When you go on a trip, you might make use of a road map-either the oldfashioned, fold-out kind or an electronic version presented by a navigation system. Whatever form it takes, the map is not the land over which you travel, but rather a representation of that land. The map has captured some of the information needed to accomplish the goal of getting from one place to another. 当你外出旅行时,你可能会使用路线图--无论是老式的折叠式地图,还是导航系统提供的电子版地图。无论采取哪种形式,地图都不是你旅行所经过的土地,而是这片土地的代表。地图记录了实现从一个地方到另一个地方的目标所需的部分信息。
Likewise, the data we need to store and manage on a computer must be represented in a way that captures the essence of the information we care about, and it must do so in a form convenient for computer processing. Building on the fundamental concepts of the binary number system established in Chapter 2, this chapter explores how we represent and store the various kinds of data a computer manages. 同样,我们需要在计算机上存储和管理的数据,其表示方式必须能够捕捉到我们所关心的信息的本质,而且必须以方便计算机处理的形式来表示。本章将以第 2 章中建立的二进制数系统基本概念为基础,探讨如何表示和存储计算机管理的各种数据。
GOALS 目标
After studying this chapter, you should be able to: 学完本章后,您应该能够
distinguish between analog and digital data. 区分模拟数据和数字数据
explain data compression and calculate compression ratios. 解释数据压缩并计算压缩率。
explain the binary formats for negative and floating-point values. 解释负值和浮点数值的二进制格式。
describe the characteristics of the ASCII and Unicode character sets. 描述 ASCII 和 Unicode 字符集的特征。
perform various types of text compression. 执行各种类型的文本压缩。
explain the nature of sound and its representation. 解释声音的性质及其表现形式。
explain how RGB values define a color. 解释 RGB 值如何定义颜色。
distinguish between raster and vector graphics. 区分光栅和矢量图形
explain temporal and spatial video compression. 解释时间和空间视频压缩。
3.1 Data and Computers 3.1 数据和计算机
Without data, computers would be useless. Every task a computer undertakes deals with managing data in some way. Therefore, our need to represent and organize data in appropriate ways is paramount. 没有数据,计算机将一无是处。计算机所执行的每项任务都以某种方式涉及数据管理。因此,我们需要以适当的方式表示和组织数据。
Let’s start by distinguishing the terms data and information. Although these terms are often used interchangeably, making the distinction is sometimes useful, especially in computing. Data is basic values or facts, whereas information is data that has been organized and/or processed in a way that is useful in solving some kind of problem. Data can be unstructured and lack context. Information helps us answer questions (it “informs”). This distinction, of course, is relative to the needs of the user, but it captures the essence of the role that computers play in helping us solve problems. 让我们先来区分一下数据和信息这两个术语。虽然这两个术语经常被互换使用,但进行区分有时还是很有用的,尤其是在计算机领域。数据是基本的数值或事实,而信息则是经过组织和/或处理的数据,其方式有助于解决某种问题。数据可能是非结构化的,也可能缺乏上下文。信息可以帮助我们回答问题("提供信息")。当然,这种区分是相对于用户的需求而言的,但它抓住了计算机在帮助我们解决问题方面所起作用的本质。
Data Basic values or facts 数据 基本数值或事实
Information Data that has been organized or processed in a useful manner 信息 以有用方式组织或处理的数据
In this chapter, we focus on representing different types of data. In later chapters, we discuss the various ways to organize data so as to solve particular types of problems. 在本章中,我们将重点讨论如何表示不同类型的数据。在后面的章节中,我们将讨论组织数据以解决特定类型问题的各种方法。
In the not-so-distant past, computers dealt almost exclusively with numeric and textual data. Today, however, computers are truly multime dia devices, dealing with a vast array of information categories. Computers store, present, and help us modify many different types of data: 在不久的过去,计算机几乎只处理数字和文本数据。但如今,计算机已成为真正的多功能设备,可以处理大量的信息类别。计算机可以存储、呈现并帮助我们修改多种不同类型的数据:
Multimedia Several different media types 多媒体 几种不同的媒体类型
■ Numbers 数字
Text 文本
■ Audio 音频
Images and graphics 图像和图形
Video 视频
Ultimately, all of this data is stored as binary digits. Each document, picture, and sound bite is somehow represented as strings of 1 s and 0 s . This chapter explores each of these types of data in turn and discusses the basic ideas behind the ways in which we represent these types of data on a computer. 最终,所有这些数据都以二进制数字的形式存储。本章将依次探讨每种类型的数据,并讨论我们在计算机上表示这些类型数据的方法背后的基本思想。
We can’t discuss data representation without also talking about data compression-reducing the amount of space needed to store a piece of data. In the past, we needed to keep data small because of storage limitations. Today, computer storage is relatively cheap-but now we have an even more pressing reason to shrink our data: the need to share it with others. The Web and its underlying networks have inherent bandwidth restrictions that define the maximum number of bits or bytes that can be transmitted from one place to another in a fixed amount of time. 在讨论数据表示时,我们不能不谈到数据压缩--减少存储数据所需的空间。过去,由于存储空间的限制,我们需要将数据压缩得很小。如今,计算机存储空间相对便宜,但我们现在有一个更迫切的理由来缩小数据:需要与他人共享数据。网络及其底层网络具有固有的带宽限制,规定了在固定时间内从一个地方传输到另一个地方的比特或字节的最大数量。
Data compression Reducing the amount of space needed to store a piece of data 数据压缩 减少储存数据所需的空间
Bandwidth The number of bits or bytes that can be transmitted from one place to another in a fixed amount of time 带宽 在固定时间内可从一个地方传输到另一个地方的比特或字节的数量
The compression ratio gives an indication of how much compression occurs. The compression ratio is the size of the compressed data divided by the size of the original data. The values could be in bits or characters (or whatever is appropriate), as long as both values measure the same thing. The ratio should result in a number between 0 and 1 . The closer the ratio is to zero, the tighter the compression. 压缩比表示压缩的程度。压缩比是压缩数据的大小除以原始数据的大小。数值的单位可以是比特或字符(或任何合适的单位),只要这两个值衡量的是同一件事。比值应介于 0 和 1 之间。比值越接近零,压缩就越紧密。
Compression ratio The size of the compressed data divided by the size of the uncompressed data 压缩比 压缩数据的大小除以未压缩数据的大小
A data compression technique can be lossless, which means the data can be retrieved without losing any of the original information, or it can be lossy, in which case some information is lost in the process of compaction. Although we never want to lose information, in some cases this loss is acceptable. When 数据压缩技术可以是无损的,即在检索数据时不会丢失任何原始信息;也可以是有损的,即在压缩过程中会丢失一些信息。虽然我们绝不希望丢失信息,但在某些情况下,这种损失是可以接受的。当
dealing with data representation and compression, we always face a tradeoff between accuracy and size. 在处理数据表示和压缩时,我们总是要在准确性和大小之间做出权衡。
Lossless compression A data compression technique in which there is no loss of information 无损压缩 一种不会丢失信息的数据压缩技术
Lossy compression A data compression technique in which there is loss of information 有损压缩 一种数据压缩技术,会损失信息
Analog and Digital Data 模拟和数字数据
The natural world, for the most part, is continuous and infinite. A number line is continuous, with values growing infinitely large and small. That is, you can always come up with a number that is larger or smaller than any given number. Likewise, the numeric space between two integers is infinite. For instance, any number can be divided in half. But the world is not just infinite in a mathematical sense. The spectrum of colors is a continuous rainbow of infinite shades. Objects in the real world move through continuous and infinite space. Theoretically, you could always close the distance between you and a wall by half, and you would never actually reach the wall. 自然界在大多数情况下是连续和无限的。数线是连续的,数值无限大或无限小。也就是说,你总能想出一个比任何给定数字更大或更小的数字。同样,两个整数之间的数值空间也是无限的。例如,任何数字都可以对半分。但是,世界并不仅仅是数学意义上的无限。光谱是由无限深浅组成的连续彩虹。现实世界中的物体在连续而无限的空间中运动。从理论上讲,你可以将你与墙壁之间的距离缩短一半,但实际上你永远也到不了墙壁。
Computers, by contrast, are finite. Computer memory and other hardware devices have only so much room to store and manipulate a certain amount of data. We always fail in our attempt to represent an infinite world on a finite machine. The goal, then, is to represent enough of the world to satisfy our computational needs and our senses of sight and sound. We want to make our representations good enough to get the job done, whatever that job might be. 相比之下,计算机是有限的。计算机内存和其他硬件设备只有这么大的空间来存储和处理一定量的数据。我们试图在有限的机器上呈现无限的世界,却总是以失败告终。因此,我们的目标是呈现足够多的世界,以满足我们的计算需求以及视觉和听觉。我们要让我们的表征足够好,以便完成工作,无论这项工作是什么。
Data can be represented in one of two ways: analog or digital. Analog data is a continuous representation, analogous to the actual information it represents. Digital data is a discrete representation, breaking the information up into separate elements. 数据可以用两种方式之一表示:模拟或数字。模拟数据是一种连续的表示法,类似于它所代表的实际信息。数字数据是一种离散的表示法,将信息分割成独立的元素。
Analog data A continuous representation of data 模拟数据 数据的连续表示
Digital data A discrete representation of data 数字数据 数据的离散表示
A mercury thermometer is an analog device. The mercury rises in a continuous flow in the tube in direct proportion to the temperature. We calibrate and mark the tube so that we can read the current temperature, usually as an integer such as 75 degrees Fahrenheit. However, the mercury in such a thermometer is actually rising in a continuous manner between degrees. At some point in time, the temperature is actually 74.568 degrees Fahrenheit, and the mercury is accurately indicating that, even if our markings are not detailed enough to note such small changes. See FIGURE 3.1. 水银温度计是一种模拟设备。水银在管内不断上升,与温度成正比。我们对管子进行校准和标记,这样就能读出当前的温度,通常是一个整数,如华氏 75 度。然而,这种温度计中的水银实际上是在度数之间不断上升的。在某一时刻,温度实际上是 74.568 华氏度,水银正在准确地显示这一温度,即使我们的标记不够详细,无法注意到如此微小的变化。参见图 3.1。
FIGURE 3.1 A mercury thermo meter continually rises in direct proportion to the temperature 图 3.1 水银温度计的持续升高与温度成正比
Analog data is directly proportional to the continuous, infinite world around us. Computers, therefore, cannot work well with analog data. Instead, we digitize data by breaking it into pieces and representing those pieces separately. Each of the representations discussed in this chapter takes a continuous entity and separates it into discrete elements. Those discrete elements are then individually represented using binary digits. 模拟数据与我们周围连续、无限的世界成正比。因此,计算机无法很好地处理模拟数据。取而代之的是,我们通过将数据分解成不同的片段并分别表示这些片段来实现数据的数字化。本章讨论的每一种表示方法都是将连续的实体拆分成离散的元素。然后用二进制数字来单独表示这些离散元素。
Digitize The act of breaking information down into discrete pieces 数字化 将信息分解成离散片段的行为
But why do we use the binary system? We know from Chapter 2 that binary is just one of many equivalent number systems. Couldn’t we use, say, the decimal number system, with which we are already more familiar? We could. In fact, it’s been done. Computers have been built that are based on other number systems. However, modern computers are designed to use and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they have to represent only one of two possible values. 但我们为什么要使用二进制系统呢?从第 2 章中我们知道,二进制只是众多等价数制中的一种。难道我们不能使用我们已经比较熟悉的十进制数系吗?可以。事实上,我们已经做到了。我们已经制造出了基于其他数制的计算机。不过,现代计算机的设计是为了使用和管理二进制数值,因为如果存储和管理数据的设备只能表示两种可能数值中的一种,那么它们的成本就会低得多,可靠性也会高得多。
Also, electronic signals are far easier to maintain if they transfer only binary data. An analog signal continually fluctuates up and down in voltage, but a digital signal has only a high or low state, corresponding to the two binary digits. See FIGURE 3.2. 此外,如果电子信号只传输二进制数据,则更易于维护。模拟信号的电压不断上下波动,但数字信号只有高电平或低电平状态,与两位二进制数相对应。参见图 3.2。
All electronic signals (both analog and digital) degrade as they move down a line. That is, the voltage of the signal fluctuates due to environmental effects. The trouble is that as soon as an analog signal degrades, information is lost. Because any voltage level within the range is valid, it’s impossible to know what the original signal state was or even that it changed at all. 所有电子信号(包括模拟信号和数字信号)都会随着线路的移动而衰减。也就是说,信号的电压会因环境影响而波动。问题是,模拟信号一旦衰减,信息就会丢失。因为在这个范围内的任何电压电平都是有效的,所以不可能知道最初的信号状态是什么,甚至不可能知道它是否发生了变化。
Digital signals, by contrast, jump sharply between two extremes-a behavior referred to as pulse-code modulation (PCM). A digital signal can become degraded by quite a bit before any information is lost, because any voltage value above a certain threshold is considered a high value, and any value below that threshold is considered a low value. Periodically, a digital signal is reclocked to regain its original shape. As long as it is reclocked before too much degradation occurs, no information is lost. FIGURE 3.3 shows the degradation effects of analog and digital signals. 相比之下,数字信号在两个极端之间急剧跳变,这种行为被称为脉冲编码调制(PCM)。在丢失任何信息之前,数字信号可能会有相当程度的劣化,因为任何高于某个阈值的电压值都被视为高值,而任何低于该阈值的电压值都被视为低值。数字信号会定期重新计时,以恢复其原来的状态。只要在信号衰减过快之前重新计时,就不会丢失任何信息。图 3.3 显示了模拟信号和数字信号的衰减效应。
Pulse-code modulation Variation in a signal that jumps sharply between two extremes 脉冲编码调制 信号在两个极端之间急剧跳变
Reclock The act of reasserting an original digital signal before too much degradation occurs 重新锁定 在原始数字信号发生严重衰减之前重新确认原始数字信号的行为
FIGURE 3.2 An analog signal and a digital signal 图 3.2 模拟信号和数字信号
FIGURE 3.3 Degradation of analog and digital signals 图 3.3 模拟和数字信号的衰减
Binary Representations 二进制表示法
As we investigate the details of representing particular types of data, it’s important to remember the inherent nature of using binary. One bit can be either 0 or 1 . There are no other possibilities. Therefore, one bit can represent only two things. For example, if we wanted to classify a food as being either sweet or sour, we would need only one bit to do it. We could say that if the bit is 0 , the food is sweet, and if the bit is 1 , the food is sour. But if we want to have additional classifications (such as spicy), one bit is not sufficient. 在我们研究表示特定类型数据的细节时,记住使用二进制的固有特性非常重要。一位只能是 0 或 1 。没有其他可能性。因此,一位只能代表两种情况。例如,如果我们想把一种食物分为甜的或酸的,我们只需要一个比特就可以做到。我们可以说,如果该位为 0,食物就是甜的;如果该位为 1,食物就是酸的。但是,如果我们想有额外的分类(比如辣),一个比特就不够了。
To represent more than two things, we need multiple bits. Two bits can represent four things because four combinations of 0 and 1 can be made from two bits: 00,01,1000,01,10, and 11 . For instance, if we want to represent which of four possible gears a car is in (park, drive, reverse, or neutral), we need only two bits: Park could be represented by 00 , drive by 01 , reverse by 10 , and neutral by 11 . The actual mapping between bit combinations and the thing each combination represents is sometimes irrelevant ( 00 could be used to represent reverse, if you prefer), although sometimes the mapping can be meaningful and important, as we discuss in later sections of this chapter. 要表示两个以上的事物,我们需要多个比特。两个比特可以表示四种事物,因为两个比特可以组成 0 和 1 的四种组合: 00,01,1000,01,10 和 11。例如,如果我们想表示一辆汽车在四个档位(驻车、驱动、倒车或空档)中的哪个档位,我们只需要两个比特:驻车档用 00 表示,驱动档用 01 表示,倒档用 10 表示,空档用 11 表示。位组合与每种组合所代表的事物之间的实际映射有时并不相关(如果你愿意,00 可以用来表示倒车),但有时映射可能是有意义和重要的,我们将在本章后面的章节中讨论。
If we want to represent more than four things, we need more than two bits. For example, three bits can represent eight things because eight combinations of 0 and 1 can be made from three bits. Likewise, four bits can represent 16 things, five bits can represent 32 things, and so on. See FIGURE 3.4. In the 如果我们想表示四个以上的事物,就需要两个以上的比特。例如,三个比特可以表示八个事物,因为三个比特可以组成八个 0 和 1 的组合。同样,四个比特可以表示 16 种事物,五个比特可以表示 32 种事物,依此类推。参见图 3.4。在
figure, note that the bit combinations are simply counting in binary as you move down a column. 请注意,在向下移动一列时,位数组合只是以二进制计数。
In general, nn bits can represent 2^(n)2^{n} things because 2^(n)2^{n} combinations of 0 and 1 can be made from nn bits. Every time we increase the number of available bits by 1 , we double the number of things we can represent. 一般来说, nn 比特可以表示 2^(n)2^{n} 事物,因为 2^(n)2^{n} 0 和 1 的组合可以由 nn 比特组成。每增加 1 个可用比特,我们可以表示的事物数量就会增加一倍。
Let’s turn the question around. How many bits do you need to represent, say, 25 unique things? Well, four bits wouldn’t be enough because four bits can represent only 16 things. We would have to use at least five bits, which would allow us to represent 32 things. Given that we need to represent only 25 things, some of the bit combinations would not have a valid interpretation. 让我们把问题反过来。比方说,你需要多少比特来表示 25 种独特的事物?那么,四个比特是不够的,因为四个比特只能表示 16 种事物。我们至少要使用五个比特,这样才能表示 32 种事物。考虑到我们只需要表示 25 种事物,一些比特组合将无法进行有效的解释。
Keep in mind that even though we may technically need only a certain minimum number of bits to represent a set of items, we may allocate more than that for the storage of data. There is a minimum number of bits that a computer architecture can address and move around at one time, and it is usually a power of 2 , such as 8,16 , or 32 bits. Therefore, the minimum amount of storage given to any type of data is allocated in multiples of that value. 请记住,尽管我们在技术上可能只需要一定的最小位数来表示一组项目,但我们可能会分配比这更多的位数来存储数据。计算机体系结构一次可寻址和移动的最小位数通常是 2 的幂次,如 8、16 或 32 位。因此,任何类型数据的最小存储量都是以该值的倍数分配的。
3.2 Representing Numeric Data 3.2 表示数值数据
Numeric values are the most prevalent type of data used in a computer system. Unlike with other types of data, there may seem to be no need to come up with a clever mapping between binary codes and numeric data. Because binary is a number system, a natural relationship exists between the numeric data and the binary values that we store to represent them. This is true, in general, for positive integer data. The basic issues regarding integer conversions were covered in Chapter 2 in the general discussion of the binary system and its equivalence to other bases. However, we have other issues regarding the representation of numeric data to consider at this point. Integers are just the beginning in terms of numeric data. This section discusses the representation of negative and noninteger values. 数值是计算机系统中最常用的数据类型。与其他类型的数据不同,二进制代码和数值数据之间似乎不需要巧妙的映射。因为二进制是一种数字系统,数字数据与我们存储的二进制值之间存在着一种自然的关系。一般来说,正整数数据也是如此。有关整数转换的基本问题已在第 2 章关于二进制系统及其与其他进制等价的一般性讨论中阐述。不过,我们现在还要考虑有关数字数据表示的其他问题。整数只是数值数据的开始。本节将讨论负值和非整数值的表示。
Representing Negative Values 表示负值
Aren’t negative numbers just numbers with a minus sign in front? Perhaps. That is certainly one valid way to think about them. Let’s explore the issue of negative numbers and discuss appropriate ways to represent them on a computer. 负数不就是前面带减号的数字吗?也许吧。这当然是一种有效的思考方式。让我们来探讨负数的问题,并讨论在计算机上表示负数的适当方法。
Signed-Magnitude Representation 带符号幅度表示法
You have used the signed-magnitude representation of numbers since you first learned about negative numbers in grade school. In the traditional decimal system, a sign ( + or - ) is placed before a number’s value, although the positive sign is often assumed. The sign represents the ordering, and the digits represent 自从你在小学时第一次认识负数,你就开始使用带符号的数字表示法了。在传统的十进制中,数字的数值前要加一个符号(+ 或-),不过通常都是正号。符号代表排序,数字代表
the magnitude of the number. The classic number line looks something like this, in which a negative sign means that the number is to the left of zero and the positive number is to the right of zero: 数字的大小。经典的数字线是这样的:负号表示数字在零的左边,正号表示数字在零的右边:
Signed-magnitude representation Number representation in which the sign represents the ordering of the number (negative and positive) and the value represents the magnitude 带符号幅度表示法 符号代表数字的排序(负数和正数),数值代表幅度的数字表示法
Performing addition and subtraction with signed integer numbers can be described as moving a certain number of units in one direction or another. To add two numbers, you find the first number on the scale and move in the direction of the sign of the second as many units as specified. Subtraction is done in a similar way, moving along the number line as dictated by the sign and the operation. In grade school, you soon graduated to doing addition and subtraction without using the number line. 用有符号整数进行加减法运算,可以说是朝一个方向移动一定数量的单位。要将两个数相加,需要在刻度上找到第一个数,然后朝第二个数的符号方向移动指定的单位数。减法也是用类似的方法,根据符号和运算的要求沿着数线移动。到了小学,你很快就可以不用数线做加减法了。
There is a problem with the signed-magnitude representation: There are two representations of zero-plus zero and minus zero. The idea of a negative zero doesn’t necessarily bother us; we just ignore it. However, two representations of zero within a computer can cause unnecessary complexity, so other representations of negative numbers are used. Let’s examine another alternative. 带符号的幅度表示法有问题:零有两种表示法--正零和负零。负零的概念并不一定会困扰我们,我们只是忽略了它。但是,在计算机中,两种零表示法会造成不必要的复杂性,因此我们使用了其他负数表示法。让我们来看看另一种选择。
Fixed-Sized Numbers 固定大小的数字
If we allow only a fixed number of values, we can represent numbers as just integer values, where half of them represent negative numbers. The sign is determined by the magnitude of the number. For example, if the maximum number of decimal digits we can represent is two, we can let 1 through 49 be the positive numbers 1 through 49 and let 50 through 99 represent the negative numbers -50 through -1 . Let’s take the number line and number the negative 如果我们只允许固定数量的数值,我们就可以只用整数值来表示数字,其中一半表示负数。符号由数字的大小决定。例如,如果我们能表示的最大小数位数是两位,我们可以让 1 到 49 表示正数 1 到 49,让 50 到 99 表示负数 -50 到 -1 。让我们在数线上给负数编号
values on the top according to this scheme: 根据这一方案,顶部的数值
To perform addition within this scheme, you just add the numbers together and discard any carry. Adding positive numbers should be okay. Let’s try adding a positive number and a negative number, a negative number and a positive number, and two negative numbers. These are shown in the following table in signed-magnitude and in this scheme (the carries are discarded): 要在此方案中执行加法运算,只需将数字相加,然后舍弃任何进位。正数相加应该没问题。让我们试着把一个正数和一个负数、一个负数和一个正数以及两个负数相加。下表显示了这些加法的带符号幅度和本方案(舍弃了进位):
What about subtraction, using this scheme for representing negative numbers? The key is in the relationship between addition and subtraction: A-B=A+(-\mathrm{A}-\mathrm{B}=\mathrm{A}+(-BB ). We can subtract one number from another by adding the negative of the second to the first: 那么,用这种表示负数的方法来做减法呢?关键在于加法和减法之间的关系: A-B=A+(-\mathrm{A}-\mathrm{B}=\mathrm{A}+(-BB )。我们可以用一个数减去另一个数,方法是将第二个数的负数加到第一个数上:
In this example, we have assumed a fixed size of 100 values and kept our numbers small enough to use the number line to calculate the negative representation of a number. However, you can also use a formula to compute the negative representation: 在本例中,我们假定数值的大小固定为 100,并将数字保持在足够小的范围内,以便使用数列来计算数字的负表示。不过,您也可以使用公式来计算负数的表示:
Negative (I)=10^(k)-I(I)=10^{k}-I, where kk is the number of digits 负数 (I)=10^(k)-I(I)=10^{k}-I ,其中 kk 为位数
Let’s apply this formula to -3 in our two-digit representation: 让我们把这个公式应用到两位数表示的 -3 中: -(3)=10^(2)-3=97-(3)=10^{2}-3=97
What about a three-digit representation? 三位数表示法如何? -(3)=10^(3)-3=997-(3)=10^{3}-3=997
This representation of negative numbers is called the ten’s complement. Although humans tend to think in terms of sign and magnitude to represent numbers, the complement strategy is actually easier in some ways when it comes to electronic calculations. Because we store everything in a modern computer in binary, we use the binary equivalent of the ten’s complement, called the two’s complement. 这种表示负数的方法被称为十进制补码。虽然人类倾向于用符号和大小来表示数字,但在电子计算中,补码策略在某些方面其实更简单。由于现代计算机中的所有数据都以二进制存储,因此我们使用相当于十进制的二进制,即二进制。
Ten’s complement A representation of negative numbers such that the negative of II is 10 raised to kk minus II 十的补码 负数的表示方法,如 II 的负数是 10 升至 kk 减去 II
Two's Complement 两人互补
Let’s assume that a number must be represented in eight bits, seven for the number and one for the sign. To make it easier to look at long binary numbers, we make the number line vertical: 假设一个数字必须用八个比特来表示,七个比特表示数字,一个比特表示符号。为了便于查看较长的二进制数,我们将数字线改为垂直:
Would the ten’s complement formula work with the 10 replaced by 2 ? That is, could we compute the negative binary representation of a number using the formula negative (I)=2^(k)-I(I)=2^{k}-I ? Let’s try it and see: 如果把 10 换成 2,十进制公式还能用吗?也就是说,我们能用负 (I)=2^(k)-I(I)=2^{k}-I 公式计算出一个数字的负二进制表示吗?让我们试试看:
-(2)=2^(7)-2=128-2=-126-(2)=2^{7}-2=128-2=-126
Decimal 126 is octal 176, which is 1111110 in binary-but the number line has one more 1 digit to the left. What’s wrong? Nothing. This is a negative number. The leftmost digit determines whether the number is negative or positive. A 0 bit in the leftmost digit says that the number is positive; a 1 bit says the number is negative. Thus, -(2)-(2) is 11111110 . 十进制 126 是八进制 176,二进制是 1111110,但数列左边多了一个 1 位数。怎么了?没什么。这是一个负数。最左边的数字决定了数字是负数还是正数。最左边数字的 0 位表示数字是正数;1 位表示数字是负数。因此, -(2)-(2) 是 11111110 。
There is an easier way to calculate the two’s complement: invert the bits and add 1 . That is, take the positive value and change all the 1 bits to 0 and all the 0 bits to 1 , and then add 1 . 还有一种更简单的方法来计算二进制补码:反转位并加上 1。也就是说,取正值,将所有 1 位变为 0,所有 0 位变为 1,然后加上 1。
Addition and subtraction are accomplished the same way as in ten’s 加法和减法的计算方法与十的计算方法相同
complement arithmetic: 补码算术: -12710000001-12710000001 +quad100000001+\quad 100000001 -12610000010-12610000010
With this representation, the leftmost bit in a negative number is always a 1. Therefore, you can tell immediately whether a binary number in two’s complement is negative or positive. 使用这种表示法,负数最左边的一位总是 1,因此可以立即判断二进制数是负数还是正数。
Number Overflow 数字溢出
Overflow occurs when the value that we compute cannot fit into the number of bits we have allocated for the result. For example, if each value is stored using eight bits, adding 127 to 3 would produce an overflow: 当我们计算的值无法容纳进我们为结果分配的位数时,就会出现溢出。例如,如果每个值用 8 位存储,那么将 127 加到 3 就会产生溢出:
Overflow A situation where a calculated value cannot fit into the number of digits reserved for it 溢出 计算值无法容纳预留位数的情况
In our scheme 10000010 represents -126 , not +130 . If we were not representing negative numbers, however, the result would be correct. 在我们的方案中,10000010 表示 -126 ,而不是 +130 。不过,如果我们不表示负数,结果也是正确的。
Overflow is a classic example of the type of problems we encounter by mapping an infinite world onto a finite machine. No matter how many bits we allocate for a number, there is always the potential need to represent a number that doesn’t fit. How overflow problems are handled varies by computer hardware and by the differences in programming languages. 溢出是我们将无限世界映射到有限机器时所遇到问题类型的一个典型例子。无论我们为一个数字分配多少位,都有可能需要表示一个不合适的数字。处理溢出问题的方式因计算机硬件和编程语言的不同而异。
Representing Real Numbers 表示实数
In computing, we call all noninteger values (that can be represented) real values. For our purposes here, we define a real number as a value with a potential fractional part. That is, real numbers have a whole part and a 在计算中,我们将所有非整数值(可以表示的)都称为实数。在这里,我们将实数定义为具有潜在分数部分的数值。也就是说,实数有整数部分和小数部分。
fractional part, either of which may be zero. For example, some real numbers in base 10 are 104.32, 0.999999,357.00.999999,357.0, and 3.14159. 小数部分,其中任何一部分都可能为零。例如,以 10 为基数的实数有 104.32、 0.999999,357.00.999999,357.0 和 3.14159。
As we explored in Chapter 2, the digits represent values according to their position, and those position values are relative to the base. To the left of the decimal point, in base 10 , we have the ones position, the tens position, the hundreds position, and so forth. These position values come from raising the base value to increasing powers (moving from the decimal point to the left). The positions to the right of the decimal point work the same way, except that the powers are negative. So the positions to the right of the decimal point are the tenths position ( 10^(-1)10^{-1} or one tenth), the hundredths position (10^(-2):}\left(10^{-2}\right. or one hundredth), and so forth. 正如我们在第 2 章中所探讨的,数位根据其位置来表示数值,而这些位置值是相对于基数而言的。在小数点左边,在基数 10 中,我们有 1 位、10 位、100 位,依此类推。这些位置值来自于将基数提高到幂的增大(从小数点向左移动)。小数点右边的位置也是这样,只不过幂是负数。因此,小数点右边的位置是十位( 10^(-1)10^{-1} 或十分之一)、百位( (10^(-2):}\left(10^{-2}\right. 或百分之一),以此类推。
In binary, the same rules apply but the base value is 2 . Since we are not working in base 10 , the decimal point is referred to as a radix point, a term that can be used in any base. The positions to the right of the radix point in binary are the halves position ( 2^(-1)2^{-1} or one half), the quarters position ( 2^(-2)2^{-2} or one quarter), and so forth. 在二进制中,同样的规则适用,但基数是 2。由于我们使用的不是 10 进制,因此十进制点被称为弧度点,这个术语可用于任何进制。在二进制中,弧度点右边的位置是二分之一位置( 2^(-1)2^{-1} 或二分之一)、四分之一位置( 2^(-2)2^{-2} 或四分之一),以此类推。
Radix point The dot that separates the whole part from the fractional part in a real number in any base 任意基数实数中分隔整数部分和分数部分的点
How do we represent a real value in a computer? We store the value as an integer and include information showing where the radix point is. That is, any real value can be described by three properties: the sign (positive or negative; the mantissa, which is made up of the digits in the value with the radix point assumed to be to the right; and the exponent, which determines how the radix point is shifted relative to the mantissa. A real value in base 10 can , therefore, be defined by the following formula: 如何在计算机中表示实数值?我们将数值存储为整数,并包含显示弧度点位置的信息。也就是说,任何实数值都可以用三个属性来描述:符号(正或负)、尾数(由数值中的数字组成,假定弧度点在右边)和指数(决定弧度点相对于尾数的移动方式)。因此,以 10 为底数的实数值可以用下面的公式定义:
sign * mantissa * 10^("exp ")10^{\text {exp }} 符号 * 尾数 * 10^("exp ")10^{\text {exp }}
The representation is called floating point because the number of digits is fixed but the radix point floats. When a value is in floating-point form, a positive exponent shifts the decimal point to the right, and a negative exponent shifts the decimal point to the left. 这种表示法之所以称为浮点,是因为位数是固定的,但弧度点是浮动的。当数值是浮点形式时,正指数会将小数点向右移动,负指数会将小数点向左移动。
Floating point A representation of a real number that keeps track of the sign, mantissa, and exponent 浮点数 指实数的表示形式,可跟踪符号、尾数和指数
Let’s look at how to convert a real number expressed in our usual decimal notation into floating-point notation. As an example, consider the number 148.69. The sign is positive, and two digits appear to the right of the decimal point. Thus the exponent is -2 , giving us 14869**10^(-2)14869 * 10^{-2}. TABLE 3.1 shows other examples. For the sake of this discussion, we assume that only five digits can be represented. 让我们来看看如何将以我们常用的十进制符号表示的实数转换成浮点数符号。以 148.69 这个数字为例。这个数字的符号是正数,小数点右边有两位数。因此指数为 -2 ,得出 14869**10^(-2)14869 * 10^{-2} 。表 3.1 显示了其他示例。为了便于讨论,我们假设只能表示五位数。
How do we convert a value in floating-point form back into decimal notation? The exponent on the base tells us how many positions to move the radix point. If the exponent is negative, we move the radix point to the left. If the exponent is positive, we move the radix point to the right. Apply this scheme to the floating-point values in Table 3.1. 如何将浮点形式的数值转换回十进制符号?基数上的指数告诉我们要将弧度点移动多少个位置。如果指数是负数,我们就将半整点向左移动。如果指数为正,我们就将弧度点向右移动。将此方案应用于表 3.1 中的浮点数值。
Notice that in the last example in Table 3.1, we lose information. Because we are storing only five digits to represent the significant digits (the mantissa), the whole part of the value is not accurately represented in floating-point notation. 请注意,在表 3.1 的最后一个示例中,我们丢失了一些信息。因为我们只存储了五位数来表示有效数字(尾数),所以整个数值部分无法用浮点符号准确表示。
TABLE 3.1 Values in decimal notation and floating-point notation (five digits) 表 3.1 十进制符号和浮点符号数值(五位数)
Real Value 真实价值
Floating-Point Value 浮点数值
12001.00
12001**10^(0)12001 * 10^{0}
-120.01
-12001**10^(-2)-12001 * 10^{-2}
0.12000
12000**10^(-5)12000 * 10^{-5}
-123.10
-12310^(**)10^(-2)-12310^{*} 10^{-2}
155555000.00
15555**10^(4)15555 * 10^{4}
Real Value Floating-Point Value
12001.00 12001**10^(0)
-120.01 -12001**10^(-2)
0.12000 12000**10^(-5)
-123.10 -12310^(**)10^(-2)
155555000.00 15555**10^(4)| Real Value | Floating-Point Value |
| :---: | :---: |
| 12001.00 | $12001 * 10^{0}$ |
| -120.01 | $-12001 * 10^{-2}$ |
| 0.12000 | $12000 * 10^{-5}$ |
| -123.10 | $-12310^{*} 10^{-2}$ |
| 155555000.00 | $15555 * 10^{4}$ |
Likewise, a binary floating-point value is defined by the following formula: 同样,二进制浮点数值的定义公式如下:
sign * mantissa * 2^("exp ")2^{\text {exp }} 符号 * 尾数 * 2^("exp ")2^{\text {exp }}
Note that only the base value has changed. Of course, in this scheme the mantissa would contain only binary digits. To store a floating-point number in 请注意,只有基数发生了变化。当然,在这种方案中,尾数只包含二进制数。要将浮点数存储在
binary on a computer, we can store the three values that define it. For example, according to one common standard, if we devote 64 bits to the storage of a floating-point value, we use 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. Internally, this format is taken into account whenever the value is used in a calculation or is displayed. 在计算机上,我们可以存储定义浮点数的三个值。例如,根据一种常见的标准,如果我们用 64 位来存储浮点数值,那么我们用 1 位来表示符号,11 位来表示指数,52 位来表示尾数。在内部,当数值用于计算或显示时,就会考虑到这种格式。
But how do we get the correct value for the mantissa if the value is not a whole number? In Chapter 2, we discussed how to change a natural number from one base to another. Here we have shown how real numbers are represented in a computer, using decimal examples. We know that all values are represented in binary in the computer. How do we change the fractional part of a decimal value to binary? 但是,如果尾数的值不是整数,我们如何得到正确的尾数值呢?在第 2 章中,我们讨论了如何将自然数从一个基数转换为另一个基数。在这里,我们用十进制举例说明了实数在计算机中的表示方法。我们知道,所有数值在计算机中都用二进制表示。如何将十进制数值的小数部分转换为二进制呢?
To convert a whole value from base 10 to another base, we divide by the new base, recording the remainder as the next digit to the left in the result and continuing to divide the quotient by the new base until the quotient becomes zero. Converting the fractional part is similar, but we multiply by the new base rather than dividing. The carry from the multiplication becomes the next digit to the right in the answer. The fractional part of the result is then multiplied by the new base. The process continues until the fractional part of the result is zero. Let’s convert .75 to binary. 要将一个整数值从 10 进制转换成另一个进制,我们先除以新的进制,将余数记为结果左边的下一位数,然后继续用商除以新的进制,直到商为零。小数部分的转换与此类似,但我们是乘以新的基数,而不是除以新的基数。乘法的进位成为答案右边的下一位数。结果的小数部分再乘以新的基数。这个过程一直持续到结果的小数部分为零为止。让我们把 0.75 转换成二进制。 .75**2=1.50.75 * 2=1.50 .50**2=1.00.50 * 2=1.00
Thus, .75 in decimal is .11 in binary. Let’s try another example. 因此,十进制的 0.75 就是二进制的 0.11。让我们再举一个例子。 .435**2=0.870.870.435 * 2=0.870 .870
2=1.740.7402=1.740 .740
2=1.480.4802=1.480 .480
2=0.960.9602=0.960 .960
2=1.920.9202=1.920 .920 **2=1.840* 2=1.840
Thus, .435 is 011011 … in binary. Will the fractional part ever become zero? Keep multiplying it out and see. 因此,0.435 是二进制的 011011......。小数部分会变成零吗?继续乘下去就知道了。
Now let’s go through the entire conversion process by converting 20.25 in 现在,让我们通过将 20.25 换算成
decimal to binary. First we convert 20. 十进制转换为二进制。首先转换 20。
20 in binary is 10100 . Now we convert the fractional part: 二进制的 20 是 10100。现在我们转换小数部分: .25**2=0.50.25 * 2=0.50 .50**2=1.00.50 * 2=1.00
Thus 20.25 in decimal is 10100.01 in binary. 因此,十进制的 20.25 就是二进制的 10100.01。
Scientific notation is a term with which you may already be familiar, so we mention it here. Scientific notation is a form of floating-point representation in which the decimal point is kept to the right of the leftmost digit. That is, there is one whole number. In many programming languages, if you print out a large real value without specifying how to print it, the value is printed in scientific notation. Because exponents could not be printed in early machines, an “E” was used instead. For example, “12001.32708” would be written as " 1.200132708E+41.200132708 \mathrm{E}+4 " in scientific notation. 科学记数法是一个您可能已经熟悉的术语,因此我们在此提及。科学记数法是一种浮点表示法,小数点保留在最左边数字的右边。也就是说,只有一个整数。在许多编程语言中,如果打印一个较大的实数值而不指定打印方式,该数值就会以科学计数法打印出来。由于早期的机器无法打印指数,因此用 "E "代替。例如,"12001.32708 "在科学计数法中将写成" 1.200132708E+41.200132708 \mathrm{E}+4 "。
Scientific notation An alternative floating-point representation 科学记数法 另一种浮点表示法
3.3 Representing Text 3.3 表示文本
A text document can be decomposed into paragraphs, sentences, words, and ultimately individual characters. To represent a text document in digital form, we simply need to be able to represent every possible character that may appear. The document is the continuous (analog) entity, and the separate characters are the discrete (digital) elements that we need to represent and store in computer memory. 文本文档可以分解成段落、句子、单词,最后是单个字符。要以数字形式表示文本文档,我们只需能够表示可能出现的每一个字符。文档是连续的(模拟)实体,而独立的字符则是我们需要表示并存储在计算机内存中的离散(数字)元素。
At this point, we should distinguish between the basic idea of representing text and the more involved concept of word processing. When we create a document in a word processing program such as Microsoft ^(®){ }^{\circledR} Word, we can specify all kinds of formatting: fonts, margins, tabs, color, and so on. Many word processors also let us add art, equations, and other elements. This extra information is stored along with the rest of the text so that the document can be displayed and printed the way you want it. The core issue, however, is the way we represent the characters themselves; therefore, those techniques remain our focus at this point. 在这一点上,我们应该区分表示文本的基本概念和更复杂的文字处理概念。当我们在 Microsoft ^(®){ }^{\circledR} Word 等文字处理程序中创建文档时,我们可以指定各种格式:字体、页边距、制表符、颜色等。许多文字处理程序还允许我们添加美术、公式和其他元素。这些额外的信息会和其他文本一起存储,这样文档就可以按照你想要的方式显示和打印。然而,核心问题是我们如何表现字符本身;因此,这些技术仍然是我们目前的重点。
There are a finite number of characters to represent. The general approach for representing characters is to list all of them and then assign a binary string to each character. For example, to store a particular letter, we store the appropriate bit string. 要表示的字符数量有限。表示字符的一般方法是列出所有字符,然后为每个字符分配一个二进制字符串。例如,要存储一个特定的字母,我们就存储相应的位字符串。
So which characters do we need to represent? The English language includes 26 letters. But uppercase and lowercase letters have to be treated separately, so that’s really 52 unique characters. Punctuation characters also have to be represented, as do the numeric digits (the actual characters ’ 0 ', ’ 1 ', through ’ 9 '). Even the space character must have a representation. And what about languages other than English? The list of characters we may want to represent starts to grow quickly once you begin to think about it. Keep in mind that, as we discussed earlier in this chapter, the number of unique things (characters, in this case) we want to represent determines how many bits we’ll need to represent any one of them. 那么,我们需要表示哪些字符呢?英语包括 26 个字母。但大写字母和小写字母必须分开处理,因此实际上有 52 个独特的字符。标点符号和数字(实际字符 "0"、"1 "到 "9")也需要表示。甚至空格字符也必须有表示方法。那么英语以外的语言呢?一旦开始考虑,我们可能需要表示的字符列表就会迅速增加。请记住,正如我们在本章前面所讨论的,我们想要表示的独特事物(这里指字符)的数量决定了我们需要多少比特来表示其中任何一个字符。
A character set is simply a list of characters and the codes used to represent each one. Several character sets have been used over the years, though a few have dominated. By agreeing to use a particular character set, 字符集只是一个字符列表,以及用于表示每个字符的代码。多年来,已经使用了多个字符集,但只有少数几个字符集占据主导地位。通过同意使用特定的字符集、