Computers store numbers in a variety of ways, but all
use base-2 arithmetic, rather than our base 10.
Almost all computers bits in multiples of 8 to store numbers;
one units of 8 bits is called a **byte**.
There are two main ways to represent numbers:
integer and floating point.

Suppose we consider positive integers only. What range of values can we represent with a single byte?

0000 0000 equals 0 0000 0001 1 0000 0010 2 0000 0011 3 ... 1111 1110 254 1111 1111 255The range is 0 to 255. In general, the range of unsigned integers stored in N bits is

N 0 to 2 -1

What happens if you try to add numbers to the highest possible value?

1111 1111 = 255 + 1 = 0000 0000 = 0 1111 1111 = 255 + 2 = 0000 0001 = 1Whoops. The value roll over and start again from zero. Likewise, if you subtract from a small number, you find rollunder:

0000 0000 = 0 - 1 = 1111 1111 = 255 0000 0000 = 0 - 2 = 1111 1110 = 254

So, if you use integer storage, you must watch out for rollover/under.

*Signed integers*:
Suppose you want to keep track of positive and negative values.
You must devote one of the bits to the sign, using, for example,
0 to mean "positive" and 1 to mean "negative". That would leave
only 7 of the 8 bits for numerical values.
If the highest-order (leftmost) bit is used for the sign, then
we have

0000 0000 = 0 0000 0001 = +1 1000 0001 = -1 0000 0010 = +2 1000 0010 = -2The range of values has changed from

0 to 255 unsigned -127 to +127 signed

Yes, this simple scheme wastes two representations for zero (since +0 and -0 are the same). Real computer schemes use a slightly different arrangement which doesn't waste two values on zero, but instead gives a slightly larger range of -128 to +127.

*Challenge:*

You are in charge of the US Census for 2000. You must design computers to keep track of people. How many bytes should you devote to the population count?

Now, what happens if we want to use a computer to simulate a real, physical situation?

*Example 1:*
We drop a ball from a height of 1 meter above the ground, and
want to calculate its height and velocity as a function
of time.
Using the equations of 1-D kinematics, we can figure
out that the ball will take about 0.452 seconds
to reach the ground, and will be travelling at
a speed of about 4.4 meters per second when it gets there.

If we use integers, and units of meters and seconds, then we have a big problem: all the action takes place between t = 0 and t = 1 second. Ooops. And the height of the ball can be stated only as h = 1 m and h = 0 m. That's a pretty poor way to simulate a real ball.

There is one thing we can do: rescale the units so that they are better matched to the problem. For example, we could change

time from seconds to milliseconds 0 msec --> 452 msec height from meters to millimeters 0 mm --> 1000 mmWith these units, we can do a decent job of simulating a smooth drop, even with integers.

*Example 2:*
We want to calculate the motion of the Earth and the Moon,
due to the gravitational forces they exert on each other.
We know that the equation for gravitational force is

M * m F = G * ------- R^2where

G = 6.67 x 10^(-11) N*m^2/kg*^2 M = 5.98 x 10^(24) kg m = 7 x 10^(22) kg R = 3.82 x 10^8 m

Yikes! It's not so easy to rescale all the units by the same factor, since they are a mix of VERY big and VERY small.

It's almost always better to give up on integers,
and move to a **floating point** representation
of numbers.
You are familiar with it already: scientific notation
is an example:

24 Mass = 5.98 x 10 kg

We can divide this value into three parts:
The first part, 5.98, is called the **mantissa**.
The second part, 10, is the **base**
(we always use base 10 for scientific notation).
The last part, 24, is the **exponent**.

What's the advantage of scientific notation? Well, we can mix very small and very large values together easily: we combine the mantissa portions using one sort of arithmetic rule, and the exponent portions using another sort of rule. And this is much more compact than integer representation:

5.98 x 10^(24) vs. 5,980,000,000,000,000,000,000,000

Although we humans usually write numbers as some value
between 1 and 10, times ten to some exponent,
it turns out that computers usually **normalize**
the numbers so that the mantissa lies between 0 and 1:

we write 5.98 x 10^(24) computer uses 0.598 x 10^(25)It's no big deal.

There is a drawback to this floating-point representation,
though.
If we only use a certain number of digits to hold the mantissa --
say, 3 digits, as shown above --
then we end up with a certain amount of **quantization**:
we can't represent all the numbers within some range,
only a selected few.
For example, we can represent these numbers:

5,980 = 5.98 x 10^3 5,990 = 5.99 x 10^3but not these:

5,981 = 5.981 x 10^3 too many digits in mantissa 5,985.43 = 5.98543 x 10^3 way too many digits in mantissa

*Floating point in base 2:*

Humans use base 10 for calculations. But computers use base 2. Let's see how that works out.

The normalized mantissa in a base-2 representation is interpreted like this:

mantissa bits first second third fourth etc. -1 -2 -3 -4 hold place for 2 2 2 2

So, for example, a 4-bit mantissa of 1011 would mean

-1 -2 -3 -4 1011 = 1*2 + 0*2 + 1*2 + 1*2 = 0.5 + 0 + 0.125 + 0.0625 = 0.6875

Note the smallest possible value for a normalized mantissa is a one followed by zeroes:

1000 = 0.5It's kind of wierd to have the smallest possible value be 0.5 (instead of 0.1), but that's what happens when one normalizes in base 2.

Suppose we want to represent the number 4049, and we decide to use

8 bits for the normalized mantissa 4 bits for the exponentWe can write this number in normalized scientific notation:

base 10: 0.4049 x 10^(4) exactly base 2: 0.98852539 x 2^(12) approximately base 2: 0.98828125 x 2^(12) to a rougher approximationSince we only have 8 bits for the normalized mantissa, we have to cut off the number of digits somewhere. I've chosen the approximation 0.98828125, because we can represent that in 8 bits, as follows:

0.98828125 = 0.5 + 0.25 + 0.125 + 0.0625 + 0.03125 + 0.015625 + 0.00390625 -1 -2 -3 -4 -5 -6 -8 = 2 + 2 + 2 + 2 + 2 + 2 + 2 -1 -2 -3 -4 -5 -6 -7 -8 = 1*2 + 1*2 + 1*2 + 1*2 + 1*2 + 1*2 + 0*2 + 1*2 = 1 1 1 1 1 1 0 1 = 1111 1101The exponent part is easy: we need to express an exponent of 12

12 = 8 + 4 = 1100So, putting it all together, we can express

0.98828125 x 2^(12) = 1111 1101 1100 normalized mantissa exponent

Gee, 12 bits. That's a lot. Does it actually express the value "4049" accurately? Let's see:

0.98828125 x 2^(12) = 0.98828125 * 4096 = 4048Hey, that's not exactly right. But it's as close as we can get. If we increase the value of the mantissa by the smallest possible amount,

1111 1101 ---> 1111 1111 0.98828125 0.99609375then we get

0.99609375 x 2^(12) = 0.99609375 * 4096 = 4080

So, using a floating-point representation with 8 bits for mantissa and 4 bits for exponent, we can't actually encode the number "4049" exactly. The closest we can come is either 4048 (which is pretty good) or 4080 (which is pretty bad).

*Challenge*:

You are given a single byte, 8 bits, for storing signed, floating-point numbers. You choose the following system:Answer the following questions:

- 1 bit for sign, 3 bits for a signed exponent, 4 bits for mantissa

- What is the largest positive value you can express?
- What is the smallest positive value you can express?
- What is the "average" spacing between values in your system? That is, what is the total range of positive values, divided by the number of different values inside that range?
- How many different values does your system have between the numbers 5.1 and 6.8?

* Last modified 3/17/2002 by MWR.*

Copyright © Michael Richmond. This work is licensed under a Creative Commons License.