Tag Archives: IEEE-754

IEEE-754 pt 3 – Increased granularity

In part 2 we showed that the position of the implicit bit i was also the width of the biased exponent. That makes the bias i-1 bits wide. Since all bits of the bias are 1, the value of the bias is 2i-1-1. With one bit for the sign, that leaves n-i-1 bits for the width of the significand. By specifying i as a function of n the storage format becomes fully defined. This works not only for eight-bit boundaries but for the six-bit boundaries of Color My Data eight-bit compressed binary format (CBF8). CBF8 is designed for loss-less data exchange using binary formats. One base-64 digit could represent plus or minus sixteenths between 0 and 0.9375; eighths between 1 and 1.875; fourths between 2 and 3.75, infinity, quiet NaN or NaN with up to two signals. That represents a potential compression of 87.5% over double precision WITH NO LOSS OF DATA.
The graph below shows how i varies with n in comparison to the current standard.ieee-754 graph

The graph was generated from the following tables. Here are the column definitions.
Bytes (Digits):

Data storage size in bytes or base-64 digits
Bits:
Data storage size in bits (8 bits per byte; 6 bits per base-64 digit)
Sign:
The sign is a one-bit value
BiasExp:
The width, i of the biased exponent in bits
Significand:
The width of the significand in bits
Precision:
The number of decimal digits of precision (Ada digits)
DynRange:
Twice the bias converted to decimal digits
bytes bits sign biasExp significand precision bias dynRange
1 8 1 3 4 1.20 3 1.81
2 16 1 5 10 3.01 15 9.03
3 24 1 7 16 4.82 63 37.93
4 32 1 8 23 6.92 127 76.46
5 40 1 9 30 9.03 255 153.53
6 48 1 10 37 11.14 511 307.65
7 56 1 10 45 13.55 511 307.65
8 64 1 11 52 15.65 1023 615.91
9 72 1 11 60 18.06 1023 615.91
10 80 1 12 67 20.17 2047 1232.42
11 88 1 12 75 22.58 2047 1232.42
12 96 1 13 82 24.68 4095 2465.44
13 104 1 13 90 27.09 4095 2465.44
14 112 1 14 97 29.20 8191 4931.47
15 120 1 14 105 31.61 8191 4931.47
16 128 1 15 112 33.72 16383 9863.55
digits bits sign biasExp significand precision bias dynRange
1 6 1 2 3 0.90 1 0.60
2 12 1 4 7 2.11 7 4.21
3 18 1 5 12 3.61 15 9.03
4 24 1 7 16 4.82 63 37.93
5 30 1 8 21 6.32 127 76.46
6 36 1 8 27 8.13 127 76.46
7 42 1 9 32 9.63 255 153.53
8 48 1 10 37 11.14 511 07.65
9 54 1 10 43 12.94 511 307.65
10 60 1 11 48 14.45 1023 615.91
11 66 1 11 54 16.26 1023 615.91
12 72 1 11 60 18.06 1023 615.91
13 78 1 12 65 19.57 2047 1232.42
14 84 1 12 71 21.37 2047 1232.42
15 90 1 13 76 22.88 4095 2465.44
16 96 1 13 82 24.68 4095 2465.44
17 102 1 13 88 26.49 4095 2465.44
18 108 1 14 93 28.00 8191 4931.47
19 114 1 14 99 29.80 8191 4931.47
20 120 1 14 105 31.61 8191 4931.47
21 126 1 15 110 33.11 16383 9863.55
22 132 1 15 116 34.92 16383 9863.55

IEEE-754 pt 2 – Composition

There are five important components to an IEEE-754 floating point number

  1. sign – a one-bit value indicating whether the value is negative
  2. biased exponent – determines the magnitude of the floating-point value
  3. significand – the precision of the floating point value derives from its width
  4. bias – the dynamic range of the floating point value derives from its width
  5. implicit bit – has an implied weight of 1.

In the following illustrations let us break down the composition of a floating point value n bits wide. Constants are shown as 0 and 1. The variable x represents any 0 or 1; the variable y represents at least one 1 and the variable z represents at least one 0.  Note that the position i of the implicit bit (blue) is also the width of the biased exponent (orange) and is the least significant bit (LSB) of both the bias (green) and the biased exponent. The sign (red) is always the most significant bit (MSB) and the significand (yellow) is always aligned on the implicit bit.  The weight of each bit of the significand is one half the weight of the predecessor bit. Thus, bit i+1 is one half the weight of the implicit bit or one half. This means the significand represents a value between 1.0 up to but not including 2.0. The product of this value and 2biasedExponent-bias is the magnitude. Applying the sign to the magnitude yields what is called the normalized value.

ieee-754 comp

The minimum value for a biased exponent is zero. The IEEE-754 standard reserves this value for zero and subnormal values.  The maximum value for the biased exponent is the sum of twice the bias and one (i.e. the exponent with bias removed is greater than the bias). The IEEE-754 standard reserves this value for infinity and Not-a-Number (NaN) When the significand is zero, the value is an infinity; but, when the significand has at least one 1 the result is a NaN. When the 1 is the most significant bit of the significand and the remaining bits are zero, the NaN is called a quiet NaN. When any other bit of the significand is set, the NaN is called a signaling NaN and the set bit is the signal.

ieee-754-cases

IEEE-754 pt 1 Floating Point Standard

Floating point arithmetic allows one to deal with numbers that are very large or very small by combining a number with an exponent. In the early 80s there were many approaches to doing floating-point arithmetic. It was like the software equivalent of the  tower of Babel. In 1983 the military’s Ada programming language took the approach of specifying the number of digits of precision and sweeping the implementation details under the rug. Binary interoperability became possible when the IEEE released the IEEE-754 floating point standard. Floating point units (FPUs) that implemented the standard quickly emerged. For binary formats the standard specifies four sizes: 16, 32, 64 and 128 bits. In Ada these would be precisions of 3, 6, 15 or 33 digits.  Half-precision is a storage only format (i.e. it is not used for computation).  That begs the question, if the precision requirement is for an in-between value (e.g. 9 or 11 digits), can we conserve memory with storage formats that meet the requirements for precision but also take less storage? The answer is absolutely yes, but in order to do that we need to add storage-only binary formats to the IEEE-754 standard and understand the implications of widening a storage format to a computational format and narrowing a computational format to fit within a storage only format.