There are five important components to an IEEE-754 floating point number
- sign – a one-bit value indicating whether the value is negative
- biased exponent – determines the magnitude of the floating-point value
- significand – the precision of the floating point value derives from its width
- bias – the dynamic range of the floating point value derives from its width
- implicit bit – has an implied weight of 1.
In the following illustrations let us break down the composition of a floating point value n bits wide. Constants are shown as 0 and 1. The variable x represents any 0 or 1; the variable y represents at least one 1 and the variable z represents at least one 0. Note that the position i of the implicit bit (blue) is also the width of the biased exponent (orange) and is the least significant bit (LSB) of both the bias (green) and the biased exponent. The sign (red) is always the most significant bit (MSB) and the significand (yellow) is always aligned on the implicit bit. The weight of each bit of the significand is one half the weight of the predecessor bit. Thus, bit i+1 is one half the weight of the implicit bit or one half. This means the significand represents a value between 1.0 up to but not including 2.0. The product of this value and 2biasedExponent-bias is the magnitude. Applying the sign to the magnitude yields what is called the normalized value.
The minimum value for a biased exponent is zero. The IEEE-754 standard reserves this value for zero and subnormal values. The maximum value for the biased exponent is the sum of twice the bias and one (i.e. the exponent with bias removed is greater than the bias). The IEEE-754 standard reserves this value for infinity and Not-a-Number (NaN) When the significand is zero, the value is an infinity; but, when the significand has at least one 1 the result is a NaN. When the 1 is the most significant bit of the significand and the remaining bits are zero, the NaN is called a quiet NaN. When any other bit of the significand is set, the NaN is called a signaling NaN and the set bit is the signal.