For loss-less data compression, only redundant or insignificant data may be discarded. For example, the numbers 45 and 000000045 have the same value but in the second case there are seven redundant zeros before the significant data. A policy that aligns data on the least-significant bit (LSB) allows redundant data on the most-significant part to be discovered and discarded. This is the policy best suited to whole numbers.
Fractions can also have redundant data. Consider 0.125 versus 0.1250000000. In this example the seven trailing zeros are insignificant and may be discarded. A policy that aligns data on the most-significant bit (MSB) allows redundant data on the least-significant part to be discovered and discarded. This is the policy best suited to fractions.
When a number contains both a whole number and a fraction; the bit on which the two numbers align is a fixed number of bits from the most significant bit and is therefore msb-aligned. The alignment policies divide as follows:
- MSB Alignment:
64 # IEEE-754 real number policy
65 ^ Angle policy
66 : Date and Timestamp policy
67 ~ Logical set (bit-vector) policy
- LSB Alignment:
68 ? Boolean policy
69 - Twos-complement sign-extended integral value policy
70 + Unsigned integral value policy
71 @ Indexed element policy
72 * Array dimension policy
In my next posts I will be describing a novel data transport concept for the efficient transfer of both text and binary formatted data called Compressed Binary Format – eight bit (CBF-8). CBF-8 applies polymorphism to a stream of bytes to yield different types of data. With the CBF-8 encoder one can safely merge UTF-8 text with raw dimensioned byte arrays and binary formatted numbers such as real numbers, sign-extended integral values, unsigned integral values, logical values, Boolean values angle values, date values, timestamp values and a hierarchy of indexed objects in a common byte stream. The CBF-8 decoder decomposes the byte stream to yield the individual elements of the composition. The purpose of CBF-8 is to reduce the quantity of bytes transferred in database queries and their responses, thereby reducing required bandwidth and increasing responsiveness.
Floating point arithmetic allows one to deal with numbers that are very large or very small by combining a number with an exponent. In the early 80s there were many approaches to doing floating-point arithmetic. It was like the software equivalent of the tower of Babel. In 1983 the military’s Ada programming language took the approach of specifying the number of digits of precision and sweeping the implementation details under the rug. Binary interoperability became possible when the IEEE released the IEEE-754 floating point standard. Floating point units (FPUs) that implemented the standard quickly emerged. For binary formats the standard specifies four sizes: 16, 32, 64 and 128 bits. In Ada these would be precisions of 3, 6, 15 or 33 digits. Half-precision is a storage only format (i.e. it is not used for computation). That begs the question, if the precision requirement is for an in-between value (e.g. 9 or 11 digits), can we conserve memory with storage formats that meet the requirements for precision but also take less storage? The answer is absolutely yes, but in order to do that we need to add storage-only binary formats to the IEEE-754 standard and understand the implications of widening a storage format to a computational format and narrowing a computational format to fit within a storage only format.