Friday, January 5, 2024

Floating point

Scientific notation refers to a number with a single digit to the left of the decimal point and an exponential figure to the right. For example 1.23x10^2 is the scientific notation for 123. 

Normalised number refer to a scientific notation without leading zero. For example, 0.123x10^4 a s not a normalised scientific notation.  

Floating point is an encoding of the normalised scientific notation in binary in a word. The fraction part represents the precision and the exponent part represents range. The fraction is assumed to have a leading one which is not included in the encoding. 

The first bit is a sign bit followed by 8 bits for exponent and 23 bits for fraction. So the precision is 24 bits with the implicit leading 1. For double, the fraction is 52 (+1) bits long. The exponent is 11 bits long. 

The decimal value represented is equal to (-1)^sign x fraction ^ exponent

No comments: