A **floating-point number** is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. In particular, it represents an integer or fixed-point number (the **significand** or, informally, the **mantissa**) multiplied by a base (usually 2 in computers) to some integer power (the **exponent**). When the base is 2, it is the binary analog of scientific notation (in base 10).
A *floating-point calculation* is an arithmetic calculation done with floating-point numbers, and often involves some approximation or rounding because the result of an operation may not be exactly representable,

In a floating-point number, the number of significant digits (the relative precision) has a maximum, rather than the number of digits after the radix point (the absolute precision) as in fixed-point.

Table of contents |

2 Usage in computing 3 Problems with floating-point 4 IEEE standard 5 Examples 6 References |

A floating-point number *a* can be represented by two numbers *m* and *e*, such that *a = m × b ^{e}*.
In any such system we pick a base

This scheme allows a large range of magnitudes to be represented within a given size of field, which is not possible in a fixed-point notation.

As an example, a floating-point number with four decimal digits (b=10, p=4) and an exponent range of ±4 could be used to represent 43210, 4.321, or 0.0004321, but would not have enough precision to represent 432.123 and 43212.3 (which would have to be rounded to 432.1 and 43210). Of course, in practice, the number of digits is usually larger than four.

Floating-point numbers usually behave very similarly to the real numbers they are used to approximate. However, this can easily lead programmers into over-confidently ignoring the need for numerical analysis. There are many cases where floating-point numbers do not model real numbers well, even in simple cases such as representing the decimal fraction 0.1, which cannot be exactly represented in any binary floating-point format. For this reason, financial software tends not to use a binary floating-point number representation. See: " class="external">http://www2.hursley.ibm.com/decimal/

Errors in floating-point computation can include:

- Rounding
- Non-representable numbers: for example, the literal 0.1 cannot be represented exactly by a binary floating-point number
- Rounding of arithmetic operations: for example 2/3 might yield 0.6666667

- Absorption: 1×10
^{15}+ 1 = 1×10^{15} - Cancellation: subtraction between nearly equivalent operands
- Overflow / Underflow

The IEEE has standardized the computer representation for binary floating-point numbers in IEEE 754. This standard is followed by almost all modern machines. Notable exceptions include IBM Mainframes, which have both hexadecimal and IEEE 754 data types, and Cray vector machines, where the T90 series had an IEEE version, but the SV1 still uses Cray floating-point format.

The IEEE 754 standard is currently (2004) under revision. See: http://grouper.ieee.org/groups/754/

- The value of Pi, &pi = 3.1415926...
_{10}decimal, which is equivalent to binary 11.001001000011111..._{2}. When represented in a computer that allocates 17 bits for the significand, it will become 0.11001001000011111 × 2^{2}. Hence the floating-point representation would start with bits 01100100100001111 and end with bits 01 (which represent the exponent 2 in the binary system). Note: the first zero indicates a positive number, the ending 10_{2}= 2_{10}.) - The value of −0.375
_{10}= 0.011_{2}or 0.11 × 2^{−1}. In two's complement notation, −1 is represented as 11111111 (assuming 8 bits are used in the exponent). In floating-point notation, the number with start with a 1 for the sign bit, followed by 110000... and then followed by 11111111 at the end, or 1110...011111111 (where ... are zeros).

- Kahan, William (2001). How Java's floating-point hurts everyone everywhere. Retrieved Sep. 5, 2003 from http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
- an edited reprint of the paper
*What Every Computer Scientist Should Know About Floating-Point Arithmetic*, by David Goldberg, published in the March, 1991 issue of Computing Surveys