In defense of floating point
June 28, 2025 at 3:06 PM by Dr. Drang
I’ve noticed that many programmers have a phobia about floating point numbers. They see something like this (a Python interactive session, but a similar thing could be done in many languages),
python:
>>> sum = 0.0
>>> for i in range(10):
... sum += 0.1
...
>>> sum
0.9999999999999999
and decide never to trust floating point numbers again. Web pages with titles like “Why Are Floating Point Numbers Inaccurate?” and “What is a floating point number, and why do they suck” help promote the mistrust.1 I fear this post published yesterday by John D. Cook will do the same.
The gist of Cook’s article, which is perfectly correct, is that the overwhelming majority of 32-bit integers cannot be represented exactly by a 32-bit float. And an even greater majority of 64-bit integers cannot be represented exactly by a 64-bit float.
If your response to the previous paragraph is “Well, duh!” you’re my kind of people. The mantissa of a 32-bit float is only 24 bits wide (one of the bits is implicit), so of course you can only represent a small percentage of the 32-bit integers. After accounting for the sign bit, you have a 7-bit deficit.
But here’s the thing: a 32-bit float can represent exactly every integer from -16,777,216 to 16,777,216 ( to ). Here’s a quick demonstration in an interactive Python session:
python:
>>> import numpy as np
>>> n = 2**24
>>> ai = np.linspace(-n, n, 2*n+1, dtype=np.int32)
>>> af = np.linspace(-n, n, 2*n+1, dtype=np.float32)
>>> np.array_equal(af.astype(np.int32), ai)
True
As Cook explains, there are actually many more integers that can be represented exactly by a float32
, but there are gaps between them. The run from -16,777,216 to 16,777,216 has no gaps.
That’s a big range, possibly bigger than you need. And you’re more likely to be using double precision floats than single precision. For float64
s, the mantissa is 53 bits (again, one bit is implicit), so they can exactly represent every integer from -9,007,199,254,740,992 to 9,007,199,254,740,992. Yes, as Cook says, that’s a very small percentage of 64-bit integers, but it’s still damned big.
JavaScript programmers understand the practical implications of this. By default, JavaScript stores numbers internally as 64-bit floats, so you’ll run into problems if you need an integer greater than 9 quadrillion. That’s why JavaScript has the isSafeInteger
function and the BigInt
type.
I guess the main point is understand the data types you’re using. You wouldn’t use an 8-bit integer to handle values in the thousands, but it’s fine if the values stay under one hundred. The same rules apply to floating point. You just have to know how they work.
-
The author of the second piece apparently doesn’t trust question marks, either. ↩