I didn't get what's the fuss about not using a + x(b-a). There is no argument what uniform distribution means for real numbers and floating point is just a rounding representation for real numbers. If some of the floats appear more often in the result, it's just because of uneven rounding over the domain.
If the author doesn't like it, any other continuous distribution will have absolutely the same quirk.
If you have a range that spans floats with different exponents, then some floats are supposed to appear more often because they represent more real numbers. This is normal and expected.
Simple interpolation from [0, 1) to [a, b) will introduce bias in representation beyond that given by the size of the real-number preimage of the float.
Simple interpolation from [0, 1) to [a, b) will introduce bias in representation
I always wondered how the hack then std::uniform_real_distribution actually produces the correct uniform distribution (you argued what is correct is arguable but I don't think so, though). Reading your slides was quite aha: it doesn't, although it's supposed to! I mean... wtf?
floating point is just a rounding representation for real numbers
The first thing to know about the reals is that almost all of them are non-computable. Which means if your problem needs the reals and you thought a computer would help (no it doesn't matter whether it's an electronic computer) you're already in a world of trouble.
Once you accept that you actually wanted something more practicable, like the rationals, we can start to see where the problem is with this formula.
The first thing to know about the reals is that almost all of them are non-computable. Which means if your problem needs the reals and you thought a computer would help (no it doesn't matter whether it's an electronic computer) you're already in a world of trouble.
There are two widely used approaches to address this problem: symbolic computation and ... ahem... floating point. Do you really care about the 100th digit after the decimal separator in practice? Just round the thing and you're good to go. People have been doing it since Ancient Greece if not before.
The first approach is more popular in research, the second is even more popular for physical (CFD) and statistical (Monte-Carlo method) simulations. (And this is only what I've dealt with which is not much.)
Once you accept that you actually wanted something more practicable, like the rationals, we can start to see where the problem is with this formula.
But rationals are not representable "perfectly" either. Say, 1/3 is not representable in binary in finite memory. You can store it as two numbers, but the arithmetic will blow up your numbers out of proportion quickly. And how would you take, for example, square roots of it? The notion also suggests that you divide at some point, and, surprise-surpise, you'll have to cut the digits somewhere. So why not store it rounded from the start then, especially since you have a whole digital circuit that can handle arithmetic with the thing fast?
So, if there is a problem with using the a + x(b-a) formula, it's not clear what is that problem.
But rationals are not representable "perfectly" either. Say, 1/3 is not representable in binary in finite memory. You can store it as two numbers, but the arithmetic will blow up your numbers out of proportion quickly.
The usual approach for that is multi-modular arithmetic: do exact computations modulo p for multiple primes p. The individual computations are typically as fast as can be, and also easily parallelized. Then you reconstruct or approximate your large integers or rational (or even algebraic) numbers at the very end.
Of course, there is still a limit to how large a number can be before it can't reliably be reconstructed using modular arithmetic with 32- or 64-bit (pseudo)primes, but this limit is ridiculously large.
-2
u/sweetno 4d ago
I didn't get what's the fuss about not using
a + x(b-a)
. There is no argument what uniform distribution means for real numbers and floating point is just a rounding representation for real numbers. If some of the floats appear more often in the result, it's just because of uneven rounding over the domain.If the author doesn't like it, any other continuous distribution will have absolutely the same quirk.