# Big numbers

November 4, 2014 at 5:18 PM by Dr. Drang

The other day I needed to shuffle a list of about 200 items as part of a random sampling program. The obvious solution in Python is

```
python:
random.shuffle(theList)
```

which shuffles the list in place. But there’s a caveat in the `random`

module’s documentation that gave me pause:

`random`

.shuffle(x[,random])

Shuffle the sequence x in place. The optional argumentrandomis a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function`random()`

.Note that for even rather small

`len(x)`

, the total number of permutations ofxis larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated.

How small is “even rather small”? Is 200 “a long sequence”? The answer was actually pretty easy to figure out.

First, how many permutations of a 200-item list are there? I fired up IPython and got the number.

```
In [1]: import math
In [2]: math.factorial(200)
Out[2]: 7886578673647905035523632139321850622951359776871732
632947425332443594499634033429203042840119846239041772121389
196388302576427902426371050619266249528299311134628572707633
172373969889439224456214516642402540332918641312274282948532
775242424075739032403212574055795686602260319041703240623517
008587961789222227896237038973747200000000000000000000000000
00000000000000000000000L
```

Python nicely switches to arbitrary precision arithmetic to give the answer. That’s the kind of number that begs to be put in scientific notation, but unfortunately it’s beyond the capacity of `float`

. So how do I get the exponent? The same way I learned back in junior high: count the number of places the decimal point has to move to the left. I converted it to a string and counted how many digits there were to the right of the first one.

```
In [3]: len(str(math.factorial(200))[1:])
Out[3]: 374
```

So the answer is \(7.89 \times 10^{374}\).

How does this compare to the period of the random number generator Python uses? The documentation says

Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.

OK, it should be pretty obvious that \(2^{19937}\) is a lot bigger than \(10^{374}\), but I wanted to see the period in scientific notation, too. Applying the same technique, I got

```
In [4]: len(str(2**19937 - 1)[1:])
Out[4]: 6001
In [5]: str(2**19937 - 1)[:5]
Out[5]: '43154'
```

In scientific notation, then, the period of the Mersenne Twister is \(4.32 \times 10^{6001}\), which is about 5600 orders of magnitude bigger than the number of permutations.

Another way to check would be to take the base-2 logarithm of \(200!\):

```
In [6]: math.log(factorial(200), 2)
Out[6]: 1245.3805070592084
```

This means that \(200! \approx 2^{1245}\), which can be compared directly to \(2^{19937}\). Either way, it’s clearly safe to use `random.shuffle()`

on a 200-item list.

How big can a list be before the number of permutations is greater than the period of the Mersenne Twister? I could have dug into SciPy’s root-finding functions to solve

\[\log_2(x!) - 19937 = 0\]but it seemed easier to just use trial and error. I soon found the answer to be 2080:

```
In [13]: math.log(factorial(2080),2)
Out[13]: 19932.556355821675
In [14]: math.log(factorial(2081),2)
Out[14]: 19943.579417071407
```

That doesn’t mean it’s safe to use `random.shuffle()`

on a 2080-item list. I’m not combinatorially sophisticated enough to know how much “cushion” is necessary between the number of permutations and the period of the random number generator, but I do feel safe now using `random.shuffle()`

on lists of several hundred items.