In parallel simulations it’s invaluable to be able to be able to provide independent pseudo-random streams to each thread to guarantee reproducible results. We might try to accomplish this by instantiating generators with pseudo-random states for each thread, but this offers no guarantees about the distribution of states. Given enough threads, it quickly becomes likely that two generators are insufficiently separated across the state space. Another strategy is to provide generators with different parameters to each thread. While feasible for a small number of threads, this is not a sustainable solution for all generators.
Tina’s Random Number Generator Library (TRNG) offers two techniques for constructing parallel pseudo-random number generators. Given threads, each which will consume
pseudo-random values, it is sufficient to construct generators whose initial states are at least
steps apart. Blocksplitting works by instantiating
generators to some initial state and then advancing each by
steps for
. As long as you don’t underestimate the requisite number of pseudo-random values required, the streams will be disjoint. TRNG also utilizes Leapfrogging, a method that decomposes the initial random stream into substreams. Given
streams instantiated from consecutive states from some generator, if the state of each generator is advanced by
steps per invocation, the resulting streams are disjoint. Unfortunately this can bring statistical defects in a generator to the surface.
Linear Congruential Generators (LCG) are a simple and venerable class of PRNG of the form:
. They have some fairly abhorrent properties which were first brought to light in Marsaglia‘s 1968 paper, Random Numbers Fall Mainly in the Planes [pdf]. Nevertheless, they’re simple, fast, and widely used1. Pierre L’Ecuyer‘s 1988 paper Efficient and portable combined random number generators discusses some properties of Multiplicative Linear Congruential Generators (MLCG), a simplified variant of LCGs, with the parameter
. Usefully we can cheaply and arbitrarily advance the state of a MLCG.
Given some state , we can find the next state given the formula
. To advance it by
steps, simply compute
. For large values of
, it is impractical to do naively. Instead we precompute
for all
and advance the generator using these steps. Thus, to advance a generator for
steps we compute:
thus requiring only three additional multiplications (or approximately of the work).
The same method can be used to arbitrarily step Xorshift generators. The Xorshift transition function is just the state vector multiplied with a transition matrix in :
. Thus, to advance the state by an arbitrary step
, we find
.
is computed in the same manner that we computed
above.
I strongly recommend against using either Xorshift generators or MLCGs for any serious simulation purpose. In addition to its underlying weaknesses, a MLCG run in Leapfrog mode will yield a predictable low-order bits, while Xorshift generators fail some rudimentary statistical tests. Still, they have been successfully incorporated into more robust generators, and this ‘jumping’ property may be useful in the construction of better parallel PRNGs.
1. For example java.util.Random implements a 48-bit LCG, truncating the lower 16-bits to produce a 32-bit output. Please don’t use this for any simulation that matters.

, but how long does sorting take? It may be tempting to use radix sort, which takes time
. Since we’ve used 32-bit numbers as our key,
. However, k almost surely exceeds the
factor we would have if we used merge sort or quicksort. We therefore arrive at a complexity of
. As if linearithmic time wasn’t bad enough, there’s another caveat: it’s very difficult to sensibly avoid key collisions. How many 32-bit values can we pick before the probability that two are the same exceeds 0.50?
(half of our key space) the probability of a key collision exceeds 0.50. Assuming the number of pairs is
. If we relax the collision probability to 0.05, we find
. Ack! If you’re performing multiple shuffles, for example in Monte Carlo trials, some bias is inevitable.
equally likely sequences of swaps. Let’s construct a mapping from sequences to permutations. If this algorithm was unbiased, each permutation could be generated by exactly k swaps, where k is equal to the number of swaps divided by the number of permutations. (Note that the existence of some k does not imply an absence of bias.)
is not an integer we conclude that shuffling by swaps does not yield an unbiased shuffle.
bits per element. This is equivalent to generating a sequence of n integers on the interval [0,n) using rejection sampling.
).[1] Here’s a method that works if you can generated unbiased longs. It’s limited by the size of longs on your machine, but you can easily iterate it if this is an issue.


)[2]. Simply change the instantiation of x:


elements and a pseudo-random integer
where
. In other cases it introduces some bias in favor of those elements with a low index. The first
elements will occur with probability
, while the remaining elements occur with probability
. For very large values of
, in this case
, the difference in these probabilities is minuscule and will never be of practical concern. For a given value
(e.g.
, the nearest power of 2 greater than n), the bias introduced by this method is significant and is compounded as a shuffle is executed. I used this method to shuffle the array [1..128] and calculated the mean value at each index over
trials. I normalized the values by subtracting the means generated by an unbiased shuffle and then normalized the data (
.) The method described above is biased so as to be sensitive to the original ordering of a set of elements. This should be enough reason to prefer rejection sampling instead.
, we should reject half of the values we generate, giving an expectation of
bits.