May 042011

In parallel simulations it’s invaluable to be able to be able to provide independent pseudo-random streams to each thread to guarantee reproducible results. We might try to accomplish this by instantiating generators with pseudo-random states for each thread, but this offers no guarantees about the distribution of states. Given enough threads, it quickly becomes likely that two generators are insufficiently separated across the state space. Another strategy is to provide generators with different parameters to each thread. While feasible for a small number of threads, this is not a sustainable solution for all generators.

Tina’s Random Number Generator Library (TRNG) offers two techniques for constructing parallel pseudo-random number generators. Given p threads, each which will consume j pseudo-random values, it is sufficient to construct generators whose initial states are at least j steps apart. Blocksplitting works by instantiating p generators to some initial state and then advancing each by jk steps for k = \{0,1,\ldots,p-1\}. As long as you don’t underestimate the requisite number of pseudo-random values required, the streams will be disjoint. TRNG also utilizes Leapfrogging, a method that decomposes the initial random stream into substreams. Given p streams instantiated from consecutive states from some generator, if the state of each generator is advanced by p steps per invocation, the resulting streams are disjoint. Unfortunately this can bring statistical defects in a generator to the surface.

Linear Congruential Generators (LCG) are a simple and venerable class of PRNG of the form:
s_{i+1} \equiv a * s_i + c \mod m. They have some fairly abhorrent properties which were first brought to light in Marsaglia‘s 1968 paper, Random Numbers Fall Mainly in the Planes [pdf]. Nevertheless, they’re simple, fast, and widely used1. Pierre L’Ecuyer‘s 1988 paper Efficient and portable combined random number generators discusses some properties of Multiplicative Linear Congruential Generators (MLCG), a simplified variant of LCGs, with the parameter c = 0. Usefully we can cheaply and arbitrarily advance the state of a MLCG.

Given some state s_i, we can find the next state given the formula s_{i+1} = a * s_i \mod m. To advance it by j steps, simply compute s_{i+j} = a^j * s_i \mod m. For large values of j, it is impractical to do naively. Instead we precompute a^j \mod m for all j = 2^k and advance the generator using these steps. Thus, to advance a generator for j = 1073741893 steps we compute:
s_{i+j} = a^{1073741893} = a^{2^{30}} * a^{2^6} * a^{2^2} * a^{2^0} * s_i \mod m
thus requiring only three additional multiplications (or approximately 2.8 * 10^{-7}\% of the work).

The same method can be used to arbitrarily step Xorshift generators. The Xorshift transition function is just the state vector multiplied with a transition matrix in GF(2): s_{i+1} = s_i * T. Thus, to advance the state by an arbitrary step j, we find s_{i+j} = s_i * T^j. T_j is computed in the same manner that we computed a^j \mod m above.

I strongly recommend against using either Xorshift generators or MLCGs for any serious simulation purpose. In addition to its underlying weaknesses, a MLCG run in Leapfrog mode will yield a predictable low-order bits, while Xorshift generators fail some rudimentary statistical tests. Still, they have been successfully incorporated into more robust generators, and this ‘jumping’ property may be useful in the construction of better parallel PRNGs.

1. For example java.util.Random implements a 48-bit LCG, truncating the lower 16-bits to produce a 32-bit output. Please don’t use this for any simulation that matters.