Skip to content

Random numbers generation in C++

How to properly generate random numbers in C++11 and onwards.

Random numbers – the rand() way

The most common way of generating random numbers, till C++11 came along,  was something like this:

std::srand(std::time(NULL));

for (int i = 0; i < N; ++i) {
  std::cout << std::rand() % 100;
}

For those who don’t know, std::rand() and std::srand() functions come from <cstdlib> and are with us for a couple of decades now. They were also the only standard-compliant choice for generating random numbers in C++ up to C++98. It was a pity because the std::rand() function was not defined well enough to be portable across different platforms. There are no guarantees with regard to the quality of produced numbers. The algorithm responsible for generation is unspecified and can be different for each C standard library implementation and even for different versions of the same library! Moreover, std::rand() may depend on a global state making it problematic to use in a multi-threaded code. Current standard says that it’s implementation-defined whether the function is thread-safe. Also, this is a low-quality pseudo-random number generator (PRNG), which makes it worthless for cryptography.

Even if  std:rand() produced high quality, uniformly distributed numbers, there is another problem – distribution on arbitrary ranges. Specification of the function says that it computes pseudo-random integers in the range from 0 to RAND_MAX, where RAND_MAX shall be at least 32767. Now, if you want uniformly distributed numbers in a range [0, 100), you might be tempted to use the above method of using modulo operator to do that. That’s a big mistake because you can’t know if RAND_MAX is evenly divisible by 100 or by any other number you might be interested it. That means some numbers in [0, 100) are more likely to appear than others. Also, what if you want numbers in range [0, 65536)?

All of the shortcomings of the std::rand() led to the implementation proposal of an extensible random number facility, now available from the <random> header. Looking at the header can be downright scary, but it doesn’t have to be so, as I’ll try to explain (if you’re not interested in details, please skip to “Examples”).

Characteristics of random number generators

There are a few important characteristics describing each RNG. As such, it’s important to realize what qualities an RNG has before its use. Donald E. Knuth noted that “random numbers should not be generated with a method chosen at random” [1]. We can choose appropriate engine basing on following things [2]:

  • Type of result. The engine returns integers, real numbers or both.
  • Speed. How many numbers can be generated in a second?
  • Quality. There are many methods to measure the quality of generated numbers, like period/cycle length, dimensional distribution, and others.
  • State size. Each engine needs some space to keep its state.
  • Option for independent subsequences. Can we quickly get independent subsequences for use in parallel computation?

Depending on an application, one of those properties can be seen as the most important. One of the most commonly mentioned property is period length, which defines the maximum length of the sequence without repetition. Speed is also very important for PRNGs. Quick, high-quality generators are most desired. If generated numbers are used in multi-dimensional computations, dimensional distribution has to be taken into account. Beautiful presentation of linear congruential method’s drawbacks in this regard can be seen below:

random numbers in 3D
Hyperplanes of a linear congruential generator in three dimensions. (by JN~commonswiki, available under Creative Commons Attribution-ShareAlike)

The <random> header is divided into engines, adaptors, and distributions. This division enables mixing of engines with distributions to get a generator with required properties.

Engines

There are three basic engine templates: ,std::linear_congruential_engine std::mersenne_twister_engine and std::subtract_with_carry_engine. Each of those is named after the algorithm it implements. All three are deterministic. So, if you wonder which one to choose, google the algorithm and read about it pros and cons. For the most common uses, you should be fine with std::mersenne_twister_engine, which is both fast and of high-quality. The other two have other qualities but I can’t delve into more details about each without making this post substantially longer. Anyway, you’ll only use those templates through their specializations, so you don’t have to worry about them.

Beside deterministic engines, the standard also includes the specification for non-deterministic engine called std::random_device. An implementation of this engine should use some external source of entropy to ascertain that generated numbers can’t be realistically predicted. The source can be time between keystrokes, time of incoming network packets, mouse cursor moves or a specialized hardware chip. However, if an available hardware can’t be used for this purpose, the std::random_device may be implemented as a PRNG. Be sure to check your standard library docs before assuming anything.

Adaptors

Adaptors are template classes, which modify the underlying algorithm implemented in the base engines, thus changing the statistical characteristics of generated sequences. For example 24-bit RANLUX algorithm is defined as std::discard_block_engine<std::subtract_with_carry_engine<std::uint_fast32_t, 24, 10, 24>, 223, 23>. Here, std::discard_block_engine is used to modify numbers returned by the base engine, which is std::subtract_with_carry_engine .

There are 3 engine adaptors specified, but of little direct use for an average programmer. It’s just good to know their role when you have to dig into the details.

Engines with predefined parameters

If RANLUX definition didn’t give you a chill, then take a look at std::mt19937 engine definition:

std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31, 
                             0x9908b0df, 11, 
                             0xffffffff, 7, 
                             0x9d2c5680, 15, 
                             0xefc60000, 18, 1812433253>

What are those magic numbers? Those are very carefully chosen constants, which are used to calibrate the generator’s algorithm. The standard includes a couple of predefined RNGs so that you don’t have to wrap your head around choosing appropriate values. In fact, “Finding parameters for a given function f that make for good randomness in the resulting engine’s generated numbers x(i) requires extensive and specialized mathematical training and experience. Usually, there are only a few (less than five) well-known good parameterizations for each engine, so it appears feasible to provide these.” [2]. Be safe and use the predefined engines with constants chosen by people, who have given it much more thought and testing than any single person can do on its own.

Ok, ok, but which one to choose? There are 10 predefined RNGs. Unless you do something really advanced or have special needs like feeding parallel algorithm with independent sequences, you’re good and safe with std::mt19937. If you’re curious about other engines, please check “Predefined random number generators” section of the reference.

Distributions

Having a fast, high-quality engine, which produces uniformly distributed random numbers doesn’t solve our original problem. The problem is that we have to fit all of those numbers to a given range with desired distribution. That’s where classes modeling distributions come in.

In the process of designing of the specification for random numbers in C++, it was decided to split distributions into five “families”:

  • Uniform distributions
  • Bernoulli distributions
  • Poisson distributions
  • Normal distributions
  • Sampling distributions

Each distribution is represented by an appropriate class. Objects of those classes take numbers returned by a generator and perform steps required to calculate sequences that will meet the requirements imposed by chosen distribution. They are completely independent of an underlying engine.

Again, as with engines, those distributions should meet the needs of an average user. It’s an even easier choice, because if you’ve had any lessons of statistics you naturally know which one to choose. The complete list can be found here.

Examples

I’m 1000 words already and no examples. Here they come. The first example shows how the code from the beginning of this post should look like:

std::random_device rd;
std::mt19937 gen{rd()};
std::uniform_int_distribution<> dis{0, 99};
 
for (auto i = 0; i < N; ++i) {
  std::cout << dis(gen);
}

We create std::random_device engine (non-deterministic) and then use it to seed Mersenne Twister engine. If you choose a constant instead of that, you’ll get the same result on each run. After that, we create an object dis representing desired distribution. In this case, we want uniformly distributed integers in the range [0, 99].  And that’s it! Use operator() of dis and you’re done. Beautifully, uniformly distributed random numbers go to the output.

Another example. Let’s say you want to simulate a coin toss. If I remember correctly from statistics classes that’s a Bernoulli distribution. The only thing to change from the above is distribution class.

std::random_device rd;
std::mt19937 gen{rd()};
std::bernoulli_distribution dis{0.5};
 
for (auto i = 0; i < N; ++i) {
  std::cout << (dis(gen) ? "head" : "tail") << " ";
}

The number 0.5 passed to the distribution object means that the probability of getting “head” or “tail” is 50%. On the output, you should see about the same number of “head” and “tail” if the sequence is long enough.

The thing you should remember from this is that engine + distribution = your RNG. It’s that simple.

References and further reading

[1] Donald E. Knuth Art of Computer Programming, Volume 2: Seminumerical Algorithms

[2] A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398)

[3] Absolutely great talk by Stephan T. Lavavej, who talks about everything that’s contained in this post and more.

[4] This paper, which explains which distributions are important for inclusion in the standard, with examples of use.

 

Please leave a comment if you think/know I got something wrong or just want to share an idea. Thanks!


Header photo “Dice”by Daniel Dionne, available under Creative Commons Attribution-ShareAlike license.

 

6 Comments

  1. David Aceituno David Aceituno

    The Bernoulli distribution for a perfectly balanced coin (p=0.5) feels like overkill. A random integer [0,1] should suffice:

    std::uniform_int_distribution dis{0, 1};

    The strength of the Bernoulli distributions is most apparent with “loaded” coins with uneven mass distribution, where heads may turn up with probability 0.6, say.

    std::bernoulli_distribution dis{0.6};

    By the way, is there any performance difference between distributions? Or is it negligible compared to the time taken by the engine itself?

    • kszatan kszatan

      Hi! You’re right, the example for Bernoulli distribution could be better, but I just wanted to show how easy it is to use another distribution.

      It’s hard to say what’s the difference in performance between different distributions because the standard says that their algorithms are implementation-defined. As often with performance, it boils down to checking the docs of particular implementation and profiling. I haven’t checked that myself.

      Thanks for your comment.

  2. “std::rand() and std::rand()”
    I think you didn’t mean to repeat the same function twice.

    • kszatan kszatan

      Fixed. Thanks!

  3. Hi, nice article. But there is problem seeding the mersene twister: you seed it with only one integer, so it will give you only 2^32 different sequences. Its internal state is whopping 624 such integers. Unfortunately, seeding it correctly in std C++ is cumbersome, I believe boost does it better.

    • kszatan kszatan

      Thanks! I wasn’t aware of this at the time of writing this article and it was already pointed out in a discussion on Reddit.

Leave a Reply

Your email address will not be published. Required fields are marked *