*How to properly generate random numbers in C++11 and onwards.*

## Random numbers – the rand() way

The most common way of generating random numbers, till C++11 came along, was something like this:

std::srand(std::time(NULL)); for (int i = 0; i < N; ++i) { std::cout << std::rand() % 100; }

For those who don’t know, `std::rand()`

and `std::srand()`

functions come from `<cstdlib>`

and are with us for a couple of decades now. They were also the only standard-compliant choice for generating random numbers in C++ up to C++98. It was a pity because the `std::rand()`

function was not defined well enough to be portable across different platforms. There are no guarantees with regard to the quality of produced numbers. The algorithm responsible for generation is unspecified and can be different for each C standard library implementation and even for different versions of the same library! Moreover, `std::rand()`

may depend on a global state making it problematic to use in a multi-threaded code. Current standard says that it’s implementation-defined whether the function is thread-safe. Also, this is a low-quality pseudo-random number generator (PRNG), which makes it worthless for cryptography.

Even if `std:rand()`

produced high quality, uniformly distributed numbers, there is another problem – distribution on arbitrary ranges. Specification of the function says that it computes pseudo-random integers in the range from 0 to RAND_MAX, where RAND_MAX shall be at least 32767. Now, if you want uniformly distributed numbers in a range [0, 100), you might be tempted to use the above method of using modulo operator to do that. That’s a big mistake because you can’t know if RAND_MAX is evenly divisible by 100 or by any other number you might be interested it. That means some numbers in [0, 100) are more likely to appear than others. Also, what if you want numbers in range [0, 65536)?

All of the shortcomings of the `std::rand()`

led to the implementation proposal of an extensible random number facility, now available from the `<random>`

header. Looking at the header can be downright scary, but it doesn’t have to be so, as I’ll try to explain (if you’re not interested in details, please skip to “Examples”).

## Characteristics of random number generators

There are a few important characteristics describing each RNG. As such, it’s important to realize what qualities an RNG has before its use. Donald E. Knuth noted that “*random numbers should not be generated with a method chosen at random*” [1]. We can choose appropriate engine basing on following things [2]:

**Type of result**. The engine returns integers, real numbers or both.**Speed**. How many numbers can be generated in a second?**Quality.**There are many methods to measure the quality of generated numbers, like period/cycle length, dimensional distribution, and others.**State size.**Each engine needs some space to keep its state.**Option for independent subsequences.**Can we quickly get independent subsequences for use in parallel computation?

Depending on an application, one of those properties can be seen as the most important. One of the most commonly mentioned property is period length, which defines the maximum length of the sequence without repetition. Speed is also very important for PRNGs. Quick, high-quality generators are most desired. If generated numbers are used in multi-dimensional computations, dimensional distribution has to be taken into account. Beautiful presentation of linear congruential method’s drawbacks in this regard can be seen below:

The `<random>`

header is divided into engines, adaptors, and distributions. This division enables mixing of engines with distributions to get a generator with required properties.

## Engines

There are three basic engine templates: ,`std::linear_congruential_engine`

`std::mersenne_twister_engine`

and `std::subtract_with_carry_engine`

. Each of those is named after the algorithm it implements. All three are deterministic. So, if you wonder which one to choose, google the algorithm and read about it pros and cons. For the most common uses, you should be fine with `std::mersenne_twister_engine`

, which is both fast and of high-quality. The other two have other qualities but I can’t delve into more details about each without making this post substantially longer. Anyway, you’ll only use those templates through their specializations, so you don’t have to worry about them.

Beside deterministic engines, the standard also includes the specification for non-deterministic engine called `std::random_device`

. An implementation of this engine should use some external source of entropy to ascertain that generated numbers can’t be realistically predicted. The source can be time between keystrokes, time of incoming network packets, mouse cursor moves or a specialized hardware chip. However, if an available hardware can’t be used for this purpose, the `std::random_device`

may be implemented as a PRNG. Be sure to check your standard library docs before assuming anything.

## Adaptors

Adaptors are template classes, which modify the underlying algorithm implemented in the base engines, thus changing the statistical characteristics of generated sequences. For example 24-bit RANLUX algorithm is defined as `std::discard_block_engine<std::subtract_with_carry_engine<std::uint_fast32_t, 24, 10, 24>, 223, 23>`

. Here, `std::discard_block_engine`

is used to modify numbers returned by the base engine, which is `std::subtract_with_carry_engine`

.

There are 3 engine adaptors specified, but of little direct use for an average programmer. It’s just good to know their role when you have to dig into the details.

## Engines with predefined parameters

If RANLUX definition didn’t give you a chill, then take a look at `std::mt19937`

engine definition:

std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31, 0x9908b0df, 11, 0xffffffff, 7, 0x9d2c5680, 15, 0xefc60000, 18, 1812433253>

What are those magic numbers? Those are very carefully chosen constants, which are used to calibrate the generator’s algorithm. The standard includes a couple of predefined RNGs so that you don’t have to wrap your head around choosing appropriate values. In fact, “*Finding parameters for a given function f that make for good randomness in the resulting engine’s generated numbers x(i) requires extensive and specialized mathematical training and experience. Usually, there are only a few (less than five) well-known good parameterizations for each engine, so it appears feasible to provide these.” *[2]. Be safe and use the predefined engines with constants chosen by people, who have given it much more thought and testing than any single person can do on its own.

Ok, ok, but which one to choose? There are 10 predefined RNGs. Unless you do something really advanced or have special needs like feeding parallel algorithm with independent sequences, you’re good and safe with `std::mt19937`

. If you’re curious about other engines, please check “Predefined random number generators” section of the reference.

## Distributions

Having a fast, high-quality engine, which produces uniformly distributed random numbers doesn’t solve our original problem. The problem is that we have to fit all of those numbers to a given range with desired distribution. That’s where classes modeling distributions come in.

In the process of designing of the specification for random numbers in C++, it was decided to split distributions into five “families”:

- Uniform distributions
- Bernoulli distributions
- Poisson distributions
- Normal distributions
- Sampling distributions

Each distribution is represented by an appropriate class. Objects of those classes take numbers returned by a generator and perform steps required to calculate sequences that will meet the requirements imposed by chosen distribution. They are completely independent of an underlying engine.

Again, as with engines, those distributions should meet the needs of an average user. It’s an even easier choice, because if you’ve had any lessons of statistics you naturally know which one to choose. The complete list can be found here.

## Examples

I’m 1000 words already and no examples. Here they come. The first example shows how the code from the beginning of this post should look like:

std::random_device rd; std::mt19937 gen{rd()}; std::uniform_int_distribution<> dis{0, 99}; for (auto i = 0; i < N; ++i) { std::cout << dis(gen); }

We create `std::random_device`

engine (non-deterministic) and then use it to seed Mersenne Twister engine. If you choose a constant instead of that, you’ll get the same result on each run. After that, we create an object `dis`

representing desired distribution. In this case, we want uniformly distributed integers in the range [0, 99]. And that’s it! Use `operator()`

of `dis`

and you’re done. Beautifully, uniformly distributed random numbers go to the output.

Another example. Let’s say you want to simulate a coin toss. If I remember correctly from statistics classes that’s a Bernoulli distribution. The only thing to change from the above is distribution class.

std::random_device rd; std::mt19937 gen{rd()}; std::bernoulli_distribution dis{0.5}; for (auto i = 0; i < N; ++i) { std::cout << (dis(gen) ? "head" : "tail") << " "; }

The number 0.5 passed to the distribution object means that the probability of getting “head” or “tail” is 50%. On the output, you should see about the same number of “head” and “tail” if the sequence is long enough.

The thing you should remember from this is that **engine + distribution = your RNG**. It’s that simple.

## References and further reading

[1] Donald E. Knuth *Art of Computer Programming, Volume 2: Seminumerical Algorithms*

[2] A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398)

[3] Absolutely great talk by Stephan T. Lavavej, who talks about everything that’s contained in this post and more.

[4] This paper, which explains which distributions are important for inclusion in the standard, with examples of use.

Please leave a comment if you think/know I got something wrong or just want to share an idea. Thanks!

Header photo “Dice”by Daniel Dionne, available under Creative Commons Attribution-ShareAlike license.

The Bernoulli distribution for a perfectly balanced coin (p=0.5) feels like overkill. A random integer [0,1] should suffice:

std::uniform_int_distribution dis{0, 1};

The strength of the Bernoulli distributions is most apparent with “loaded” coins with uneven mass distribution, where heads may turn up with probability 0.6, say.

std::bernoulli_distribution dis{0.6};

By the way, is there any performance difference between distributions? Or is it negligible compared to the time taken by the engine itself?

Hi! You’re right, the example for Bernoulli distribution could be better, but I just wanted to show how easy it is to use another distribution.

It’s hard to say what’s the difference in performance between different distributions because the standard says that their algorithms are implementation-defined. As often with performance, it boils down to checking the docs of particular implementation and profiling. I haven’t checked that myself.

Thanks for your comment.

“std::rand() and std::rand()”

I think you didn’t mean to repeat the same function twice.

Fixed. Thanks!

Hi, nice article. But there is problem seeding the mersene twister: you seed it with only one integer, so it will give you only 2^32 different sequences. Its internal state is whopping 624 such integers. Unfortunately, seeding it correctly in std C++ is cumbersome, I believe boost does it better.

Thanks! I wasn’t aware of this at the time of writing this article and it was already pointed out in a discussion on Reddit.