Random number generation in Python and C++

When I was writing my evolutionary computing program in Python, I often wondered what would have been different if I had used C++ instead. Execution time would be faster, but development time would be slower. In this post, I’ll compare the performance of the two languages in a program I wrote that simulates 100 million coin flips.

I wrote the first version of the program using Python. Python is a good programming language to use for statistical simulation because it has a built-in random library which implements the Mersenne Twister. I executed this version of the program in both Python 2.7.2 and PyPy 1.7. PyPy is an alternative Python interpreter with improved performance. The Python code can be executed using the different interpreters without modification.

I wrote the second version of the program using C++. C++ has a simple random number generator, rand(), which should not be used for any serious statistical simulation. I used two libraries that implemented the Mersenne Twister. My first C++ version uses the GNU Scientific Library (GSL). I was helped by the GSL documentation and by the GSL example in this post. My second C++ version uses the new random number capabilities in C++11. I was helped by this example of uniform_real_distribution. The C++ code can be executed using the two different libraries by changing a few declarations and function calls.

Here are the results of running the code on my desktop computer:

Version Time
Python 23.9 s
PyPy 6.3 s
C++ using GSL 1.8 s
C++11 21.7 s

So assuming I’m not using C++11 incorrectly, it seems that it has really poor performance. There is a decent performance boost when using PyPy over native Python interpreter. But GSL does the best of all.

The files are available in this github repository.

What I learned from Evolutionary Computing

875 lines of code and 55 hours. Evolutionary Computing was one of the most challenging courses I have taken at SCSU. These statistics come from our final project, which was a month-long research project. Our assignment was to build on previous projects and produce a 5-page paper suitable for publication at a conference.

I learned a lot while completing this project. My project studied spanning trees on complete graphs, and how to evolve them using a genetic algorithm. Maybe in a follow-up post I’ll introduce these concepts. For now, I want to show what I learned while writing those 875 lines of Python code.

  • Write a testing function for non-trivial tests. I was checking something by hand, and thought my program wasn’t working. Then I ended up spending several hours debugging something that was not in fact a bug.
  • Dropbox is an automated version control. Several times I used it to retrieve an old version of a file that I had not committed. This came in very handy.
  • In vim, you can do :edit! to reload a file from disk. (documentation here)
  • Time estimating is important when running your jobs on other people’s machines.
  • I learned how to profile code using cProfile. On Ubuntu, cProfile can be found in package python-profiler.


In git, you can choose to commit only certain changes in a file. It can be done by using git gui. The GUI basically shows you the output of git diff for the changed files in your repo, and you can right click on a line in the diff and choose “Stage Line For Commit.”

I only used this a few times, but it is a lot better than doing the following:

$ cp dandelion.tex dandelion.tex.bak
$ git checkout -- dandelion.tex
$ #manually copy over the changes that you do want to commit
$ git add dandelion.tex
$ git commit -m "This is the hard way"


When writing tables in LaTeX, you must put the label in the caption in order for it to refer to a specific table number. If you don’t, your ref will only refer to the number of the section. Here is a working code snippet:

The times can be seen in Table ref{wallclock}.

caption{Wall clock times in seconds for a single
run of the genetic algorithm using the PyPy interpreter.
begin{tabular}{l r}