Generating true random numbers from bananas

It was a dreary afternoon in Milan last year when, procrastinating on studying, I was struck by a flash of genius. “What would happen if I made a banana-powered random number generator?“. I immediately went to tell my roommate, who was also an electronic engineer.
He looked me in the face and burst out laughing. At that moment I realized that I had a great project in my hands.

Before being taken for a fool: it really does make sense, and this post is here to explain it to you. Let’s start at the root of the problem: computers are deterministic systems. Put in a less complicated way, if we always give them the same input data, they always return the same output values. Which is exactly what we expect from a computer. It is clear from the start, however, that determinism and randomness are not great friends. In fact, a computer, per se, is not capable of doing anything random.

If the reader is one of the lucky ones to have written some C programs, you will surely have run into some rand() function call. I myself have made extensive use of it in posts ([1][2]) about the calculation of pi with the Monte Carlo method.
For the uninitiated, rand() is the function that (with a lot of imagination in the name) allows you to get random numbers in a C program. But we just got done saying that there is nothing random about computers. So?

So we’ve been fooled. Unfortunately, friends, we’ve all been fooled. I take my share of responsibility, too, for flippantly calling them random numbers in posts about Pi. We should have called them pseudo-random numbers. Which is the name computer scientists give to those sets of numbers that have the distribution in line with random numbers, but which are not actually random.

To understand it better, to be called random, a set of numbers must have at least (a necessary but not sufficient condition) these two characteristics:

  1. Each number must have the same probability as every other number to appear in our list (taken a reference interval). This is what we call uniform distribution;
  2. The sequence of numbers must not be predictable in advance.

Clearly, the difficulty with deterministic machines is in answering point 2. Point 1 is easily solved, and gives rise to what we have just called pseudo-random numbers. Numbers which respect the uniform distribution, but which are not actually truly random.

But what do bananas have to do with anything?!

Now we are getting there!
When you need to provide a computer with true random numbers, you use hardware systems, called true random number generators (TRNG). There are many types of TRNG, which exploit different physical random quantities and convert them into digital information that is passed to the computer.
The most common ones exploit physical phenomena such as thermal noise of resistors, the avalanche effect of diodes and other chaotic effects. Others exploit more complex quantum phenomena such as shot noise, radioactive decay or photonic effects.

The possible way using bananas is that of radioactive decay. Bananas in fact are known to contain a lot of potassium, and a small but significant percentage of the potassium present in nature is radioactive. Specifically we are talking about the 40K isotope, which makes up 0.01% of potassium in nature. Plus they’re delicious with lemon and sugar, which alone would be a great reason to always have one on hand.

Now that the “banana-powered random number generator” statement makes a little more sense or at least has some context, one question remains: what do we do with true random numbers in a computer? My answer would like to be “I don’t care, I want the focus of my project to be limited to generating them”, but I’m sure that would leave many unsatisfied.

So I’ll try to answer this question as well: encryption. This is the main reason why random numbers and their relationship with computers are studied. Random numbers are used to generate cryptographic keys, which are the only factor that determines the effectiveness of an encryption system. As Kerckhoffs Principle states: “the security of a cryptosystem should not depend on keeping the crypto-algorithm hidden, but only on keeping the key hidden“. (Copy that, NXP?)
It is clear that if the attacker can predict the key in some way, it will no longer be hidden and we will be in the presence of a vulnerable system. A “good random” therefore lies at the foundation of a good cryptographic system. [3]

Okay but… if random is random, how do we distinguish good random from bad random?

In order to analyze the quality of random number generators, specially designed software tools are usually used. Two of the most popular ones are ent [4] and dieharder [5]. The first one is designed as a lightweight test for radioactive decay random number generators, it is very simple and fast, requires little data but the result is purely indicative.
Dieharder on the other hand is a test suite that is considered the gold standard for random number generators, it performs very thorough tests but needs gigabytes of samples on which to run its tests.
For now we’ll just focus on the results obtained with ent.

Let’s prepare the data to run a first test with ent.
The data are written by the generator on the serial, we can save them on a file from linux console with cat /dev/ttyACM0 >> sampletext.txt. We exploit the bash stream redirect command in “append” mode, this way we can stop the acquisition and resume it later without overwriting the file.

The sample collected during two days consists of 90628 16-bit numbers, one per line. However, the numbers are saved as ascii text files, while ent analyzes binary files. You can write a very short program in C to convert them to binary:

#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <stdint.h>
 
int main(int argc, char const *argv[]) {
 
  FILE * lettura = fopen("textsample.txt", "r");
  assert(lettura != NULL);
 
  FILE * scrittura = fopen("sample.txt", "wb");
  assert(scrittura != NULL);
 
  uint16_t N = 0; //N is 16 bytes
  char bytes[2];
  char buffer[6]; // 5 char + terminator
 
  do{
    fscanf(lettura,"%s",buffer); // put one line in the buffer
    N = atoi(buffer); // from char array to integer
    bytes[0] = (N >> 8); // take the 8 msb
    bytes[1] = (N & 0xFF); // take the 8 lsb
    fwrite(bytes, 1, sizeof(bytes), scrittura); // output raw msb and lsb
  }while (!feof(lettura));
 
  fclose(lettura);
  fclose(scrittura);
  return 0;
}

Done this, we can run the ent test for the first time:

valerio@valerio ~/tests $ ./ent campione.txt 
Entropy = 7.997995 bits per byte.

Optimum compression would reduce the size of this 181256 byte file by 0 percent.

Chi square distribution for 181256 samples is 498.15, and randomly would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.4942 (127.5 = random).
Monte Carlo value for Pi is 3.138799695 (error 0.09 percent).
Serial correlation coefficient is 0.005408 (totally uncorrelated = 0.0).

Ent gives us several parameters:

  • Entropy: entropy is the amount of “randomness” contained in a portion of information. Information theory tells us that the minimum size theoretically obtainable through compression without loss of information, is represented by the value of entropy.
  • Chi-square distribution: this test is used to understand how well the distribution of our values adheres to a theoretical distribution (in our case the uniform distribution). From the ent manual, this value should be as close as possible to 256, with percentage values between 10 and 90%.
  • Arithmetic mean: The simple arithmetic mean of our bits. Since our values are between 0 and 255, it should be about equal to 127.
  • Value of pi with Monte Carlo method: The method should be well known by those who follow my posts, but in this context is more a nice data than a useful one.
  • Autocorrelation: represents the dependence between the values of the series. In the optimal case must be equal to zero.

From this first test all values are passable, except the chi square. In the next post we will explore why and possible solutions. The second part, with a dive on the mathematics, is available here at this link.

Notes:
[1] Follia computazionale: pi greco e il metodo Monte Carlo
[2] Follia computazionale: pi greco, i thread e la follia collettiva
[3] inside secure – The importance of true randomness in cryptography – white paper
[4] Ent: A Pseudorandom Number Sequence Test Program 
[5] Dieharder: A Random Number Test Suite – Robert G. Brown 

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

9 pensieri riguardo “Generating true random numbers from bananas”

  1. 8 bit parallel version with smaller tubes ought to be doable. Easier to use a single HV block and split it 8 ways with stepdown diodes and then fine tune bias so the tubes have equal entropy.

  2. Very cool! A good thing 40K has such a long half-life of 1.25 bn years, so the operation of the BRNG isn’t limited by that, only by the life time of the banana. As the banana in the picture looks nearly ripe enough for eating, it is a good thing it is renewable resource. :–)

  3. to a non-code nerd this will sound like a Monty Python joke… but is sincere… those bananas… African or European? which breed of domesticated bananas is better?

    (there’s a bunch of Iceland-based greenhouses raising bananas, so that qualifies as European)

  4. Such a great post! Lots of fun to read but super instructive at the same time!
    Please continue procrastinating from time to time!

  5. The decay of radioactive potassium in the banana may not be enough to significantly raise the signal rate above background noise with the given detector. That may be what the widely off chi value is showing. You might get better results from a more prolific source such as the americium source from a smoke detector.

    1. The decay from the banana is detectable albeit not huge. The reason for the chi value being wrong is explained in the second part and has nothing to do with decay and everything to do with microcontrollers.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.