So I have been planning a series of podcasts on Cryptographic Controls. In the process of this planning, I fell into one of the classic traps that crypto-geeks fall into: obsessing about random number generators (RNGs).
(FYI, for the impatient, click here.)
There are two ways to generate random numbers on computers: (1) use a software program called a Pseudorandom Number Generator (PRNG) or (2) use a hardware random number generator. A Pseudorandom Number Generator uses a seed value to generate a sequence of numbers that appear random. The problem is that the same seed generates the same random sequence. The hardware based RNG observes and samples some physical phenomenon which is random, such as cosmic rays, RF noise, etc. (aka Entropy).
RNGs are important in Information Security because they are used to generate encryption keys, salts, etc. Historically, attacking RNGs has proven effective, such as the defeat of Netscape’s HTTPS sessions.
Most operating systems utilize a hybrid approach, implementing a PseudoRandom Number Generator that has a seed that is regularly updated through the collection of random hardware events. This process is called Entropy Collection or Entropy Harvesting. For most applications, this approach should be completely sufficient. However, one of the key assumptions is that the operating system has been up and running long enough for the seed value itself to become hard to predict through the collection of Entropy. Also, many of the Entropy collecting events come from properties of hardware devices, such as the minor variations in hard drive rate of rotation. As such, there are a few circumstances where the OS RNG may not be good enough for strong cryptographic key generation:
- Live Boot CD ( The start state of the RNG may be predictable. )
- Virtualized Hosts ( OS may be dependent on simulated events for randomness. )
( Given the exploding popularity of virtualization, this is an area worthy of research. Stay tuned. )
Design of the Got Entropy Service
Many RNGs (such as the one included in Linux, as well as OpenSSL’s) allow the addition of entropy from outside sources. So I started looking to Entropy sources I could use to bolster the RNGs on my virtual hosts (and other uses…). While I was looking into this, it occurred to me that I had an unused TV tuner card, a PVR-350.
When a TV is tuned to a channel with no local station, the ‘snow’ on the screen is RF noise (the same as the static between stations on AM radios). But, for reasons beyond our scope, you never use a direct physical observation as the RNG. You have to ‘de-skew and whiten’ the data prior to sampling it. Here is the process that I use:
- Collect about 3 minutes of video ( about 130 MB data ).
- Using a random key and IV, encrypt the data ( using openssl & AES-128-CBC ).
- Discard the first 32k of the file.
- Use each of the following 32k blocks as samples.
- Compress each sample with SHA-256.
- Discard the last block.
- Steps 2 and 3 remove any patterns, such as MPEG file formatting, from the data.
- Steps 4 and 5 generate a 32-byte random value ( 1024 to 1 compression in the hash ).
Check it out at http://gotentropy.artofinfosec.com
Can an Attacker Broadcast a Signal to Undermine This?
Such an attacker could not remove RF noise from the received signal. Our eyes and brains are good at filtering out the noise in the TV video, but there is a lot of it. Part of the noise comes from the atmospheric background RF, but there are also flaws (noise) in the tuner’s radio and analog-to-digital capture circuitry.
I think this is a pretty strong RNG, and I have provided an interface for pulling just the values.
Also, I have written a script ( getEntropy.sh ) that will pull Entropy from the service and seed it into /dev/random on Linux.
Results from ENT
Here are results, from a sample run of the Got Entropy, analyzed by ENT ( A Pseudorandom Number Sequence Test Program provided by John Walker of http://www.fourmilab.ch – Thanks, John! ).
- Entropy = 7.999987 bits per byte
- Optimum compression would reduce the size of this 13366112 byte file by 0 percent.
- Chi square distribution for 13366112 samples is 233.85, and randomly would exceed this value 82.48 percent of the time.
- Arithmetic mean value of data bytes is 127.4767 (127.5 = random).
- Monte Carlo value for Pi is 3.143054786 (error = 0.05 percent).
- Serial correlation coefficient is -0.000078 (totally uncorrelated = 0.0).
Resources for the Curious…