Pick a number between one and ten. You picked the number seven. Was I right? Doesn’t matter, the point is there’s a slim chance you picked one or ten. The reason is simply because humans are bad at random number generating. Fear not! Computers are bad at it too, but that’s a problem for a world that revolves around cyber security. Randomness is so important to our daily lives, that we don’t even notice it… mostly.
Well I’m feeling better and to celebrate I finally have the energy to put some serious effort into my post today and I’m going to excite and titillate by talking about randomness. This may seem… random, but I’ve been giving it a lot of thought lately as a researcher because randomness is important to what we do, but as humans we can’t be trusted to create random numbers on our own and today we’re talking about why the hell that is good for the IRS, but bad for security and yes, even research.
Randomized trials are a pain. Mostly because we’re stuck randomizing the numbers ourselves, even if we’re blinded to what the numbers correspond to, leaving a human to shuffle the numbers manually (I.E. writing them out on a piece of paper) leaves much to be desired. That’s because our definition of randomness and true randomness are two very different things. To us a perfectly random set of numbers between one and ten would look something like this:
8, 3, 7, 1, 9, 5, 2, 4, 10, 6
When true randomness may look closer to:
1, 4, 5, 6, 9, 2, 3, 10, 7, 8
In both cases, I did what we would do in research, I “randomly” selected numbers, but the second is truly more random than the first even though there are consecutive numbers in the string. If you, for example, put a playlist on random you may hear music from the same musician repeatedly and wonder why it’s broken. That’s because our definition of random isn’t really random, our definition is more like, well mixed, but ironically being “well mixed” actually makes it less random.
Think about it like this, if I have a playlist that I know has no song by the same artist back to back, then I know with certainty that the next artist in a playlist won’t be the same as the current artist! Put another way, when we’re talking something that is truly random, we want a “high entropy” system, or high disorder. If songs are evenly distributed than that is a low entropy system because they are spread out evenly throughout the playlist making them more ordered than if the same artist was repeated a few times. This disconnect between our brains and the real-world have implications that the IRS takes advantage of to catch tax cheats for example.
Enter Benford’s law, which is not about randomness, but about patterns and how we are horrible at seeing them. Benford’s law says that there is a higher chance your string of numbers will start with a one than any other number, furthermore there is a decaying relationship so the number two will appear less than one and nine will be one of the least frequent numbers you should see in a set of large strings of numbers (like for tax forms for example!). That’s because as your string of numbers gets bigger, the number one starts the string more often, let’s look at an example and you can try this at home.
For small strings, say numbers between 0 and 99, 11 percent of the numbers start with 1, and 11 percent start with each digit from 2 to 9. But most of the time, in taxes anyway, we’re not dealing with small numbers like that. So let’s look at something slightly larger now, say between 0 and 199. In this case, over half of the numbers start with 1, and less than 6 percent start with 2 to 9. Now a larger string, numbers between 0 and 299, 37 percent start with 1 and 37 percent start with 2, and the numbers 3 through 9 start 3.7 percent each. This goes on and on, so over a large enough data set, the distribution of leading digits follows a predictable pattern. The bigger the integer, the less likely it is to be the first digit in a data set.
Like I said, this isn’t exactly “random” because there is a pattern here. It’s just an example of the fact that we don’t see the pattern so create a different pattern. In fact, most people will try to pick larger numbers, numbers that should appear least frequently, to help break up the pattern, ironically drawing attention to the fact that the numbers have been fudged. It’s this inability to see patterns that make us horrible at randomness!
So you’re saying humans are horrible at seeing patterns and worse at creating random numbers? Yes, yes I am! In fact, I’ve annoyed my PI’s (all of them) by using the term pseudo-random to describe our human generated “random shuffling” because it isn’t actually random, we just like to think it is, same for computer generated suffling.
While the whole internet is protected by strings of random numbers, called encryption, there is a pattern behind encryption that is hard for us to understand or “see” because it is complex. At the end of the day, it’s that increasing complexity keeps us safe. However, that complexity can still be cracked because deep inside that “random number” is something very algorithm driven. In short, we would be safer with true randomness, so how do we generate true randomness in a random real world and a pseudo-random algorithmic driven cyber world? Well, we turn to help from the real world.
The most creative (or dare I say random) way I’ve seen random numbers being generated is via lava lamp. Yep, the thing that went out of style faster than bellbottoms is keeping our data safe! Cloudflare (last time I checked) turned to a wall packed full of lava lamps to help generate truly random numbers. Now, even this could theoretically be modeled and blah, blah, blah, but the truth is it would be extremely difficult to do and can be assumed to be as random as we can possibly get. No really, lava lamps, cloudflare isn’t the first either, back in the early 90’s lavarand was the first (or probably the first). The video below talks about how and why cloudflare does this and even mentions lavarand! There is a longer and more detailed explanation from cloudflare themselves about how this works (here).
The nice thing about the video is they also mention other ways to generate randomness using the real world, from accelerometer sensors on a cellphone to radioactive isotopes (take that Schrödinger) there are ways to generate all sorts of random numbers that don’t rely on (1) the human brain, and (2) algorithms. Don’t get me wrong, the photos of the lava lamps are used to seed the algorithm which generates the hash used to protect you or me (even this blog!), but that seed needs to be as random as possible for any of the system to work because if it’s not, well then it’s predictable and you’re always one person away from the whole system from falling apart.
Do we really live in a random world? Who knows, this could all be some complex simulation with a super complex algorithm deciding everything in the background. I myself, could just be some not-so-random algorithm typing away on my simulated computer, while somewhere there exists people I can’t even imagine and they using us as a seed for a pseudo-random number generator, a super boring, but super practical, version of the Matrix.
Now, pick a number…