Lately I have been thinking about encryption (haven’t we all?) and as an exercise I have written my own encryption algorithm that I’m going to describe in this article. Of course i know rolling your own is a bad idea, but that doesn’t mean its not fun.

I base it on the idea that i want to use the simplicity of a one time pad, but to have a considerably shorter key. If we have a small key we should be able to procedurally generate an infinitely long key from the initial key seed. Note: at this point anyone who knows anything about encryption can see an obvious weakness here: the procedural algorithm will produce a pattern that can be found and used to break the key. True, but that is a mathematical way of thinking about it: a specific algorithm yields a specific pattern. But what if the algorithm isn’t specific? What if the key describes the algorithm? If the key is data that can be used as an algorithm to produce data, we can create a cycle where the algorithm is self modifying and therefor wont create a pattern.

One way of thinking about it is to imagine the encryption algorithm as a virtual machine that produces a one time pad, and new instructions for the virtual machine. All we really need to do is to ensure that the virtual machine never gets stuck in a loop where it produces an output that makes it repeat its previous operations over and over.

That’s pretty much the basic idea, and once you start to think about it you realize that you don’t need a full virtual machine, you can do something much simpler that has similar characteristics.

pos_a = key[0];

pos_b = key[1];

pos_c = key[2];

for(i = 0; i < length; i++)

{

old_a = pos_a;

pos_a = key[pos_b] % key_size;

pos_b = (pos_a + 1 + key[pos_c] % (key_size – 1)) % key_size;

pos_c = (pos_a + 1 + key[old_a] % (key_size – 1)) % key_size;

decrypted[i] = encrypted[i] ^ key[pos_a] ^ key[pos_b];

key[pos_c] = (key[pos_c] << 31) | (key[pos_c] >> 1);

key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i];

}

Lets go over this code and start by first analyzing the key line here:

decrypted[i] = encrypted[i] ^ key[pos_a] ^ key[pos_b];

This is the encryption using a simple XOR. XOR in it self is unbreakable because any input can yeld any output with the right key value. However If we re-use the same XOR key more then once it becomes possible to guess the key. The assumption of any encryption algorithm must always be to make the message unbreakable even if the breaker has a part of the message in plain text. So the first thing we do is to XOR with 2 different parts of the key; key[pos_a] and key[pos_b]. The breaker now knows two numbers if XORed together will produce the XOR difference between the message. if we a working with a 32 bit implementation that means 4 billion combinations. That’s a lot, but its still a clue. So the next thing we do is to destroy that clue:

key[pos_c] = (key[pos_c] << 31) | (key[pos_c] >> 1);

key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i];

Here we take a third portion of the key, key[pos_c], that the adversary still haven got a clue about and use it to destroy one of the two XOR factors. To this we add in the decrypted message and a counter, that will add a poison pill and prevent the algorithm to ever get stuck in a pattern. By adding the decrypted message we also add the same entropy as the message it self has to the possible combinations. To make sure we have good entropy we also shift the key one step, so that we aren’t constantly XORing the same bits. Then finally we get to this:

old_a = pos_a;

pos_a = key[pos_b] % key_size;

pos_b = (pos_a + 1 + key[pos_c] % (key_size – 1)) % key_size;

pos_c = (pos_a + 1 + key[old_a] % (key_size – 1)) % key_size;

Here we simply use the key to recursively select the 3 sections of our key we will use in our above algorithm. Since none of these position values are exposed, they obfuscate how the algorithm work as they will just modify how the algorithm selects its key values, they wont actually be used in the math relating to the message. Since the keys at pos_a and pos_c will be XORed to destroy the key, they cant be the same, and since the key at pos_a and pos_b are used to decrypt the message, they cant be the same. The core idea here is that the adversary can crack the key, but not how the key was generated as that process is fenced of from the encryption process.

I would love to see if anyone can break this. If you want to try here are a few assumptions you can make: The key is only used once and is random, but assume that you have access to both the encrypted and a significant part of the plain text message (The encryption should hold up even if an attacker can accurately guess significant parts of the plain text). I’m very curious as to how the key size and amount of plain text data is given can impact the security of the encryption.

This is one of thous times I wish i was very rich so that i could offer up a big cash price, but maybe i can owe you beer if you break it?

Edit:

The lines:

pos_a = key[0];

pos_b = key[1];

pos_c = key[2];

should obviously be:

pos_a = key[0] % key_size;

pos_b = key[1] % key_size;

pos_c = key[2] % key_size;

See? I already look stupid!

Yeah, what blake said. What happened to Eskil and LOVE?

Yeah, what blake said. What happened to Eskil and LOVE?

Nearly a year since the last update any chance of a post?

not entirely sure i understand the pseudo-code.

let’s say we have 3 keys, each 32 bits long.

key[0] = 0x3243F6A8

key[1] = 0x885A308D

key[2] = 0x313198A2

since we only have 3 keys, we can’t really use the key values themselves to generate new keys.

that is i can’t use key[0x3243F6A8], which you seem to do all over your code.

i did an analysis of a closely related cipher (8 bit integers) at https://github.com/andrewcooke/Stupid.jl

I’ve posted a thread on the xkcd forums to see what they have to say.

http://forums.xkcd.com/viewtopic.php?f=12&t=106073

There are some simple tests you can (and should) do yourself: Run some example data through and measure randomness / entropy of the output. Especially for input with clear patterns (all the same, non-random distribution of values, image data with large areas and few colours, …) you want those patterns not to be in the output; more precisely you don’t want any patterns in the output – measure the entropy of the output.

Also, how easy is it for someone who gets hold of some data and its encrypted counter-part to derive the key from that knowledge?

For anyone interested in learning basic crypto constructions, there is a free, high-quality Crypto I course from Dan Boneh at Stanford.

https://www.coursera.org/course/crypto

You can’t have a one time pad and a shorter key at the same time. The one time pad would only have the entropy of the shorter key and that makes it lose it’s advantages.

However, if you make your algorith ‘hard’ enough, it can have a purpose. What you’re doing is called ‘key stretching’ : https://www.schneier.com/paper-low-entropy.pdf

Great idea, i just did.

Very neat. I wish I had a better understanding of this stuff. I’d recommend submitting to Hacker News (https://news.ycombinator.com/), they’re always happy to point out any mistakes.

Also, your English seems to be improving compared to older blog posts. Congrats on that.