Cryptography

Cryptography, from the Greek kryptós (secret) and graphein (writting) is the technique of transmitting secret information that a recipient can access without other parties being able to. Terminology is now established that we call Alice ("A") the sender, Bob ("B") the recipient and Eve ("E") the eavesdropper. The message is called plaintext and its encrypted version is called cyphertext. Cryptography consists in using a cypher (an algorithm) to turn the plaintext into cyphertext and back:

plaintext 🡒 cyphertext.

Both the destination and a Hacker will reverse this process, one legitimately and accessing the intended information, the other in violation of the intended secrecy.

Caesar's cipher

An example of a cipher is, for instance, the following sequence:

Na rknzcyr bs n pvcure vf, sbe vafgnapr, gur sbyybjvat frdhrapr:

which is a naively encrypted version of the original sentence, using the rot13 algorithm which rotates by 13 the Latin alphabet, i.e., a🡒n, b🡒o, c🡒p, etc. Because 13 is half the alphabet, rotating it twice brings us back to the original message: a🡒n🡒a, b🡒o🡒b, etc. This is a 13-shift version of a cipher actually used by Caesar, who was rotating the alphabet by -3 letter:

a🡒x, b🡒y, c🡒z, d🡒a, e🡒b

Such operations are described mathematically by modular arithmetics, $\oplus$, which equates two numbers $a$ and $b$ modulo $m$, written

$$a\equiv b \pmod m$$

if they differ by a multiple of the modulo $m$, i.e., iff there exists $k\in\mathbb{N}$ such that

$$a=b+km$$

Here is a text Caesar might have encoded:

Yhql, ylgl, ylfl

in rot13, this would have given:

Irav, ivqv, ivpv

in rot1

Wfoj, wjej, wjdj

One problem of such ciphers is that one can notice patterns and break them. Caesar's cipher is particularly fragile because one can try all rotations of the alphabet and see which soup of letter makes sense. This is called a "brute-force attack". Even for such simple algorithms, one would not use brute force but more subtle technique, like frequency analysis, i.e., the fact that the letter 'e' occurs on average 12.702% in the English language, as compared to 'z' which occurs only 0.077% of the time. Between these two extremes, on has for frequent letters 't' (9.356%), 'a' (8.167%), 'o' (7.507%), etc., and, on the rare end, 'j' (0.153%), 'x' (0.150%), 'q' (0.095%). So by counting how many times a letter occurs, and matching it with its frequency, one can make good guesses as to which letter is which. Of course, frequency analysis is not limited to occurences of single letters, but also of combinations of two successive lettes (digrams) and three (trigrams), etc. Note that one can use letters, numbers or basically anything for the encrypted message...

Vigenère cipher

While Caesar's cipher had only one number, the $x$ in rot$x$, to encrypt and decrypt, one can easily make the cipher more robust by using a "key", which encodes the information that rotates the alphabet in a varying way, so that the same letter in the cyphertext can correspond to different letter of the plaintext, and vice-versa. The simplest variation is to decide on a key, for instance:

secret

which translates in arithmetics as:

19.5.3.18.5.20

and that can be repeated to fit the message's length:

secretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecret

This turns Jule Vernes' message

Man, a mere inhabitant of the earth, cannot overstep its boundaries! But though he is confined to its crust, he may penetrate into all its secrets.

into

Ffq, s rykj lfmuunwssn hk wzj ytwwz, hugsrl tpxwvljj byv ttogidjnyl! Gxl ybhzjz my bx fgszbshv yi byv uwoly, kw rur uhfjnkfww nhmt ddq cmx vwhlxyv.

since the plaintext in alphabet arithmetics reads:

13.1.14, 1 13.5.18.5 ...

which yields:

13$\oplus$19.1$\oplus$5.14$\oplus$3.1$\oplus$18.13$\oplus$5.5$\oplus$20.18$\oplus$19.5$\oplus$5

which evaluates to

6.6.17.19.18.25.11.10

since 13$\oplus$19=32=6 mod 26 and 18$\oplus$19=11 mod 26, or, back to alphabet:

Ffq, s rykj ...

This is better but still vulnerable. In particular, if one knows the length of the key, one can then break the cyphertext into as many sub-texts and perform frequency analysis on these, thus reconstructing the key. One way to guess the length of the key is called Kasiski's examination and consists in finding lucky repetitions of patterns, e.g.:

cryptography shall be shortened to crypto
secretsecret secre ts ecretsecr et secret

This encodes by modular arithmetic to:

vwbhyizwdhms lmddq vx xkgwnxshv yi vwbhyi

where one can see the odd repetition of vwbhyi in vwbhyizwdhms lmddq vx xkgwnxshv yi vwbhyi. Since there are 24 letters between these two repetitions, this suggests that the key is a divisor of 24, which mean it has length 1 (Caesar's cipher), 2, 3, 4, 6, 8, 12 or 24. That's a lot of possibilities. In a long text, other occurences would allow to narrow it down, and one does not count on such generous repetitions but look at repetitions even of digrams. In the example above, although a very small sample, we have yi appearing three times (it corresponds to to, which is the 15th most occuring digram in English, after "th", "he", "in", "an", "er", "on", etc.)

There are other tricks, for instance, since modular arithmetics is commutative, subtracting the cipher text from itself offset by the key length produces the plain text encrypted with itself. If any "probable word" in the plain text is known or can be guessed, its self-encryption can be recognized, which allows recovery of the key by subtracting the known plaintext from the cipher text. Key elimination is especially useful against short messages.

One-time pad

It is easy to see what is the problem of the previous method. The key itself has redundancy, which eventually leaks into the cyphertext. Therefore, it should be enough, actually, to simply use a much longer key (one can also use several short keys which relative lenghts are prime and successively encrypt the message). This actually provides perfect cryptography, in the sense that it cannot be broken even with unlimited computing power (this was proven by Claude Shannon). In fact it is easy to show that any cyphertext can be decoded to any plaintext whatsoever using the appropriate key. For instance:

Youshouldknow

decrypts to

We come in peace.

with the key XXX but it decrypts to

We will wipe you.

with the key XXX. So it's easy to ensure perfect security, just use a unique key, as long as the message. The problem with that, though, is that both Alice and Bob must share the key and make sure nobody else has access to it. How do they share this secret? If they exchange long texts, and often, they need a lot of such keys. It is difficult to secure such a fresh, disposable information, so they need a way to generate cryptographic keys, securely. Here comes the concept of "asymmetric keys" (the schemes above are using "symmetric" keys), which involves two keys: a public one, that everybody has access to, even the eavesdropper, and a secret one.

RSA

A reverse application of RSA makes it useful for signing or authentificating a plaintext.

70’s by Ron Rivest, Adi Shamir, and Leonard Adleman, hence RSA

RSA is rather slow so it’s hardly used to encrypt data , more frequently it is used to encrypt and pass around symmetric keys which can actually deal with encryption at a faster speed

Asymmetric systems use a public key to encrypt a message and a private key to decrypt it

Contents

Cryptography

Caesar's cipher

Vigenère cipher

One-time pad

RSA