Contents

Cryptography

Cryptography, from the Greek kryptós (secret) and graphein (writting) is the technique of transmitting secret information that a recipient can access without other parties being able to. Terminology is now established that we call Alice ("A") the sender, Bob ("B") the recipient and Eve ("E") the eavesdropper. The message is called plaintext and its encrypted version is called cyphertext. Cryptography consists in using a cypher (an algorithm) to turn the plaintext into cyphertext and back:

plaintext 🡒 cyphertext.

Both the destination and a Hacker will reverse this process, one legitimately and accessing the intended information, the other in violation of the intended secrecy.

Caesar's cipher

An example of a cipher is, for instance, the following sequence:

Na rknzcyr bs n pvcure vf, sbe vafgnapr, gur sbyybjvat frdhrapr:

which is a naively encrypted version of the original sentence, using the rot13 algorithm which rotates by 13 the Latin alphabet, i.e., a🡒n, b🡒o, c🡒p, etc. Because 13 is half the alphabet, rotating it twice brings us back to the original message: a🡒n🡒a, b🡒o🡒b, etc. This is a 13-shift version of a cipher actually used by Caesar, who was rotating the alphabet by -3 letter:

a🡒x, b🡒y, c🡒z, d🡒a, e🡒b

Such operations are described mathematically by modular arithmetics, $\oplus$, which equates two numbers $a$ and $b$ modulo $m$, written

$$a\equiv b \pmod m$$

if they differ by a multiple of the modulo $m$, i.e., iff there exists $k\in\mathbb{N}$ such that

$$a=b+km$$

Here is a text Caesar might have encoded:

Yhql, ylgl, ylfl

in rot13, this would have given:

Irav, ivqv, ivpv

in rot1

Wfoj, wjej, wjdj

One problem of such ciphers is that one can notice patterns and break them. Caesar's cipher is particularly fragile because one can try all rotations of the alphabet and see which soup of letter makes sense. This is called a "brute-force attack". Even for such simple algorithms, one would not use brute force but more subtle technique, like frequency analysis, i.e., the fact that the letter 'e' occurs on average 12.702% in the English language, as compared to 'z' which occurs only 0.077% of the time. Between these two extremes, on has for frequent letters 't' (9.356%), 'a' (8.167%), 'o' (7.507%), etc., and, on the rare end, 'j' (0.153%), 'x' (0.150%), 'q' (0.095%). So by counting how many times a letter occurs, and matching it with its frequency, one can make good guesses as to which letter is which. Note that one can use letters, numbers or basically anything for the encrypted message...

Dancing-men.jpg

Vigenère cipher

While Caesar's cipher had only one number, the $x$ in rot$x$, to encrypt and decrypt, one can easily make the cipher more robust by using a "key", which encodes the information that rotates the alphabet in a varying way, so that the same letter in the cyphertext can correspond to different letter of the plaintext, and vice-versa. The simplest variation is to decide on a key, for instance:

secret

which translates in arithmetics as:

19.5.3.18.5.20

and that can be repeated to fit the message's length:

secretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecretsecret

This turns Jule Vernes' message

Man, a mere inhabitant of the earth, cannot overstep its boundaries! But though he is confined to its crust, he may penetrate into all its secrets.

into

Ffq, s rykj lfmuunwssn hk wzj ytwwz, hugsrl tpxwvljj byv ttogidjnyl! Gxl ybhzjz my bx fgszbshv yi byv uwoly, kw rur uhfjnkfww nhmt ddq cmx vwhlxyv.

since the plaintext in alphabet arithmetics reads:

13.1.14, 1 13.5.18.5 ...

which yields:

13$\oplus$19.1$\oplus$5.14$\oplus$3.1$\oplus$18.13$\oplus$5.5$\oplus$20.18$\oplus$19.5$\oplus$5

which evaluates to

6.6.17.19.18.25.11.10

since 13$\oplus$19=32=6 mod 26 and 18$\oplus$19=11 mod 26, or, back to alphabet:

Ffq, s rykj ...

This is better but still vulnerable. In particular, if one knows the length of the key, one can then break the cyphertext into as many sub-texts and perform frequency analysis on these, thus reconstructing the key. One way to guess the length of the key is to

cryptography shall be shortened to crypto
secretsecret secre ts ecretsecr et secret

This encodes by modular arithmetic to:

vwbhyizwdhms lmddq vx xkgwnxshv yi vwbhyi

where one can see the odd repetition of vwbhyi in vwbhyizwdhms lmddq vx xkgwnxshv yi vwbhyi. Since there are 24 letters between these two repetitions, this suggests that the key is a divisor of 24, which mean it has length 1 (Caesar's cipher), 2, 3, 4, 6, 8, 12 or 24. That's a lot of possibilities. In a long text, other occurences would allow to narrow it down. Note that frequency analysis is not limited to occurences of single letters, but also of combinations of two successive lettes (digram) and three (treegram), etc. In the example above, although a very small sample, we have yi appearing three times (it corresponds to to, which is the 15th most occuring digram in English, after "th", "he", "in", "an", "er", "on", etc.)

One-time pad

The process of encryption/decryption is typically , thanks to a key (one key for "symmetric" cyphers, two for "asymmetric" ones).


It represents perfect cryptography, in the sense that it cannot be broken even with unlimited computing power (this was proven by Claude Shannon).


Asymmetric systems use a public key to encrypt a message and a private key to decrypt it