Crash Course in Scientific Computing

Computers

I trust my family jewels only to Linux.

Computers are everywhere, nowadays, even in most people's pockets or even hand-wrist. In this Course, we will learn how to tame them and make them obey us so as to be useful companions. For that, we will first have to understand what types of beasts they are, although their proliferation might make us believe we're familiar enough already. But in the jungle of computers, those that we see most often aren't actually the best prototypes to control, as they often represent precisely the species already so much domesticated as to behave more like high-level slaves than low-level machines. At the highest level, we can give as examples your smartphone or an interactive display in the mall or airport. We are interested in the low-level, fundamental, basic aspects of computers, and have them obey us to the letter. We will see that computers are very hierarchical, they build layers over layers over layers. At the highest layers, there is, e.g., skype, allowing you to speak to someone pretty much as if they are in front of you, except that they are on the other side of the planet. At the lowest level, there is the bit: the element of information, that takes the value either 0 or 1. This is useful already. For instance, you could have a bit called sign to store whether a quantity is positive or negative in which case:

sign=1

or negative, in which case

sign=0

means positive. Note that we use 0 or 1 but these are just labels. In fact some people like to use day=TRUE and day=FALSE, meaning that for all practical purposes, TRUE=1 and FALSE=0.

Now the world isn't really this type of place where things are positive or negative, night and day, white or black, true or false, bad or good, 1 or 0, we need more refinement, more complexity. This we can do by bringing more bits. For instance, with two bits, we can store 4 values, namely:

  1. 00 = value 1
  2. 01 = value 2
  3. 10 = value 3
  4. 11 = value 4

That is, if they are unsigned. If we want to keep one bit for the sign, we reduce to four, or, if adhering to strict consistency, three values:

  1. 00 = 0
  2. 01 = 1
  3. 10 = -0
  4. 11 = 2

Believe it or not, computers do differentiate between -0 and +0. That is a big waste when we have two bits only but with $n$ bits, the numbers of possibilities increase exponentially and this one redundancy doesn't matter. Furthermore, it makes the delight of people who discover this fact at the occasion of small-talk when discussing technical topics. With $n$ bits, we can thus describe all the positive integers from 0 to $2^{n}-1$. A first example of layer is that we don't usually deal with bits directly but with groups of bits, such as the byte, which is eight bits:

$$b_7b_6b_5b_4b_3b_2b_1b_0$$

That makes $2^8-1=255$ possible values, which is already quite a lot. Bits are inconvenient while our standard decimal system is not compatible, so when talking to computers, we commonly use instead the hexadecimal system, which is base 16, that covers for half-a-byte, or, the joke goes, a nibble. We use letters, $A$, $B$, ..., $F$ for values 10, 11, ..., 15. Two nibbles give us all the values from 0 to 255, of FF in hexadecimal.

This can be used for fairly rudimentary information encoding, for instance, to encode text, we associate given numbers to letters. The standard is provided by ASCII, which says that the letter A is encoded by the number 65 (or 41 in hexadecimal, which, by the way, we'd write as 0x41), and then we increment: B, C, D go by 66, 67, 68, or 0x42, 0x43, 0x44, etc., till Z 90 (or 0x5A) and then we have six glyphs, namely [, ~, ], ^, _ and @ before turning to lowcase letters, that carry on:

a=97 (0x61), b=98 (0x62), c=99 (0x63), etc., till z=122 (0x7A).

The numbers are found earlier, so that their low bits of the ASCII code correspond to their binary values, namely

0 00110000 (that's 48 or 0x30)
1 00110001 (that's 49 or 0x31)
2 00110010 (that's 50 or 0x32)
3 00110011 (that's 51 or 0x33)
...
9 00111001 (that's 57 or 0x39)

Another famous ASCII code is the one for space " " which is 32 (0x20). Other codes are special such as \0 (00000000) or the line feed \n (10=0x0A) and line break or carriage return \r (13=0x0D) that indicate a newline.

That's for text, but information encoding does not limit to that, numbers and text. It covers also images, sounds, movies, etc. Everything that you find in a computer is ultimately streams of 0 and 1. For instance, if we lay out 64 bits (eight bytes) into an array, we can form patterns, such as:

00011000
00011000
00100100
00111100
01000010
01000010
00000000
00000000

or:

01111000
01000100
01111000
01000100
01000100
01111000
00000000
00000000

which is probably not so clear as such but see if we label 1 with a dark character such as # and 0 with a space, we see (putting them side by side):

   ##     ####   
   ##     #   #  
  #  #    ####   
  ####    #   #  
 #    #   #   #  
 #    #   ####   

If we make the computer switch on a pixel of the screen if the bit is set to 1 and leave it off if set to 0, we thus have a way to convert bits into images. We can also use that not to project on screen but to be stored in a disk, or sent over the internet. That is a fairly simple way to store an image, but one that is actually used. Together with some necessary information such the geometry of the image, which specifies how many bits go in the first line and at which point we should break to start the next line, we then create a so-called "file", namely, a pbm file—PBM stands for portable bitmap format—which has a so-called "header" that follows a pre-ascribed format, which for pbm is:

  • two bytes defining the type of file, in our case P1 for pixmap.
  • A line starting with # allowing for a comment (accessorily).
  • Two bytes giving the padding of the image.

This is a valid bitmap file:

P1
# A
8 8
0 0 0 1 1 0 0 0
0 0 0 1 1 0 0 0
0 0 1 0 0 1 0 0
0 0 1 1 1 1 0 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

It is 138 bits. You can download it and play with it. Of course, more characters would give us more resolution, and we are not limited to letters. Here's a pbm file of your lecturer:

P1
# Created by GIMP version 2.10.8 PNM plug-in
783 966
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
...(1468 lines of only 1)...
1111111111111111111111111111111111111111111111111111111110111111111111
1111111111111000100111111111111111111111111111111111111111111111111111
...(5778 lines)...
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000010111111111111111111111111111111111111111111111
1111111111111111111111111111111111100000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000001111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111100111110000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000001111111111111111111111111111111111
1111111111111111111111111111111111111111111110000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000111111111111111111111111111111111
...(5023 lines)...
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111

which, setting 1 bits to black and 0 to white, gives:

Fabrice.laussy-pbm.jpg

Note that there is a linebreak (and no space) so that the data is not actually stored directly as the grid of pixels itself, mainly for portability, that's what the P of PBM stands for. For this reason, we really need the encoding in the header. For instance, if it would be off by one and read 784 966 instead of 783 966, the image would be reconstructed as:

Fabrice.laussy-squeezed-pbm.jpg

Can you see why?

While at the deepest level, the computer really stores these images as such, it is not convenient for us, human beings, and we would work here too in a different basis than 2. Since we group bits by groups of eight (bytes), the Hexadecimal (base 16) is convenient since $2^4=16$, an hexadecimal represents half-a-byte (a so-called nibble) and two numbers represent a byte. Therefore, the bits for the letter A, which in a single line read:

0001100000011000001001000011110001000010010000100000000000000000

we can convert in hexadecimal as:

18 18 24 3C 42 42 00 00

The same for B:

78 44 78 44 44 78 00 00

etc.

What is the letter 0x3844808044380000? How about

  • 0x3C023E463A0000
  • 0x40407C42625C0000
  • 0x6094681629060000
  • 0x3C449C945C201C?

etc.

If we do not store the data in plain text (ASCII, 0 & 1 only) but in hexademical (with no linebreaks) the same file looks like this:

Raw-pbm.png

where the header now reads P4 (it is left as an exercise for you to decipher the header completely) meaning that encoding is now RAW, not ASCII (0/1).

Instead of using two bits to represent two colours (white & black) we would use, say, 8 bits to represent shades of grey, making a much neater image, or 3 bytes to encode Red, Green and Blue components, making an RGB bitmap:

Laussy-grayscale.png

Still encoding this in ASCII, and with one data per line (as is the [arbitrary] choice of the software used, gimp), this gives the following headers:

P2
# PGM file Created by GIMP version 2.10.8 PNM plug-in
783 966
255
0
1
1
1
2
2
1
1
1
0
0
1
1
0
0
1

and

P3
# PPM file Created by GIMP version 2.10.8 PNM plug-in
783 966
255
1
0
0
4
0
0
4
0
1
5
0
1
5
1
2
5

You see that the header varies with specification P2 & P3 which correspond to ASCII gray & color encoding, respectively. You already know P4, there is also P5 & P6 which are for hexadecimal encodings. This table gives you the fairly complete characterization of [PBM] bitmap encoding.

Type Magic number Extension Colors
ASCII (plain) Binary (raw)
Portable BitMap P1 P4 .pbm 0–1 (white & black)
Portable GrayMap P2 P5 .pgm 0–255 (gray scale), 0–65535 (gray scale), variable, black-to-white range
Portable PixMap P3 P6 .ppm 16777216 (0–255 for each RGB channel), some support for 0-65535 per channel

We could carry on discussing image encoding, for instance introducing more popular formats, but basically equally simple formats, such as Targa (abbreviation tga), which is the simplest widely supported binary image file format, or BMP, using the same encoding (more complex headers) and popular on Microsoft platforms—you might actually have encountered these file extensions before—or even go onto the really famous formats, such as gif, jpg or png, but that is not our direction. We are interested instead in the general system which allows us to deal with collections of bits to turn them into useful things, not specifically images. We have seen texts, images, there is also music, movies, etc. We shall come to that, but one bit at a time!