Revision as of 10:39, 1 March 2021

Crash Course in Scientific Computing

7. Algorithms: the idea

An algorithm is a recipe to solve a problem with a computer, a list of instructions for the later to follow that will bring it to the solution, which it can duly report. Problems are of all types, from scientific problems such as finding roots of complex functions to integrating differential equations passing by everybody's daily problems such as finding the shortest or best route to bring you from A to B in a complex map or sorting the list of top results in an internet query. The set of instructions can be efficient and lead to a rapid, accurate and faithful solution from the computer, or on the opposite, clumsy, redundant and leading it to waste resources and possibly not even the best, or a good, or even any solution at all. The art of finding a good algorithm has been guiding scientists and engineers since antiquity. Any single problem typically comes with a myriad of different algorithms. For many problems, such as integer factorization, we do not have yet, and probably will never have, good algorithms which can perform the job in a reasonable time. This is the basis for today's cryptography.

One of the simplest algorithms is to find the largest number in a list that has no particular order. We will use this as a starting point to illustrate the above concepts. It is a simple algorithm because its best implementation is basically what one would do anyway: go through each element in turn, if the element we currently examine is larger than the largest we found so far, we update our record, if not, we carry on. Here's a Julia implementation:

function myfindmax(lst)
    i=1;
    imax=length(lst);
    lmax=lst[1]
    while i<imax
    i+=1;
        if lst[i]>lmax
            lmax=lst[i]
        end
    end
    lmax
end

Example application:

julia> myfindmax(rand(10))
0.985052934036265

Although in principle it is better to test your algorithm on cases which you know and that you can check yourself by hand:

julia> known=rand(Int8, 5)
5-element Array{Int8,1}:
 -13
 -94
  22
  90
 -86
 
julia> myfindmax(known)
90
 
julia> myfindmax(abs.(known))
94

It is also good to test on particular as well as extreme cases, e.g., when the max element is first or last. For instance, if we would have specified the increment i+=1 after the test, the algorithm would fail if the max would find itself as the last element of the array (which would be the case if the list would be ordered):

julia> myfindmax([i for i=1:10])
10

An important aspect of algorithms is their efficiency, how fast and good they are in performing the job. To benchmark, one can use @time or, if wanting to store the result for later processing, @elapsed (!!):

julia> mytime=zeros(100);
julia> for i=1:100
           lst = rand(10^6)
           mytime[i] = @elapsed myfindmax(lst);
       end
 
julia> comptime=zeros(100);
julia> for i=1:100
           lst = rand(10^6)
           comptime[i] = @elapsed maximum(lst);
       end
 
julia> plot([mytime comptime], label=["us" "julia"])

As an exception to the general rule, we find here that our own implementation is actually faster than the one provided by the system:

julia> mean(mytime)
0.0016102164000000002
 
julia> mean(comptime)
0.0027113203900000003

To do even more advanced benchmarking, one can be

julia> using BenchmarkTools

julia> @benchmark myfindmax(rand(10^6))
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     2.015 ms (0.00% GC)
  median time:      2.061 ms (0.00% GC)
  mean time:        2.184 ms (4.24% GC)
  maximum time:     42.876 ms (88.56% GC)
  --------------
  samples:          2285
  evals/sample:     1
 
julia> @benchmark maximum(rand(10^6))
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     3.344 ms (0.00% GC)
  median time:      3.461 ms (0.00% GC)
  mean time:        3.570 ms (3.05% GC)
  maximum time:     44.792 ms (91.19% GC)
  --------------
  samples:          1399
  evals/sample:     1

The fact that our readily-made version is faster than the one provided by the system has been flagged by the Julia community and has been traced to time being lost when checking for NaN numbers.

But let's move on and see other types of algorithms and more sophisticated problems.

This

Historical algorithms:

Division algorithm/Euclidean algorithm

Sieve of Eratosthenes

Muḥammad ibn Mūsā al-Khwārizmī

Links

Table of Contents

Crash Course on Scientific Computing