Revision as of 08:30, 28 March 2022

Crash Course in Scientific Computing

XVIII. Fitting

So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x;\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$:

$$r_i=f(x_i;\beta)-y_i\,.$$

We want to minimize the sum of the squared residuals:

$$S=\sum_{i=1}^N r_i^2\,.$$

This we do by varying $\beta$s such that $S$ is minimum, which is achieved when each $\partial S/\partial\beta_j$ term for $1\le j\le\nu$ is canceled independently.

$$\frac{\partial S}{\partial\beta_j}=2\sum_i(f(x_i;\beta)-y_i)\frac{\partial f(x_i;\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\tag{1}$$

This is a complicated problem in general, that typically requires iterations over trials and errors.

There is a case where Eq. (1) is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called linear least squares where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:

$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$

in which case:

$$\frac{\partial f(x_i;\beta)}{\partial\beta_j}=\phi_j(x_i)\,.$$

We put this back in each term of the differential in $\beta$:

$$\sum_{i=1}^N(\sum_{k=1}^\nu\beta_k\phi_k(x_i)-y_i)\phi_j(x_i)=0$$

and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):

Concatenation:

$$\sum_k A_{ik}B_{kj}=(AB)_{ij}$$

Transposition:

$$A^T_{ij}=A_{ji}$$

so that our equation above becomes:

$$\sum_{i=1}^N\sum_{k=1}^\nu\beta_k\phi_k(x_i)\phi_j(x_i)=\sum_{i=1}^N y_i\phi_j(x_i)\,,$$

We introduce the matrix $\Phi$ such that:

$$\Phi_{ij}=\phi_j(x_i)\tag{2}\,,$$

in which case, with some further grouping and reassembling:

$$\sum_{k=1}^\nu\sum_{i=1}^N\Phi^T_{ji}\Phi_{ik}\beta_k=\sum_{i=1}^N \Phi_{ij}y_i\,,$$

So that we can get rid of the $i$ indices:

$$\sum_{k=1}^\nu(\Phi^T\Phi)_{jk}\beta_k=(\Phi^T y)_j\,,$$

where $y=(y_1,\dots,y_N)^T$, and now getting rid of $k$:

$$[(\Phi^T\Phi)\beta]_j=(\Phi^T y)_j\,,$$

with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:

$$(\Phi^T\Phi)\beta=\Phi^T y\,.$$

Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:

$$\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$

The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "pseudo-inverse" that deals with such an object in an efficient and stable way.

Let us now turn from the general theory to a particular example. A convenient basis of function is that of polynomials, that is:

$$f(x;\beta)\equiv\sum_{k=0}^N\beta_k x^k$$

and our $\Phi$ matrix (which by the way goes by the name of a Gramian) reads:

$$\Phi=\pmatrix{ 1 & x_1 & x_1^2 & x_1^3 & \dots \\ 1 & x_2 & x_2^2 & x_2^3& \dots \\ \vdots & & & & \vdots \\ 1 & x_N & x_N^2 & x_N^3 & \dots }$$

Let us try with Julia:

Weierstrass Approximation Theorem

Table of Contents

Crash Course on Scientific Computing