m (Created page with "= Crash Course in Scientific Computing = == XVIII. Fitting == So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are acc...") |
So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x;\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$:
$$r_i=f(x_i;\beta)-y_i\,.$$
We want to minimize the sum of the squared residuals:
$$S=\sum_{i=1}^N r_i^2\,.$$
This we do by varying $\beta$s such that $S$ is minimum, which is achieved when each $\partial S/\partial\beta_j$ term for $1\le j\le\nu$ is canceled independently.
$$\frac{\partial S}{\partial\beta_j}=2\sum_i(f(x_i;\beta)-y_i)\frac{\partial f(x_i;\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\tag{1}$$
This is a complicated problem in general, that typically requires iterations over trials and errors.
There is a case where Eq. (1) is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called linear least squares where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$
in which case:
$$\frac{\partial f(x_i;\beta)}{\partial\beta_j}=\phi_j(x_i)\,.$$
We put this back in each term of the differential in $\beta$:
$$\sum_{i=1}^N(\sum_{k=1}^\nu\beta_k\phi_k(x_i)-y_i)\phi_j(x_i)=0$$
and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):
Concatenation:
$$\sum_k A_{ik}B_{kj}=(AB)_{ij}$$
Transposition:
$$A^T_{ij}=A_{ji}$$
so that our equation above becomes:
$$\sum_{i=1}^N\sum_{k=1}^\nu\beta_k\phi_k(x_i)\phi_j(x_i)=\sum_{i=1}^N y_i\phi_j(x_i)\,,$$
We introduce the matrix $\Phi$ such that:
$$\Phi_{ij}=\phi_j(x_i)\tag{2}\,,$$
in which case, with some further grouping and reassembling:
$$\sum_{k=1}^\nu\sum_{i=1}^N\Phi^T_{ji}\Phi_{ik}\beta_k=\sum_{i=1}^N \Phi_{ij}y_i\,,$$
So that we can get rid of the $i$ indices:
$$\sum_{k=1}^\nu(\Phi^T\Phi)_{jk}\beta_k=(\Phi^T y)_j\,,$$
where $y=(y_1,\dots,y_N)^T$, and now getting rid of $k$:
$$[(\Phi^T\Phi)\beta]_j=(\Phi^T y)_j\,,$$
with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:
$$(\Phi^T\Phi)\beta=\Phi^T y\,.$$
Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:
$$\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "pseudo-inverse" that deals with such an object in an efficient and stable way.
Let us now turn from the general theory to a particular example. A convenient basis of function is that of polynomials, that is:
$$f(x;\beta)\equiv\sum_{k=0}^N\beta_k x^k$$
and our $\Phi$ matrix (which by the way goes by the name of a Gramian) reads:
$$\Phi=\pmatrix{ 1 & x_1 & x_1^2 & x_1^3 & \dots \\ 1 & x_2 & x_2^2 & x_2^3& \dots \\ \vdots & & & & \vdots \\ 1 & x_N & x_N^2 & x_N^3 & \dots }$$
Let us try with Julia:
Weierstrass Approximation Theorem