Revision as of 08:58, 28 March 2022

Crash Course in Scientific Computing

XVIII. Fitting

So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$:

$$r_i=f(x_i,\vec\beta)-y_i\,.$$

We want to minimize the sum of the squared residuals:

$$S=\sum_{i=1}^N r_i^2\,.$$

This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously.

$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\tag{1}$$

This is a complicated problem in general, that typically requires iterations over trials and errors.

There is a case where Eq. (1) is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called linear least squares where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:

$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$

in which case:

$$\frac{\partial f(x_i;\beta)}{\partial\beta_j}=\phi_j(x_i)\,.$$

We put this back in each term of the differential in $\beta$:

$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$

and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):

Concatenation:

$$\sum_k A_{ik}B_{kj}=(AB)_{ij}$$

Transposition:

$$A^T_{ij}=A_{ji}$$

so that our equation above becomes:

$$\sum_{i=1}^N\sum_{k=1}^\nu\beta_k\phi_k(x_i)\phi_j(x_i)=\sum_{i=1}^N y_i\phi_j(x_i)\,,$$

We introduce the matrix $\Phi$ such that:

$$\Phi_{ij}=\phi_j(x_i)\tag{2}\,,$$

in which case, with some further grouping and reassembling:

$$\sum_{k=1}^\nu\sum_{i=1}^N\Phi^T_{ji}\Phi_{ik}\beta_k=\sum_{i=1}^N \Phi_{ij}y_i\,,$$

So that we can get rid of the $i$ indices:

$$\sum_{k=1}^\nu(\Phi^T\Phi)_{jk}\beta_k=(\Phi^T y)_j\,,$$

where $y=(y_1,\dots,y_N)^T$, and now getting rid of $k$:

$$[(\Phi^T\Phi)\beta]_j=(\Phi^T y)_j\,,$$

with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:

$$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$

Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:

$$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$

The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "pseudo-inverse" that deals with such an object in an efficient and stable way.

Let us now turn from the general theory to a particular example. A convenient basis of function is that of polynomials, that is:

$$f(x;\beta)\equiv\sum_{k=0}^N\beta_k x^k$$

and our $\Phi$ matrix (which by the way goes by the name of a Gramian) reads:

$$\Phi=\pmatrix{ 1 & x_1 & x_1^2 & x_1^3 & \dots \\ 1 & x_2 & x_2^2 & x_2^3& \dots \\ \vdots & & & & \vdots \\ 1 & x_N & x_N^2 & x_N^3 & \dots }$$

Let us try with Julia:

Nonlinear fit with Julia:

m(t, p) = p[1] * exp.(p[2] * t)
p0 = [0.5, 0.5]
myfit = curve_fit(m, xdata, ydata, p0)

This will return a composite typeLsqFitResult, of which we can extract:

fit.dof: degrees of freedom
fit.param: best fit parameters
fit.resid: vector of residuals
fit.jacobian: estimated Jacobian at solution

\frac{\partial S}{\partial\beta_j}=2

Weierstrass Approximation Theorem

Table of Contents

Crash Course on Scientific Computing

@@ Line 2: / Line 2: @@
 == XVIII. Fitting ==
-So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x;\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$:
+So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$:
-$$r_i=f(x_i;\beta)-y_i\,.$$
+$$r_i=f(x_i,\vec\beta)-y_i\,.$$
 We want to minimize the sum of the squared residuals:
@@ Line 10: / Line 10: @@
 $$S=\sum_{i=1}^N r_i^2\,.$$
-This we do by varying $\beta$s such that $S$ is minimum, which is achieved when each $\partial S/\partial\beta_j$ term for $1\le j\le\nu$ is canceled independently.
+This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously.
-$$\frac{\partial S}{\partial\beta_j}=2\sum_i(f(x_i;\beta)-y_i)\frac{\partial f(x_i;\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\label{eq:lstsq}$$
+$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\label{eq:lstsq}$$
 This is a complicated problem in general, that typically requires iterations over trials and errors.
+There is a case where Eq.&nbsp;\ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
-There is a case where Eq. \ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
 $$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$
@@ Line 27: / Line 26: @@
 We put this back in each term of the differential in $\beta$:
-$$\sum_{i=1}^N(\sum_{k=1}^\nu\beta_k\phi_k(x_i)-y_i)\phi_j(x_i)=0$$
+$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$
 and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):
@@ Line 61: / Line 60: @@
 with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:
-$$(\Phi^T\Phi)\beta=\Phi^T y\,.$$
+$$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$
 Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:
-$$\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
+$$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
 The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse pseudo-inverse]" that deals with such an object in an efficient and stable way.
@@ Line 85: / Line 84: @@
+Nonlinear fit with Julia:
+<syntaxhighlight lang="python">
+m(t, p) = p[1] * exp.(p[2] * t)
+p0 = [0.5, 0.5]
+myfit = curve_fit(m, xdata, ydata, p0)
+</syntaxhighlight>
+This will return a composite typeLsqFitResult, of which we can extract:
+<pre>
+fit.dof: degrees of freedom
+fit.param: best fit parameters
+fit.resid: vector of residuals
+fit.jacobian: estimated Jacobian at solution
+</pre>
+<syntaxhighlight lang="python">
+\frac{\partial S}{\partial\beta_j}=2
+</syntaxhighlight>
 Weierstrass Approximation Theorem
 {{WLP6}}