m (Created page with "= Crash Course in Scientific Computing = == XVIII. Fitting == So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are acc...")
 
m (Crash Course in Scientific Computing)
Line 2: Line 2:
 
== XVIII. Fitting ==
 
== XVIII. Fitting ==
  
So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x;\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$:
+
So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$:
  
$$r_i=f(x_i;\beta)-y_i\,.$$
+
$$r_i=f(x_i,\vec\beta)-y_i\,.$$
  
 
We want to minimize the sum of the squared residuals:
 
We want to minimize the sum of the squared residuals:
Line 10: Line 10:
 
$$S=\sum_{i=1}^N r_i^2\,.$$
 
$$S=\sum_{i=1}^N r_i^2\,.$$
  
This we do by varying $\beta$s such that $S$ is minimum, which is achieved when each $\partial S/\partial\beta_j$ term for $1\le j\le\nu$ is canceled independently.
+
This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously.
  
$$\frac{\partial S}{\partial\beta_j}=2\sum_i(f(x_i;\beta)-y_i)\frac{\partial f(x_i;\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\label{eq:lstsq}$$
+
$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\label{eq:lstsq}$$
  
 
This is a complicated problem in general, that typically requires iterations over trials and errors.
 
This is a complicated problem in general, that typically requires iterations over trials and errors.
  
 
+
There is a case where Eq. \ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
There is a case where Eq. \ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
+
  
 
$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$
 
$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$
Line 27: Line 26:
 
We put this back in each term of the differential in $\beta$:
 
We put this back in each term of the differential in $\beta$:
  
$$\sum_{i=1}^N(\sum_{k=1}^\nu\beta_k\phi_k(x_i)-y_i)\phi_j(x_i)=0$$
+
$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$
  
 
and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):
 
and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):
Line 61: Line 60:
 
with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:
 
with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:
  
$$(\Phi^T\Phi)\beta=\Phi^T y\,.$$
+
$$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$
  
 
Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:
 
Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:
  
$$\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
+
$$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
  
 
The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse pseudo-inverse]" that deals with such an object in an efficient and stable way.
 
The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse pseudo-inverse]" that deals with such an object in an efficient and stable way.
Line 85: Line 84:
  
  
 +
Nonlinear fit with Julia:
 +
 +
<syntaxhighlight lang="python">
 +
m(t, p) = p[1] * exp.(p[2] * t)
 +
p0 = [0.5, 0.5]
 +
myfit = curve_fit(m, xdata, ydata, p0)
 +
</syntaxhighlight>
 +
 +
This will return a composite typeLsqFitResult, of which we can extract:
 +
 +
<pre>
 +
fit.dof: degrees of freedom
 +
fit.param: best fit parameters
 +
fit.resid: vector of residuals
 +
fit.jacobian: estimated Jacobian at solution
 +
</pre>
 +
 +
<syntaxhighlight lang="python">
 +
\frac{\partial S}{\partial\beta_j}=2
 +
</syntaxhighlight>
  
 
Weierstrass Approximation Theorem
 
Weierstrass Approximation Theorem
  
 
{{WLP6}}
 
{{WLP6}}

Revision as of 08:58, 28 March 2022

Crash Course in Scientific Computing

XVIII. Fitting

So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$:

$$r_i=f(x_i,\vec\beta)-y_i\,.$$

We want to minimize the sum of the squared residuals:

$$S=\sum_{i=1}^N r_i^2\,.$$

This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously.

$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\tag{1}$$

This is a complicated problem in general, that typically requires iterations over trials and errors.

There is a case where Eq. (1) is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called linear least squares where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:

$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$

in which case:

$$\frac{\partial f(x_i;\beta)}{\partial\beta_j}=\phi_j(x_i)\,.$$

We put this back in each term of the differential in $\beta$:

$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$

and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):

Concatenation:

$$\sum_k A_{ik}B_{kj}=(AB)_{ij}$$

Transposition:

$$A^T_{ij}=A_{ji}$$

so that our equation above becomes:

$$\sum_{i=1}^N\sum_{k=1}^\nu\beta_k\phi_k(x_i)\phi_j(x_i)=\sum_{i=1}^N y_i\phi_j(x_i)\,,$$

We introduce the matrix $\Phi$ such that:

$$\Phi_{ij}=\phi_j(x_i)\tag{2}\,,$$

in which case, with some further grouping and reassembling:

$$\sum_{k=1}^\nu\sum_{i=1}^N\Phi^T_{ji}\Phi_{ik}\beta_k=\sum_{i=1}^N \Phi_{ij}y_i\,,$$

So that we can get rid of the $i$ indices:

$$\sum_{k=1}^\nu(\Phi^T\Phi)_{jk}\beta_k=(\Phi^T y)_j\,,$$

where $y=(y_1,\dots,y_N)^T$, and now getting rid of $k$:

$$[(\Phi^T\Phi)\beta]_j=(\Phi^T y)_j\,,$$

with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:

$$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$

Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:

$$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$

The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "pseudo-inverse" that deals with such an object in an efficient and stable way.

Let us now turn from the general theory to a particular example. A convenient basis of function is that of polynomials, that is:

$$f(x;\beta)\equiv\sum_{k=0}^N\beta_k x^k$$

and our $\Phi$ matrix (which by the way goes by the name of a Gramian) reads:

$$\Phi=\pmatrix{ 1 & x_1 & x_1^2 & x_1^3 & \dots \\ 1 & x_2 & x_2^2 & x_2^3& \dots \\ \vdots & & & & \vdots \\ 1 & x_N & x_N^2 & x_N^3 & \dots }$$

Let us try with Julia:


Nonlinear fit with Julia:

m(t, p) = p[1] * exp.(p[2] * t)
p0 = [0.5, 0.5]
myfit = curve_fit(m, xdata, ydata, p0)

This will return a composite typeLsqFitResult, of which we can extract:

fit.dof: degrees of freedom
fit.param: best fit parameters
fit.resid: vector of residuals
fit.jacobian: estimated Jacobian at solution
\frac{\partial S}{\partial\beta_j}=2

Weierstrass Approximation Theorem