m (Created page with "= Crash Course in Scientific Computing = == XVIII. Fitting == So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are acc...") |
m (→Crash Course in Scientific Computing) |
||
Line 2: | Line 2: | ||
== XVIII. Fitting == | == XVIII. Fitting == | ||
− | So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x | + | So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$: |
− | $$r_i=f(x_i | + | $$r_i=f(x_i,\vec\beta)-y_i\,.$$ |
We want to minimize the sum of the squared residuals: | We want to minimize the sum of the squared residuals: | ||
Line 10: | Line 10: | ||
$$S=\sum_{i=1}^N r_i^2\,.$$ | $$S=\sum_{i=1}^N r_i^2\,.$$ | ||
− | This we do by varying $\beta$s such that $S$ is minimum, which is achieved when | + | This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously. |
− | $$\frac{\partial S}{\partial\beta_j}=2\ | + | $$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\label{eq:lstsq}$$ |
This is a complicated problem in general, that typically requires iterations over trials and errors. | This is a complicated problem in general, that typically requires iterations over trials and errors. | ||
− | + | There is a case where Eq. \ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads: | |
− | There is a case where Eq. \ref{eq:lstsq} is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called ''linear least squares'' where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads: | + | |
$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$ | $$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$ | ||
Line 27: | Line 26: | ||
We put this back in each term of the differential in $\beta$: | We put this back in each term of the differential in $\beta$: | ||
− | $$\sum_{i=1}^N(\sum_{k=1}^\nu\beta_k\phi_k(x_i)-y_i)\phi_j(x_i)=0$$ | + | $$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$ |
and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations): | and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations): | ||
Line 61: | Line 60: | ||
with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form: | with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form: | ||
− | $$(\Phi^T\Phi)\beta=\Phi^T y\,.$$ | + | $$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$ |
Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion: | Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion: | ||
− | $$\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$ | + | $$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$ |
The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse pseudo-inverse]" that deals with such an object in an efficient and stable way. | The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse pseudo-inverse]" that deals with such an object in an efficient and stable way. | ||
Line 85: | Line 84: | ||
+ | Nonlinear fit with Julia: | ||
+ | |||
+ | <syntaxhighlight lang="python"> | ||
+ | m(t, p) = p[1] * exp.(p[2] * t) | ||
+ | p0 = [0.5, 0.5] | ||
+ | myfit = curve_fit(m, xdata, ydata, p0) | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | This will return a composite typeLsqFitResult, of which we can extract: | ||
+ | |||
+ | <pre> | ||
+ | fit.dof: degrees of freedom | ||
+ | fit.param: best fit parameters | ||
+ | fit.resid: vector of residuals | ||
+ | fit.jacobian: estimated Jacobian at solution | ||
+ | </pre> | ||
+ | |||
+ | <syntaxhighlight lang="python"> | ||
+ | \frac{\partial S}{\partial\beta_j}=2 | ||
+ | </syntaxhighlight> | ||
Weierstrass Approximation Theorem | Weierstrass Approximation Theorem | ||
{{WLP6}} | {{WLP6}} |
So if we have $N$ data points $y_1$, ..., $y_N$ obtained for the values $x_1$, ..., $x_N$ that we expect are accounted for by the "model" $f$, i.e., a function $f(x,\vec\beta)$ of $\nu$ parameters $\beta_1$, ..., $\beta_\nu$ kept in a vector $\vec\beta$:
$$r_i=f(x_i,\vec\beta)-y_i\,.$$
We want to minimize the sum of the squared residuals:
$$S=\sum_{i=1}^N r_i^2\,.$$
This we do by varying $\beta$s such that $S$ is minimum, which is achieved when all $\partial S/\partial\beta_j$ terms for $1\le j\le\nu$ are canceled simultaneously.
$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N(f(x_i,\vec\beta)-y_i)\frac{\partial f(x_i,\vec\beta)}{\partial\beta_j}=0\,,\quad\text{for $j=1,\dots,\nu$}\,.\tag{1}$$
This is a complicated problem in general, that typically requires iterations over trials and errors.
There is a case where Eq. (1) is considerably simplified, namely, if the partial derivative can be cancelled out, which occurs for so-called linear least squares where the parameters $\beta$ are merely linear coefficients of functions (that can be, of course, nonlinear). That is to say, our test function reads:
$$f(x;\beta)=\sum_{k=1}^\nu\beta_k\phi_k(x)\,,$$
in which case:
$$\frac{\partial f(x_i;\beta)}{\partial\beta_j}=\phi_j(x_i)\,.$$
We put this back in each term of the differential in $\beta$:
$$\frac{\partial S}{\partial\beta_j}=2\sum_{i=1}^N\left(\left[\sum_{k=1}^\nu\beta_k\phi_k(x_i)\right]-y_i\right)\phi_j(x_i)=0$$
and now, it remains to rearrange this equality so that we can write it in matrix form. To do so, we have to remember these properties (and notations):
Concatenation:
$$\sum_k A_{ik}B_{kj}=(AB)_{ij}$$
Transposition:
$$A^T_{ij}=A_{ji}$$
so that our equation above becomes:
$$\sum_{i=1}^N\sum_{k=1}^\nu\beta_k\phi_k(x_i)\phi_j(x_i)=\sum_{i=1}^N y_i\phi_j(x_i)\,,$$
We introduce the matrix $\Phi$ such that:
$$\Phi_{ij}=\phi_j(x_i)\tag{2}\,,$$
in which case, with some further grouping and reassembling:
$$\sum_{k=1}^\nu\sum_{i=1}^N\Phi^T_{ji}\Phi_{ik}\beta_k=\sum_{i=1}^N \Phi_{ij}y_i\,,$$
So that we can get rid of the $i$ indices:
$$\sum_{k=1}^\nu(\Phi^T\Phi)_{jk}\beta_k=(\Phi^T y)_j\,,$$
where $y=(y_1,\dots,y_N)^T$, and now getting rid of $k$:
$$[(\Phi^T\Phi)\beta]_j=(\Phi^T y)_j\,,$$
with $\beta$ our vector of unknown parameters. As the equality holds for all $1\le j\le\nu$, it holds for the full vector, providing us with the final equation in its matrix form:
$$(\Phi^T\Phi)\vec\beta=\Phi^T y\,.$$
Assuming $(\Phi^T\Phi)$ is well conditioned (in particular, not singular), we get the formal solution by brute force inversion:
$$\vec\beta=(\Phi^T\Phi)^{-1}\Phi^T y\,.$$
The $(\Phi^T\Phi)^{-1}\Phi^T$ could be problematic if large (many fitting functions and/or many data points) and actually computed in this form, in which case one can see it as the notation for a so-called "pseudo-inverse" that deals with such an object in an efficient and stable way.
Let us now turn from the general theory to a particular example. A convenient basis of function is that of polynomials, that is:
$$f(x;\beta)\equiv\sum_{k=0}^N\beta_k x^k$$
and our $\Phi$ matrix (which by the way goes by the name of a Gramian) reads:
$$\Phi=\pmatrix{ 1 & x_1 & x_1^2 & x_1^3 & \dots \\ 1 & x_2 & x_2^2 & x_2^3& \dots \\ \vdots & & & & \vdots \\ 1 & x_N & x_N^2 & x_N^3 & \dots }$$
Let us try with Julia:
Nonlinear fit with Julia:
m(t, p) = p[1] * exp.(p[2] * t) p0 = [0.5, 0.5] myfit = curve_fit(m, xdata, ydata, p0)
This will return a composite typeLsqFitResult, of which we can extract:
fit.dof: degrees of freedom fit.param: best fit parameters fit.resid: vector of residuals fit.jacobian: estimated Jacobian at solution
\frac{\partial S}{\partial\beta_j}=2
Weierstrass Approximation Theorem