Lecture 4 Least Square Fit


Given numeric data points, find an equation that approximates the data
with a least square fit. This is one of many techniques for getting
an analytical approximation to numeric data.


 The problem is stated as follows :
   Given measured data for values of Y based on values of X1,X2 and X3. e.g.

    Y_actual         X1      X2     X3    observation, i
    --------       -----   -----  -----
     32.5           1.0     2.5    3.7                 1
      7.2           2.0     2.5    3.6                 2
      6.9           3.0     2.7    3.5                 3
     22.4           2.2     2.1    3.1                 4
     10.4           1.5     2.0    2.6                 5
     11.3           1.6     2.0    3.1                 6

  Find a, b and c such that   Y_approximate =  a * X1 + b * X2 + c * X3
  and such that the sum of (Y_actual - Y_approximate) squared is minimized.
  (We are minimizing RMS error.)

 The method for determining the coefficients a, b and c follows directly
 form the problem definition and mathematical analysis given below.

 Set up and solve the system of linear equations:
 (Each SUM is for i=1 thru 6 per table above, note symmetry)

 | SUM(X1*X1)  SUM(X1*X2)  SUM(X1*X3) |   | a |   | SUM(X1*Y) |
 | SUM(X2*X1)  SUM(X2*X2)  SUM(X2*X3) | x | b | = | SUM(X2*Y) |
 | SUM(X3*X1)  SUM(X3*X2)  SUM(X3*X3) |   | c |   | SUM(X3*Y) |

Easy to program, not good data:
lsfit_lect.c
lsfit_lect_c.out
lsfit_lect.java
lsfit_lect_java.out
lsfit_lect.py
lsfit_lect_py.out
lsfit_lect.f90
lsfit_lect_f90.out
lsfit_lect.m similar to c
lsfit_lect_m.out
test_lsfit.py3 Python3 two methods
test_lsfit_py3.out output
test_lsfit2.py3 Python3 two methods
test_lsfit2_py3.out output better
 

Now, suppose you wanted a constant term to make the fit:
  Y_approximate =  Y0 + a * X1 + b * X2 + c * X3
Then the linear equations would be:

 | SUM( 1* 1)  SUM( 1*X1)  SUM( 1*X2)  SUM( 1*X3) |   | Y0 |   | SUM( 1*Y) |
 | SUM(X1* 1)  SUM(X1*X1)  SUM(X1*X2)  SUM(X1*X3) |   | a  |   | SUM(X1*Y) |
 | SUM(X2* 1)  SUM(X2*X1)  SUM(X2*X2)  SUM(X2*X3) | x | b  | = | SUM(X2*Y) |
 | SUM(X3* 1)  SUM(X3*X1)  SUM(X3*X2)  SUM(X3*X3) |   | c  |   | SUM(X3*Y) |

Note the symmetry! Easy to program.

Note the simultaneous equations, from Lecture 3:  |A| x |X| = |Y|
|A| and |Y| easily computable, solve for |X| to get Y0, a, b and c

We now have a simple equation to compute Y approximately from a reasonable
range of X1, X2, and X3.

 Y is called the dependent variable and X1 .. Xn the independent variables.
 The procedures below implement a few special cases and the general case.
    The number of independent variables can vary, e.g. 2D, 3D, etc. .
    The approximation equation may use powers of the independent variables
    The user may create additional independent variables e.g. X2 = SIN(X1)
    with the restriction that the independent variables are linearly
    independent.  e.g.  Xi not equal  p Xj + q  for all i,j,p,q


Mathematical derivation

 The mathematical derivation of the least square fit is as follows :

 Given data for the independent variable Y in terms of the dependent
 variables S,T,U and V  consider that there exists a function F
 such that     Y = F(S,T,U,V)
 The problem is to find coefficients a,b,c and d such that
            Y_approximate = a * S + b * T + c * U + d * V
 and such that the sum of ( Y - Y_approximate ) squared is minimized.

 Note: a, b, c, d are scalars. S, T, U, V, Y, Y_approximate are vectors.

 To find the minimum of  SUM((Y - Y_approximate)^2)
 the derivatives must be taken with respect to a,b,c and d and
 all must equal zero simultaneously. The steps follow :

  SUM((Y - Y_approximate)^2) = SUM((Y - a*S - b*T - c*U - d*V)^2)

 d/da =  -2 * S * SUM( Y - a*S - b*T - c*U - d*V )
 d/db =  -2 * T * SUM( Y - a*S - b*T - c*U - d*V )
 d/dc =  -2 * U * SUM( Y - a*S - b*T - c*U - d*V )
 d/dd =  -2 * V * SUM( Y - a*S - b*T - c*U - d*V )

 Setting each of the above equal to zero (derivative minimum at zero)
 and putting constant term on left, the -2 is factored out,
 the independent variable is moved inside the summation

  SUM( a*S*S + b*S*T + c*S*U + d*S*V = S*Y )
  SUM( a*T*S + b*T*T + c*T*U + d*T*V = T*Y )
  SUM( a*U*S + b*U*T + c*U*U + d*U*V = U*Y )
  SUM( a*V*S + b*V*T + c*V*U + d*V*V = V*Y )

 Distributing the SUM inside yields

  a * SUM(S*S) + b * SUM(S*T) + c * SUM(S*U) + d * SUM(S*V) = SUM(S*Y)
  a * SUM(T*S) + b * SUM(T*T) + c * SUM(T*U) + d * SUM(T*V) = SUM(T*Y)
  a * SUM(U*S) + b * SUM(U*T) + c * SUM(U*U) + d * SUM(U*V) = SUM(U*Y)
  a * SUM(V*S) + b * SUM(V*T) + c * SUM(V*U) + d * SUM(V*V) = SUM(V*Y)

 To find the coefficients a,b,c and d solve the linear system of equations

    | SUM(S*S)  SUM(S*T)  SUM(S*U)  SUM(S*V) |   | a |   | SUM(S*Y) |
    | SUM(T*S)  SUM(T*T)  SUM(T*U)  SUM(T*V) | x | b | = | SUM(T*Y) |
    | SUM(U*S)  SUM(U*T)  SUM(U*U)  SUM(U*V) |   | c |   | SUM(U*Y) |
    | SUM(V*S)  SUM(V*T)  SUM(V*U)  SUM(V*V) |   | d |   | SUM(V*Y) |

 Some observations :
     S,T,U and V must be linearly independent.
     There must be more data sets (Y, S, T, U, V) than variables.
     The analysis did not depend on the number of independent variables
     A polynomial fit results from the substitutions S=1, T=x, U=x^2, V=x^3
     The general case for any order polynomial of any number of variables
     may be used with a substitution, for example, S=1, T=x, U=y, V=x^2,
     W=x*y, X= y^2,  etc to terms such as exp(x), log(x), sin(x), cos(x).
     Any number of terms may be used. The "1" is for the constant term.

Using the S,T,U,V notation above, fitting Y_approx to find a, b, c, d
Y_approx = a*S + b*T + c*U + d*V + ...
Choose  S = 1.0      thus  a  is the constant term
Choose  T = log(x)   thus  b  is the coefficient of log(x)
Choose  U = log(x*x) thus  c  is the coefficient of log(x*x)
Choose  V = sin(x)   thus  d  is the coefficient of sin(x)
see 4 additional terms in lsfit_log.c

Thus our x data, n = 21 samples in code, is fit to
Y_approx = a + b*log(x) + c*log(x*x) + d*sin(x) + ...
By putting the terms in a vector, simple indexing builds the matrix:
   A[i][j] = A[i][j] + term_i * term_j   summing over n terms using k

lsfit_log.c fitting log and other terms
lsfit_log_c.out


fitting a simple polynomial, 1D

Now, suppose you wanted to fit a simple polynomial:
Given the value of Y for at least four values of X, 
  Y_approximate =  C0 + C1 * X + C2 * X^2 +  C3 * X^3
Then the linear equations would be A*X=Y:

| SUM(1  *1) SUM(1  *X) SUM(1  *X^2) SUM(1  *X^3) | | C0 |   | SUM(1  *Y) |
| SUM(X  *1) SUM(X  *X) SUM(X  *X^2) SUM(X  *X^3) | | C1 |   | SUM(X  *Y) |
| SUM(X^2*1) SUM(X^2*X) SUM(X^2*X^2) SUM(X^2*X^3) |x| C2 | = | SUM(X^2*Y) |
| SUM(X^3*1) SUM(X^3*X) SUM(X^3*X^2) SUM(X^3*X^3) | | C3 |   | SUM(X^3*Y) |

Note that the (i,j) subscript in the A matrix has SUM(X^(i)*X^(j))
for i=0..3, j=0..3

In C, to build A matrix and Y vector for solving simultaneous equations:
  // sample polynomial least square fit, nth power, m values of xd and yd 
  for(i=0; i<n+1; i++)
  {
    for(j=0; j<n+1; j++)
    {
      A[i][j] = 0.0;
    }
    Y[i] = 0.0;
  }
  for(k=0; k<m; k++)
  {
    y = yd[k];
    x = xd[k];
    pwr[0] = 1.0;
    for(i=1; i<=n+1; i++) pwr[i] = pwr[i-1]*x;
    for(i=0; i<n+1; i++)
    {
      for(j=0; j<n+1; j++)
      {
        A[i][j] = A[i][j] + pwr[i]*pwr[j]; // SUM
      }
      Y[i] = Y[i] + y*pwr[i];
    }
  }

Solve the simultaneous equations A*X=Y for X[0]=C0, X[1]=C1, X[2]=C2, X[3]=C3
Note that the sum is taken over all observations and the "1" is
just shown to emphasize the symmetry.

Sample code in various languages:

least_square_fit.c  really old
least_square_fit_c.out

least_square.py3
least_square_py3.out
test_lsfit.py3 Python3 two methods
test_lsfit_py3.out output
lsfit.py3 just 1D version for copying
peval.py3 just 1D version for copying

least_square_fit.f90
least_square_fit_f90.out

Least_square.rb Ruby class   new
test_least_square.rb test
test_least_square_rb.out test output

least_square_fit.java
least_square_fit_java.out
least_square_fit_3d.java
least_square_fit_3d_java.out
least_square_fit_4d.java
least_square_fit_4d_java.out
A specialized version for use later with PDE's
lsfit.java
test_lsfit.java
test_lsfit_java.out
test_lsfit2.java
test_lsfit2_java.out
test_lsfit3.java
test_lsfit3_java.out
test_lsfit4.java
test_lsfit4_java.out
test_lsfit5.java
test_lsfit5_java.out
test_lsfit6.java
test_lsfit6_java.out
test_lsfit7.java
test_lsfit7_java.out
uses simeq.java

The Makefile entry that makes  test_lsfit_java.out
test_lsfit_java.out: test_lsfit.java lsfit.java simeq.java
	javac -cp . simeq.java
	javac -cp . lsfit.java
	javac -cp . test_lsfit.java
	java  -cp . test_lsfit > test_lsfit_java.out
	rm -f *.class




least_square_fit.adb
least_square_fit_ada.out
least_square_fit_2d.adb
least_square_fit_2d_ada.out
least_square_fit_3d.adb
least_square_fit_3d_ada.out
least_square_fit_4d.adb
least_square_fit_4d_ada.out
real_arrays.ads
real_arrays.adb

A specialized version for use later with PDE's
lsfit.ads has 1D through 6D
lsfit.adb
test_lsfit6.adb
test_lsfit6_ada.out
array4d.ads
test_lsfit5.adb
test_lsfit5_ada.out
array5d.ads
test_lsfit4.adb
test_lsfit4_ada.out
array4d.ads
test_lsfit3.adb
test_lsfit3_ada.out
array3d.ads
test_lsfit2.adb
test_lsfit2_ada.out
integer_arrays.ads
integer_arrays.adb
real_arrays.ads
real_arrays.adb

The Makefile entry to make  test_lsfit6_ada.out
test_lsfit6_ada.out: test_lsfit6.adb lsfit.ads lsfit.adb \
                    array3d.ads array4d.ads array5d.ads array6d.ads \
                    real_arrays.ads real_arrays.adb \
                    integer_arrays.ads integer_arrays.adb
	gnatmake test_lsfit6.adb
	./test_lsfit6 > test_lsfit6_ada.out
	rm -f test_lsfit
	rm -f *.ali
	rm -f *.o

similarly for test_lsfit2_ada.out, test_lsfit3_ada.out,
              test_lsfit4_ada.out, test_lsfit5_ada.out


Similar code in plain C (up to 7th power in up to 6 or 4 dimensions)
lsfit.h
lsfit.c
test_lsfit.c
test_lsfit_c.out
test_lsfit7.c
test_lsfit7_c.out
test_write_lsfit7.c
test_write_lsfit7_c.out
test_write_lsfit71.c generated
test_write_lsfit72.c generated

Simple one variable versions  polyfit  polyval
polyfit.h
polyfit.c
polyval.h
polyval.c
test_polyfit.c
test_polyfit_c.out
test_polyfit.py3
test_polyfit_py3.out
test_polyfit_py3.png
poly.java
test_poly.java
test_poly_java.out
Poly.scala
Test_poly.scala
Test_poly_scala.out


truncated series vs lsfit for sin(x+y) and sin(x+y+z)
using code from above, another test case:
fit_sin.adb demonstrates that a lsfit to a specified power,
does not give the same coefficients as a truncated approximation,
to the same power, and is a more accurate fit.

fit_sin.adb
fit_sin_ada.out


comparison to very small Matlab code
generic_real_least_square_fit.ada

lsfit.m MatLab source code (tiny!)
lsfit_m.out MatLab output and plot




comparison to small Python code

test_polyfit.py Python source code
test_polyfit_py.out Python output and plot










Terms for fitting two and three variables, 2D and 3D
And to see how polynomials in two and three variables may be fit:

Given a value of Y for each value of X, get polynomial with X^2
Y_approximate =  C0 + C1 * X + C2 * X^2
Then the linear equations would be:

| SUM(1  *1) SUM(1  *X) SUM(1  *X^2) | | C0 |   | SUM(1  *Y) |
| SUM(X  *1) SUM(X  *X) SUM(X  *X^2) | | C1 |   | SUM(X  *Y) |
| SUM(X^2*1) SUM(X^2*X) SUM(X^2*X^2) |x| C2 | = | SUM(X^2*Y) |

Given a value of Y for each value of X, get polynomial with X^3
Y_approximate =  C0 + C1 * X + C2 * X^2 +  C3 * X^3
Then the linear equations would be:

| SUM(1  *1) SUM(1  *X) SUM(1  *X^2) SUM(1  *X^3) | | C0 |   | SUM(1  *Y) |
| SUM(X  *1) SUM(X  *X) SUM(X  *X^2) SUM(X  *X^3) | | C1 |   | SUM(X  *Y) |
| SUM(X^2*1) SUM(X^2*X) SUM(X^2*X^2) SUM(X^2*X^3) |x| C2 | = | SUM(X^2*Y) |
| SUM(X^3*1) SUM(X^3*X) SUM(X^3*X^2) SUM(X^3*X^3) | | C3 |   | SUM(X^3*Y) |

Note that index [i][j] of the matrix has SUM(X^i*X^j)) when subscripts 0..3


Note that the terms for two and three variables in a polynomial are:
      two               three        can optimize
    a^0 b^0          a^0 b^0 c^0   constant, "1"

    a^1 b^0          a^1 b^0 c^0   just a
    a^0 b^1          a^0 b^1 c^0   just b
                     a^0 b^0 c^1

    a^2 b^0          a^2 b^0 c^0   just a^2
    a^1 b^1          a^1 b^1 c^0   a*b
    a^0 b^2          a^1 b^0 c^1
                     a^0 b^2 c^0
                     a^0 b^1 c^1
                     a^0 b^0 c^2

Then terms with a sum of the powers equal to 3, 4, 5, 6 are available

Note that the matrix has the first row as the sum of each term
multiplied by the first term. The second row is the sum of each
term multiplied by the second term, etc.

The data for the terms is from the raw data sets of
y_actual a b    or  y_actual a b c 

being used to determine a fit
y_approx=F(a,b) or  y_approx=F(a,b,c)

Terms for the data point y1  a1  b1 are:
 1  a1  b1  a1^2  a1*b1  b1^2  the constant term  y1

These terms are multiplied by the first term, "1" and added to row 1.
These terms are multiplied by the second term, "a1" and added to row 2, etc.

Then the terms for data point  y2  a2  b2 are:
 1  a2  b2  a2^2  a2*b2  b2^2   the constant term  y2

These terms are multiplied by the first term, "1" and added to row 1.
These terms are multiplied by the second term, "a2" and added to row 2, etc.

Then the simultaneous equations are solved for the coefficients C1, C2, ...
to get the approximating function

 y_approx = F(a,b) = C1 + C2*a + C3*b + C4*a^2 + C5*a*b + C6*b^2

The following sample programs compute the least square fit of
internally generated data for low polynomial powers and compute
the accuracy of the fit in terms of root-mean-square error,
average error and maximum error. Note that the fit becomes exact
when the data is from a low order polynomial and the fit uses at
least that order polynomial.

least_square_fit_2d.c
least_square_fit_2d.out

least_square_fit_3d.c
least_square_fit_3d.out

least_square_fit_4d.c
least_square_fit_4d.out

least_square_fit2d.adb
least_square_fit2d_ada.out
real_arrays.ads
real_arrays.adb

Development of Python 2d and 3d least square fit
polyfit2.py3
polyfit2_py3.out
polyfit3.py3
polyfit3_py3.out

You can translate the above to your favorite language.


Now, if everything works, a live interactive demonstration of
least square fit. 

The files needed are:
Matrix.java
LeastSquareFit.java
LeastSquareFitFrame.java
LeastSquareFit.out
LeastSquareFitAbout.txt
LeastSquareFitHelp.txt
LeastSquareFitEvaluate.txt
LeastSquareFitAlgorithm.txt
LeastSquareFitIntegrate.txt

The default parameter is an n, all, point fit.
Then set parameter to 3, for n=3, third order polynomial fit.
Then set parameter to 4 and 5 to see improvement.


Homework 2


Spline fit, possible demonstration
There are many other ways to fit a set of points.
The Spline fit smooths the fit by controlling derivatives.

SplineFrame.java
(Replace "LeastSquareFit" with "Spline" then get same files as above.)
Demonstration to see a different type of fit.


Right click on points, left to right increasing X.
Left click to see plot.


Optional lecture on taylor fit to implement math functions
that may not be availabke in programming language you are using.
optional lecture 4.5

updated 10/30/2021