## CS455 Selected Lecture Notes

### This is one big WEB page, used for printing

 These are not intended to be complete lecture notes.
Complicated figures or tables or formulas are included here
in case they were not clear or not copied correctly in class.
Computer commands, directory names and file names are included.
Specific help may be included here yet not presented in class.
Source code may be included in line or by a link.

Lecture numbers correspond to the syllabus numbering.


### Contents

• Lecture 1, Introduction
• Lecture 2, Rocket Science
• Lecture 3, Simultaneous Equations
• Lecture 3a, Case Study, Matrix Inversion
• Lecture br, Boundary reduction of equations
• Lecture 4, Least Square Fit
• Lecture 5, Polynomials
• Lecture 6, Curve Fitting
• Lecture 7, Integration
• Lecture 8, Integration
• Lecture 9, Review 1
• Lecture 10, Quiz 1
• Lecture 11, Complex Arithmetic
• Lecture 12, Complex Functions
• Lecture 13, Eigenvalues of a Complex Matrix
• Lecture 14, LAPACK
• Lecture 15, Multiple precision, bignum
• Lecture 16, Finding Roots and Nonlinear Equations
• Lecture 17, Optimization, finding minima
• Lecture 18, FFT, Fast Fourier Transform
• Lecture 18a, Digital Filtering
• Lecture 18b, Molecular frequency response
• Lecture 19, Review 2
• Lecture 20, Quiz 2
• Lecture 21, Benchmarks, time and size
• Lecture 22, Project Discussion
• Lecture 23, Computing Volume and Area
• Lecture 24, Numerical Differentiation
• Lecture 24a, Computing partial derivatives
• Lecture 24b, Computing partial derivatives in polar, cylindrical, spherical
• Lecture 25, Ordinary Differential Equations
• Lecture 26, Ordinary Differential Equations
• Lecture 27, Partial Differential Equations
• Lecture 27a, Differential Equation Definitions
• Lecture ODE PDE Overview
• Lecture 28, Partial Differential Equations
• Lecture 28a, Additional Differential Equations
• Lecture 28d, Biharmonic PDE using higher order
• Lecture 28b, Navier Stokes case study
• Lecture 28e, 5D five dimensions, independent variables
• Lecture 28f, 6D six dimensions, Biharmonic
• Lecture 28g, extended to 7 dimensions
• Lecture 28h, PDE polar, cylindrical, spherical
• Lecture 28j, PDE toroid geometry
• Lecture 29, Review
• Lecture 30, Final Exam
• Supplemental L31, Creating PDE Test Cases
• Supplemental L31a, sparse solution of PDE
• Supplemental L31b, Nonlinear PDE
• Supplemental L31c, Parallel PDE
• Supplemental L31d, Parallel Multiple Precision PDE
• Supplemental L32, Finite Element Method
• Supplemental L33, Finite Element Method, triangle
• Supplemental L33a, Lagrange fit triangle
• Lecture 28c, fem_50 case study
• Supplemental L35, Navier Stokes Airfoil Simulation
• Supplemental L36, Some special PDE's
• Supplemental L36a, Special discretization, non uniform
• Supplemental L37, Vaious utility
• Supplemental L38, Open Tutorial on LaTeX
• L39, Tutorial on Numerical Solution of Differential Equations
• L40, Unique Numerical Solution of Differential Equations
• L41, Numerical solving AC circuits
• ### Lecture 1, Numerical Errors

Introduction:
Hello, my name is Jon Squire and I have been using computers
to solve numerical problems since 1959. I have about 1 million
lines of source code, written over the past 50 years.
How can that be? Check the numerical computation:
1,000,000/50 years is 20,000 lines per year.
20,000/200 working days per year is 100 lines per working day.
With a lot of reuse, cut-and-paste, same programs and
data files including scripts for many languages on many
operating systems, easy.

On a job, 20,000/(50 weeks*5 days per week) is 80 lines per day.
80/8 hours is 10 lines per hour. You can do that.
You may not save every line you type. sad.

Overview:

This lecture will provide the definitions and the intuition for
absolute error
relative error
round off error
truncation error

Then how to call intrinsic function and elementary functions.

Some terms will be used without comment in the rest of the course.
Learn them now. You should run the sample code to increase your
understanding and belief.

"Absolute error" A positive number, the difference between the

"Relative error" A positive number, the Absolute Error divided
by the exact answer. (Not defined if the exact answer is zero.)

Most numerical software is first tested with one or more
known solutions. Then, the software may be used on
problems where the solution is not known.

Given the exact answer is 1000 and the computed answer is 1001:
The absolute error is 1
The relative error is 1/1000 = 0.001

For a set of computed numbers there are three common absolute errors:
"Maximum Error" the largest error in the set.
"Average Error" the sum of the absolute errors divided by the number
in the set.
"RMS Error" the root mean square of the absolute errors.
sqrt(sum_over_set(absolute_error^2)/number_in_set)

The maximum error is 100.0     101.0 - 1.0
The average error is 1.0       100.0/100
The RMS error is 10.0          sqrt(100.0*100.0/100)

In some problems the main concern is the maximum error, yet
the RMS error is often the best intuitive measure of error.
Generally much better intuitive measure than average error.

Almost all Numerical Computation arithmetic is performed using
IEEE 754-1985 Standard for Binary Floating-Point Arithmetic.
The two formats that we deal with in practice are the 32 bit and
64 bit formats. You need to know how to get the format you desire
in the language you are programming. Complex numbers use two values.

older
C       Java    Fortran 95        Fortran     Ada 95        MATLAB    Python
------  ------  ----------------  ----------  ------------  --------  --------
32 bit  float   float   real              real        float         N/A       float
64 bit  double  double  double precision  real*8      long_float    'default' 'default'

complex
32 bit  'none'  'none'  complex           complex     complex       N/A       N/A
64 bit  'none'  'none'  double complex    complex*16  long_complex  'default' 'default'

'none' means not provided by the language (may be available as a library)
N/A means not available, you get the default.

IEEE Floating-Point numbers are stored as follows:
The single format 32 bit has
1 bit for sign,  8 bits for exponent, 23 bits for fraction
about 6 to 7 decimal digits
The double format 64 bit has
1 bit for sign, 11 bits for exponent, 52 bits for fraction
about 15 to 16 decimal digits

Some example numbers and their bit patterns:

decimal
stored hexadecimal sign exponent  fraction                 significand

The "1" is not stored |
|
31  30....23  22....................0  |
1.0
3F 80 00 00          0  01111111  00000000000000000000000  1.0   * 2^(127-127)

0.5
3F 00 00 00          0  01111110  00000000000000000000000  1.0   * 2^(126-127)

0.75
3F 40 00 00          0  01111110  10000000000000000000000  1.1   * 2^(126-127)

0.9999995
3F 7F FF FF          0  01111110  11111111111111111111111  1.1111* 2^(126-127)

0.1
3D CC CC CD          0  01111011  10011001100110011001101  1.1001* 2^(123-127)

63  62...... 52  51 .....  0
1.0
3F F0 00 00 00 00 00 00    0  01111111111  000 ... 000  1.0    * 2^(1023-1023)

0.5
3F E0 00 00 00 00 00 00    0  01111111110  000 ... 000  1.0    * 2^(1022-1023)

0.75
3F E8 00 00 00 00 00 00    0  01111111110  100 ... 000  1.1    * 2^(1022-1023)

0.9999999999999995
3F EF FF FF FF FF FF FF    0  01111111110  111 ...      1.11111* 2^(1022-1023)

0.1
3F B9 99 99 99 99 99 9A    0  01111111011  10011..1010  1.10011* 2^(1019-1023)
|
sign   exponent      fraction                      |
before storing subtract bias

Note that an integer in the range 0 to 2^23 -1 may be represented exactly.
Any power of two in the range -126 to +127 times such an integer may also
be represented exactly. Numbers such as 0.1, 0.3, 1.0/5.0, 1.0/9.0 are
represented approximately. 0.75 is 3/4 which is exact.
Some languages are careful to represent approximated numbers
accurate to plus or minus the least significant bit.
Other languages may be less accurate.

Now for some experiments for you to run an the computer of your choice
in the language of your choice.

In single precision floating point print:
10^10 * sum( 1.0, 10^-7, -1.0)         answer should be  1,000
10^10 * sum( 1.0, 0.5*10^-7, -1.0)     answer should be    500
expression 10^10 * ( 1.0 + 10^-7 -1.0) answer should be  1,000
10000000000.0*(1.0+10.0E-7-1.0)
The order of addition is important.
Adding a small number to 1.0 may not change the value.
This small number is less that what we call "epsilon".

error_demo1.c source code
error_demo1_c.out output

error_demo1.f90 source code
error_demo1_f90.out output

error_demo1.java source code
error_demo1_java.out output

error_demo1.m source code
error_demo1_m.out output

Remember 10^-7 is 0.0000001, not a power of 2.
Thus, can not be stored exactly.

Also, floating point arithmetic is performed in registers with
more bits than can be stored. Thus, as shown below, you may
get more precision than you expect. Do not count on it.
epsilon.c showing more precision source code
epsilon.out showing forced store output

We will cover, in the course,  methods of reducing error in areas:

statistics  sum x^2 - (sum x)^2  vs  sum(x-mean)

polynomial definition, evaluation Horner's rule vs x^N

approximation, e.g. sin(x)  truncation error vs roundoff error

derivatives

indefinite integral, definite integral, area

partial differential equations

and show sample code in many languages:
Ada 95, C, Fortran 95, Java and MatLab

Error accumulation when computing standard deviation.
Subtracting large numbers loses significant digits.
123456 - 123455 = 1.00000  only 1 significant digit
With only 6 digits representable
123456.00 - 123455.99 = 1.00000 yet should be 0.010000

Two ways of computing the standard deviation are shown,
with the errors indicated for various sets of data:
Cases:
stddev(1, 2,..., 100)
stddev(10, 20,..., 1,000)              should just scale
stddev(100, 200,..., 10,000)
stddev(1,000, 2,000,..., 100,000)
stddev(10,000, 20,000,..., 1,000,000)
stddev(10,001, 10,002,..., 10,100)     should be same as first

sigma_error.c
sigma_error_c.out

sigma_error.f90
sigma_error_f90.out

sigma_error.java
sigma_error_java.out

sigma_error.m
sigma_error_m.out

Iteration needing uniform step size should use multiplication,

The elementary functions are sin, cos, tan, exp, sqrt
and inverse forms asin, acos, atan, log, power
and reciprocal forms cosecant, secant, cotangent,
and hyperbolic forms sinh, cosh, tanh,
and inverse hyperbolic forms asinh, acosh, atanh, ...

The intrinsic functions are built into the language, e.g. abs (sometimes)

Note that Fortran 95 and Java need no extra information to get the
real valued elementary functions. "C" needs #include <math.h>

single precision as 'float' and double precision as 'long_float'.

Note that Fortran 95 overloads the function names and provides
single precision as 'real' and double precision as 'double precision'.
ef_f95.f90 Fortran 95, real and double

Note that Java provides only double precision as 'double'.
ef_java.java Java, only double

Note that C only provides double precision as 'double'.
ef_c.c C, only double is available
ef_c.out nan means not a number, bad input

Note that MATLAB has the most functions and all functions are
automatically double or double complex as needed.
ef_matlab.m MATLAB everything
ef_matlab.out automatic complex

Note that Python needs  import math  and can list available functions
test_math.py Python many, automatic conversion
test_math_py.out Python output

Just for fun: Power of 2 "tree"

A small program that prints epsilon, 'eps' and the largest and
smallest floating point numbers, run on one machine, shows:

float_sig.c running, may die
type float has about six significant digits
eps is the smallest positive number added to 1.0f that changes 1.0f
type float eps= 5.960464478E-08
type float small= 1.401298464E-45
this could be about 1.0E-38, above shows un-normalized IEEE floating point
type float large= 1.701411835E+38

type double has about fifteen significant digits
type double eps= 1.11022302462515654E-16
type double small=4.940656458E-324
this could be about 1.0E-304, above shows un-normalized IEEE floating point
type double large=8.988465674E+307

The program is float_sig.c

Using 2^n = 10^k/3.32 and n bits has a largest number 2^n -1 and
k digits has a largest number 10^k -1. 24 bits would be 7 digits.
53 bits would be 16 digits, the more optimistic significant
digits for IEEE floating point.

Notes: I have chosen to keep most WEB pages plain text so that
they may be read by all browsers. Some referenced pages may
be .pdf, .jpg, .gif, .ps or other file types.
Many math books use many Greek letters. You may want to refer
to various alphabets.

If you want to stretch your concept of numbers,
check out the Continuum Hypothesis

Beware of easy methods of printing, related to epsilon
eps.py Python "print"
eps.f90 Fortran "print"



### Lecture 2, Rocket Science

Some physical problems are easy to solve numerically using just
the basic equations of physics. Other problems may be very difficult.

Consider a specific model rocket with a specific engine.
Given all the data we can find, compute the maximum altitude
the rocket can obtain. Yes, this is rocket science.

Estes Alpha III
Length      12.25  inches =  0.311 meters
Diameter     0.95  inches =  0.0241 meters
Body area    0.785 square inches = 0.506E-3  square meters cross section
Cd of body   0.45  dimensionless
Fins area    7.69  square inches = 0.00496  square meters total for 3 fins
Cd of fins   0.01  dimensionless
Weight/mass  1.2   ounce  = 0.0340 kilogram without engine
Engine       0.85  ounce  = 0.0242 kilogram initial engine mass
Engine       0.33  ounce  = 0.0094 kilogram final engine mass

Thrust curve
Total impulse  8.82 newton seconds (area under curve)
Peak thrust   14.09 newton
Average thrust 4.74 newton
Burn time      1.86 second

Initial conditions:
t = 0   time
s = 0   height
v = 0   velocity
a = 0   acceleration
F = 0   total force not including gravity
m = 0.0340 + 0.0242  mass

Basic physics:
t = t + dt  time advances

Fd = Cd*Rho*A*v^2 /2   for both body and fins
Fd is force of drag in newtons in opposite direction of velocity
Cd is coefficient of drag, dimensionless (depends on shape)
Rho is density of air, use 1.293 kilograms per meter cubed
A is total surface area in square meters
v is velocity in meters per second (v^2 is velocity squared)

Fg = m*g  Fg is force of gravity toward center of Earth
m is mass in kilograms
g is acceleration due to gravity, 9.80665 meters per second squared

Ft =     value from thrust curve at this time, you enter this data.
Do not copy! This is part of modeling and simulation.

F = Ft - (Fd body + Fd fins + Fg)   resolve forces

a = F/m   a is acceleration we will compute from knowing
F, total force in newtons and
m is mass in kilograms of body plus engine mass that changes

dv = a*dt dv is velocity change in meters per second in time dt
a is acceleration in meters per second squared
dt is delta time in seconds

v = vp+dv v is new velocity after the dt time step
(v is positive upward, stop when v goes negative)
vp is previous velocity prior to the dt time step
dv is velocity change in meters per second in time dt

ds = v*dt ds is distance in meters moved in time dt
v is velocity in meters per second
dt is delta time in seconds

s = sp+ds s is new position after the dt time step
sp is previous position prior to the dt time step
ds is distance in meters moved in time dt

m = m -0.0001644*Ft  apply each time step

print and go to next time step

Homework Problem 1:
Write a small program to compute the maximum height when
the rocket is fired straight up. Assume no wind.
In order to get reasonable consistency of answers, use dt = 0.1 second
Every student will have a different answer.
Some where near 350 meters that is printed on the box. +/- 30%
Any two answers that are the same, get a zero.

Suggestion: Check the values you get from the thrust curve by
simple summation. Using zero thrust at t=0 and t=1.9 seconds, sampling
at 0.1 second intervals, you should get a sum of about 90 . Adjust
values to make it this value in order to get reasonable consistency of

The mass changes as the engine burns fuel and expels mass
at high velocity. Assume the engine mass decreases from 0.0242 kilograms
to 0.0094 grams proportional to thrust. Thus the engine mass is
decreased each 0.1 second by the thrust value at that time times
(0.0242-0.0094)/90.0 = 0.0001644 . mass=mass-0.0001644*thrust at this time.
"thrust" = 0.0 at time t=0.0 seconds
"thrust" = 6.0 at time t=0.1 seconds.
"thrust" = 0.0 at and after 1.9 seconds. Important, rocket is still climbing.
Check that the mass is correct at the end of the flight. 0.0340+0.0094

Published data estimates a height of 1100 feet, 335  meters to
1150 feet, 350 meters.

Your homework is to write a program that prints every 0.1 seconds:
the time in seconds
height in meters
velocity in meters per second
acceleration in meters per second squared
force in newtons
mass in kilograms (just numbers, all on one line)
and stop when the maximum height is reached.

Think about what you know. It should become clear that at each
time step you compute the body mass + engine mass, the three
forces combined into Ft-Fd_body-Fd_fins-Fg, the acceleration, the velocity
and finally the height. Obviously stop without printing if
the velocity goes negative (the rocket is coming down).

The program has performed numerical double integration. You might
ask "How accurate is the computation?"
Well, the data in the problem statement is plus or minus 5%.
We will see later that the computation contributed less error.
A small breeze would deflect the rocket from vertical and
easily cause a 30% error. We should say that:
"the program computed the approximate maximum height."

Additional cases you may wish to explore.
What is the approximate maximum height without any drag,
set Rho to 0.0 for a vacuum.
What is the approximate maximum height using dt = 0.05 seconds.
What is the approximate maximum height if launched at 45 degrees
rather than vertical, resolve forces in horizontal and vertical.

For this type of rocket to have stable flight the center of gravity
of the rocket must be ahead of the center of pressure. The center
of pressure is the centroid of the area looking at the side of
the rocket. This is why the fins extend out the back and the
engine mass is up in the rocket.

Homework 1 assignment



### Lecture 3 Solving Simultaneous Equations


Simultaneous equations are multiple equations involving the same
variables. In general, to get a unique solution we need the
same number of equations as the number of unknown variables,
and the equations mutually linearly independent.

A sample set of three equations in three unknowns is:

eq1:  2*x + 3*y + 2*z = 13
eq2:  3*x + 2*y + 3*z = 17
eq3:  4*x - 2*y + 2*z = 12

One systematic solution method is called the Gauss-Jordan reduction.
We will reduce the three equations such that eq1: has only x,
eq2: has only y and eq3: has only z making the constant yield
of each equation. The numeric solution will perform the same
operations on a matrix.

Reduce the coefficient of x in the first equation to 1, dividing by 2

eq1:  1*x + 1.5*y + 1*z = 6.5

Eliminate the variable x from eq2: by subtracting eq2 coefficient of
x times eq1:

eq2:  becomes   eq2: - 3* eq1:

eq2:  (3-3)*x + (2-4.5)*y + (3-3)*z = (17-19.5)  then simplifying
eq2:      0*x - 2.5*y + 0*z = -2.5

Eliminate the variable x from eq3: by subtracting eq3 coefficient of
x times eq1:

eq3:  becomes   eq3: - 4* eq1:

eq3:  (4-4)*x (-2 -6)*y + (2-4)*z = (12-26) then becomes
eq3:      0*x - 8*y - 2*z = -14

The three equations are now

eq1:  1*x + 1.5*y + 1*z =   6.5
eq2:  0*x - 2.5*y + 0*z = - 2.5
eq3:  0*x - 8.0*y - 2*z = -14.0

Reduce the coefficient of y in eq2: to 1, dividing by -2.5

eq2:  0*x + 1*y + 0*z = 1

Eliminate the variable y from eq1: by subtracting eq1 coefficient of y times eq2:

eq1:  becomes  eq1: -1.5* eq2:

eq1:  1*x + 0*y + 1*z =  (6.5-1.5)  then becomes
eq1:  1*x + 0*y + 1*z = 5

Eliminate the variable y from eq3: by subtracting eq3 coefficient of y times eq2:

eq3:  becomes  eq3: -8* eq2:

eq3:  0*x + 0*y - 2*z =  (-14 +8)  then becomes
eq3:  0*x + 0*y - 2*z = -6

The three equations are now

eq1:  1*x + 0*y + 1*z =  5
eq2:  0*x + 1*y + 0*z =  1
eq3:  0*x + 0*y - 2*z = -6

Reduce the coefficient of z in eq3: to 1, dividing by -2

eq3:  0*x + 0*y + 1*z = 3

Eliminate the variable z from eq1: by subtracting eq1 coefficient of z times eq2:

eq1:  becomes  eq1: -1* eq2:

eq1:  1*x + 0*y + 0*z =  (5-3)  then becomes
eq1:  1*x + 0*y + 0*z = 2

Eliminate the variable z from eq2: by subtracting eq2 coefficient of z times eq2:

eq2:  becomes  eq3: 0* eq2:

eq2:  0*x + 1*y + 0*z = 1

The three equations are now

eq1:  1*x + 0*y + 0*z = 2   or  x = 2
eq2:  0*x + 1*y + 0*z = 1   or  y = 1
eq3:  0*x + 0*y + 1*z = 3   or  z = 3  the desired solution.

The numerical solution simply places the values in a matrix and uses the same
reductions shown above.

Given the equations:

eq1:  2*x + 3*y + 2*z = 13
eq2:  3*x + 2*y + 3*z = 17
eq3:  4*x - 2*y + 2*z = 12

Create the matrix  | 2  3  2  13 |  having 3 rows and 4 columns
A = | 3  2  3  17 |
| 4 -2  2  12 |

The following code, using n=3 for these three equations, computes the
same desired solution as the manual method above.

for k=1:n
for j=k+1:n
A(k,j) = A(k,j)/A(k,k)
end
for i=1:n
if(i not k)
for j=1:n
A(i,j) = A(i,j) - A(i,k)*A(k,J)

Pivoting to avoid zero on diagonal and improve accuracy

Now we must consider the possible problem of A(k,k) being zero
for some value of k. It is rather obvious that the order of the
equations does not matter. The equations can be given in any
order and we get the same solution. Thus, simply interchange
any equation where we are about to divide by a zero A(k,k)
with some equation below that would not result in a zero A(k,k).
It turns out that we get better accuracy if we always pick
the equation that has the largest absolute value for A(k,k).
If the largest value turns out to be zero then there is no
unique solution for the set of equations.

We generally want numerical code to run efficiently and thus we
will not physically interchange the equations but rather keep
a row index array that tells us where the k th row is now.
The code for the final algorithm is given in the links below.

Working code in many languages

The instructor understands that some students have a strong
prejudice in favor of, or against, some programming languages.
languages, the instructor finds that the difference between
programming languages is mostly syntactic sugar. Yet, since
students may be able to read some programming languages
easier than others, these examples are presented in "C",
Fortran 95, Java and Ada 95. The intent was to do a quick
translation, keeping most of the source code the same,
for the different languages. Style was not a consideration.
Some rearranging of the order was used when convenient.
The numerical results are almost exactly the same.

The same code has been programmed in "C", Fortran 95, Java and Ada 95
as shown below with file types .c, .f90, .java and .adb:

simeq.c "C" language source code
time_simeq.c "C" language source code
time_simeq.out output

simeq.f90 Fortran 95 source code
time_simeq.f90 Fortran 95 source code
time_simeq_f90_cs.out output

simeq.java Java source code
time_simeq.java Java source code
time_simeq_java.out output

simeq.m MATLAB language source code
simeq_m.out MATLAB output

test_solve.py Python language source code
test_solve_py.out Python output

Many methods have been devised for solving simultaneous equations.
A very small sample is LU decomposition and Crout method.
LU decomposition with pivoting:
lup_decomp.f90  source code
lup_decomp_f90.out  output
Crout method without pivoting, uses back substitution:
Back substitution:
The difference from plain Gauss-Jordan with pivoting
and back substitution is less inner reduction,
then use back substitution. Solve A * X = Y
The initial reduction creates a matrix of the form:

| 1  a12  a13  a14  y1 |
| 0   1   a23  a24  y2 |
| 0   0    1   a34  y3 |
| 0   0    0    1   y4 |

Thus we have      x4 = y4
Back substituting x3 = y3 - x4*a34
Back substituting x2 = y2 - x3*a23 - x4*a24
Back substituting x1 = y1 - x2*a12 - x3*a13 - x4*a14

simeq_back.h  source code
simeq_back.c  source code
test_simeq_back.c  source code
test_simeq_back.out  output

Tailoring

Throughout this course you will see variations of this source code,
tailored for specific applications. The packaging will change with
"C" files having code inside with 'static void', Fortran 95 code using
modules and,  Java and Ada code using packages. Python etc. code.

It should be noted that the algorithm is exactly the same for sets
of equations with complex values. The code change is simply
changing the type in Fortran 95, Java, and Ada 95. The Java class
'Complex' is on my WEB page. The "C" code requires a lot of changes.

I wrote the first version of this program for the IBM 650 in assembly
language as an electrical engineering student. The program was for
complex values and solved for node voltages in alternating current
circuits. A quick and dirty version is ac_circuit.java
that needs a number of java packages:
Matrix.java
Complex.java
ComplexMatrix.java
ac_analysis.java an improved version

Then an even more complete version that plots up to eight node voltages.

ac_plot.dat capacitor, inductor and tuned circuits

Output of  java myjava.ac_plot.java < ac_plot.dat

There are systems of equations with no solutions:

eq1:  1*x + 0*y + 0*z = 2
eq2:  2*x + 0*y + 0*z = 2
eq3:  4*x - 2*y + 3*z = 5

Some may ask: What about solving  |A| * |X| = |Y| for X, given A and Y
using  |X| = |Y| * |A|^-1   (inverse of matrix A) ?
The reason this is not a good numerical solution is that slightly
more total error will be in the inverse |A|^-1 and then a little
more error will come from the vector times matrix multiplication.

Matrix Inverse

The code for matrix inverse is very similar to the code for solving
simultaneous equations. Added effort is needed to find the
maximum pivot element and there must be both row and column
interchanges. An example that shows the increasing error with the
increasing size of the matrix, on a difficult matrix, is shown below.
Note that results of a 16 by 16 matrix using 64-bit IEEE Floating
point arithmetic that is ill conditioned may become useless.

inverse.f90
test_inverse.f90
test_inverse_f90.out

Extracted form test_inverse_f90.out is
initializing big matrix, n= 2 , n*n= 4
sum of error= 1.84748050191530E-16 , avg error= 4.61870125478825E-17
initializing big matrix, n= 4 , n*n= 16
sum of error= 2.19971263426544E-12 , avg error= 1.37482039641590E-13
initializing big matrix, n= 8 , n*n= 64
sum of error= 0.00000604139304982709 , avg error= 9.43967664035483E-8
initializing big matrix, n= 16 , n*n= 256
sum of error= 83.9630735209012 , avg error= 0.327980755941020
initializing big matrix, n= 32 , n*n= 1024
sum of error= 4079.56590417946 , avg error= 3.98395107830025
initializing big matrix, n= 64 , n*n= 4096
sum of error= 53735.8765782488 , avg error= 13.1191104927365
initializing big matrix, n= 128 , n*n= 16384
sum of error= 85784.2643647822 , avg error= 5.23585597929579
initializing big matrix, n= 256 , n*n= 65536
sum of error= 1097119.16168229 , avg error= 16.7407098645368
initializing big matrix, n= 512 , n*n= 262144
sum of error= 1.36281435213093E+7 , avg error= 51.9872418262837
initializing big matrix, n= 1024 , n*n= 1048576
sum of error= 1.24247404738082E+9 , avg error= 1184.91558778841

Very similar results from the C version:

inverse.c
test_inverse.c
test_inverse.out

Extracted form test_inverse.out is
initializing big matrix, n=1024, n*n=1048576
sum of error=1.24247e+09, avg error=1184.92

Multiple Precision

A case study using 32-bit IEEE floating point and 50, 100, and 200
digit multiple precision are shown in Lecture 3a

Reducing the number of equation when some values are known

Later, when we study partial differential equations, we will need
a process  for reducing the number
of equations when we know the value of one or more elements
of the unknown vector.

Nonlinear equations and systems of nonlinear equations
are covered in Lecture 16

Really difficult, are systems of nonlinear equations that need
a solution. The following examples have many comments describing
one or more possible methods of solution:
(Later versions have fewer bugs)

simeq_newton.f90  with debug printout
simeq_newton_f90.out
simeq_newton2.f90
simeq_newton2_f90.out
inverse.f90  used by above
udrnrt.f90  used by above
equation_nl.f90  with debug printout
equation_nl_f90.out

simeq_newton.c  with debug printout
simeq_newton.out
simeq_newton2.c
simeq_newton2.out
simeq_newton3.h
simeq_newton3.c
test_simeq_newton3.c
test_simeq_newton3_c.out
simeq_newton4.h
simeq_newton4.c
test_simeq_newton4.c
test_simeq_newton4_c.out
invert.h  used by above
invert.c  used by above
udrnrt.h  used by above
udrnrt.c  used by above
equation_nl.c  with debug printout
equation_nl_c.out

simeq_newton.java  with debug printout
simeq_newton_java.out
simeq_newton2.java
simeq_newton2_java.out
simeq_newton5.java
test_simeq_newton5.java
test_simeq_newton5_java.out
invert.java  used by above
equation_nl.java  with debug printout
equation_nl_java.out
inverse.java  used by above

Accuracy does degrade as the relative size of solution and
matrix elements gets large. Expect similar results with any method.
This program tests 0, 1, 2, 5, ... 1E9, 2E9, 5E9, 1E10 for various n.
simeq_accuracy.c  big range test
simeq_accuracy_c.out  big range results



### Lecture 3a Case Study, Matrix Inversion


The code for matrix inverse is very similar to the code for solving
simultaneous equations. Added effort is needed to find the
maximum pivot element and there must be both row and column
interchanges. An example that shows the increasing error with the
increasing size of the matrix, on a difficult matrix, is shown below.
Note that results of a 16 by 16 matrix using 64-bit IEEE Floating
point arithmetic that is ill conditioned may become useless.

Given a matrix A, computing the inverse AI, then checking that
|A| * |AI| = |II|  is approximately the identity matrix |I|
is useful and possibly very important.

The check used for this case study was to sum the absolute values
of  |II| - |I| and print the sum and also print the sum divided
by  n*n,  the number of elements in the matrix.

This case study uses the classic, difficult to invert,
variation of the Hilbert Matrix, in floating point format,
shown here as rational numbers:

| 1/2  1/3  1/4  1/5 |   using i for the column index and
| 1/3  1/4  1/5  1/6 |   j for the row index,
| 1/4  1/5  1/6  1/7 |   the (i,j) element is  1/(i+j)
| 1/5  1/6  1/7  1/8 |   as a floating point number

A few solutions are (A followed by AI):

| 1/2 |  n=1      | 2 |

| 1/2 1/3 |  n=2      |  18  -24 |
| 1/3 1/4 |           | -24   36 |

| 1/2 1/3 1/4 1/5 | n=4   |   200  -1200   2100  -1120 |
| 1/3 1/4 1/5 1/6 |       | -1200   8100 -15120   8400 |
| 1/4 1/5 1/6 1/7 |       |  2100 -15120  29400 -16800 |
| 1/5 1/6 1/7 1/8 |       | -1120   8400 -16800   9800 |

| 1/2  1/3  1/4  1/5  1/6  1/7  1/8  1/9  |
| 1/3  1/4  1/5  1/6  1/7  1/8  1/9  1/10 |
| 1/4  1/5  1/6  1/7  1/8  1/9  1/10 1/11 |
| 1/5  1/6  1/7  1/8  1/9  1/10 1/11 1/12 |
| 1/6  1/7  1/8  1/9  1/10 1/11 1/12 1/13 |
| 1/7  1/8  1/9  1/10 1/11 1/12 1/13 1/14 |
| 1/8  1/9  1/10 1/11 1/12 1/13 1/14 1/15 |
| 1/9  1/10 1/11 1/12 1/13 1/14 1/15 1/16 |

|     2592     -60480      498960    -1995840      4324320     -5189184      3243240     -823680 |
|   -60480    1587600   -13970880    58212000   -129729600    158918760   -100900800    25945920 |
|   498960  -13970880   128066400  -548856000   1248647400  -1553872320    998917920  -259459200 |
| -1995840   58212000  -548856000  2401245000  -5549544000   6992425440   -454053600   118918800 |
|  4324320  -12972960  1248647400  5549544000  12985932960 -16527551040  10821610800 -2854051200 |
| -5189184  158918760 -1553872320  6992425440 -16527551040  21210357168 -13984850880  3710266560 |
|  3243240 -100900800   998917920 -4540536000  10821610800 -13984850880   9275666400 -2473511040 |
|  -823679   25945920  -259459200  1189188000  -2854051200   3710266560  -2473511040   662547600 |

Note the exponential growth of the size of numbers in the inverse.

One measure of the difficulty of inverting a matrix is the size
of the largest diagonal during each step of the inversion process.
The magnitude of the largest element of the inverse will be
approximately the order of the reciprocal of the smallest
of the largest diagonal.

The smallest of the largest diagonal for a few cases are:

n =   2  .277E-1
n =   4  .340E-4
n =   8  .770E-10
n =  16  .560E-22
n =  32  .358E-46
n =  64  .645E-95
n = 128  .178E-192
n = 256  failed with 200 digit multiple precision

The average error, as computed for various precisions, is

7-digit   15-digit   166-bit    332-bit    664-bit     MatLab
IEEE      IEEE       mpf        mpf        mpf         IEEE
n    32-bit    64-bit     50-digit   100-digit  200-digit   64-bit

2    1.29E-7   4.61E-17   1.99E-59   1.36E-107  6.37E-204   0.00
4    7.84E-5   1.37E-13   1.91E-58   3.05E-106  6.12E-203   2.39E-13
8    3.83E0    9.43E-8    4.44E-50   1.81E-98   3.24E-194   7.31E-7
16    1.23E1    3.27E-1    8.13E-39   2.77E-87   8.56E-183   2.37E2
32    5.20E0    3.98E0     4.72E-14   4.45E-62   1.19E-158   1.46E6
64    6.13E0    1.31E1     1.01E8     2.20E-13   1.20E-109   2.55E8
128    8.39E0    5.23E0     7.79E7     3.46E8     3.83E-12    2.27E10
256    1.14E3    1.67E1     1.76E8     8.51E7     1.05E8      8.38E11

The errors bigger than E-5 are very deceiving
in the first two columns. They indicate failure to invert.
MatLab does indicate failure for n=16 and larger, other codes had
the matrix singular error suppressed.

A reasonable conclusion, for this matrix, is that an n by n matrix
needs more than n bits of floating point precision in order to
get a reliable inverse. Twice the number of bits as n to get
good results.

Before you panic, notice the results for the same test in MatLab
for this hard to invert matrix verses a pseudo random matrix.

n=2,   avgerr=0            , rnderr=4.16334e-017
n=4,   avgerr=2.39808e-013 , rnderr=1.04246e-016
n=8,   avgerr=7.31534e-007 , rnderr=7.86263e-016
n=16,  avgerr=237.479      , rnderr=3.57e-016
n=32,  avgerr=1.46377e+006 , rnderr=7.5899e-016
n=64,  avgerr=2.55034e+008 , rnderr=2.34115e-015
n=128, avgerr=2.2773e+010  , rnderr=6.70795e-015
n=256, avgerr=8.38903e+011 , rnderr=1.9137e-014

Well conditioned matrices may be inverted for n in the range
10,000 to 20,000 with IEEE 64-bit floating point.
Many large matrices are sparse, having many zero elements,
and may have only bands of non zero elements. Unfortunately
the inverse of sparse matrices are not sparse, thus
sparse matrix storage techniques may actually be slower.

The MatLab code is, of course, the shortest (stripped here):
n=1;
for r=1:8
n=2*n;
A=zeros(n,n);
B=rand(n,n);
for i=1:n
for j=1:n
A(i,j)=1.0/(i+j);
end
end
avgerr=sum(sum(abs(A*inv(A)-eye(n,n))))/(n*n)
rnderr=sum(sum(abs(B*inv(B)-eye(n,n))))/(n*n)
end

The detailed code and results are:

invrnd.m actual MatLab code
invrnd_m.out file output
invrnd_mm.out partial screen output

inverse.f90 basic inverse
test_inverse.f90 test program
test_inverse_f90.out output

Extracted form test_inverse_f90.out is
initializing big matrix, n= 2 , n*n= 4
sum of error= 1.84748050191530E-16 , avg error= 4.61870125478825E-17
initializing big matrix, n= 4 , n*n= 16
sum of error= 2.19971263426544E-12 , avg error= 1.37482039641590E-13
initializing big matrix, n= 8 , n*n= 64
sum of error= 0.00000604139304982709 , avg error= 9.43967664035483E-8
initializing big matrix, n= 16 , n*n= 256
sum of error= 83.9630735209012 , avg error= 0.327980755941020
initializing big matrix, n= 32 , n*n= 1024
sum of error= 4079.56590417946 , avg error= 3.98395107830025
initializing big matrix, n= 64 , n*n= 4096
sum of error= 53735.8765782488 , avg error= 13.1191104927365
initializing big matrix, n= 128 , n*n= 16384
sum of error= 85784.2643647822 , avg error= 5.23585597929579
initializing big matrix, n= 256 , n*n= 65536
sum of error= 1097119.16168229 , avg error= 16.7407098645368
initializing big matrix, n= 512 , n*n= 262144
sum of error= 1.36281435213093E+7 , avg error= 51.9872418262837
initializing big matrix, n= 1024 , n*n= 1048576
sum of error= 1.24247404738082E+9 , avg error= 1184.91558778841

Very similar results from the C version:

inverse.c
inverse.h
test_inverse.c
test_inverse.out

Similar results from float rather than double in the C version:

inversef.c
inversef.h
test_inversef.c
test_inversef.out

Exploring results from 50 digit multiple precision arithmetic version:

mpf_inverse.c
mpf_inverse.h
test_mpf_inverse.c
test_mpf50_inverse.out

Changing 'digits' to 100 digit multiple precision arithmetic version:

test_mpf100_inverse.out

Changing 'digits' to 200 digit multiple precision arithmetic version:

test_mpf200_inverse.out

The 200 digit run with more output:

test_mpf_inverse.out

Java version, double

inverse.java
test_inverse.java
test_inverse_java.out

Java version, BigDecimal 300 bits or more

Big_inverse.java
test_Big_inverse.java
test_Big_inverse_java.out

Big_simeq.java
test_Big_simeq.java
test_Big_simeq_java.out
time_Big_simeq.java
time_Big_simeq_java.out

Ada version 50, 100 and 200 digits

Basic gcc stack storage limitation prevented getting all output.
Even happens on 64-bit computer with 8GB or RAM.




We have a number of clusters at UMBC, I happen to use
our Bluegrit cluster and the MPI examples are from it.
For multi core machines, there are Java Threads and
At the end are a few multi core benchmarks for you to run.

NOTE:
MPI is running on a distributed memory system.
Each process may be considered to have local memory and
in general, there is no common shared memory.

Multicore machines are described here as shared memory systems.

Some parallel programming techniques apply to both
distributes and shared memory, some techniques do not
apply to both memory systems.

MPI
MPI stands for Message Passing Interface and is
available on many multiprocessors. MPI may be installed
as the open source version MPICH. There are other
software libraries and languages for multiprocessors,
yet, this lecture only covers MPI.

The WEB page here at UMBC is
www.csee.umbc.edu/help/MPI

Programming in MPI is the SPMD Single Program Multiple data style of
programming. One program runs on all CPU's in the multiprocessor.
Each CPU has a number, called a rank in MPI, called myid in my code and
called node or node number in comments.

"if-then-else" code may be based on node number is used to have
unique computation on specific nodes. There is a master node,
typically the node with rank zero in MPI. The node number may
also be used in index expressions and other computation. Many
MPI programs use the master as a number cruncher along with the
other nodes in addition to the master serving as overall control
and synchronization.

Examples below are given first in "C" and then a few in Fortran.
Other languages may interface with the MPI library.
These just show a simple MPI use, these are combined later for
solving simultaneous equations on a multiprocessor.

Just check that a message can be sent and received from each
node, processor, CPU, etc. numbered as "rank".

roll_call.c

roll_call.out

Just scatter unique data from the "master" to all nodes.
Then gather the unique results from all nodes.

scat.c

scat.out

Here is the Makefile I used.
Makefile for C on Bluegrit cluster

Repeating the "roll_call" just changing the language to Fortran.

roll_call.F

roll_call_F.out

Repeating scatter/gather just changing the language to Fortran.

scat.F

scat_F.out

The Fortran version of the Makefile with additional files I used.

Makefile for Fortran on Bluegrit cluster

my_mpif.h only needed if not on cluster

nodes only needed if default machinefile not used

MPI Simultaneous Equations

Now, the purpose of this lecture, solve huge number of simultaneous
equations on a highly parallel multiprocessor.

Well, start small when programming a multiprocessor and print out
every step to be sure the indexing and communication is exactly
correct.

This is hard to read, yet it was a necessary step.

psimeq_debug.c

psimeq_debug.out

Then, some clean up and removing or commenting out most debug print:

psimeq1.c

psimeq1.out

The input data was created so that the exact answers were 1, 2, 3 ...
It is interesting to note: because the data in double precision floating
point was from the set of integers, the answers are exact for
8192 equations in 8192 unknowns.

psimeq1.out8192

|A| * |X| = |Y|  given matrix |A| and vector |Y| find vector |X|

| 1 2 3 4 5 | |5| | 35|  for 5 equations in 5 unknowns
| 2 2 3 4 5 | |4| | 40|  the solved problem is this
| 3 3 3 4 5 |*|3|=| 49|
| 4 4 4 4 5 | |2| | 61|
| 5 5 5 5 5 | |1| | 75|

A series of timing runs were made, changing the number of equations.
The results were expected to increase in time as order n^3 over the
number of processors being used. Reasonable agreement was measured.

Using 16 processors:
Number of  Time computing  Cube root of
equations  solution (sec)  16 times Time (should approximately double
1024       3.7          3.9           as number of equations double)
2048      17.2          6.5
4096      83.5         11.0
8192     471.9         19.6

More work may be performed to minimize the amount of
data send and received in "rbuf".

Basic primitive barrier in C pthreads

Simultaneous equation solution using AMD 12 core
tsimeq.h
tsimeq.c
time_tsimeqb.c
time_tsimeqb.out

Java Simultaneous Equations

Java threads are demonstrated by the following example.
When run, there are four windows, each showing a dot as that thread runs.
Note that CPU and Wall time are measured and printed. (on some Java versions)

The basic structure of threads needed for my code:
(I still have not figured out why I need the dumb "sleep" in 2)
Barrier2.java
Barrier2_java.out
(OK, several versions later)
CyclicBarrier4.java
CyclicBarrier4_java.out

Simultaneous equation solution with multiple processors in
a shared memory configuration is accomplished with:

psimeq.java
test_psimeq.java test driver
test_psimeq_java.out output
psimeq_dbg.java with lots of debug print
test_simeq_dbg.java with debug
test_psimeq_dbg_java.out output with debug
A better version making better use of threads and cyclic barrier:

And, test results for "diff" the non threaded version
test_simeq_java.out output

Some crude timing tests:
time_simeq.java test driver
time_pimeq_java.out output
time_psimeq.java test driver
time_psimeq_java.out output yuk!

Simultaneous equation solution with multiple processors in
a shared memory configuration is accomplished with:

psimeq_dbg.adb with lots of debug print

Then using Barriers

Python Simultaneous Equations

The basic structure of threads needed for my code:
barrier2.py
barrier2_py.out

Multiprocessor Benchmarks

"C" pthreads are demonstrated by an example that measures the
efficiency of two cores, four cores or eight cores.
time_mp2.c
time_mp2.out

The ratio of Wall time to CPU time indicates degree of parallelism.

time_mp4.c 4 core shared memory
time_mp4.out

time_mp8.c 8 core shared memory
time_mp8_c.out

My AMD 12-core desktop computer July 2010
time_mp12.c 12 core shared memory
time_mp12_c.out

time_mp4.java 4 core shared memory
time_mp4_java.out

time_mp8.java 8 core shared memory
time_mp8_java.out



### Lecture 4 Least Square Fit


Given numeric data points, find an equation that approximates the data
with a least square fit. This is one of many techniques for getting
an analytical approximation to numeric data.

The problem is stated as follows :
Given measured data for values of Y based on values of X1,X2 and X3. e.g.

Y_actual         X1      X2     X3    observation, i
--------       -----   -----  -----
32.5           1.0     2.5    3.7                 1
7.2           2.0     2.5    3.6                 2
6.9           3.0     2.7    3.5                 3
22.4           2.2     2.1    3.1                 4
10.4           1.5     2.0    2.6                 5
11.3           1.6     2.0    3.1                 6

Find a, b and c such that   Y_approximate =  a * X1 + b * X2 + c * X3
and such that the sum of (Y_actual - Y_approximate) squared is minimized.

The method for determining the coefficients a, b and c follows directly
form the problem definition and mathematical analysis given below.

Set up and solve the system of linear equations:
(Each SUM is for i=1 thru 6 per table above)

| SUM(X1*X1)  SUM(X1*X2)  SUM(X1*X3) |   | a |   | SUM(X1*Y) |
| SUM(X2*X1)  SUM(X2*X2)  SUM(X2*X3) | x | b | = | SUM(X2*Y) |
| SUM(X3*X1)  SUM(X3*X2)  SUM(X3*X3) |   | c |   | SUM(X3*Y) |

Easy to program, not good data:
lsfit_lect.c
lsfit_lect.out
lsfit_lect.java
lsfit_lect_java.out
lsfit_lect.py
lsfit_lect_py.out

Now, suppose you wanted a constant term to make the fit:
Y_approximate =  Y0 + a * X1 + b * X2 + c * X3
Then the linear equations would be:

| SUM( 1* 1)  SUM( 1*X1)  SUM( 1*X2)  SUM( 1*X3) |   | Y0 |   | SUM( 1*Y) |
| SUM(X1* 1)  SUM(X1*X1)  SUM(X1*X2)  SUM(X1*X3) |   | a  |   | SUM(X1*Y) |
| SUM(X2* 1)  SUM(X2*X1)  SUM(X2*X2)  SUM(X2*X3) | x | b  | = | SUM(X2*Y) |
| SUM(X3* 1)  SUM(X3*X1)  SUM(X3*X2)  SUM(X3*X3) |   | c  |   | SUM(X3*Y) |

Note the symmetry! Easy to program.

Note the simultaneous equations, from Lecture 3:  |A| x |X| = |Y|
|A| and |Y| easily computable, solve for |X| to get Y0, a, b and c

We now have a simple equation to compute Y approximately from a reasonable
range of X1, X2, and X3.

Y is called the dependent variable and X1 .. Xn the independent variables.
The procedures below implement a few special cases and the general case.
The number of independent variables can vary, e.g. 2D, 3D, etc. .
The approximation equation may use powers of the independent variables
The user may create additional independent variables e.g. X2 = SIN(X1)
with the restriction that the independent variables are linearly
independent.  e.g.  Xi not equal  p Xj + q  for all i,j,p,q

Mathematical derivation

The mathematical derivation of the least square fit is as follows :

Given data for the independent variable Y in terms of the dependent
variables S,T,U and V  consider that there exists a function F
such that     Y = F(S,T,U,V)
The problem is to find coefficients a,b,c and d such that
Y_approximate = a * S + b * T + c * U + d * V
and such that the sum of ( Y - Y_approximate ) squared is minimized.

Note: a, b, c, d are scalars. S, T, U, V, Y, Y_approximate are vectors.

To find the minimum of  SUM((Y - Y_approximate)^2)
the derivatives must be taken with respect to a,b,c and d and
all must equal zero simultaneously. The steps follow :

SUM((Y - Y_approximate)^2) = SUM((Y - a*S - b*T - c*U - d*V)^2)

d/da =  -2 * S * SUM( Y - a*S - b*T - c*U - d*V )
d/db =  -2 * T * SUM( Y - a*S - b*T - c*U - d*V )
d/dc =  -2 * U * SUM( Y - a*S - b*T - c*U - d*V )
d/dd =  -2 * V * SUM( Y - a*S - b*T - c*U - d*V )

Setting each of the above equal to zero (derivative minimum at zero)
and putting constant term on left, the -2 is factored out,
the independent variable is moved inside the summation

SUM( a*S*S + b*S*T + c*S*U + d*S*V = S*Y )
SUM( a*T*S + b*T*T + c*T*U + d*T*V = T*Y )
SUM( a*U*S + b*U*T + c*U*U + d*U*V = U*Y )
SUM( a*V*S + b*V*T + c*V*U + d*V*V = V*Y )

Distributing the SUM inside yields

a * SUM(S*S) + b * SUM(S*T) + c * SUM(S*U) + d * SUM(S*V) = SUM(S*Y)
a * SUM(T*S) + b * SUM(T*T) + c * SUM(T*U) + d * SUM(T*V) = SUM(T*Y)
a * SUM(U*S) + b * SUM(U*T) + c * SUM(U*U) + d * SUM(U*V) = SUM(U*Y)
a * SUM(V*S) + b * SUM(V*T) + c * SUM(V*U) + d * SUM(V*V) = SUM(V*Y)

To find the coefficients a,b,c and d solve the linear system of equations

| SUM(S*S)  SUM(S*T)  SUM(S*U)  SUM(S*V) |   | a |   | SUM(S*Y) |
| SUM(T*S)  SUM(T*T)  SUM(T*U)  SUM(T*V) | x | b | = | SUM(T*Y) |
| SUM(U*S)  SUM(U*T)  SUM(U*U)  SUM(U*V) |   | c |   | SUM(U*Y) |
| SUM(V*S)  SUM(V*T)  SUM(V*U)  SUM(V*V) |   | d |   | SUM(V*Y) |

Some observations :
S,T,U and V must be linearly independent.
There must be more data sets (Y, S, T, U, V) than variables.
The analysis did not depend on the number of independent variables
A polynomial fit results from the substitutions S=1, T=x, U=x^2, V=x^3
The general case for any order polynomial of any number of variables
may be used with a substitution, for example, S=1, T=x, U=y, V=x^2,
W=x*y, X= y^2,  etc to terms such as exp(x), log(x), sin(x), cos(x).
Any number of terms may be used. The "1" is for the constant term.

lsfit_log.c fitting log and other terms
lsfit_log_c.out

fitting a simple polynomial, 1D

Now, suppose you wanted to fit a simple polynomial:
Given the value of Y for at least four values of X,
Y_approximate =  C0 + C1 * X + C2 * X^2 +  C3 * X^3
Then the linear equations would be:

| SUM(1  *1) SUM(1  *X) SUM(1  *X^2) SUM(1  *X^3) | | C0 |   | SUM(1  *Y) |
| SUM(X  *1) SUM(X  *X) SUM(X  *X^2) SUM(X  *X^3) | | C1 |   | SUM(X  *Y) |
| SUM(X^2*1) SUM(X^2*X) SUM(X^2*X^2) SUM(X^2*X^3) |x| C2 | = | SUM(X^2*Y) |
| SUM(X^3*1) SUM(X^3*X) SUM(X^3*X^2) SUM(X^3*X^3) | | C3 |   | SUM(X^3*Y) |

Note that the (i,j) subscript in the matrix has X^(i+j)
for i=0..3, j=0..3

Solve the simultaneous equations for C0, C1, C2, C3
Note that the sum is taken over all observations and the "1" is
just shown to emphasize the symmetry.

Sample code in various languages:

least_square_fit.c
least_square_fit_c.out

least_square_fit.f90
least_square_fit_f90.out

least_square_fit.java
least_square_fit_java.out
least_square_fit_3d.java
least_square_fit_3d_java.out
least_square_fit_4d.java
least_square_fit_4d_java.out
A specialized version for use later with PDE's
lsfit.java
test_lsfit.java
test_lsfit_java.out
test_lsfit2.java
test_lsfit2_java.out
test_lsfit3.java
test_lsfit3_java.out
test_lsfit4.java
test_lsfit4_java.out
test_lsfit5.java
test_lsfit5_java.out
test_lsfit6.java
test_lsfit6_java.out
test_lsfit7.java
test_lsfit7_java.out
uses simeq.java

The Makefile entry that makes  test_lsfit_java.out
test_lsfit_java.out: test_lsfit.java lsfit.java simeq.java
javac -cp . simeq.java
javac -cp . lsfit.java
javac -cp . test_lsfit.java
java  -cp . test_lsfit > test_lsfit_java.out
rm -f *.class

A specialized version for use later with PDE's

The Makefile entry to make  test_lsfit6_ada.out
rm -f test_lsfit
rm -f *.ali
rm -f *.o

Similar code in plain C (up to 7th power in up to 6 or 4 dimensions)
lsfit.h
lsfit.c
test_lsfit.c
test_lsfit_c.out
test_lsfit7.c
test_lsfit7_c.out
test_write_lsfit7.c
test_write_lsfit7_c.out
test_write_lsfit71.c generated
test_write_lsfit72.c generated

truncated series vs lsfit for sin(x+y) and sin(x+y+z)
using code from above, another test case:
fit_sin.adb demonstrates that a lsfit to a specified power,
does not give the same coefficients as a truncated approximation,
to the same power, and is a more accurate fit.

comparison to very small Matlab code

lsfit.m MatLab source code (tiny!)
lsfit_m.out MatLab output and plot

comparison to small Python code

test_polyfit.py Python source code
test_polyfit_py.out Python output and plot

Terms for fitting two and three variables, 2D and 3D
And to see how polynomials in two and three variables may be fit:

Note that the terms for two and three variables in a polynomial are:
a^0 b^0          a^0 b^0 c^0   constant, "1"

a^1 b^0          a^1 b^0 c^0   just a
a^0 b^1          a^0 b^1 c^0   just b
a^0 b^0 c^1

a^2 b^0          a^2 b^0 c^0   just a^2
a^1 b^1          a^1 b^1 c^0   a*b
a^0 b^2          a^1 b^0 c^1
a^0 b^2 c^0
a^0 b^1 c^1
a^0 b^0 c^2

Then terms with a sum of the powers equal to 3, 4, 5, 6 are available

Note that the matrix has the first row as the sum of each term
multiplied by the first term. The second row is the sum of each
term multiplied by the second term, etc.

The data for the terms is from the raw data sets of
y_actual a b    or  y_actual a b c

being used to determine a fit
y_approx=F(a,b) or  y_approx=F(a,b,c)

Terms for the data point y1  a1  b1 are:
1  a1  b1  a1^2  a1*b1  b1^2  the constant term  y1

These terms are multiplied by the first term, "1" and added to row 1.
These terms are multiplied by the second term, "a1" and added to row 2, etc.

Then the terms for data point  y2  a2  b2 are:
1  a2  b2  a2^2  a2*b2  b2^2   the constant term  y2

These terms are multiplied by the first term, "1" and added to row 1.
These terms are multiplied by the second term, "a2" and added to row 2, etc.

Then the simultaneous equations are solved for the coefficients C1, C2, ...
to get the approximating function

y_approx = F(a,b) = C1 + C2*a + C3*b + C4*a^2 + C5*a*b + C6*b^2

The following sample programs compute the least square fit of
internally generated data for low polynomial powers and compute
the accuracy of the fit in terms of root-mean-square error,
average error and maximum error. Note that the fit becomes exact
when the data is from a low order polynomial and the fit uses at
least that order polynomial.

least_square_fit_2d.c
least_square_fit_2d.out

least_square_fit_3d.c
least_square_fit_3d.out

least_square_fit_4d.c
least_square_fit_4d.out

You can translate the above to your favorite language.

Now, if everything works, a live interactive demonstration of
least square fit.

The files needed are:
Matrix.java
LeastSquareFit.java
LeastSquareFitFrame.java
LeastSquareFit.out
LeastSquareFitHelp.txt
LeastSquareFitEvaluate.txt
LeastSquareFitAlgorithm.txt
LeastSquareFitIntegrate.txt

The default parameter is an n, all, point fit.
Then set parameter to 3, for n=3, third order polynomial fit.
Then set parameter to 4 and 5 to see improvement.

Homework 2

Spline fit, possible demonstration
There are many other ways to fit a set of points.
The Spline fit smooths the fit by controlling derivatives.

SplineFrame.java
(Replace "LeastSquareFit" with "Spline" then get same files as above.)
Demonstration to see a different type of fit.



# Lecture 5, polynomials


A polynomial of degree n has the largest exponent value n.
There are n+1 terms and n+1 coefficients.
A normalized polynomial has the coefficient of the
largest exponent equal to 1.0.

An nth order polynomial with coefficients c_0 through c_n.

y = c_0 + c_1 x + c_2 x^2 +...+ c_n-1 x^n-1 + c_n x^n

Horners method for evaluating a polynomial

"y" is computed numerically using Horners method that
needs n multiplications and n additions.

Starting at the highest power, set y = c_n * x
Then combine the next coefficient  y = (c_n-1 + y) * x
Continue for c_n-2 until c_1       y = (c_1 + y) * x
Then finish                        y = c_0 + y

The code in C, Fortran 90, Java and Ada 95 is:

/* peval.c Horners method for evaluating a polynomial */
double peval(int n, double x, double c[])
{ /* an nth order polynomial has n+1 coefficients */
/* stored in c[0] through c[n]                  */
/* y = c[0] + c[1]*x + c[2]*x^2 +...+ c[n]*x^n  */
int i;
double y = c[n]*x;
if(n<=0) return c[0];
for(i=n-1; i>0; i--) y = (c[i]+y)*x;
return y+c[0];
} /* end peval */

! peval.f90 Horners method for evaluating a polynomial
function peval(n, x, c) result (y)
! an nth order polynomial has n+1 coefficients
! stored in c(0) through c(n)
! y = c(0) + c(1)*x + c(2)*x^2 +...+ c(n)*x^n
implicit none
integer, intent(in) :: n
double precision, intent(in) :: x
double precision, dimension(0:n), intent(in) :: c
double precision :: y
integer i

if(n<=0) y=c(0); return
y = c(n)*x
do i=n-1, 1, -1
y = (c(i)+y)*x
end do
y = y+c(0)
end function peval

// peval.java Horners method for evaluating a polynomial
double peval(int n, double x, double c[])
{ // an nth order polynomial has n+1 coefficients
// stored in c[0] through c[n]
// y = c[0] + c[1]*x + c[2]*x^2 +...+ c[n]*x^n

double y = c[n]*x;
if(n<=0) return c[0];
for(int i=n-1; i>0; i--) y = (c[i]+y)*x;
return y+c[0];
} // end peval

-- peval.adb Horners method for evaluating a polynomial
function peval(n : integer; x : long_float; c : vector) return long_float is
-- an nth order polynomial has n+1 coefficients
-- stored in c(0) through c(n)
-- y := c(0) + c(1)*x + c(2)*x^2 +...+ c(n)*x^n
y : long_float;
begin
if n<=0 then return c(0); end if;
y := c(n)*x;
for i in reverse 1..n-1 loop  -- do i:=n-1, 1, -1
y := (c(i)+y)*x;
end loop;
y := y+c(0);
return y;
end peval;

MatLab
y = polyval(c, x)

Python
y = polyval(c, x)

Testing high degree polynomials

Although Horners method is fast and accurate on most polynomials,
the following test programs in C, Fortran 90, Java and Ada95
show that evaluating polynomials of order 9 and 10 can have
significant absolute error.

The test programs generate a set of roots r_0, r_1, ...
and computer the coefficients of a set of polynomials

y = (x-r_0)*(x-r_1)*...*(x-r_n-1)  becomes

y = c_0 + c_1*x + c_2*x^2 +...+ c_n-1*x^n-1 + 1.0*x^n

when x is any of the roots, y should be zero.
Study one of the  .out  files and see how the absolute
error increases with polynomial degree and increases with
the magnitude of the root.
(The source code was run on various types of computers
and various operating systems. Results vary.)

peval.c
peval_c.out

peval.f90
peval_f90.out

peval.java
peval_java.out

Having the coefficients of a polynomial allows creating
the derivative and integral of the polynomial.

Derivative of Polynomial
Given:
y = c_0 + c_1 x + c_2 x^2 +...+ c_n-1 x^n-1 + c_n x^n

dy/dx = c_1 + 2 c_2 x + 3 c_3 x^2 +...+ n-1 c_n-1 x^n-2 + n c_n x^n-1

The coefficients of dy/dx may be called cd and computed:

for(i=0; i<n; i++) cd[i] = (double)(i+1)*c[i+1];
nd = n-1;

The derivative may be computed an any point, example x=a, using

dy_dx_at_a = peval(nd, a, cd);

Integral of Polynomial
Similarly, the integral y(x) dx has coefficients

Given:
y = c_0 + c_1 x + c_2 x^2 +...+ c_n-1 x^n-1 + c_n x^n

int_y_dx = 0 + c_0 x + c_1/2 x^2 +...+ c_n-1/n x^n + c_n/(n+1) x^n+1

The coefficients of int_y_dy may be called ci and computed:

ci[0] = 0.0; /* a reasonable choice */
ni = n+1;
for(i=1; i<=ni; i++) ci[i] = c[i-1]/(double)i;

The integral of the original polynomial y from a to b is computed:

int_y_a_b = peval(ni, b, ci) - peval(ni, a, ci);

Polynomial Arithmetic
The sum, difference, product and quotient of two polynomials p and q are:
(unused locations filled with zeros)

nsum = max(np,nq);
for(i=0; i<=nsum; i++) sum[i] = p[i] + q[i];

ndifference = max(np,nq);
for(i=0; i<=ndifference; i++) difference[i] = p[i] - q[i];

nproduct = np+nq;
for(i=0; i<=nproduct; i++) product[i] = 0.0;
for(i=0; i<=np; i++)
for(j=0; j<=nq; j++) product[i+j] += p[i]*q[j];

/* np > nd */
nquotient = np-nd;
nremainder = np
k = np;
for(j=0; j<=np; j++) r[j] = p[j]; /* initial remainder */
for(i=nquotient; i>=0; i--)
{
quotient[i] = r[k]/d[nd]
for(j=nd; j>=0; j--) r[k-nd+j] = r[k-nd+j] - quotient[i]*d[j]
k--;
}

Polynomial series

Taylor series, for any differentiable function, f(x)

(x-a) f'(a)   (x-a)^2 f''(a)   (x-a)^3 f'''(a)
f(x) = f(a) + ----------- + -------------- + --------------- + ...
1!            2!               3!

MacLaurin series, a=0

x f'(0)   x^2 f''(0)   x^3 f'''(0)
f(x) = f(0) + ------- + ---------- + ----------- + ...
1!        2!           3!

Taylor series, offset

h f'(x)   h^2 f''(x)   h^3 f'''(x)
f(x+h) = f(x) + ------- + ---------- + ----------- + ...
1!        2!           3!

Example f(x) = e^x, thus f'(x) = e^x  f''(x) = e^x  f'''(x) = e^x
f(0) = 1         f'(0) = 1    f''(0) = 1    f'''(0) = 1
substituting in the MacLaurin series

x    x^2   x^3   x^4
f(x) = 1 + -- + --- + --- + --- + ...
1!   2!    3!    4!

Example f(x) = sin(x), f'(x) = cos(x) f''(x) = -sin(x) f'''(x) = -cos(x)
f(0) = 0       f'(0) = 1      f''(0) = 0       f'''(0) = -1
substituting in the MacLaurin series

x    x^3   x^5   x^7
f(x) = -- - --- + --- - --- + ...
1!   3!    5!    7!

Example f(x) = cos(x), f'(x) = -sin(x) f''(x) = -cos(x) f'''(x) = sin(x)
f(0) = 1       f'(0) = 0       f''(0) = -1      f'''(0) = 0
substituting in the MacLaurin series

x^2   x^4   x^6
f(x) = 1 - --- + --- - --- + ...
2!    4!    6!

Many MacLaurin series expansions are shown on Mathworld

Taylor and MacLaurin series can be expanded for two and more variables.
Two dimensional Taylor series expansion is here

Orthogonal Polynomials

For use in integration and fitting data by a polynomial,
orthogonal polynomials are used. The definition of orthogonal
is based on having a set of polynomials from degree 0 to
degree n that are denoted by p_0(x), p_1(x),...,p_n-1(x), p_n(n).
A weighting function is allowed, w(x), that depends on x but
not on the degree of the polynomial. The set of polynomials
is defined as orthogonal over the interval x=a to x=b when
the following two conditions are satisfied:
/ 0 if i != j
integral from x=a to x=b of w(x) p_i(x) p_j(x) dx = |
\ not 0 if i=j

sum from i=1 to i=n c_i *p_i(x) = 0 for all x only when all c_i are zero.

Legendre Polynomials, Chebyshev Polynomials, Laguerre Polynomials and
Lagrange Polynomials are covered below.

Legendre Polynomials P_n(x) are defined as

P_0(x) = 1
P_1(x) = x
P_2(x) = 1/2 (3 x^2 - 1)
P_3(x) = 1/2 (5 x^3 - 3 x)
P_4(x) = 1/8 (35 x^4 - 30 x^2 + 3)

the general recursion formula for P_n(x) is

P_n(x) = 2n-1/n x P_n-1(x) - n-1/n P_n-2(x)

The weight function is  w(x) = 1
The interval is  a = -1  to  b = 1

Explicit expression

P_n(x) = 1/2^n sum m=0 to n/2
(-1)^m (2n-2m)! / [(n-m)! m! (n-2m)!]  x^(n-2m)

P_n(x) = (-1)^n/(2^n n!) d^n/dx^n ((1-x^2)^n)

Legendre polynomials are best known for use in Gaussian
integration.

integrate f(x) for x= -1 to 1

For n point integration, determine w_i and x_i i=1,n
where x_i is the i th root of P_n(x) and
w_i = (x_i)

integral x=-1 to 1 of f(x) = sum i=1 to n of w_i f(x_i)

Change interval from t in [a, b] to x in [-1, 1]
x = (b+a+(b-a)t)/2

see example test program gauleg.c
and results gauleg_c.out

Chebyshev polynomials T_n(x) are defined as

T_0(x) = 1
T_1(x) = x
T_2(x) = 2 x^2 - 1
T_3(x) = 4 x^3 - 3 x
T_4(x) = 8 x^4 - 8 x^2 + 1

the general recursion formula for T_n(x) is

T_n(x) = 2 x T_n-1(x) - T_n-2(x)

The weight function, w(x) = 1/sqrt(1-x^2)
The interval is a = -1 to b = 1

Explicit expression

T_n(x) = n/2 sum m=0 to n/2 (-1)^m (n-m-1)!/(m! (n-2m)!) (2x)^(n-2m)

T_n(x) = cos(n cos^(-1) x)

Chebyshev polynomials are best known for approximating a function
while minimizing the maximum error, covered in a latter lecture.

Fit f(x) with approximate function F(x)

c_i = 2/pi integral x=-1 to 1 f(x) T_i(x) /sqrt(1-x^2) dx
(difficult near x=-1  and  x=1)

F(x) = c_0 /2 + sum i=1 to n  c_i T_i(x)

Change interval from t in [a, b] to x in [-1, 1]
x = (b+a+(b-a)t)/2

see example test program chebyshev.c
and results chebyshev_c.out

Laguerre polynomials  L_n(x) are defined as

L_0(x) = 1
L_1(x) = -x + 1
L_2(x) = x^2 - 4 x + 2
L_3(x) = -x^3 +9 x^2 - 18 x + 6

the general recursion formula for L_n(x) is

L_n(x) = (2n-x-1) L_n-1(x) - (n-1)^2 L_n-2(x)

The weight function, w(x) = e^(-x)
The interval is a = 0 to b = infinity

Explicit expression

L_n(x) = sum m=0 to n  (-1)^m n!/(m! m! (n-m)!)  x^m

L_n(x) = 1/(n! e^-x) d^n/dx^n (x^n e^(-x))

Laguerre polynomials are best known for use in integrating
functions where the upper limit is infinity.

Lagrange Polynomials
These will be used later in Galerkin Finite Element Method
for solving partial differential equations.

Lagrange polynomials  L_n(x) are defined as

L_1,0(x) = 1 - x
L_1,1(x) =     x

L_2,0(x) = 1 -3 x +2 x^2
L_2,1(x) =    4 x -4 x^2
L_2,2(x) =    - x +2 x^2

L_3,0(x) = 1 -5.5 x    +9 x^2   -4.5 x^3
L_3,1(x) =      9 x -22.5 x^2  +13.5 x^3
L_3,2(x) =   -4.5 x   +18 x^2  -13.5 x^3
L_3,3(x) =        x  -4.5 x^2   +4.5 x^3

L_4,1(x) = 16 x -69.33 x^2 + 96 x^3 - 42.33 x^4

L_5,2(x) = -25 x +222.92 x^2 -614.58 x^3 +677.08 x^4 -260.42 x^5

For 0 ≤ x ≤ 1 and equal spaced points in between.

The general recursion formula for L_n+1,j(x) is

unavailable

Explicit expression

L_n,k(x) = product i=0 to n, i/=k of (x-x_i)/(x_k-x_i)

Lagrange polynomials are best known for solving differential
equations with equal and unequal grid spacing.
An nth order Lagrange polynomial will exactly fit n data points.

Fit f(x) with approximate function F(x)

F(x) = sum k=0 to n of f(x_k) L_n,k(x)

Change of coordinates for integration

Change interval from t in [a, b] to x in [0, 1]
x = (t-a)/(b-a)

Change interval from t in [a, b] to x in [-1, 1]
x = 2(t-a)/(b-a)-1

Example plots

see example test program lagrange.c
and results lagrange_c.out

A family of Lagrange Polynomials can be constructed to be
1.0 at x_i and zero at every x_j where i ≠ j
Using the notation above, np=11 and each color is a different k.
Each of the 11 points has one color at 1.0 and all other
colors at 0.0 .

Lagrange Phi shape functions

First derivative of Lagrange Phi shape functions



### Lecture 6, Curve Fitting


A function may be given as an analytic expression
such as sqrt(exp(x)-1.0) or may be given as a set
of points (x_i, y_i).

There are occasions when an efficient and convenient computer
implementation is needed. One of the efficient and convenient
implementations is a polynomial.

Thanks to Mr. Taylor and Mr. Maclaurin we can convert any
continuously differentiable function to a polynomial:

Taylor series, given differentiable function, f(x)

(x-a) f'(a)   (x-a)^2 f''(a)   (x-a)^3 f'''(a)
f(x) = f(a) + ----------- + -------------- + --------------- + ...
1!            2!               3!

Maclaurin series, a=0

x f'(0)   x^2 f''(0)   x^3 f'''(0)
f(x) = f(0) + ------- + ---------- + ----------- + ...
1!        2!           3!

Taylor series, offset

h f'(x)   h^2 f''(x)   h^3 f'''(x)
f(x+h) = f(x) + ------- + ---------- + ----------- + ...
1!        2!           3!

Please use analytic differentiation rather than numerical differentiation.
Programs such as Maple have Taylor Series generation as a primitive.

An example Taylor series is: e^x = 1 + x + x^2/2! + x^3/3! + x^4/4!
Using a fixed number of terms, fourth power in this example,
will result in truncation error. The series has been truncated.
It should be obvious, that for large x, the error will become
very large. Also, this type of series will fail or be very
inaccurate if there are discontinuities in the function being fit.

We often estimate "truncation" error as the next order term
that is not used.
Note the relation of "estimated truncation error" to maximum error
and rms error as more terms are used in the approximation.

TaylorFit.java
TaylorFit_java.out

It is interesting to note that: The truncation error is usually
slightly less than the maximum error, thus a reasonable estimate
of the accuracy of the fit.

Unequally spaced points often use least square fit

For functions given as unequally spaced points, use
the least square fit technique in Lecture 4

Fitting discontinuous data, try Fourier Series

For function with discontinuities the Fourier Series or
Fejer Series may produce the required fit.

The Fourier series approximation f(t) to f(x) is defined as:

f(t) = a_0/2 + sum n=1..N a_n cos(n t) + b_n sin(n t)

a_n = 1/Pi integral -Pi to Pi  f(x) cos(n x) dx

b_n = 1/Pi integral -Pi to Pi  f(x) sin(n x) dx

When given an analytic function, f(x) it may be best to use analytic
evaluation of the integrals. When given just points it may be best
to not use Fourier series, use Lagrange fit.

FourierFit.java
FourierFit.out

Smoothing discontinuous data with Fejer Series

The Fejer series approximation f(t) to f(x) is defined as:

f(t) = a_0/2 + sum n=1..N a_n (N-n+1)/N cos(n t) + b_n (N-n+1)/N sin(n t)

a_n = 1/Pi integral -Pi to Pi  f(x) cos(n x) dx

b_n = 1/Pi integral -Pi to Pi  f(x) sin(n x) dx

Basically the Fourier Series with the contribution of the higher
frequencies decreased. This may give a smoother fit.

FejerFit.java
FejerFit.out

Lagrange Fit minimizes error at chosen points

The Lagrange Fit minimizes the error at the chosen points to fit.
The Lagrange Fit is good for fitting data given at uniform spacing.
The Lagrange fit requires the fewest evaluations of the function
to be fit, convenient if the function to be fit requires
significant computation time.

The Lagrange series approximation f(t) to f(x) is defined as:

L_n(x) = sum j=0..N f(x_j) L_n,j(x)

L_n,j(x) = product i=0..N  i /= j  (x - x_i)/(x_j - x_i)

Collect coefficients, a_n,  of L_n(x) to get

f(t) = sum i=0..N  a_n t^n

LagrangeFit.java
LagrangeFit.out

Legendre Fit minimizes RMS error

The Legendre Fit, similar to the Least Square Fit, minimizes
the RMS error of the fit.

The Legendre series approximation f(t) to f(x) is defined as:

f(t) = a_0 g_0 + sum n=1..N a_n g_n P_n(t)  then combining coefficients can be

f(t) = sum n=0..n b_n t^n                   a simple polynomial

a_n = integral -1 to 1  f(x) P_n(x)  dx

g_n = (2 n + 1)/2

P_0(x) = 1
P_1(x) = x
P_n(x) = (2n-1)/n x P_n-1(x) - (n-1)/n P_n-2(x)

Suppose f(x) is defined over the interval a to b, rather than -1 to 1, then

a_n = (b-a)/2 integral -1 to 1  f(a+b+x(b-a)/2) P_n(x)  dx

LegendreFit.java
LegendreFit.out

Chebyshev Fit minimizes maximum error

The Chebyshev Fit minimizes to maximum error of the fit for
a given order polynomial.

The Chebyshev series approximation f(t) to f(x) is defined as:

f(t) = a_0/2 + sum n=1..N a_n T_n(t)  then combining coefficients can be

f(t) = sum n=0..n b_n t^n             a simple polynomial

a_n = 2/Pi integral -1 to 1  f(x) T_n(x)/sqrt(1-x^2)  dx

T_0(x) = 1
T_1(x) = x
T_n+1(x) = 2 x T_n(x) - T_n-1(x)

for -1 < x < 1  T_n(x) = cos(n acos(x))

When given an analytic function it may be best to use analytic
evaluation of the integrals. When given just points it may be best
to not use Chebyshev fit, use Lagrange fit. When given a
computer implementation of the function, f(x), to be fit,
use a very good adaptive integration.

ChebyshevFit.java
ChebyshevFit.out

Source code and text output for the various fits:

LagrangeFit.java
LagrangeFit.out

LegendreFit.java
LegendreFit.out

FourierFit.java
FourierFit.out

FejerFit.java
FejerFit.out

ChebyshevFit.java
ChebyshevFit.out

You may convert any of these that you need to a language

Interactive Demonstration
Examples of interactive fitting of points may run:
java -cp . LeastSquareFitFrame
java -cp . LagrangeFitFrame
java -cp . SplineFrame

Lagrange.java
TestLagrange.java
TestLagrange.out
LagrangeFitFrame.java
LagrangeHelp.txt
LagrangeAlgorithm.txt
LagrangeIntegrate.txt
LagrangeEvaluate.txt

LeastSquareFit.java
LeastSquareFitFrame.java
LeastSquareFitHelp.txt
LeastSquareFitAlgorithm.txt
LeastSquareFitIntegrate.txt
LeastSquareFitEvaluate.txt

Spline.java
SplineFrame.java
SplineHelp.txt
SplineAlgorithm.txt
SplineIntegrate.txt
SplineEvaluate.txt



### Lecture 7, Integration


Numerical Integration is usually called Numerical Quadrature.

Numerical integration could be simple summation, but
in order to get reasonable step size and good accuracy
we need better techniques.

integral f(x) dx from x=a to x=b with step size h=(b-a)/n
using simple summation would be computed:

area = (b-a)/n * ( sum i=0..n-1  f(a+i*h) )  not using f(b)
or
area = (b-a)/(n+1) * ( sum i=0..n  f(a+i*h) )

"h" has become the approximation to "dx" using "n" steps.
A larger value of "n" gives a smaller value of "h" and
up to where roundoff error grows, a better approximation.
Shown is n=9 indicated by red dots.

Trapezoidal method
The Trapezoidal rule approximates the area between x and x+h
as  h * (f(x)+f(x+h))/2 , base times average height.
Summing the areas we note that f(a) and f(b) are used once
while the other intermediate f(x) are used twice, thus:

area = (b-a)/n * ( (f(a)+f(b))/2 + sum i=1..n-1  f(a+i*h) )

note: i=0 is f(a)  i=n is f(b) thus sum index range 1..n-1
h = (b-a)/n
cutting h in half generally cuts the error in half for large n
Shown is n=8 regions based on 7 sums plus (f(a)+f(b))/2

Python Trapezoidal Integration
output is:
test_passing_function.py running
area under x*x from  1.0  to  2.0  =  2.335

Better accuracy
In order to get better accuracy with fewer evaluations of
the function, f(x), we have found a way to choose the values
of x and to apply a weight, w(x) to each evaluation of f(x).
The integral is evaluated using:

area = sum i=1..n  w(i)*f(x(i))

The w(i) and x(i) are computed using the Legendre polynomials
covered in the previous lecture. The numerical analysis shows
that using n evaluations we obtain accuracy about equal to
fitting the f(x(i)) with an nth order polynomial and accurately
computing the integral using that nth order polynomial.

Some values of weights w(i) and ordinates x(i)  -1 < x < 1 are:

x[1]=  0.0000000000000E+00, w[1]=  2.0000000000000E+00

x[1]= -5.7735026918963E-01, w[1]=  1.0000000000000E-00
x[2]=  5.7735026918963E-01, w[2]=  1.0000000000000E-00

x[1]= -7.7459666924148E-01, w[1]=  5.5555555555555E-01
x[2]=  0.0000000000000E+00, w[2]=  8.8888888888889E-01
x[3]=  7.7459666924148E-01, w[3]=  5.5555555555555E-01

x[1]= -8.6113631159405E-01, w[1]=  3.4785484513745E-01
x[2]= -3.3998104358486E-01, w[2]=  6.5214515486255E-01
x[3]=  3.3998104358486E-01, w[3]=  6.5214515486255E-01
x[4]=  8.6113631159405E-01, w[4]=  3.4785484513745E-01

x[1]= -9.0617984593866E-01, w[1]=  2.3692688505618E-01
x[2]= -5.3846931010568E-01, w[2]=  4.7862867049937E-01
x[3]=  0.0000000000000E+00, w[3]=  5.6888888888889E-01
x[4]=  5.3846931010568E-01, w[4]=  4.7862867049937E-01
x[5]=  9.0617984593866E-01, w[5]=  2.3692688505618E-01

x[1]= -9.3246951420315E-01, w[1]=  1.7132449237916E-01
x[2]= -6.6120938646626E-01, w[2]=  3.6076157304814E-01
x[3]= -2.3861918608320E-01, w[3]=  4.6791393457269E-01
x[4]=  2.3861918608320E-01, w[4]=  4.6791393457269E-01
x[5]=  6.6120938646626E-01, w[5]=  3.6076157304814E-01
x[6]=  9.3246951420315E-01, w[6]=  1.7132449237916E-01

For a range from a to b, new_x[i] = a+(x[i]+1)*(b-a)/2

and  gauleg.java

Using the function  gaulegf  a typical integration could be:

double x[9], w[9]     for 8 points
a = 0.5;  integrate sin(x) from a to b
b = 1.0;
n = 8;
gaulegf(a, b, x, w, n);   x's adjusted for a and b
area = 0.0;
for(j=1; j<=n; j++)
area = area + w[j]*sin(x[j]);

The following programs, gauleg for Gauss Legendre integration,
computes the x(i) and w(i) for various values of n. The
integration is tested on a constant f(x)=1.0 and then on
integral sin(x) from x=0.5 to x=1.0
integral exp(x) from x=0.5 to x=5.0
integral ((x^x)^x)*(x*(log(x)+1)+x*log(x)) from x=0.5 to x=5.0
This is labeled "mess" in output files.

Note how the accuracy increases with increasing values of n,
then, no further accuracy is accomplished with larger n.

Also note, the n where best numerical accuracy is achieved
is a far smaller n than where roundoff error would be
significant.

Choose your favorite language and study the .out file
then look over the source code.

Just testing gaulegf:

gauleg.py  function is gaulegf
test_gauleg.py
test_gauleg_py.out

gaulegf.java
test_gauleg.java
test_gauleg_java.out

gaulegf.h
gaulegf.c
test_gaulegf.c
test_gaulegf_c.out

Look near end of  .out  file, note convergence to exact value at end.
integral sin(x) from x=0.5 to x=1.0  for order 2 through 10
integral exp(x) from x=0.5 to x=5.0  for order 2 through 10
integral ((x^x)^x)*(x*(log(x)+1)+x*log(x)) from x=0.5 to x=5.0
This is labeled "mess" in output files. order 2 through 30

gauleg.c
gauleg_c.out

gauleg.f90
gauleg_f90.out

gauleg.for
gauleg_for.out

gauleg.java
gauleg_java.out

gaulegf.m
gauleg.m
gauleg_m.out

On Linux, you may use  octave  rather than  MatLab.

Higher dimension, more independent variables
Multidimensional integration extends easily by using

gaulegf(ax, bx, xx, wx, nx);   x's adjusted for ax and bx
gaulegf(ay, by, yy, wy, ny);   y's adjusted for ay and by
volume = 0.0;
for(i=1; i<=nx; i++)
for(j=1; j<=ny; j++)
volume = volume + wx[i]*wy[j]*f(xx[i],yy[j]);

gaulegf(ax, bx, xx, wx, nx);   x's adjusted for ax and bx
gaulegf(ay, by, yy, wy, ny);   y's adjusted for ay and by
gaulegf(az, bz, yz, wz, nz);   z's adjusted for az and bz
volume4d = 0.0;
for(i=1; i<=nx; i++)
for(j=1; j<=ny; j++)
for(k=1; k<=nz; k++)
volume4d = volume4d + wx[i]*wy[j]*wz[k]*f(xx[i],yy[j],zz[k]);

Volume code as shown in:

int2d.c
int2d_c.out

"Handbook of Mathematical Functions" by Abramowitz and Stegun
A classic reference book on numerical integration and differentiation.

"Numerical Recipes in Fortran" by Press, Teukolsky, Vetterling and Flannery
There is also a "Numerical recipes in C" yet in my copy, there have
been a number of errors in the C code due to poor translation from Fortran.
Many very good numerical codes for general purpose and special purposes.

Integration of functions over triangles
An impressive example of quadrature over triangles is shown
by using Gauss Legendre integration coordinates rather than
splitting a triangle into smaller congruent triangles.
The routine "tri_split" converts a single triangle into four
congruent triangles that exactly cover the initial triangle.
Linear quadrature then uses just the center point of the
split, four equal area, triangles.
test_triquad.out results, see last two sets

Reducing the step size improves integration accuracy.
In two or more dimensions, reducing area, volume, etc.
improves integration accuracy. Two sets of triangle
splitting are shown below. Using higher order integration
provides more integration accuracy than smaller step size,
smaller area, volume, etc.

From tri_split.java
and  test_tri_split.java

These will appear later in the solution of
partial differential equations using the Finite Element Method.

determine if a point is inside a polygon
point_in_poly.c
point_in_poly_c.out
point_in_poly.java
point_in_poly_java.out

Digitizing curves into computer
Scan the curve you want to digitize, sample curves listed below.
Compile Digitize.java
Run  java Digitize your.jpg > your.out
Edit your.out to suit your needs. Output is scaled to your first 3 points.
(0,0) (1,0) (0,1)
pi.gif
curve.jpg
c6thrust.jpg
chess2.png
Digitize.out some output from curve.jpg

HW3 is assigned


### Lecture 8, Integration

When all else fails, there is adaptive numerical quadrature.
This is cleaver because the integration adjusts to the function
being integrated. And, yes, it can have problems.

The basic principle is to divide up the integration into
intervals. Use two methods of integration in each interval.
If the two methods do not agree in some interval, divide that
interval in half, and repeat. The total integral is the sum
of the integrals of all the intervals.

Consider integrating the function y = 1/x from 0.01 to 2.0,
with a high point at y=100 with a very steep slope and
a low relatively flat area from y=1.0 to y=0.5.

Note that the analytic solution is ln(2.0)-ln(0.01).
Do not try integration from 0 to 2, the integral is infinite.

The integral from 0.1   to 1    of 1/x = 2.302585093
The integral from 0.01  to 0.1  of 1/x = 2.302585093
The integral from 0.001 to 0.01 of 1/x = 2.302585093
Of course, ln(1)-ln(1/10) = 0 + ln(10) = 2.302585093
Thus, as we integrate closer and closer to zero, each factor
of 10 only adds 2.302585093 to the integral.

/*         for a desired accuracy on irregular functions.   */
/* define the function to be integrated as:                 */
/*  double f(double x)                                      */
/*  {                                                       */
/*    // compute value                                      */
/*    return value;                                         */
/*  }                                                       */
/*                                                          */
/* integrate from xmin to xmax                              */
/* approximate desired absolute error in all intervals, eps */
/* accuracy absolutely not guaranteed                       */

double aquad(double f(double x), double xmin, double xmax, double eps)
{
double area, temp, part, h;
double err;
int nmax = 2000;
double sbin[2000]; /* start of bin */
double ebin[2000]; /* end of bin */
double abin[2000]; /* area of bin , sum of these is area */
int    fbin[2000]; /* flag, 1 for this bin finished */
int i, j, k, n, nn, done;
int kmax = 20;     /* maximum number of times to divide a bin */

n=32; /* initial number of bins */
h = (xmax-xmin)/(double)n;
for(i=0; i<n; i++)
{
sbin[i] = xmin+i*h;
ebin[i] = xmin+(i+1)*h;
fbin[i] = 0;
}
k = 0;
done = 0;
nn = n; /* next available bin */
while(!done)
{
done = 1;
k++;
if(k>=kmax) break; /* quit if more than kmax subdivisions */
area = 0.0;
for(i=0; i<n; i++)
{
if(fbin[i]==1) /* this interval finished */
{
area = area + abin[i]; /* accumulate total area each pass */
continue;
}
temp = f((sbin[i]+ebin[i])/2.0); /* two integration methods */
part = f((3.0*sbin[i]+ebin[i])/4.0);
part = part + f((sbin[i]+3.0*ebin[i])/4.0);
abin[i] = (part+2.0*temp)*(ebin[i]-sbin[i])/4.0;
area = area + abin[i];
err = ((temp-part/2.0)<0.0?(-(temp-part/2.0)):(temp-part/2.0));
if(err*1.414 < eps) /* heuristic  */
{
fbin[i] = 1; /* this interval finished */
}
else
{
done = 0; /* must keep dividing */
if(nn>=nmax) /* out of space, quit */
{
done = 1;
for(j=i+1; j<n; j++) area = area + abin[j];
break; /* result not correct */
}
else /* divide interval into two halves */
{
sbin[nn] = (sbin[i]+ebin[i])/2.0;
ebin[nn] = ebin[i];
fbin[nn] = 0;
ebin[i] = sbin[nn];
nn++;
}
}
} /* end for i */
n = nn;
} /* end while */
return area;

The results of integrating 1/x for various xmin to xmax are:

75 intervals, 354 funeval, 6 divides, small=0.101855, maxerr=0.000209298
xmin=0.1, xmax=2, area=2.99538, exact=2.99573, err=-0.000347413

261 intervals, 1470 funeval, 11 divides, small=0.0100607, maxerr=0.000228422
xmin=0.01, xmax=2, area=5.29793, exact=5.29832, err=-0.000390239

847 intervals, 4986 funeval, 16 divides, small=0.00100191, maxerr=0.000226498
xmin=0.001, xmax=2, area=7.6005, exact=7.6009, err=-0.000399734

2000 intervals, 9810 funeval, 18 divides, small=0.000100238, maxerr=0.0141083
xmin=0.0001, xmax=2, area=9.78679, exact=9.90349, err=-0.116702

2000 intervals, 9444 funeval, 17 divides, small=1.04768e-05, maxerr=49.455
xmin=1e-05, xmax=2, area=11.2703, exact=12.2061, err=-0.935768

1.0/((x-0.3)*(x-0.3)+0.01) + 1.0/((x-0.9)*(x-0.9)+0.04) -6
387 intervals, 2226 funeval, 6 divides, small=0.00390625, maxerr=0.000595042
xmin=0, xmax=1, area=29.8582, exact=29.8583, err=-0.000113173

Notice how the method failed with xmin=0.0001, maxed out on storage.
Yes, a better data structure would be a tree or linked list.
It can be made recursive yet that may not be a good idea.
(Programs die from stack overflow!)

On one case the adaptive quadrature used the bins shown in the figure:

On some bigger problems, I had run starting with n = 8;
This missed the area where the slope was large.
Just like using the steepest descent method for finding roots,
you may hit a local flat area and miss the part that should

The above was to demonstrate the "bins" and used way too much
memory. The textbook has pseudo code on Page 301 that uses
much less storage. A modified version of that code is:

/* aquad3.c from book page 301, with modifications */

static double stack[100][7];   /* a place to store/retrieve */
static int top, maxtop; /* top points to where stored */

void store(double s0, double s1, double s2, double s3, double s4,
double s5, double s6)
{
stack[top][0] = s0;
stack[top][1] = s1;
stack[top][2] = s2;
stack[top][3] = s3;
stack[top][4] = s4;
stack[top][5] = s5;
stack[top][6] = s6;
}

void retrieve(double* s0, double* s1, double* s2, double* s3, double* s4,
double* s5, double* s6)
{
*s0 = stack[top][0];
*s1 = stack[top][1];
*s2 = stack[top][2];
*s3 = stack[top][3];
*s4 = stack[top][4];
*s5 = stack[top][5];
*s6 = stack[top][6];
} /* end retrieve */

double Sn(double F0, double F1, double F2, double h)
{
return h*(F0 + 4.0*F1 + F2)/3.0; /* error term 2/90 h^3 f^(3)(c) */
} /* end Sn */

double RS(double F0, double F1, double F2, double F3,
double F4, double h)
{
return h*(14.0*F0 +64.0*F1 + 24.0*F2 + 64.0*F3 + 14.0*F4)/45.0;
/* error term  8/945  h^7 f^(8)(c) */
} /* end RS */

double aquad3(double f(double x), double xmin, double xmax, double eps)
{
double a, b, c, d, e;
double Fa, Fb, Fc, Fd, Fe;
double h1, h2, hmin;
double Sab, Sac, Scb, S2ab;
double tol; /* eps */
double val, value;

maxtop = 99;
top = 0;
value = 0.0;
tol = eps;
a = xmin;
b = xmax;
h1 = (b-a)/2.0;
c = a + h1;
Fa = f(a);
Fc = f(c);
Fb = f(b);
Sab = Sn(Fa, Fc, Fb, h1);
store(a, Fa, Fc, Fb, h1, tol, Sab);
top = 1;
while(top > 0)
{
top--;
retrieve(&a, &Fa, &Fc, &Fb, &h1, &tol, &Sab);
c = a + h1;
b = a + 2.0*h1;
h2 = h1/2;
d = a + h2;
e = a + 3.0*h2;
Fd = f(d);
Fe = f(e);
Sac = Sn(Fa, Fd, Fc, h2);
Scb = Sn(Fc, Fe, Fb, h2);
S2ab = Sac + Scb;
if(((S2ab-Sab)<0.0?(-(S2ab-Sab)):(S2ab-Sab)) < tol)
{
val = RS(Fa, Fd, Fc, Fe, Fb, h2);
value = value + val;
}
else
{
h1 = h2;
tol = tol/2.0;
store(a, Fa, Fd, Fc, h1, tol, Sac);
top++;
store(c, Fc, Fe, Fb, h1, tol, Scb);
top++;
}
if(top>=maxtop) break;
} /* end while */
return value;
} /* end main of aquad3.c */

The same test cases, using aquad3t.c, gave the following result:

xmin=0.1, xmax=2, area=2.99573, exact=2.99573, err=9.25606e-07

xmin=0.01, xmax=2, area=5.29832, exact=5.29832, err=1.50725e-06

xmin=0.001, xmax=2, area=7.6009, exact=7.6009, err=1.65548e-06

xmin=0.0001, xmax=2, area=9.90349, exact=9.90349, err=1.66753e-06

xmin=1e-05, xmax=2, area=12.2061, exact=12.2061, err=1.66972e-06

1.0/((x-0.3)*(x-0.3)+0.01) + 1.0/((x-0.9)*(x-0.9)+0.04) -6
xmin=0, xmax=1, area=29.8583, exact=29.8583, err=-1.6204e-06

For case h, area = h(F0 + F1)/2
error = 1/12 h f''(c)

For case 2h, area = 2h(F0 + 4F1 + F2)/6
error = 2/90 h^3 f''''(c)

For case 3h, area = 3h(F0 + 3F1 + 3F2 + F3)/8
error = ? h^5 f'^(6)(c)

For case 4h, area = 4h(14F0 + 64F1 + 24F2 + 64F3 + 14F4)/180
error = 8/945 h^7 f'^(8)(c)  eighth derivitive, c is largest value

Then, the MatLab version:

% test_aquad.m  1/x  eps = 0.001
eps=0.001;
xmin = 0.1;
xmax = 2.0;
e = log(xmax)-log(xmin);
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);

xmin = 0.01;
xmax = 2.0;
e = log(xmax)-log(xmin);
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);

xmin = 0.001;
xmax = 2.0;
e = log(xmax)-log(xmin);
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);

xmin = 0.0001;
xmax = 2.0;
e = log(xmax)-log(xmin);
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);

xmin = 0.000001;
xmax = 2.0;
e = log(xmax)-log(xmin);
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);

fprintf(fid,'1.0/((x-0.3)*(x-0.3)+0.01) + 1.0/((x-0.9).*(x-0.9)+0.04) -6\n');
xmin = 0.0;
xmax = 1.0;
e = 29.8583;
fprintf(fid,'xmin=%g, mmax=%g, area=%g, exact=%g, err=%g \n', xmin, xmax, e, q, e-q);
return;

function y=f1(x)
y=1.0 ./ ((x-0.3) .*(x-0.3)+0.01) + 1.0 ./ ((x-0.9).*(x-0.9)+0.04) -6.0;
return
end

function y=f(x)
y = 1.0 ./ x;
return
end
end

with results:

xmin=0.1, mmax=2, area=2.99573, exact=2.99597, err=-0.000241595
xmin=0.01, mmax=2, area=5.29832, exact=5.29899, err=-0.000668946
xmin=0.001, mmax=2, area=7.6009, exact=7.60218, err=-0.0012822
xmin=0.0001, mmax=2, area=9.90349, exact=9.90415, err=-0.000660689
xmin=1e-06, mmax=2, area=14.5087, exact=14.5101, err=-0.00148357
1.0/((x-0.3)*(x-0.3)+0.01) + 1.0/((x-0.9).*(x-0.9)+0.04) -6
xmin=0, mmax=1, area=29.8583, exact=29.8583, err=-8.17859e-06

The last case used the curve
y = 1.0/((x-0.3)*(x-0.3)+0.01) + 1.0/((x-0.9)*(x-0.9)+0.04) -6
that looks like:

Passing a function as an argument in Python is easy:
xmin = 1.0
xmax = 2.0
eps  = 0.001

def f(x):
return x*x

print "aquad3(f, xmin, xmax, eps) =", area

With Java, some extra is needed to pass a user function to
a pre written integration class.

You need an interface to the function:
lib_funct.java

The user then provides an implementation to the function
to pass to the integration method and may provide values:
pass_funct.java

The pre written integration, trapezoidal method example,
has the numeric code, yet does not know the function (yet):
main_funct.java
main_funct.out

Another example with a two parameter function is:

You need an interface to the function:
RKlib.java

The user then provides an implementation to the function
to pass to the integration method and may provide values:
RKuser.java

The pre written integration method example,
has the numeric code, yet does not know the function (yet):
RKmain.java
RKmain.out

The files are:

aquad.c teaching version, do not use

lib_funct.java
pass_funct.java
main_funct.java

test_passing_function.py
test_passing_function_py.out



### Lecture 9, Review 1


Go over WEB pages Lecture 1 through 8.
Work the homework 1 and homework 2.

Open book, open notes, exam.
No computers, some do not have laptops.
No study guides or copies of study guides.
Follow the instructions, else lose points.
No programming required.
Some code may appear as part of a question.
One hour time limit. Exam will be after lectures.

Example questions and answers will be presented in class.
Some things you should know for the exam:

IEEE float has 6 or 7 decimal digits precision

IEEE double has 15 or 16 decimal digits precision

Adding or subtracting numbers of large differences in magnitude
causes precision lose.

RMS error is root mean square error, a reasonable intuitive measure

Maximum error may be the most important measure for some applications

Average error is not very useful

The machine "epsilon" for a specific floating point arithmetic is
the smallest positive machine number added to exactly 1.0 that
results in a sum that is not exactly 1.0 .

Be careful reading programs that use "epsilon" or "eps". Sometimes
it may be the "machine epsilon" and other times it may be a
tolerance on how close the something should be.

Most languages include the elementary functions sin and cos, yet many
do not include a complete set of reciprocal functions such as cosecant
or inverse or hyperbolic functions such as inverse hyperbolic cotangent.

If given only a natural logarithm function, log2(x) and be computed
as  log(x)/log(2.0) and log10(x) as log(x)/log(10.0)

Homework 1 used a simple approximation for integration:
position s = integral for time=0 to time=t of velocity(t) dt
as s = sum i=1 to n  s_i = s_i-1 + dv_i-1 * dt  where n*dt=t

In order to guarantee a mathematical unique solution to a set of
linear simultaneous equations, two requirements are needed:
There must be the same number of equations as unknown variables and
the equations must be linearly independent.

For two equations to be linearly independent, there must not be
two constants p and q such that equation1 * p + q = equation2

The numerical solution of liner simultaneous equations can fail even
though the mathematical condition for a unique solution exists.

The Gauss Jordan method of solving simultaneous equations A x = y produces
the solution x by performing operations on A y, reducing A to the
identity matrix such that y is replaced by x at the end of the computation

Improved numerical accuracy is obtained in the solution of linear
systems of equations by interchanging rows such that the largest
magnitude diagonal is used as the pivot element.
Some very large systems of equations can be solved accurately.

The system of linear equations is solved by the same method when
the equations have complex values.

It is better to directly solve simultaneous equations rather than
invert the "A" matrix and multiply by the "y" vector to find "x".
There are small matrices that are very hard to invert accurately.
Matrix inversion uses very similar computation to direct solving
of simultaneous equations. The matrix times vector, A * y, introduces
more error.

Given data, a least square fit of the data minimizes the RMS error
for a given degree polynomial at the data points. Between the
data points there may be large variations.

When trying to fit large degree polynomials, there may be numerical
errors such that the approximation becomes worse.

A polynomial of degree n, will exactly fit n+1 points.
It may have extreme humps and valleys between the points.

Mathematically, a least square fit of n data points should be
exactly fit by a n-1 degree polynomial. The numerical computation
may not give this result.

A least square fit may use terms: powers, sine, cosine or any other
smooth function of the data. The basic requirement is that
all functions must be linearly independent of each other.

A least square fit uses a solution of simultaneous equations

Least square fit is the easiest method of fitting data that
is given with unequal spacing.

A polynomial of degree n has n+1 coefficients. Thus n+1 data
points can be exactly fit by an n degree polynomial.

A polynomial of degree n has exactly n roots (possibly with multiplicity)

Given roots r1, r2, ... rn a polynomial with these roots is created by
(x-r1)*(x-r2)* ... *(x-rn)

Finding a root, r, divide the polynomial by (x-r) to reduce one degree.

Newton's method x_next = x - P(x)/P'(x) will have quadratic
convergence except near multiple roots or derivative P'(x)
small. An error of 1/16 will reduce to an error of 1/64
on the next iteration.

Horner's method of evaluating polynomials provides accuracy and
efficiency by never directly computing high powers of the variable.

Given the numerical coefficients of a polynomial, the numerical
coefficients of the integral, derivative are easily computed.

Given the numerical coefficients of two polynomials, the
sum, difference, product and ratio are easily computed.

Any functions that can be continuously differentiated can be
approximated by a Taylor series expansion. It is called
"truncation error" when evaluating a Taylor series with a
specific number of terms.

Orthogonal polynomials are used to fit data and perform numerical
integration. Examples of orthogonal polynomials include:
Legendre, Chebyshev, Laguerre, Lagrange, Fourier.

Chebyshev polynomials are used to approximate smooth functions
while minimizing the maximum error.

Legendre polynomials are used to approximate smooth functions
while minimizing RMS error.

Lagrange polynomials are used to approximate smooth functions
while exactly fitting a specific set of points.

For non smooth functions, including square waves and pulses,
a Fourier series approximation may be used.

The Fejer approximation can be used to eliminate many oscillations
in the Fourier approximation at the expense of a less accurate fit.

Numerical integration is typically called numerical quadrature.

The Trapezoidal integration method requires a small step size
and many function evaluations to get accuracy.
(My step size is uniform and the method is easy to code)

In general a smaller the step size, usually called "h", will
result in a more accurate value for the integral. This
implies that more evaluations of the function to be
integrated are needed.

The Gauss Legendre integration of smooth functions provides
good accuracy with a minimum number of function evaluations.
(The weights and ordinates are needed for the summation)

get reasonable accuracy of functions with large derivatives
or large variation in derivatives.

two methods of integration to determine when the desired
accuracy is achieved. This method, as with all integration
methods, can fail to give accurate results.

Two dimensional and higher dimensions can use simple extensions

The difference between a Taylor Series and a Maclaurin Series
is that the Maclaurin Series always expands a function about
zero.

The two equations for Standard Deviation are mathematically
equivalent, yet may give different numerical results.
sigma = sqrt((sum(x^2)- sum(x)^2/n)/n) = sqrt(sum((x-mean)^2)/n)
For n samples of x with a mean of mean.

RMS error is computed from a set of n error measurements using
rms_error = sqrt(sum(err^2)/n)
Note that this is the same value as Standard Deviation when
the mean value of the errors is zero. Thus, considered a
reasonable intuitive measure of error.



### Lecture 10, Quiz 1


Open book, open note quiz.
No study guides or copies of study guides.
One hour time limit, exam follows lectures.

The exam covers:
lectures 1 through 9
homework 1 and 2

Summer class quiz may be after lecture 11 or 12,
yet does not cover lecture 11 and 12.



### Lecture 11, Complex Arithmetic

There may be times when you have to do numerical computation
on complex values (scalars, vectors, or matrices).

If you are programming in Fortran, no problem, the types
complex and  complex*16 or double complex  are built in.
In Ada 95, the full set of complex arithmetic and functions
come with the compiler as packages. MatLab and Python use
complex as needed, automatically (e.g. output of FFT).

In other programming languages you need to know how to do
complex computation and how to choose the appropriate
numerical method.

Background:
A complex number is stored in the computer as an ordered pair
of floating point numbers, the real part and the imaginary part.
The floating point numbers may be single or double precision as
needed by the application. It is suggested that you use double
precision as the default.

These are called Cartesian coordinates. Polar coordinates are
seldom used for computation, yet, are usually made available by
a conversion function to magnitude and angle. Note that
numerical analyst call the angle by the name "argument".

The naming convention depends on the programming language and
personal (or customer) choice.

Basic complex arithmetic is covered first and then complex
functions: sin, cos, tan, sinh, cosh, tanh, exp, log, sqrt, pow
are covered in the next lecture.

The simplest need for complex numbers is solving for the roots
of the polynomial equation  x^2 + 1 = 0 .
There must be exactly two roots and they are sqrt(-1) and
-sqrt(-1) that are named  "i"  and  "-i".

The quadratic equation for finding roots of a second order
polynomial should use the complex sqrt, even for real coefficients
a, b, and c, because the roots may be complex.

given:  a x^2 + b x + c = 0  find the roots

b +/- sqrt(b^2 - 4 a c)
solution  x = -----------------------
2 a

that computes complex roots if  4 a c > b^2
Of course, the equation correctly computes the roots when
a, b, and c  are complex numbers.

The basic complex arithmetic (including some functions for
the next lecture) are in Complex.java
The automatically generated documentation is Complex.html

The complex arithmetic and functions in C uses "cx" named functions
as shown in the header file complex.h
and the body complex.c
with a test program test_complex.c
with results test_complex_c.out

provides complex arithmetic. Note the many operator definitions.
The use of complex in Ada is shown in this small program:
Complex functions are provided by generic_complex_elementary_functions.ads

The use of complex in Fortran 95 is shown in this small program:
complx.f90

C++ has the STL class Complex and can be used as shown
test_complex.cpp

Cartesian Coordinates
Complex numbers define a plane and are typically Cartesian coordinates.
Polar coordinates also define a plane in terms of radius, r and angle θ.
x = r * cos(θ)       r = sqrt(x*x+y*y)
y = r * sin(θ)       θ = arctan(y/x)  or atan2

Other coordinate systems are:
Cylindrical Coordinates
Cylindrical coordinates in terms of radius r, angle θ and height z.
x = r * cos(θ)       r = sqrt(x*x+y*y)
y = r * sin(θ)       θ = arctan(y/x)  or atan2
z = z                z = z

Spherical Coordinates
Spherical coordinates in terms of radius r, angles θ and φ
x = r * sin(φ) * cos(θ)     r = sqrt(x*x+y*y+z*z)
y = r * sin(φ) * sin(θ)     θ = arctan(y/x)  or atan2
z = r * cos(φ)              φ = arctan(sqrt(x*x+y*y)/z)

Toroidal Coordinates 1
The five independent variables are a, σ, θ, φ and z0

denom = cosh(θ)-cos(σ)
x = a * sinh(θ) * cos(φ) / denom
y = a * sinh(θ) * sin(φ) / denom
z = a * sin(σ)             / denom               optional + z0
-π < σ < π   θ > 0    0 < φ < 2π   a > 0

φ = arctan(y/x)
temporaries r1 = sqrt(x^2 + y^2)
d1 = sqrt((r1+a)^2 + z^2)
d2 = sqrt((r1-a)^2 + z^2)
θ = ln(d1/d2)
σ = arccos((d1^2+d2^2-4*a^2)/(2*d1*d2))

Toroidal Coordinates 2
The five independent variables are r1, r2, θ, φ, and z0

x = (r1 + r2 * sin(φ)) * cos(θ)
y = (r1 + r2 * sin(φ)) * sin(θ)
z = r2 * cos(φ)                        Optional + z0
0 < θ < 2π    0 < φ < 2π  r1 > 0   r2 > 0

θ = arctan(y/x)
φ = arccos(z/r2)
r1 = x/cos(θ) - r2*sin(φ)  or
r1 = y/sin(θ) - r2*sin(φ) no divide by zero

A simple implementation in C is demonstrated in
coordinate.c
coordinate.out

Beware of your choice of angle ranges when converting the above

Cartesian                                   Cylindrical                             Spherical

For Toroidal Coordinates 1:
toroidal_coord.c
toroidal_coord_c.out
toro.dat
toro.sh
toro.plot

For Toroidal Coordinates 2:
toro2r.c
toro2r_c.out
toro2r.dat
toro2r.sh
toro2r.plot

Then, for later, differential operators in three coordinate systems
del and other operators



### Lecture 12, complex functions

Some complex functions may be computed accurately from the basic
definition. But, preserving relative accuracy over the complete
domain of a few complex functions requires special techniques.

Note that the domain and range of complex functions may not be
obvious to the average user.

First, look at the mappings from domain z1 to range z2 for some
complex functions:
With Java applets enabled
(If this does not work, go to my home page and display from there.)

Various identities for elementary functions in the complex plane,
including implementing the complex function using only real functions.

Let   z = x + i y
then  arg z = arctan y/x       real function using signs of x and y
modulus z = sqrt(x*x+y*y)   = |z|  length of complex vector
re z = x
im z = y

SQRT
i (arg z)/2
sqrt z = sqrt(|z|) e

thus yielding half the angle with magnitude sqrt(|z|)

modulus z + re z             modulus z - re z
sqrt z = sqrt ----------------  +/- i sqrt ----------------
2                            2
where the sign of the imaginary part of the result is
the sign of the imaginary part of z

2    3
z   z    z
sqrt 1+z = 1 + - - -- + -- - ...
2   8    16

sqrt( 0 + i0) = 0 + i0
sqrt(-0 + i0) = 0 + i0
sqrt( 0 - i0) = 0 - i0
sqrt(-0 - i0) = 0 - i0
sqrt(z)**2 = z
sqrt(z*z) = z    re z > 0
sqrt(1/z) = 1/sqrt(z)   not for -0 or negative real axis
conjugate(sqrt(z)) = sqrt(conjugate(z))  not for -0 or negative real axis
Branch cut:
The branch cut lies along the negative real axis.
Domain:
Mathematically unbounded
Range:
Right half plane including the imaginary axis. The algebraic sign
of the real part is positive.
The sign of the imaginary part of the result is the same as the
sign of im z.

LOG
log z = ln( modulus z) + i argument z

2    3
z    z
log 1+z = z - -- + -- - ...       for |z| < 1
2    3

log(exp(z)) = z      im z in [-Pi,Pi]
log(1/z) = -log(z)   not on negative real axis
conjugate(log(z)) = log(conjugate(z))  not on negative real axis
Branch cut:
The branch cut lies along the negative real axis.
Domain:
Modulus z /= 0.0
Range:
Real part mathematically unbounded, imaginary part in range - Pi to Pi
The sign of the imaginary part of the result is the same as the
sign of im z.

EXP
z    re z                 re z
e  = e     cos (im z) + i e     sin (im z)

i x
e    = cos(x) + i sin(x)

2    3
z           z    z
e  = 1 + z + -- + -- + ...
2!   3!

z    z+i 2Pi
e  = e                   periodic with period  i 2Pi

exp(-z) = 1/exp(z)       not on negative real axis
exp(log(z)) = z          |z| non zero
conjugate(exp(z)) = exp(conjugate(z))
Branch cut:
None
Domain:
re z < ln 'LARGE
Range:
Mathematically unbounded
For modulus z = 0.0, the result is 1.0 + z

"**"
w    (w * ln z)
z  = e
2           3
w                (w ln z)    (w ln z)
z  = 1 + w ln z + --------- + --------- + ...  for |z| non zero
2!          3!
w
log(z ) = w * log(z)

SIN
sin z = sin(re z) cosh(im z) + i cos(re z) sinh(im z)

iz    -iz
e   - e
sin z = -----------
2i

3    5
z    z
sin z = z - -- + -- - ...
3!   5!

sin z = cos(z - Pi/2)            periodic with period Pi
sin(z) = sin(z+2Pi)
sin z = -i sinh iz
sin(arcsin(z)) = z
sin(z) = -sin(-z)
conjugate(sin(z)) = sin(conjugate(z))
Domain:
|im z| < ln 'LARGE

COS
cos z = cos(re z) cosh(im z) - i sin(re z) sinh(im z)

iz    -iz
e   + e
cos z = -----------
2

2    4
z    z
cos z = 1 - -- + -- - ...
2!   4!

cos z = sin(z + Pi/2)            periodic with period Pi
cos(z) = cos(z+2Pi)
cos(z) = cos(-z)
cos z = cosh iz
cos(arccos(z)) = z
conjugate(cos(z)) = cos(conjugate(z))
Domain:
|im z| < ln 'LARGE

TAN
tan z = sin z / cos z

iz    -iz
e   - e
tan z = -i ----------          limit = -i when im z > ln 'LARGE
iz    -iz          limit =  i when im z < - ln 'LARGE
e   + e

3     5      7
z    2z    17z
tan z = z + -- + --- + ---- + ...              for |z| < 1
3    15    315

sin x  cos x          sinh y  cosh y
tan z = ----------------   +i  ----------------
2         2            2         2
cos  x + sinh  y       cos  x + sinh  y

tan z = cot(Pi/2 - z)            periodic with period Pi
tan z = tan(z+2Pi)
tan z = 1/cot z
tan(z) = -tan(-z)
tan z = -i tanh iz
tan(arctan(z))=z
conjugate(tan(z)) = tan(conjugate(z))
Branch cut:
None
Domain:
Mathematically unbounded
Range:
Mathematically unbounded
For modulus z = 0.0, the result is z

COT
cot z = cos z / sin z

iz    -iz
e   + e
cot z = i ----------          limit = i when im z > ln 'LARGE
iz    -iz          limit = -i when im z < -ln 'LARGE
e   - e

sin x  cos x          sinh y  cosh y
cot z = ----------------   -i  ----------------
2         2            2         2
sin  x + sinh  y       sin  x + sinh  y

3     5
1   z   z    2z
cot z = - - - - -- - --- - ...
z   3   45   945

cot z = tan(Pi/2 - z)            periodic with period Pi
cot(z) = cot(z+2Pi)
cot(z) = -cot(-z)
cot z = 1/tan z
conjugate(cot(z)) = cot(conjugate(z))
Branch cut:
None
Domain:
Mathematically unbounded
Range:
Mathematically unbounded
For modulus z = 0.0, the result is z

ARCSIN
2
arcsin z = - i ln(i z + sqrt(1-z ))

arcsin z = - i ln(i z + sqrt(1-z)*sqrt(1+z))

3     5     7
z    3z    5z
arcsin z = z + -- + --- + ---- + ...        for |z| < 1
6    40    112

1      3      15
arcsin z = -i( ln(2iz) - --- - ---- - ----- - ... )  for |z| > 1
2      4       6
4z    32z    288z

2
arcsin z = arctan(z/sqrt(1-z ))  fix re of result

arcsin z =
2              2 1/2         2              2 1/2
arcsin(1/2 (x  + 2 x + 1 + y )    - 1/2 (x  - 2 x + 1 + y )   )

+ i
2              2 1/2         2              2 1/2
csgn(-i x + y) ln(1/2 (x  + 2 x + 1 + y )    + 1/2 (x  - 2 x + 1 + y )

2              2 1/2         2              2 1/2 2     1/2
+ ((1/2 (x  + 2 x + 1 + y )    + 1/2 (x  - 2 x + 1 + y )   )  - 1)   )

note: The csgn function is used to determine in which half-plane
(left' or   right') the complex-valued expression
or number x lies.  It is defined by

/  1    if Re(x) > 0 or Re(x) = 0 and Im(x) > 0
csgn(x) = <  -1    if Re(x) < 0 or Re(x) = 0 and Im(x) < 0
\  0    if x = 0

arcsin z = pi/2 - arccos z

arcsin z = -i arcsinh iz

arcsin(sin(z)) = z
Branch cut:
The real axis not in  [ -1.0, 1.0 ]
Domain:
Mathematically unbounded
Range:
Imaginary part mathematically unbounded, real part in [ -Pi/2, Pi/2 ]
For modulus z = 0.0, the result is z

ARCCOS
1+z         1-z
arccos z = -i 2 ln( sqrt --- +i sqrt --- )
2           2
2
arccos z = -i ln(z + i sqrt(1-z ) )

3     5
Pi       z    3z
arccos z = -- - z - -- - -- - ...             for |z| < 1
2        6    40

arccos z = -i( ln(2z)                         for |z| > 1/sqrt(epsilon)

arccos z =

2              2 1/2         2              2 1/2
arccos(1/2 (x  + 2 x + 1 + y )    - 1/2 (x  - 2 x + 1 + y )   )

+ i
2              2 1/2         2              2 1/2
csgn(I x - y) ln(1/2 (x  + 2 x + 1 + y )    + 1/2 (x  - 2 x + 1 + y )

2              2 1/2         2              2 1/2 2     1/2
+ ((1/2 (x  + 2 x + 1 + y )    + 1/2 (x  - 2 x + 1 + y )   )  - 1)   )

2
arccos z = arctan(sqrt(1-z )/z)  fix re of result

arccos z = pi/2 - arcsin z

arccos(cos(z)) = z
Branch cut:
The real axis not in  [ -1.0, 1.0 ]
Domain:
Mathematically unbounded
Range:
Imaginary part mathematically unbounded, real part in  [ 0.0, Pi ]

ARCTAN
arctan z = -i ( ln(1 + i z) - ln(1 - i z) )/2

i    i+z
arctan z = - ln ---    must be fixed on slit for  iz < -1
2    i-z

3    5    7
z    z    z
arctan z = z - -- + -- - -- + ...            for |z| < 1
3    5    7

arctan z = Pi/2 - arccot z

arctan z = arccot(1/z)

arctan z = -i arctanh iz

arctan(tan(z)) = z

arctan z = 1/2 arctan(x, 1 - y) - 1/2 arctan(- x, y + 1)

2          2
x  + (y + 1)
+i      1/4 ln(-------------)
2          2
x  + (y - 1)

ARCCOT
i    z-i
arccot z = - ln ---
2    z+i
3    5
Pi       z    z
arccot z = -- - z + -- - -- + ...             for |z| < 1
2        3    5

arccot z = Pi/2 - arctan z

arccot z = arctan(1/z)

arccot(cot(z)) = z

arccot z = 1/2 Pi - 1/2 arctan(x, 1 - y) + 1/2 arctan(- x, y + 1)

2          2
x  + (y + 1)
-i 1/4 ln(-------------)
2          2
x  + (y - 1)

Range:
Imaginary part mathematically unbounded, real part in  [0.0 , Pi]

SINH
sinh z = sinh(re z) cos(im z) + i cosh(re z) sin(im z)

z    -z
e  - e
sinh z = --------
2

3    5
z    z
sinh z = z + -- + -- + ...
3!   5!

sinh z = -i cosh(z +i Pi/2)           periodic with period i Pi

sinh z = -i sin iz

COSH
cosh z = cosh(re z) cos(im z) + i sinh(re z) sin(im z)

z    -z
e  + e
cosh z = --------
2

2    4
z    z
cosh z = 1 + -- + -- + ...
2!   4!

cosh z = -i sinh(z +i Pi/2)         periodic with period  i Pi

cosh z = cos iz

TANH
tanh z = sinh z / cosh z

z    -z
e  - e
tanh z = --------
z    -z
e  + e

3     5     7
z    2z   17z
tanh z = z - -- + -- - ---- + ...             for |z| < 1
3    15   315

tanh z = -i coth(z +i Pi/2)            periodic with period  i Pi

tanh z = 1/coth z

tanh z = -i tan iz

sinh x  cosh x          sin y  cos y
tanh z = ----------------   +i  ----------------
2        2             2        2
sinh  x + cos  y       sinh  x + cos  y

COTH
coth z = cosh z / sinh z

z    -z
e  + e
coth z = --------
z    -z
e  - e
3     5
1   z   z    2z
coth z = - + - - -- + --- - ...            for |z| < 1
z   3   45   945

coth z = -i tanh(z +i Pi/2)

coth z = 1/tanh z

sinh x  cosh x          sin y  cos y
coth z = ----------------   -i  ----------------
2        2             2        2
sinh  x + sin  y       sinh  x + sin  y

ARCSINH
2
arcsinh z = ln(z + sqrt(1 + z ) )

3     5     7
z    3z    5z
arcsinh z = z - -- + --- - --- + ...           for |z| < 1
6    40    112

1     3      15
arcsinh z = ln(2z) + --- - ---- + ----- - ...   for |z| > 1
2      4       6
4z    32z    288z

arcsinh z = -i arcsin iz

arcsinh(sinh(z)) = z

arcsinh z =

2    2           1/2         2    2           1/2
csgn(x + I y) ln(1/2 (x  + y  + 2 y + 1)    + 1/2 (x  + y  - 2 y + 1)

2    2           1/2         2    2           1/2 2     1/2
+ ((1/2 (x  + y  + 2 y + 1)    + 1/2 (x  + y  - 2 y + 1)   )  - 1)   )

+ i
2    2           1/2         2    2           1/2
arcsin(1/2 (x  + y  + 2 y + 1)    - 1/2 (x  + y  - 2 y + 1)   )

ARCCOSH
z+i        z-i
arccosh z = 2 ln( sqrt --- + sqrt --- )
2          2

arccosh z = ln(z + sqrt(z-1) sqrt(z+1) )    not sqrt(z**2-1)

1     3      15
arccosh z = ln(2z) - --- - ---- - ----- - ...    for |z| > 1
2      4       6
4z    32z    288z

arccosh(cosh(z)) = z

arccosh z =
2              2 1/2
- csgn(I - I x + y) csgn(I x - y) ln(1/2 (x  + 2 x + 1 + y )

2              2 1/2
+ 1/2 (x  - 2 x + 1 + y )

2              2 1/2         2              2 1/2 2     1/2
+ ((1/2 (x  + 2 x + 1 + y )    + 1/2 (x  - 2 x + 1 + y )   )  - 1)   )

+ i
csgn(I - I x + y)

2              2 1/2         2              2 1/2
arccos(1/2 (x  + 2 x + 1 + y )    - 1/2 (x  - 2 x + 1 + y )   )

ARCTANH
arctanh z = ( ln(1+z) - ln(1-z) )/2

1    1+z
arctanh z = - ln ---       must fix up on slit for  z > 1
2    1-z

3    5    7
z    z    z
arctanh z = z + -- + -- + -- + ...              for |z| < 1
3    5    7

arctanh z = -i arctan iz

arctanh(tanh(z)) = z

arctanh z = arccoth(z) + i Pi/2

2    2
(x + 1)  + y
arctanh z =                1/4 ln(-------------)
2    2
(x - 1)  + y

+i  1/2 arctan(y, x + 1) - 1/2 arctan(- y, 1 - x)

ARCCOTH
1    z+1
arccoth z = - ln ---      must fix up on slit
2    z-1
3    5
i Pi       z    z
arccoth z = ---- + z + -- + -- + ...               for |z| < 1
2          3    5

arccoth z = arctanh(z) +i Pi/2

arccoth z = arctanh(1/z)

arccoth(coth(z)) = z

2    2
(x + 1)  + y
arccoth z =                1/4 ln(-------------)
2    2
(x - 1)  + y

+i  1/2 Pi + 1/2 arctan(y, x + 1) - 1/2 arctan(- y, 1 - x)

Range:
Real part mathematically unbounded, imaginary part in [0.0 , i Pi]

There are many complex functions provided by Ada

Fortran 90 has a few complex functions
complex_func.f90  source code
complex_func_f90.out  output

Python has many complex functions
test_complex.py  source code
test_complex_py.out  output

test_complex.hs  source code
test_complex_hs.out  output
Complex.hs  Library module

I programmed some complex functions in java.
Complex.java  source code

MatLab overloads all functions that take floating point
to also handle complex numbers.



### Lecture 13, Eigenvalues of a Complex Matrix

Eigenvalue and Eigenvector computation may be the most prolific for
special case numerical computation. Considering the size and speed
of modern computers, I use a numerical solution for a general
complex matrix. Thus, I do not have to worry about the constraints
on the matrix (is it numerically positive definite?)

This type of numerical algorithm, you do not want to develop yourself.
The technique is to find a working numerical code, test it yourself,
then adapt the code to your needs, possibly converting the code to
a different language.

Eigenvalues have application in the solution of some physics problems.
They are also used in some solutions of differential equations.
Some statistical analysis uses eigenvalues.

My interest comes from the vast types of testing that can, and should,
be performed on any code that claims to compute eigenvalues and
eigenvectors. Thorough testing of any numeric code you are going
to use is essential.

Information about eigenvalues, e (no lambda in plain ASCII)
and eigenvectors, v, for arbitrary n by n complex matrix A.

There are exactly n eigenvalues (some may have multiplicity greater than 1)

For every eigenvalue there is a corresponding eigenvector.
For eigenvalues with multiplicity greater than 1,
each has a unique eigenvector.
The set of n eigenvectors form a basis, they are all mutually orthogonal.
(The dot product of any pair of eigenvectors is zero.)

View this page in a fixed width font, else the matrices are shambles.

Vertical bars, | |, in this lecture is not absolute value,
it means vector or matrix

| 1 0 0 |
det|A-eI| = 0   defines e, where I is an n by n identity matrix | 0 1 0 |
zero except  1  on the diagonal      | 0 0 1 |

For a 3 by 3 matrix A:

| a11-e a12   a13   |
det| a21   a22-e a23   | = (a11-e)*(a22-e)*(a33-e)+a12*a23*a31+a13*a32*a21-
| a31   a32   a33-e |   a31*(a22-e)*a13-a21*a12*(a33-e)-a32*a23*(a11-e)

Writing out the above determinant gives the "Characteristic Equation"
of the matrix A.
Combining terms gives  c_n * e^n + ... + c_2 * e^2 + c_1 * e + c_0 = 0
Dividing through by c_n gives exactly n coefficients.
(Special case because of " = 0 " )

There are exactly n roots for an nth order polynomial and the
n roots of the characteristic equation are the n eigenvalues.

The relation between each eigenvalue and its corresponding eigenvector is
Av = ev  where |v| is non zero.
Typically, require the length of v to be 1.

Given a matrix A and a non singular matrix P and P inverse p^-1
B = P A P^-1   the matrix B has the same eigenvalues as matrix A.
The eigenvectors may be different.
B is a similarity transform of A.

The diagonal elements of a diagonal matrix are the eigenvalues of the matrix.
| a11  0   0   0  |
|  0  a22  0   0  |
|  0   0  a33  0  | has eigenvalues a11, a22, a33, a44 and
|  0   0   0  a44 |

corresponding eigenvectors

v1=|1  0  0  0|   v2=|0  1  0  0|  v3=|0  0  1  0|  v4=|0  0  0  1|

Notice that the eigenvectors are not necessarily unique and may be
scaled by an arbitrary, non zero, constant. Normalizing the length
of each eigenvector to 1.0 is common.

The eigenvalues of a 2 by 2 matrix are easily computed as the roots
of a second order equation.
det| a11-e  a12   |=0 or  (a11-e)*(a22-e) - a12*a21 = 0  or
| a21    a22-e |

e^2 - (a11+a22) e + (a11*a22-a12*a21) = 0
Let a=1, b=-(a11+a22), c= (a11*a22-a12*a21)
then e = (-b +/- sqrt(b^2-4*a*c)/2*a  computes the two eigenvalues.
The roots and thus the eigenvalues may be complex.

Note that a matrix with all real coefficients may have complex eigenvalues.

Computing the characteristic equation is usually not a good way
to compute eigenvalues for n greater than 4 or 5.
It becomes difficult to compute the coefficients of the characteristic
equation accurately and it is also difficult to compute the roots accurately.

Note that given a high order polynomial, a matrix can be set up from
the coefficients such that the eigenvalues of the matrix are the roots
of the polynomial.  x^4 + c_3 x^3 + c_2 x^2 + c_1 x + c_0 = 0

| -c_3  -c_2  -c_1  -c_0 |   where c_4 of the polynomial is 1.0
|   1     0     0     0  |
|   0     1     0     0  |
|   0     0     1     0  |

Thus, if you have code that can find eigenvalues, you have code
that can find roots of a polynomial. Not the most efficient method.

The maximum value of the sum of the absolute values of each row and column
is a upper bound on the absolute value of the largest eigenvalue.
This maximum value is typically called the L1 norm of the matrix.

Scaling a matrix by multiplying every element by a constant causes
every eigenvalue to be multiplied by that constant. The constant may
be less than one, thus dividing works the same.

The eigenvalues of the inverse of a matrix are the reciprocals
of the eigenvalues of the original matrix.

A matrix with all elements equal has one eigenvalue equal to
the sum of a row and all other eigenvalues equal to zero.
The eigenvectors are typically chosen as the unit vectors.

| 0 0 0 |
A = | 0 0 0 | has three eigenvalues, all equal to zero
| 0 0 0 |

| 1 1 1 |     | 2 2 2 2 |
| 1 1 1 |     | 2 2 2 2 |  1 has eigenvalues 0, 0, 3
| 1 1 1 |     | 2 2 2 2 |  2 has eigenvalues 0, 0, 0, 8
| 2 2 2 2 |  in general n-1 zeros and a row sum

Each row that increases the singularity of a matrix, increases
the multiplicity of some eigenvalue.

The trace of a matrix is the sum of the diagonal elements.
The trace of a matrix is equal to the sum of the eigenvalues.

In order to keep the same eigenvalues, interchanging two rows
of a matrix, then requires interchanging the corresponding two columns.
The eigenvectors will probably be different.

Testing code that claims to compute eigenvalues
Testing a program that claims to compute eigenvalues and eigenvectors
is interesting because there are many possible tests. All should be
used.

Given A is an n by n complex matrix (that may have all real elements),
using IEEE 64-bit floating point and good algorithms:

1) Evaluate the determinant  det|A-eI| for each eigenvalue, e.
The result should be near zero. 10^-9 or smaller can be expected
when A is small and eigenvalues are about the same magnitude.

2) Evaluate each eigenvalue with its corresponding eigenvector.
Av-ev should be a vector with all elements near zero.
Typically check the magnitude of the largest element.
10^-9 or smaller can be expected when A is small.

3) Compute the dot product of every pair of eigenvectors
and check for near zero.

4) Compute the trace of A and subtract the sum of the eigenvalues.
The result should be near zero. The trace of a matrix is the
sum of the diagonal elements of the matrix.

5) Compute the maximum of the sum of the absolute values of each
row and column of A. Check that the absolute value of every
eigenvalue is less that or equal this maximum.

Generating test matrices to be used for testing.
1) Try matrices with n=1,2,3 first.
All zero matrix, all eigenvalues zero and eigenvectors should
be the unit basis vectors. If the length of the eigenvectors
is not 1.0, then you have to normalize them.

2) Try diagonal matrices with n=1,2,3,4
Typically put 1, 2, 3, 4 on the diagonal to make it easy to
check the values of the computed eigenvalues.

3) Generate a random n by n matrix, P,  with real and imaginary values.
Compute P inverse, P^-1.
Compute matrix B = P A P^-1  for the A matrices in 2)
The eigenvalues of B should be the same as the eigenvalues of A,
yet the eigenvectors may be different.

4) Randomly interchange some rows and corresponding columns of B.
The eigenvalues should be the same yet the eigenvectors
may be different.

5) Choose a set of values, typically complex values e1, e2, ..., en.
Compute the polynomial that has those roots
(x-e1)*(x-e2)*...*(x-en) and convert to the form
x^n + c_n-1 x^n-1 + ... c_2 x^2 + c_1 x + c_0
Create the matrix n by n with the first row being negative c's
and the subdiagonal being 1's.

| -c_n-1  ... -c_2  -c_1  -c_0 |
|   1           0     0     0  |
...
|   0           0     0     0  |
|   0           1     0     0  |
|   0           0     1     0  |

The eigenvalues should be e1, e2, ..., en
Then do a similarity transform with a random matrix
and check that the same eigenvalues are computed.

Yes, this is a lot of testing, yet once coded, it can be used
for that "newer and better" version the "used software salesman"
is trying to sell you.

Sample code in several languages
Now, look at some code that computes eigenvalues and eigenvectors
and the associated test code.

First, MatLab

% eigen.m  demonstrate eigenvalues in MatLab
format compact
e1=1; % desired eigenvalues
e2=2;
e3=3;
e4=4;
P=poly([e1 e2 e3 e4]) % make polynomial from roots
r=roots(P) % check that roots come back
A=zeros(4);
A(1,1)=-P(2); % build matrix A
A(1,2)=-P(3);
A(1,3)=-P(4);
A(1,4)=-P(5);
A(2,1)=1;
A(3,2)=1;
A(4,3)=1
[v e]=eig(A) % compute eigenvectors and eigenvalues
I=eye(4);    % identity matrix
for i=1:4
z1=det(A-I.*e(i,i)) % should be near zero
end
for i=1:4
z2=A*v(:,i)-e(i,i)*v(:,i) % note columns, near zero
end
z3=trace(A)-trace(e) % should be near zero

%  annotated output
%
%P =
%     1   -10    35   -50    24
%r =
%    4.0000         these should be eigenvalues
%    3.0000
%    2.0000
%    1.0000
%A =
%    10   -35    50   -24   polynomial coefficients
%     1     0     0     0
%     0     1     0     0
%     0     0     1     0
%v =
%    0.9683    0.9429    0.8677    0.5000  eigenvectors
%    0.2421    0.3143    0.4339    0.5000  are
%    0.0605    0.1048    0.2169    0.5000  columns
%    0.0151    0.0349    0.1085    0.5000
%e =
%    4.0000         0         0         0  eigenvalues
%         0    3.0000         0         0  as diagonal
%         0         0    2.0000         0  of the matrix
%         0         0         0    1.0000
%z1 =
%   1.1280e-13   i=1 first eigenvalue
%z1 =
%   5.0626e-14
%z1 =
%   1.3101e-14
%z1 =
%   2.6645e-15
%z2 =
%   1.0e-14 *    note multiplier  i=1
%       -0.4441
%       -0.1110
%       -0.0333
%       -0.0035
%z2 =
%   1.0e-14 *
%       -0.4441
%       -0.1554
%       -0.0444
%       -0.0042
%z2 =
%   1.0e-14 *
%       -0.7994
%       -0.2776
%       -0.0833
%       -0.0028
%z2 =
%   1.0e-13 *
%       -0.3103
%       -0.0600
%       -0.0122
%       -0.0033
%z3 =
%  -7.1054e-15     trace check

Now very similar code in MatLab using complex eigenvalues.
A similarity transform is applied and scaling is applied.
One eigenvalue check is now accurate to about 10^-7.

eigen2.m
eigen2_m.out

Using complex similarity transform:

eigen3.m
eigen3_m.out

Note that the MatLab help on "eig" says they use the
LAPACK routines. The next lecture covers some LAPACK.

A Fortran program to compute eigenvalues and eigenvectors from
TOMS, ACM Transactions on Mathematical Software, algorithm 535:

535.for
535.dat
535.out
535_roots.out
535_d.out
535_double.for double precision
535.dat
535_double_f.out
535_2.dat
535_2.out
535b.dat
535b.out
535_double.f90 just ! for C
535.rand
535_rand_f90.out
535.ranA
535_ranA_f90.out

A Java program to compute eigenvalues and eigenvectors is:

Eigen2.java
TestEigen2.java
TestEigen2.out

An Ada program to compute eigenvalues and eigenvectors is:

A rather limited use eigenvalue computation method is the
power method. It works sometimes and may require hundreds
of iterations. The following code shows that is can work
for finding the largest eigenvalue of a small real matrix.

eigen_power.c
eigen_power_c.out

The general eigen problem is A*v = e*B*v
Checks are also det|V|=0  det|(B^-1*A)-eI|=0
Each eigenvalue e has an eigenvector v as a column of V.
A rather quick and dirty translation of 535_double.for to Ada:

In aircraft design, there are stability questions that can be
answered using eigenvalue computations. In Matlab, you can
draw objects, such as this crude airplane, as well as doing
numerical computation.

drawn by plane.m
wing_2d.m
rotx_.m
roty_.m
rotz_.m

OK, long lecture, extra topic:
Generating random numbers, valuable for some testing.

First we generate uniformly distributed random numbers in
the range 0.0 to 1.0. A uniform distribution should have
roughly the same number of values in each range, bin.

Gaussian or Normal distributions should have roughly
a bell shaped curve, based on a given mean and sigma.
Sigma is the standard deviation. Sigma squared is the variance.

udrnrt.h three generators
udrnrt.c three generators code
plot_udrnrt.c test_program
plot_udrnrt1.out udrnrt output and plot

2.0*udrnrt()-1.0 output, -1.0 to 1.0

plot_udrnrt2.out gauss output and plot

plot_udrnrt3.out gauss output and plot

plot_udrnrt4.out sine wave output and plot

test_rand.py uniform and gauss plot

1,000,000 uniformly distributed samples

1,000,000 Gaussian distributed samples, mean=0.0, sigma=3.0



### Lecture 14, LAPACK

Creating a numerical algorithm can take years.
Finding and adapting a numerical algorithm is practical.

LAPACK is a good starting place to find high quality numerical code.
Yes, it is in Fortran 77 and yes much of the code is a very mature
30 years old. The good news is that the code produces correct
numeric results with known accuracy.
see www.netlib.org/lapack

ACM Transactions on Mathematical Software, TOMS is another
source. see www.netlib.org/toms

LAPACK includes prior LINPACK and EISPACK and uses the BLAS,
Basic Linear Algebra Subroutines, for low level operations.

Almost every LAPACK routine is available in four types:
Single precision floating point  prefix "s"
Double precision floating point  prefix "d"
Complex single precision         prefix "c"
Complex double precision         prefix "z"

LAPACK is available locally, may be out of date, and on Internet:

LAPACK FAQ
naming conventions
single
double
complex
complex16

LAPACK Users Guide

The Fortran source code for the Linear Algebra Package, LAPACK
are under the LAPACK directory with subdirectories
SRC  BLAS  TESTING  TIMING  INSTALL
And, for g95 users, the Fortran95 interface lapack95.tgz

Or, on Debian or Ubuntu  sudo apt-get install gfortran
and compile and link LAPACK and all my examples.

On CSEE Help WEB pages:  (www.csee.umbc.edu/help/fortran)

LAPACK installation guide (postscript)
LAPACK quick reference (PostScript)
Raw LAPACK directory use lapack.a and blas.a on Linux on Intel
Raw LAPACK/SRC directory
Raw LAPACK/BLAS directory
Raw LAPACK/TESTING directory
Raw LAPACK/TIMING directory
Raw LAPACK/INSTALL directory
lapcak95.tgz

g95 for Linux, tar -xvzf g95-x86-linux.tgz
Self installing executable,g95 for MS Windows

www.csee.umbc.edu/help/fortran

For Java users:
Java Numerics WEB Site
includes an early version of LAPACK translated to Java plus many other
numeric routines.

For Ada users there is an interface binding

There is a learning curve to using LAPACK.
I suggest finding a routine you need.
Copy the comments from the front of that routine into your program.
Create the necessary declarations needed for calling the routine.
Create a Makefile with the compilation and library commands.
You will need lapack.a and blas.a or equivalent.

An example use on our CSEE system is
Makefile_LAPACK
test_eigen2.f90
test_eigen2_f90.out
This is really old code that still works and uses
invrse.for
eigdet.for
matmul.for
evalvec.for
a02ftext.for



### Lecture 15, Multiple precision


When 64-bit floating point is not accurate enough
When 64-bit integers are way too small

"C" has gmp, gnu multiple precision.

Java has a bignum package.
Java has apfloat.jar

Ada has arbitrary decimal digit precision floating point.

Fortran has a multiple precision library.

Python has arbitrary precision integer arithmetic
Python has mpmath and other packages

Hmmm? Multiple precision must be important and have a use.

Computing Pi to a million places is a demonstration, but there
are more reasonable uses.

Types of multiple precision:
Unlimited length integers
Unlimited length fractions  1/integer
Unlimited length rationals  integer/integer
Unlimited length floating point
Arbitrary yet fixed number of digits floating point

for "C" get gmp, GNU Multiple Precision!
gmp.guide

Here are a few simple gmp samples

test_mpf.c
test_mpf.out
test_mpq.c
test_mpq.out
test_mpz.c
test_mpz.out
gmp fact.c
fact_gmp.out

Java BigDecimal that is BigInteger with a scale factor
Big_pi.java test program
Big_pi_java.out test results
test_apfloat.java test program
test_apfloat.out test results
mytest_apfloat.java test program
mytest_apfloat.out test results
Big_math.java various functions
Big_simeq.java simultaneous equations
Big_inverse.java invert matrix

Fortran 95 module that implements big integers
big_integers_module.f90
test_big.f90  test program
test_big_f90.out  test results

Ada 95 precise, rational and digits arithmetic

Python simple factorial(52) is
factorial.py  program
factorial_py.out  output
Python simple 2^200 integer, yet floating point is IEEE 64 bit
power2.py  program
power2_py.out  output
test_mpmath.py  program
test_mpmath_py.out  output

A quick conversion of simeq.c to mpf_simeq.c solves simultaneous
equations with 50 digits, could be many more digits using gmp mpf_t.

Using the very difficult to get accurate answers matrix:

test_mpf_simeq.c
mpf_simeq.c
mpf_simeq.h
test_mpf_simeq.out
test_mpf_simeq_300.out

Can irrational numbers be combined to produce an integer?

It appears that  e^(Pi*sqrt(163)) is the integer 262,537,412,640,768,744
and that is rather amazing to me.

How might this be validated? It has been tried using higher and
higher precision and the value computes to about
262,537,412,640,768,743.999,999,999,999,999,999,999

We know 1.5 base 10 can be represented as  1.1 base 2 exactly.
We know 1/3 can not be represented exactly in a finite number of
decimal digits or a finite number of binary digits
1/3 = 0.3333333333333333333333333333333  decimal approximately
1/3 = 0.0101010101010101010101010101010  binary approximately
And, for what it is worth
We know 1/3 can be represented exactly as  0.1 base 3.

A quick "C" program gives the first 14 digits, using IEEE 64-bit floating point

/* irrational.c  e^(Pi*sqrt(163)) is an integer?  */
#include <math.h>
#include <stdio.h>

#define e  2.7182818284590452353602874713526624977572
#define Pi 3.1415926535897932384626433832795028841971

int main(int argc, char * argv[])
{
printf("e^(Pi*sqrt(163)) is an integer? \n");
printf("  262537412640768744.0000000  check \n");
printf("%28.7f  using 15 digit arithmetic \n",pow(e,Pi*sqrt(163.0)));
return 0;
}
With output:
e^(Pi*sqrt(163)) is an integer?
262537412640768744.0000000  check
262537412640767712.0000000  using 15 digit arithmetic
262537412640768743.99999999999925007259719818568887935385633733699 using 300 digit

What is needed is to show convergence from above and below,
and it would be nice if the convergence was uniform. This
could use 50, 100, 150, 200 digit precision for the computation.
Pick a number of digits. Increment the bottom digit of e, Pi and 163
then do the computation. Decrement the bottom digit of e, Pi and 163
then do the computation. Check if the upper and lower values of
the fraction are converging toward zero. Check if the convergence
is uniform, balanced, for the upper and lower values.

This is left as an exercise to the student.

Computing Pi to arbitrary precision

One method of computing Pi is to compute 4*atan(1) or
24*atan(b2)  b2=(2*sqrt(b1)-1)/b1  b1=2-sqrt(3)
get_pi.c
get_pi.out
get_pi.c using double and math.h atan
pi=24*atan(b2) b2=0.131652, b1=0.267949
Pi= 3.1415926535897932384626433832795028841971
get_pi= 3.14159265358980333
pi-Pi= 0.00000000000001021

get_mpf_pi.c
get_mpf_pi.out
get_mpf_pi.c using digits=50
b1=    0.26794919243112270647255365849412763305719474618962
b2=    0.13165249758739585347152645740971710359281410222324
mpf_pi= 3.1415926535897932384626433832795028841971693999457
Pi= 3.1415926535897932384626433832795028841971

Now you can do Homework 4

Using gmp to solve higher power methods

For reference in later lectures on PDE, when "double" is
just not accurate enough to allow higher power methods:

Solving a relatively simple partial differential equation

du/dx + du/dy = 0

using a uniform grid on rectangle 0 <= X <= 2Pi, 0 <= Y <= 2Pi
with boundary values (and analytic solution)

u(x,y) = sin(x-y)

Due to the symmetry of the problem, the number of boundary
points in X and Y can not be the same. If nx = ny then
the solution of simultaneous equations becomes unsolvable
because of a singular matrix.

Solutions above nx or ny equal 13 cause unstable values
for the derivative coefficients with standard nderiv
computation. Actually, integer overflow occurs first.

By using gmp with high precision integer, mpz, and
high precision floating point, mpf, then high
accuracy solutions can be computed.

Setting "digits" at 200, "bits" becomes 200*3.32 = 664.
This is used for computing boundary values and the
numeric checking against the analytic solution.

The results are:
nx  ny  maxerror
13  12  1.40e-4     same as nx=12, ny=13, they can not be equal
15  14  4.38e-6
17  16  1.02e-7
19  18  1.84e-9
21  20  2.70e-11
23  22  3.23e-13
25  24  3.23e-15
31  30  1.25e-21
33  32  6.86e-24
35  34  3.33e-26
37  36  1.44e-28

37  36  8.59e-18  0  to  4Pi
37  36  3.40e-7   0  to  8Pi

The basic "C" code with lots of printout is:
source code pde2sin_eq.c
output pde2sin_eq_c.out
source code pde2sin_sparse.c
output pde2sin_sparse_c.out

The same programs, converting "double" to gmp "mpf_t"
and calling gmp versions of functions is:
source code pde2sin_eq_gmp.c
output pde2sin_eq_gmp.out
source code pde2sin_sparse_gmp.c
output pde2sin_sparse_gmp.out
gmp function mpf_deriv.h
gmp function mpf_deriv.c
gmp function mpf_simeq.h
gmp function mpf_simeq.c
gmp function mpf_sparse.h
gmp function mpf_sparse.c

Then, using non uniform derivative (on same uniform grid):
source code pde2sin_nueq_gmp.c
output pde2sin_nueq_gmp.out
source code pde2sin8_nueq_gmp.c
output pde2sin8_nueq_gmp.out
source code pde4sin_nueqsp.c
output pde4sin_nueqsp.out
gmp function mpf_nuderiv.h
gmp function mpf_nuderiv.c
gmp function mpf_inverse.h
gmp function mpf_inverse.c
gmp function mpf_simeq.h
gmp function mpf_simeq.c
gmp function mpf_sparse.h
gmp function mpf_sparse.c



### Lecture 16, Finding Roots and Nonlinear Equations


Finding Roots

First, a special case, the "roots of unity"
x  +1=0  root is x=-1
x^2+1=0  roots are x=i  x=-i  i=sqrt(-1)
x^3+1=0  roots are x=-1 x=1/2 +i sqrt(3)/2 x=1/2 -i sqrt(3)/2
x^4+1=0  roots are x=+sqrt(2)/2 +i sqrt(2)/2
(all four combinations of +/-)

Note that all roots are on the unit circle in the complex plane.

Consider the Maple root finding for x^12+1=0 :

root12g.html

In general the roots of a polynomial are complex numbers.
A general purpose root finder for polynomials is difficult to
develop. Fortunately, some very smart people have written the
function, cpoly, that can take a general polynomial with complex
coefficients and find the complex roots. It also works with
real coefficients and real roots.

MatLab root finding for polynomials is just r = roots(p)
Where p is the vector of polynomial coefficients and
r is the vector of roots. Both r and p may be complex.

Finding roots of a polynomial seems easy:

given  c_n x^n + c_n-1 x^n-1 + ... _ c_1 x + c_0 = 0

find a value, x=r, that satisfies the equation,
divide the polynomial by  (x-r) and thus have a
polynomial of one degree lower. Repeat. Done.

Unfortunately there are a lot of pitfalls, solved by the codes below.
Standard techniques include dividing through by c_n,
if c_0 is zero, there is a root equal zero, divide through by x,
if highest power is 2, directly solve the quadratic equation.

The Fortran code, including the test program (driver) is
419.for in Fortran IV, compiles with g95
419_f.out is the output of many test cases

The Java code, including the test program (driver) is
c419.java in standard Java
c419.out is the output of many test cases

The Ada code, including the test program (driver) is

To understand how some iterative methods can converge quickly,
consider how to compute sqrt when you have no math library:
Basically y=sqrt(x)  guess y, y_next = (x/y+y)/2, repeat.
Do not guess zero for y, x/y does not work, use y = 1 as guess.

sqrt.c
sqrt_c.out

A professional implementation of sqrt would reduce the input, x,
to a range from 0.25 to 1.0 by dividing by an even power of 2.
The sqrt(x^2n) is just x^n. The square root of a product is
the product of the square root of the factors. This can be easy
with IEEE floating point numbers because they are stored with
an exponent of 2.

So, since the sqrt iteration was basically Newtons method,
why not just use Newtons method to find roots?
here is how it works, even for complex roots.

f(x) = 0
guess x
x_next = x - f(x)/f'(x)   f'(x) is the derivative of f(x)
repeat until |x_next-x| < epsilon

newton.c
newton_c.out

Notice the oscillation in the last few steps.
The problem can come from a bad guess that causes a big
oscillation. The problem can come from hitting an x
where f'(x) is zero. Thus, there have been many workarounds.
"cpoly" is one of the most general solutions.

Nonlinear Equations

Other examples: same equation with different constant.

First, expect x=1,  we happen to have at least two possible solutions:
simple_nl.c
simple_nl_c.out

Second, expect x=2, have convergence to expected solution:
simple2_nl.c
simple2_nl_c.out

Systems of Nonlinear Equations

Three equations in three unknowns, not direct convergence.
Experiments show great difference with widely different
initial conditions. Two sets of equations with a choice of
initial condition. x1, x2, x3 are the variables.
equation_nl.c
equation_nl_c.out
inverse.c used by code above

In other languages:
equation_nl.f90
equation_nl_f90.out
inverse.f90
equation_nl.java
equation_nl_java.out
inverse.java

A system of non linear equations with powers 1, 2, 3, -1 and -2.
This example has a lot of debug print and several checks.
Added is also, variable control of fctr, a multiplier on
how much correction is to be made each iteration.
The problem and method are described as comments in the code.

Same example as above, in various languages, with debug removed:
equation2_nl.f90
equation2_nl_f90.out
equation2_nl.c
equation2_nl_c.out
equation2_nl.java
equation2_nl_java.out

A polynomial nonlinear example that converges with no fctr control.
equation4_nl.c
equation4_nl_c.out



### Lecture 17, Optimization, finding minima

A difficult numerical problem is the finding of a global minima
of an unknown function. This type of problem is called an
optimization problem because the global minima is typically
the optimum solution.

In general we are given a numerically computable function of
some number of parameters  v = f(x_1, x_2, ... , x_n)
and must find the values of x_1, x_2, ... , x_n that gives
the smallest value of v. Or, by taking the absolute value,
find the values of x_1, x_2, ... , x_n that give the value
of v closest to zero.

Generally the problem is bounded and there are given maximum and
minimum values for each parameter. There are typically many
places where local minima exists. Thus, the general solution
must include a global search then a local search to find the
local minima. There is no general guaranteed optimal solution.

First consider a case of only one variable on a non differentiable
function, y = f(x) where x has bounds xmin and xmax. There may be
many local minima, valleys that are not the deepest.

*     *       *      *         *
*   * *     * *    *  *      * *   *
* *   *   *   *  *  * *    *   * *
*     * *     **      *  *     *
*                *
*

Consider evaluating f(x) at some initial point x0 and x0+dx.
If f(x0) < f(x0+dx) you might move to x0-dx.
if f(x0) > f(x0+dx) you might move to x0+dx+dx.
The above may be very bad choices!

Here are the cases you should consider:
Compute  yl=f(x-dx)  y=f(x)  yh=f(x+dx)  for some dx

yh      y            yh       y      yl         yl
y           yh    yl         yl             yh       y
yl        yl            y            yh      y            yh

case 1    case 2     case 3     case 4    case 5     case6

For your next three points, always keep best x:
case 1 x=x-dx                possibly dx=2*dx
case 2 x=x-dx  dx=dx/2
case 3 dx=dx/2
case 4 x=x+dx  dx=dx/2
case 5 dx=dx/2
case 6 x=x+dx                possibly dx=2*dx
Then loop.

A simple version in Python is
optm.py
optm_py.out
optm.py running
minimum (x-5)**4 in -10 to +10 initial 2, 0.001, 0.001, 200
xbest= 5.007 , fval= 2.401e-09 ,cnt= 21

minimum (x-5)**4 in -10 to +10 initial 2 using default
xbest= 4.998271 , fval= 8.9367574925e-12 ,cnt= 35

Another version, found on the web, slightly modified
min_search.py
min_search_py.out
min of (x-5)**4, h= 0.1
xmin= 4.9984375
fmin= 5.96046447754e-12
n= 86 , sx= 4.996875 , xx= 5.003125

min of (x-5)**4, h= 0.01   # smaller initial h, not as good
xmin= 4.9975
fmin= 3.90625000037e-11
n= 704 , sx= 4.995 , xx= 5.005

min of (x-5)**4, h= 2.0    # smaller tolerance, better
xmin= 4.99993896484
fmin= 1.38777878078e-17
n= 47 , sx= 4.99987792969 , xx= 5.00012207031

Beware local minima
Both of the above, will find the best of a local minima.

There could be a local minima, thus when dx gets small enough,
remember the best x and use another global search value to
look for a better optimum. Some heuristics may be needed to
increase dx. This is one of many possible algorithms.

Another algorithm that is useful for large areas in two
dimensions for z=f(x,y) is:
Use a small dx and dy to evaluate a preferred direction.
Use an expanding search, doubling dx and dy until no more
progress is made. Then use a contracting search, halving dx and dy
to find the local minima on that direction.
Repeat until the dx and dy are small enough.
The numbers indicate a possible order of evaluation of the points
(in one dimension).

1
2

3

5
6
7
4
8
9

The pseudo derivatives are used to find the preferred direction:
(After finding the best case from above, make positive dx and dy best.)
z=f(x,y)
zx=f(x+dx,y)                    zx < z
zy=f(x,y+dy)                    zy < z
r=sqrt(((z-zx)^2+(z-zy)^2))
dx=dx*(z-zx)/r
dy=dy*(z-zy)/r

This method has worked well on the spiral trough.

The really tough problems have many discontinuities.
I demonstrated a function that was everywhere discontinuous.
The function was f(x)=x^2-1 with f(x)=1 if the bottom bit
of x^2-1 is a one.
discontinuous.c first function
discontinuous.out

Another, recursive, function that is continuous, yet in the
limit of recursing, nowhere differentiable, is:
double f(x){  /* needs a termination condition, e.g count=40 */
if(x<=1.0/3.0) return (3.0/4.0)*f(3.0*x);
else if(x>=2.0/3.0)  return (1.0/4.0)+(3.0/4.0)*f(3.0*x-2.0);
else  /* 1/3 < x < 2/3 */
return (1.0/4.0)+(1.0/2.0)*f(2.0-3.0*x);}
discontinuous.c second function
discontinuous.out

In general, it will not work to use derivatives, or even pseudo

A sample program that works for some functions of three
floating point parameters is shown below. Then, a more general
program with a variable number of parameters is presented
with a companion crude global search program.

Three parameter optimization:

optm3.h
optm3.c
test_optm3.c
test_optm3_c.out

N parameter optimization in C:
This includes a global search routine  srchn.

optmn.h
optmn.c
test_optmn.c
test_optmn_c.out

optmn.java
test_optmn.java
test_optmn_java.out

In MatLab use "fminsearch" see the help file.
Each search is from one staring point.
You need nested loops to try many starting points.
I got error warnings that I ignored, OK to leave them in your output.
optm.m

An interesting test case is a spiral trough:

Possibly a better view:

test_spiral.f90
test_spiral_f90.out
spiral.f90

test_spiral.c
test_spiral_c.out
spiral.h
spiral.c

test_spiral.java
test_spiral_java.out

spiral_trough.py
test_spiral.py
test_spiral_py.out

Your project is a multiple precision minimization



### Lecture 18, FFT Fast Fourier Transform


A basic Fourier transform can convert a function in the time domain
to a function in the frequency domain. For example, consider a sound
wave where the amplitude is varying with time. We can use a discrete
Fourier transform on the sound wave and get the frequency spectrum.

This is short_count.wav  one, two, three
short_count.wav

The basic FFT plotting the magnitude of each bin is:

Using fftshift to center the zero frequency:

Then taking the log of the amplitude:

Of course there is an inverse Fourier transform that converts
the frequency spectrum back to the time domain.

The spectrum of  F(t)=sin(2 Pi f t) t=0..1 is a single frequency f.
The discrete transform assumes the function is repeated for all t.
The amplitude values must be sampled at equal time increments for
the spectrum to be valid.

It turns out that a signal an some frequency has an amplitude and
a phase, or an in-phase and quadrature value. The convenient
implementation is to consider these values as complex numbers.
It turns out that some input can be collected as complex values and
thus most implementations of the FFT take an array of complex numbers
as input and produce the same size array of complex numbers as output.
For technical reasons, a simple FFT requires the size of the input
and output must be a power of 2. Special versions of the FFT as found in
FFTW the fastest Fourier transform in the west
handle arbitrary sizes. We just pad out our input data with zeros to make
the size of the array a power of 2.

Fourier transforms have many other uses such as image filtering and
pattern matching, convolution and understanding modulation.

A few, non fast, programs that compute Fourier Series coefficients for
unequally spaced samples, read from a file, are:
fourier.c
fourier.java
sin(2x) data sin2.dat
test on sine2.dat fourier_java.out

More detail of the basic Fourier transforms and series,
both continuous and discrete is fourier.pdf

square_wave_gen.c using series
square_wave_gen_c.out cos and sin

There are many many implementations of the FFT, the n log_2 n
complexity method of computing the discrete Fourier transform.
Two of the many methods are shown below.
One method has the coefficients precomputed in the source code.
A second method computes the coefficients as they are needed.

Precomputed constants for each size FFT

Precomputed constants of each size inverse FFT

fftc.h

A more general program that can compute the fft and inverse fft for
an n that is a power of two is:
fft.h
fft.c
fftest.c
fftin.dat  binary data
fftin.out  binary data

A more general program for the FFT of large data in Java is:
Cxfft.java
these may be combined for use in HW5.
fft_wav.java transform and inverse
fft_frame.java with a very crude frequency plot

For Matlab, fft_wav.m   may be useful for HW5
fft_wav.m transform and inverse, frequency plot

Another version in "C" and Java, the Java demonstrates convolution
fftd.h
fftd.c
test_fft.c
test_fft_c.out
Cxfft.java
TestCxfft.java
TestCxfft.out

testfft.py
testfft.out

MatLab has fft as well as almost every other function
test_fft.m
test_fft_m.out
test_fft2.m
test_fft2_m.out

In order to get better accuracy on predicted coefficients
for square wave, triangle wave and saw tooth wave, 1024 points.
The series are shown as comments in the code.
Also shown, is the anti aliasing and normalization needed to
get the coefficients for the Fourier series.
test_fft_big.c
test_fft_big_c.out

Here is a program that just reads a .wav file and writes a copy,
and does a little printing. This with extensions can be the basis
of Homework 5. I can only provide a "C" version. You may translate
to suit your desires. These only work for single channel, 8 bit
per sample PCM .wav files. (not all the .wav files listed below)

train1.wav
short_count.wav
rocky4.wav
cat.wav

Splitting out read and write wav:
rw_wav.h
rw_wav.c
test_rw_wav.c
test_rw_wav.out

And here are three .wav files from very small to larger for testing:
ok.wav
train.wav
roll.wav
short_count.wav

Suppose you wanted to compute a sound. Here is a generator for
a simple sine wave. It sounds cruddy.
sine_wav.c
sine_wav.out
sine.wav

For the homework, one example using a 64 point FFT and just doing
the transform and inverse transform, essentially no change, is
fft1_wav.c
fft1_wav.out
trainf.wav
rockey2.wav
count_out.wav

trainf2.wav experiment

Hopefully you can find some more interesting .wav files.

P.S. When using MediaPlayer or QuickTime be sure to close the
file before trying to rewrite it.
Use  file:///  path to your directory /xxx.wav

plays .wav and other files. The driver program is  ClipPlayerTest.java

In Python, with the WX graphics available, play .wav files with
sound.py
or on linux
linux_sound.py
rocky4.wav

In MatLab play  .wav  files with
waveplay.m
short_count.wav

Im MatLab play .dat files that have a value for each
sample.
soundplay.m
short_count.dat

For students running Ubuntu, this sequence of "C" programs
demonstrate direct use of playing sampled amplitude sound.
These do not require generating a .wav or other type sound file.
pcm_min.c
pcm_sin.c
pcm_dat.c
Makefile_sound
short_count.dat
long_count.dat
You may modify any of these to suit your needs.

An interactive WEB site with with a few functions and their spectrum is
heliso.tripod.com/java_hls/gccs/spectrum1.htm

Here is Rocky speaking and the dynamic spectrum
rocky4.wav
This needs modifications for 2 channel and 16 bits per sample.

Now you can do homework 5

optional plots of FFT's using Python and C with gnuplot
plot_fft_py.html
plot_fft_c.html



### Lecture 18a, Digital Filtering


Digital Filtering uses numerical computation rather than analog
components such as resistors, capacitors and inductors to
filter out frequency bands.

A low-pass filter will have a "cutoff frequency" f  such that
frequencies above f will be attenuated and frequencies below f
will be passed. The filter does not produce a sharp dividing
line and all frequencies are changed in both amplitude and
phase angle.

A high-pass filter will have a "cutoff frequency" f such that
frequencies below f will be attenuated and frequencies above f
will be passed. The filter does not produce a sharp dividing
line and all frequencies are changed in both amplitude and
phase angle.

A band-pass filter will pass frequencies between f1 and f2,
attenuating frequencies below f1 and attenuating frequencies
above f2. f1 <= f2.

The signal  v(t)=sin(2 Pi f t) t=t0, t1, ..tn is a single frequency f.
For digital filtering we assume the digital value of v(ti) is
sampled at uniformly spaced times t0, t1,..., tn.

For an order m filter, with one based subscripts:

y(i) = ( a(1)*x(i) + a(2)*x(i-1) + a(3)*x(i-2) + ... + a(m+1)*x(i-m)
- b(2)*y(i-1) - b(3)*y(i-2) - ... - b(m+1)*y(i-m) )/b(1)

With array x and y in memory, the MatLab code for computing y(1:n) could be:

for i=1:n
y(i)=a(1)*x(i);
for j=1:m
if j>=i
break
end
y(i)=y(i) + a(j+1)*x(i-j) - b(j+1)*y(i-j);
end
y(i)=y(i)/b(1);  % not needed if b(1) equals 1.0
end

or, use MatLab   y = filter(a, b, x); % less typing

For an order m filter, with zero based subscripts:

y[i] = ( a[0]*x(i) + a[1]*x[i-1] + a[2]*x[i-2] + ... + a[m]*x[i-m]
- b[1]*y[i-1] - b[2]*y[i-2] + ... - b[m]*y[i-m] )/b[0]

With array x and y in memory, the C code for computing y[0] to y[n-1] could be:

for(i=0; i<n; i++) {
y[i]=a[0]*x[i];
for(j=1; j<=m; j++)
{
if(j>i) break;
y[i]=y[i] + a[j]*x[i-j] - b[j]*y[i-j];
}
y[i]=y[i]/b[0];  /* not needed if b[0]==1 */
}

For reading x samples and writing filtered y values:

read next x sample and compute y and output y
(the oldest x and y in RAM are deleted and the new x and y saved.
Typically use a circular buffer [ring buffer] so that the saved x and y
values do not have to be moved [many uses of modulo in this code].)

Given a periodic input, filters require a number of samples to build up
to a steady state value. There will be amplitude change and phase change.

Typical symbol definitions related to digital filters include:

ω = 2 Π f

z = ej ω t  digital frequency

s = j ω    analog frequency

z-1 is a unit delay (previous sample)

IIR Infinite Impulse Response filter or just recursive filter.

In transfer function form

yi     a0 + a1 z-1 + a2 z-2 + a3 z-3 + ...
---- = -----------------------------------
xi          b1 z-1 + b2 z-2 + b3 z-3 + ...

s = c (1-z-1)/(1+z-1)  bilinear transform

db = 10 log10 (Power_out/Power_in)  decibel for power

db = 20 log10 (Voltage_out/Voltage_in)  decibel for amplitude

(note that db is based on a ratio, thus amplitude ratio may
be either voltage or current. Power may be voltage squared,
current squared or voltage times current. More on db at end.)

A Java program is shown that computes the a and b coefficients for
Butterworth and Chebyshev digital filters. Options include Low-Pass,
Band-Pass and High-Pass for order 1 through 16. Click on the sequence:
Design, Poles/Zeros, Frequency response, coefficients  to get output
similar to the screen shots shown below.

The Pole/Zero plot shows the complex z-plane with a unit circle.
The poles, x, and zeros o may be multiple.

The frequency response is for the range 0 to 4000 Hz based on
the 8000 Hz sampling. 0 db is at the top, the bottom is the
user selected range, -100 db is the default.

The coefficients are given with zero based subscripts.
(These subscripts may be used directly in C, C++, Java, etc.
add one to the subscript for Fortran, MatLab and diagrams above.)

Note that MatLab has 'wavread' and 'wavwrite' functions as well
as a 'sound' function to play  .wav  files. 'wavwrite' writes
samples at 8000 Hz using the default.

The files to compile and run the Digital Filter coefficients are:
IIRFilterDesign.java
IIRFilter.java
PoleZeroPlot.java
GraphPlot.java
make_filter.bat
Makefile_filter
mydesign.out

This is enough for you to be able to program and use simple
digital filters. This lecture just scratched the surface.
There are many types of digital filters with many variations.
There are complete textbooks on just the subject of digital filters.

The output from the low pass filter shown above "builds up and settles"
quickly:

A fourth order band pass filter with band 2000 to 2100 requires
more samples for the center frequency 2050 to build up and settle:

Some notes on db
+3 db is twice the power
-3 db is half the power

+10 db is ten times the power
-10 db is one tenth the power

+30 db is 1000 times the power, 10^3
-30 db is 1/1000 of the power, 10^-3

+60 db is one million times the power, 10^6
-60 db is very small

Sound is measured to average human hearing threshold,
technically 20 micropascals of pressure by definition 0 db.
Some approximate examples for sound from a close source:

0 db  threshold of hearing
20 db  whisper
60 db  normal conversation
80 db  vacuum cleaner
90 db  lawn mover
110 db  front row at rock concert
130 db  threshold of pain
140 db  military jet takeoff, gun shot, fire cracker
180 db  damage to structures
194 db  shock waves begin, not normal sound

The sound drops off with distance by the inverse square law.
Each time you move twice as far away from the source,
the sound is 1/4 as strong.

The effective loudness decreases with very short burst of sound.
160 db for 1/100 second is about a loud as 140 db for 1 second.



### Lecture 18b, Molecular frequency response


Many examples of spectra of molecules are presented on the
following pages. The spectra can be used to identify the
presents of specific molecules. Note that the same chemical
designation may result in more than one molecular structure
and the resulting spectra may be different.

Basic physics:
c = velocity of light = about 300,000,000 meters per second
f = frequency in Hz, Kilo KHz = 1000 Hz, Mega Mhz = 106Hz
Giga GHz = 109Hz Tera THz = 1012Hz
λ = wavelength in meters, μ = 10-6 meters = 1000 nano meters

c = λ * f
λ = c / f
f = c / λ

Example: wavelength  1 μ has frequency, f, computed by
f =  300*106 / 10-6 = 300*1012Hz = 300 THz

The "visible" spectrum, that which can be seen by average people,
is roughly given by wavelength in nm = nanometer or 10-9

The following pages show the spectrum of many molecules.

The Infrared spectrum has wavelength μ in micrometers, 10-6 meters

The notation Frequency, cm-1 must be multiplied by 30*106
to get Hz.

The Nmr, Nuclear magnetic resonance, data is in Nmr units and Hz.

molecular.pdf Molecular frequency response



### Lecture 19, Review 2


Go over WEB pages Lecture 11 through 18.
Open book, open notes, no study guide or copy.
Follow the instructions, else lose points.
The study guide is not allowed during the exam.

Some things you should know:

The roots of a polynomial with real coefficients can be complex.

Some languages have "complex" as a built in type (e.g. Fortran).
Many languages have complex arithmetic available: Ada, Fortran,
MatLab, Mathematica, Python.

Some languages provide standard packages for "complex" (e.g. Ada and C++)

Some languages require you find or make your own "complex" (e.g. C)

All elementary functions are defined and computable for complex arguments.

All complex elementary functions may be computed with only real functions.

Example: sin(z)=sin(re z)*cosh(im z) + i cos(re z)*sinh(im z)

Many languages have either built in or available complex elementary
functions, e.g. Ada, C++, Fortran, Matlab, Mathematica, Java, Python
and some others.

Some complex elementary functions are difficult to compute accurately.

Some elementary functions have a branch cut in the complex plane.

The branch cut is often along the negative real axis.

The derivative is not defined across the branch cut.

Eigenvalues must satisfy several definitions.

There are exactly n eigenvalues for every n by n matrix.

The eigenvalues of a non singular matrix of real elements may be complex.

The determinant of a matrix with a variable subtracted from each
diagonal element yields the characteristic equation of the matrix.
e.g  det|A - eI| = c_n*e^n + ... _ c_1*e + c_0

The roots of the characteristic equation are eigenvalues.

det|A - eI|=0 is one definition of eigenvalues.

Given a polynomial equation, the coefficients of the polynomial may
be placed in a matrix such that the eigenvalues of the matrix are
the roots of the polynomial.

An eigenvector exists for every eigenvalue. Av = ev

It is always possible to find a set of orthogonal eigenvectors
that form the basis of an n dimensional space for an n by n matrix.

Eigenvectors may be complex when the matrix elements and the
eigenvalues are real.

The eigenvectors of a matrix of all ones are usually chosen as the
unit vectors [1 0 ... 0] [0 1 0 ... 0] ... [0 ... 0 1]

The equation  A v = e v must be satisfied for every eigenvalue, e,
and its corresponding eigenvector, v. |v| non zero.

There are many possible test for a program that is supposed to compute
eigenvalues and eigenvectors.

MatLab may not produce unique eigenvectors using [v e]=eig(a)

MatLab may not produce eigenvectors that are mutually orthogonal.

LAPACK is a source of many numerical algorithms implemented in Fortran.

LAPACK code may be converted to any other language.

LAPACK object files may be used with most languages via some interface.

LAPACK source code needs the BLAS, Basic Linear Algebra Subroutines.

The BLAS may be available, optimized, for your computer.

LAPACK is available with files to compile and install on
many computers from  www.netlib.org/lapack

TOMS is numerical source code from the ACM Transactions on Mathematical
software and may be found on www.netlib.org/toms

Some problems require more than double precision floating point.

"gmp" is the gnu multiple precision library of "C" functions.

gmp is available from www.swox.com/gmp or other sites.

gmp provides multiple precision integer, rational and floating point.

Any language can implement multi-precision, Java has it available,
other languages use libraries.

52! has too many significant digits to exactly represent in double
precision floating point.

Solving large systems of equation may require multi-precision in order
to get reasonable results.

A simple example of the need for complex roots is x^2+1 = 0
with roots i and -i .

The roots of high order polynomials may be difficult to compute
accurately, e.g. x^16 + 5 = 0 .

There is no algorithm that can guarantee finding all the
roots of every polynomial to a specified accuracy.

Newtons method works most of the time for finding complex roots.
z_next = z - P(z)/P'(z)
(Using a different initial guess usually fixes the problem.)

Optimization is finding the values of independent variables such
that a function is minimized.

Optimizing the absolute value of a function finds the values of
the independent variables that make the function closest to zero.

There is no algorithm that can guarantee finding the optimum
value of every function to a required accuracy.

There are many methods of implementing code to find local minima.
The code varies depending on the assumption of a differentiable
function and on the assumption of no local minima.

The spiral trough is one test case that can be used to test
optimization code.

A Fourier transform of a finite set of data assumes that the
data is repeated infinitely many times before and after the data.

A FFT, Fast Fourier Transform implements a discrete Fourier transform
using order n log n computations for an n point transform. Typically
n must be a power of 2.

The FFT of real data may be complex.

The inverse FFT of the result of and FFT will be reasonably close
to the original data.

The Discrete Fourier transform and FFT of data sampled at time
steps dt will be the frequency spectrum with highest unaliased
frequency 1/(2*dt), the Nyquist frequency.

Digital Filters can be low-pass, high-pass, or band-pass.
A band-pass filter requires two frequencies.
The cutoff frequency is typically 3db down and
the amplitude decreases faster for higher order filters.

Digital Filters can be described by a set of "a" coefficients
for the numerator and "b" coefficients for the denominator.

Sound power is measured in db, decibel. 0 db for minimum
threshold of hearing. db increases as 10 log10(new_power/old_power)
60 db is 100 times the power of 40 db.

Sound can hurt, 130 db, and greater values may damage buildings.

New applications are happening. Not on the exam,
yet may be of interest:
FFT used on light



### Lecture 20, Quiz 2


Open book, open note quiz.
No computer, no study guide or copies.
One hour time limit.

Summer class quiz will be after Lecture 21, 22 and 23,
yet lecture 21, 22 and 23 are not covered by the quiz.



### Lecture 21, Benchmarks, time and size

A computer benchmark will typically be some code that is executed
and the running time measured.

A few simple rules about benchmarks:

1) Do not believe or trust any person, any company, any data.

2) Expect the same code to give different times on:
different operating systems,
different compilers,
different computers from various manufacturers
(IBM, Sun, Intel, AMD) even at same clock speed,
different languages, even for line by line translation.

3) If you want to measure optimization, turn it on,
otherwise prevent all optimization.

4) You will probably be using your computers clock to measure time.
Test that the clock is giving valid results for the language
you are using. The constant CLOCKS_PER_SEC in the "C" header
file  time.h  has been found to be wrong.
One manufacturer put a capacitor across the clock circuitry
on a motherboard and all time measurements were half the
correct time. See sample test below.

5) For measuring short times you will need to use the
"double difference method". This method can be used to
measure the time of a single instruction. This method
should be used for any benchmark where one iteration of
the code runs in less than a second. See sample test below.

6) Some methods of measuring time on a computer are only
accurate to one second. Generally run enough iterations of
your code in loops to get a ten second measurement.
Some computers provide a real time clock as accurate as
one microsecond, others one millisecond and some poorer than
a fiftieth of a second.

7) Turn off all networking and stop all software that might run
periodically. If possible, run in single user mode. You want to
measure your code, not a virus checker or operating system.
I once did measurement on a Sun computer running Solaris. It
seemed to slow down periodically. I found that the operating
system periodically checked to see if any disk files needed
to be written.

8) If you are interested in how fast your application might run
on a new computer, find reputable benchmarks that are for
similar applications. I do a lot of numerical computation, thus
all my benchmarks are heavily floating point. You may be
more interested in disk performance or network performance.

9) Do not run all all zero data. Some compilers and very smart and
may precompute your result without running you code.
Be sure to use every result. Compilers do "dead code elimination"
that checks for code where the results are not used and just
does not produce instructions for that "dead code." An "if" test
or printing out the result is typically sufficient. For vectors
and arrays, usually printing out one element is sufficient.

10) It helps to be paranoid. Check that you get the same results
by running n iterations, then 2n iterations. If the time did
not double, you do not have a stable measurement. Run 4n and 8n
and check again. It may not be your benchmark code, it may be
an operating system activity.

11) Do not run a benchmark across midnight. Most computers reset
the seconds to zero at midnight.

12) Keep values of time as a double precision numbers.

13) Given an algorithm where you can predict the time increase
as the size of data increases: e.g. FFT is order  n log2 n,
multiplying a matrix by a matrix is order n^3, expect
non uniform results for some values of n.

Consider the case where all your code and all your data fit
in the level one caches. This will be the fastest.

Consider when you data is much larger than the level one cache
yet fits in the level two cache. You are now measuring the
performance of the level two cache.

Consider when your data fits in RAM but is much larger than
your level two (or three) cache. You are now measuring the speed
of your code running in RAM.

Consider when your data is much larger than your RAM, you are
now running in virtual memory from your disk drive. This will
be very slow and you are measuring disk performance.

In order to check the hardware and software time from your computers
clock, run the following two programs. Translate to the language
of your choice. The concept is to have the program display the number
of seconds every 5 seconds for a minute and a half. You check
the display against a watch with a second hand.

The first code uses "clock()" for process time and the second code
uses "time()" for wall clock time.

/* time_cpu.c    see if  C  time.h   clock() is OK */
/*               in theory, this should be only your processes time */
#include <stdio.h>
#include <time.h>
int main()
{
int i;
double now;
double next;

printf("Should be wall time, 5 second intervals \n");
now = (double)clock()/(double)CLOCKS_PER_SEC;
next = now+5.0;

for(i=0; i<90;)
{
now = (double)clock()/(double)CLOCKS_PER_SEC;
if(now >= next)
{
printf("%d\n",i);
i=i+5;
next=next+5.0;
}
}
return 0;
}

/* time_test.c    see if  C  time.h   time(NULL) is OK */
#include <stdio.h>
#include <time.h>
int main()
{

int i;
double now;
double next;

printf("Should be wall time, 5 second intervals \n");
now = (double)time(NULL);
next = now+5.0;

for(i=0; i<90;)
{
now = (double)time(NULL);
if(now >= next)
{
printf("%d\n",i);
i=i+5;
next=next+5.0;
}
}
return 0;
}

The "Double Difference Method" tries to get accurate measurement
for very small times. The code to time a single floating point
add instruction is shown below. The principal is:

measure time, t1

run a test harness with loops that has everything except the code
that you want to time. Count the number of executions as a check.

measure time, t2

measure time, t3

run exactly the same code from the test harness with only the
feature you want to measure added. Count number of executions.

measure time, t4

check that the number of executions is the same.
check that  t2-t1 was more than 10 seconds

the time for the feature you wanted to measure is

t5 = ((t4 - t3) - (t2 - t1))/ number of executions

basically measured time minus test harness time divided by the
number of executions.

/* time_fadd.c  try to measure time of double  A = A + B;         */
/*              roughly time of one floating point add            */
/*              using double difference and minimum and stability */

#include <time.h>
#include <stdio.h>
#include <math.h>

#define dabs(a) ((a)<0.0?(-(a)):(a))
void do_count(int * count_check, int rep, double * B);

int main(int argc, char * argv[])
{
double t1, t2, t3, t4, tmeas, t_min, t_prev, ts, tavg;
double A, B, Q;
int stable;
int i, j;
int count_check, outer;
int rep, min_rep;

t_min = 10.0;    /* 10.0 seconds typical minimum measurement time */
Q  = 5.0;        /* 5.0 typical approximate percentage stability */
min_rep = 32;    /* minimum of 32 typical */
outer = 100000;  /* some big number */

printf("min time %g seconds, min stability %g percent, outer loop=%d\n",
t_min, Q, outer);

stable = 5; /* max tries */
t_prev = 0.0;
for(rep=min_rep; rep<100000; rep=rep+rep) /* increase until good measure */
{
A = 0.0;
B = 0.1;
t1 = (double)clock()/(double)CLOCKS_PER_SEC;
for(i=0; i<outer; i++) /* outer control loop */
{
count_check = 0;
for(j=0; j<rep; j++)   /* inner control loop */
{
do_count(&count_check, rep, &B);
}
}
t2 = (double)clock()/(double)CLOCKS_PER_SEC;
if(count_check != rep) printf("bad count_check_1 %d \n", count_check);

A = 0.0;
t3 = (double)clock()/(double)CLOCKS_PER_SEC;
for(i=0; i<outer; i++) /* outer measurement loop */
{
count_check = 0;
for(j=0; j<rep; j++)   /* inner measurement loop */
{
do_count(&count_check, rep, &B);
A = A + B;   /* item being measured, approximately FADD time */
}
}
t4 = (double)clock()/(double)CLOCKS_PER_SEC;
if(count_check != rep) printf("bad count_check_2 %d \n", count_check);

tmeas = (t4-t3) - (t2-t1); /* the double difference */
printf("rep=%d, t measured=%g \n", rep, tmeas);

if((t4-t3)<t_min) continue; /* need more rep */

if(t_prev==0.0)
{
printf("tmeas=%g, t_prev=%g, rep=%d \n", tmeas, t_prev, rep);
t_prev = tmeas;
}
else /* compare to previous */
{
printf("tmeas=%g, t_prev=%g, rep=%d \n", tmeas, t_prev, rep);
ts = 2.0*(dabs(tmeas-t_prev)/(tmeas+t_prev));
tavg = 0.5*(tmeas+t_prev);
if(100.0*ts < Q)  break; /* above minimum and stable */
t_prev = tmeas;
}
stable--;
if(stable==0) break;
rep = rep/2; /* hold rep constant */
} /* end loop increasing rep */

/* stable? and over minimum */
if(stable==0) printf("rep=%d  unstable \n", rep);
if(tmeas<t_min) printf("time measured=%g, under minimum \n", tmeas);
printf("raw time=%g, fadd time=%g, rep=%d, stable=%g\% \n\n", tmeas,
(tavg/(double)outer)/(double)rep, rep, ts);
return 0;

/* do_count to prevent dead code elimination  */
void do_count(int * count_check, int rep, double * B)
{
(*count_check)++;
/* could change B but probably don't have to. */
}

You might think that I am obsessed with time. :)
time_cpu.c
time_test.c
time_mp2.c
time_mp2.out
time_mp4.c
time_mp4.out
time_mp8.c
time_mp8_c.out
time_mp12.c
time_mp12_c.out
time_of_day.java
time_of_day.out
fft_time.c
fft_time_a533.out

If you have a 64-bit computer and more than 4GB of RAM, here is a
test you may want to run in order to check that your operating
system and compiler are both 64-bit capable:

big_malloc.c

big_malloc.c running, 10 malloc and free 1GB
about to malloc and free 1GB the 1 time
about to malloc and free 1GB the 2 time
about to malloc and free 1GB the 3 time
about to malloc and free 1GB the 4 time
about to malloc and free 1GB the 5 time
about to malloc and free 1GB the 6 time
about to malloc and free 1GB the 7 time
about to malloc and free 1GB the 8 time
about to malloc and free 1GB the 9 time
about to malloc and free 1GB the 10 time
about to malloc and free 2000000000
about to malloc and free 3000000000
about to malloc and free 4000000000
about to malloc and free 5000000000
about to malloc and free 6000000000
about to malloc and free 7000000000
try calloc on 800000000 items of size 8 6.4GB
successful end big_malloc

A 28,000 by 28,000 matrix of double requires about 6.4GB of RAM.

In general, 64-bit is currently supported for the "C" type long
rather than int. Hopefully this will change as most desktop and
laptop computers become 64-bit capable. A small test of various
types and the size in bytes of the types and objects of various
types is:

big.c and its output (older OS)

big.c  compiled  gcc -m64 -o big big.c (64 bit long)
sizeof(int)=4, sizeof(int1)=4
sizeof(long)=8, sizeof(long1)=8
sizeof(long long)=8, sizeof(llong1)=8
sizeof(float)=4, sizeof(fl1)=4
sizeof(double)=8, sizeof(d1)=8
sizeof(size_t)=8, sizeof(sz1)=8
sizeof(int *)=8, sizeof(p1)=8
n factorial with n of type long
0 ! = 1
1 ! = 1
2 ! = 2
3 ! = 6
4 ! = 24
5 ! = 120
6 ! = 720
7 ! = 5040
8 ! = 40320
9 ! = 362880
10 ! = 3628800
11 ! = 39916800
12 ! = 479001600
13 ! = 6227020800
14 ! = 87178291200
15 ! = 1307674368000
16 ! = 20922789888000
17 ! = 355687428096000
18 ! = 6402373705728000
19 ! = 121645100408832000
20 ! = 2432902008176640000
22 ! = -1250660718674968576
23 ! = 8128291617894825984
24 ! = -7835185981329244160

big12.c and its output (newer OS)
Note: 'sizeof' change to long, needs %ld rather than %d

big12.c  compiled  gcc -o big12 big12.c (64 bit long)
-m64 needed on some older OS
sizeof(sizeof(int))=8, sizeof needs pct ld
sizeof(int)=4, sizeof(int1)=4
sizeof(long)=8, sizeof(long1)=8
sizeof(long long)=8, sizeof(llong1)=8
sizeof(float)=4, sizeof(fl1)=4
sizeof(double)=8, sizeof(d1)=8
sizeof(size_t)=8, sizeof(sz1)=8
sizeof(int *)=8, sizeof(p1)=8
n factorial with n of type long (same)
...
trying to malloc 10GB
malloc returned
stored 10GB of 'a'

Some information on the long history of 64-bit computers:

Other code to find out how big an array can be in your

big_mem.f90 and its output
big_mem.java and its output
big_mem.c and its output
Then, more like big_malloc.c, Ada "new"
output from a MAC big_new_mac.out
output from a AMD big_new_amd.out
Yes, 50,000 by 50,000 8-byte floating point, not sparse.
20GB of RAM in one matrix.

Now do HW6



### Lecture 22, Project Discussion


Project writeup

Did you use a plotting utility to look at the shape of the function?

Did you use a grid search to find candidate staring points?
0.0025 works, yet 0.001 will get you into the global minima well.
The global minimum must be between -3 and -4 by analysis of equation.

Did you refine your grid search to get a set of four surrounding
values greater than the lowest minimum value found so far?

*             ^
*               |             |
|               |     *   *=f(x,y)
|     *         |     |
|     |         |     |
|     |    *    |     |
|     |    |    |     |
|     |  (x,y)  |     |
|     |         |     |
| (x,y-dy)   (x,y+dy) |
|                     |
(x-dx,y)               (x+dx,y)

(Refine means closing in on local xmin=x-dx, xmax=x+dx, ymin=y=dy,
ymax=y+dy as well as reducing the step size dx, dy.)

Given a good candidate staring point, use an available minimization
to find the local minimum value that should be the global minimum value.

To get high accuracy, 100 digits, requires that the function be
evaluated using multiple precision, covered in an earlier lecture.

Typically, if 100 digit accuracy is desired, then all computation
would be performed at 110 to 200 digit accuracy. This is slow yet
not difficult for sin(x) and exp(x) for small values of x.

exp(x)=1 + x + x^2/2! + x^3/3! + x^4/4! + ...

For |x|<1 you need the nth term where n! > 10^110 or n=75

Similarly for sin(x) = x - x^3/3! + x^5/5! - x^7/7! + ...
Similarly for cos(x) = 1 - x^2/2! + x^4/4! - x^6/6! + ...
For sqrt(x) make a guess y=1, then iterate y=(y+x/2)/2
until |y-y_previous| < 10^110

Actually, not quite that easy. You must use range reduction
to get |x|<1 and better if |x| < 1/2.
e.g.  for sin(x): if ((x)<0.0?(-(x)):(x))>2Π then x=x rem 2Π
if x>&Pi then x=x-2Π
if x<&Pi then x=x+2Π
if x>Π/2 then x=Π-x
if x<&Pi/2  then x=Π+x  now ((x)<0.0?(-(x)):(x))<Π/2 1.57
use sin(x)=2 sin(x/2) cos(x/2) on x/2<Π/4  0.78

Several of the implementations to 200 digits, that I have programmed:
mpf_sin_cos.c

Obviously, you find a good staring point with global search and
and then use a reasonable optimization method to get a good
starting  x,y  before going to multiple precision computation.

Project writeup



### Lecture 23, Computing Volume and Area


Some volumes and areas you should already know:

x1,y1
/\
/  \
x3,y3 /___ \ x2,y2

area of a triangle is 1/2 base times height

|  1   1   1 |          | 1   x1  y1 |
area of a triangle is 1/2 det| x1  x2  x3 | = 1/2 det| 1   x2  y2 |
| y1  y2  y3 |          | 1   x3  y3 |

area of a triangle is |(V2 - V1) x (V3 - V1)|  length of cross product

area of a rectangle is base times height
area of a parallelogram is base times height
area of a circle is  Pi r^2

Some variations that you may need some day:

You may not know the area of a five point star inscribed in
a unit circle or the area of an arbitrary closed polygon.

Easy to calculate, using the Sailors Algorithm:

Given the n points (x,y) of a closed polygon where no edges cross,
compute the sum i=1..n  (x_i+1 - x_i) * (y_i+1 + y_i)/2
(The first point is repeated as the n+1 point,
add enough to the y's to make them all positive)

(If some y's are negative, this area is subtracted from
the positive y area.)

The intuition for the Sailors Algorithm, on a shape with only
vertical and horizontal edges is:
a vertical line adds no area
a horizontal line to the right adds area (length times average height)
a horizontal line to the left subtracts area.

The computed area will be positive if an upper outside edge is
listed in counter clockwise order, else negative, take absolute value.

A  sample program is:
poly_area.c
poly_area_maze.out
maze.path
poly_area_star.out
star.path

The maze (8 units wide, 7 units high, area is 35) is:

The star (inscribed in a circle with radius 5, area about 28.06) is:

Volume of a sphere  4/3 Pi r^3
Area of a sphere    4 Pi r^2
Volume of a cone, tetrahedron, any solid with a planar base that goes
to a point  1/3 base area times height

| 1  x1  y1  z1 |
Volume of a tetrahedron 1/6 det | 1  x2  y2  z2 |
| 1  x3  y3  z3 |
| 1  x4  y4  z4 |

using just the x,y,z of the four vertices

The general volume computation:
of a closed surface is a little more complicated.
The sailors algorithm is still the basic idea.

Consider a closed surface covered by triangles. Each triangle is
three points and are coded counter clockwise such that the normal
vector to the triangle points out of the volume.

Make all z coordinates positive.
Compute the average z for a triangle, "height".
Compute the area for a triangle, "base".
Compute the z component of the normal vector, znorm.
The volume of this piece of the surface is
height  times  base  times znorm

If znorm is positive, this triangle is on top and contributes
positive volume.
If znorm is negative, this triangle is on the bottom and
contributes negative volume.
The area in the x-y plane is the area of the triangle times znorm.
A vertical triangle has znorm = 0.
A horizontal triangle has znorm = 1 on top and -1 on bottom.
A triangle tipped 45 degrees has a znorm = cos(45 degrees) = 0.7071

Numerical approximation

A program that uses graphics data files to compute the volume and
area of a closed volume is:
volume_dat2.c

from data:
sphere_div.dat
output for a crude, 32 triangle, sphere is:

status=0, zmax=1, points=18, polys=32

xmin=-1.000000, xmax=1.000000, ymin=-1.000000, ymax=1.000000
zmin=-1.000000, zmax=1.000000
enclosing area= 24, enclosing volume= 8

final total area = 10.4178, total volume = 2.94281

should be          12.56              and  4.189

from data:
sphere_div2.dat
output for a better, 128 triangle, sphere is:

status=0, zmax=1, points=66, polys=128

xmin=-1.000000, xmax=1.000000, ymin=-1.000000, ymax=1.000000
zmin=-1.000000, zmax=1.000000
enclosing area= 24, enclosing volume= 8

final total area = 11.9549, total volume = 3.81773

should be          12.56              and  4.189

from data:
sphere_div3.dat
output for a better, 512 triangle, sphere is:

status=0, zmax=1, points=258, polys=512

xmin=-1.000000, xmax=1.000000, ymin=-1.000000, ymax=1.000000
zmin=-1.000000, zmax=1.000000
enclosing area= 24, enclosing volume= 8

final total area = 12.4082, total volume = 4.0916

should be          12.56              and  4.189

from data:
sphere_div4.dat
output for a good, 2048 triangle, sphere is:

status=0, zmax=1, points=1026, polys=2048

xmin=-1.000000, xmax=1.000000, ymin=-1.000000, ymax=1.000000
zmin=-1.000000, zmax=1.000000
enclosing area= 24, enclosing volume= 8

final total area = 12.5264, total volume = 4.16419

should be          12.56              and  4.189

The area of a perfect sphere of radius 1 is about 12.56
The volume of a perfect sphere of radius 1 is about 4.189

from data:
bball.dat Buckminster Fuller Geodesic
output for small, 90 triangle, sphere is:

status=0, zmax=1, points=42, polys=90

xmin=-0.951060, xmax=0.951060, ymin=-1.000000, ymax=1.000000
zmin=-1.000000, zmax=1.000000
enclosing area= 23.217, enclosing volume= 7.60848

final total area = 13.1894, total volume = 3.94159

should be          12.56              and  4.189

No bull? let us compute the volume of this bull

bull.dat data

status=0, zmax=3177.82, points=6202, polys=12398

xmin=-2060.574463, xmax=1978.578857, ymin=-1580.072998, ymax=1429.878052
zmin=356.500702, zmax=3177.816406
enclosing area= 6.40908e+07, enclosing volume= 3.43006e+10

final total area = 2.07025e+07, total volume = 3.64616e+09

That's a lot of bull!

Seems scaled up by 500 relative to feet, in all three dimensions.
Thus, about 29.16 cubic feet of bull.

Changing volume_dat2.c to volume_ucd.c
Just reading a UCD  .inp  file, rather than a .dat file.

volume_ucd.c
blivet.inp
volume_ucd_c.out

determining surface normal vectors
The blue dot is on the surface, the red dot is the
direction of the normal.

normal_dat.c
blivet.dat
star3.dat



### Lecture 24, Numerical Differentiation


Numerical differentiation is typically considered poor at best.
In general, high order techniques are needed to get reasonable
values.

Similar to numerical integration, the more values of the function
that are used, then the more accuracy can be expected.

Similar to numerical integration, given a function that can be
fit accurately by an  n th order polynomial, an equation for
the derivative exists to give very good accuracy.

Consider a function  f(x) that you can compute but do not know
a symbolic representation. To find the derivatives at a few points,
say 4 for example, and computing only 4 evaluations of  f(x):

f(x2)
*     f(x3)
|       *
f(x1)     |       |
*       |       |
|       |       |
f(x0)     |       |       |
*       |       |       |
|       |       |       |
|       |       |       |
x0      x1      x2      x3    h = x3-x2 = x2-x1 = x1-x0

We use f'(x0) to represent the derivative of f(x) at x0.
We looked up the formulas for computing the first derivative
using four equally spaced points, and write the equations:

f'(x0) = 1/6h (-11 f(x0) +18 f(x1)  -9 f(x2)   2 f(x3) )
f'(x1) = 1/6h ( -2 f(x0)  -3 f(x1)   6 f(x2)  -1 f(x3) )
f'(x2) = 1/6h (  1 f(x0)  -6 f(x1)   3 f(x2)   2 f(x3) )
f'(x3) = 1/6h ( -2 f(x0)  +9 f(x1) -18 f(x2)  11 f(x3) )

The error estimate is  1/h^4 f''''(z) for worse z in x0..x3

Thus, if the four points are fit by a third degree polynomial,
f''''(z), the fourth derivative is always zero and the numerical
derivatives are exact within roundoff error.

To check that the coefficients such as  -11 18 -9 2 are correct,
we symbolically fit the 4 points with a third order polynomial,
then symbolically take the derivatives. Too much work to do by
hand, thus I used Maple to check and found the coefficients correct.

Using 5 evaluations of f(x) provides the five derivatives:

f'(x0) = 1/12h (-25 f(x0)  48 f(x1) -36 f(x2)  16 f(x3)  -3 f(x4) )
f'(x1) = 1/12h ( -3 f(x0) -10 f(x1)  18 f(x2)  -6 f(x3)   1 f(x4) )
f'(x2) = 1/12h (  1 f(x0)  -8 f(x1)   0 f(x2)   8 f(x3)  -1 f(x4) )
f'(x3) = 1/12h ( -1 f(x0)   6 f(x1) -18 f(x2)  10 f(x3)   3 f(x4) )
f'(x4) = 1/12h (  3 f(x0) -16 f(x1)  36 f(x2) -48 f(x3)  25 f(x4) )

The above were typed by hand and could have an error.
Some quick checking: If all function evaluations are the same,
then the function is flat and the derivative must be zero at
all points. Thus, the sum of the coefficients must be zero for
every derivative. Note that the first and last coefficients have
the same values, in reverse order with reverse sign.

We will call the "12" in 1/12h the variable b or bb.
We will call the coefficients "-12", "48", "-36" the 'a' array

Always try to cut-and-paste from the computer printout:

Tabulation to compute derivatives at various points
First through sixth order, various number of points
nderiv.out

The above was generated by the computer programs listed below:

nderiv.c computes these cases.
nderiv.f90 in Fortran 95
nderiv.java in Java
deriv.py in Python
nderiv.py in Python

Note that the coefficients can be generated in your program by
copying the function   deriv  and calling it for a specific
derivative order, number of evaluations and point where derivative
is to be computed. For the second derivative using 4 evaluations
and computing the derivative at point zero,
deriv( 2, 4, 0, a, &bb); returns a={2, -5, 4, -1) and bb=1

thus  f''(x0) = 1/bb h^2 ( a[0] f(x0) + a[1] f(x1) + a[2] f(x2) + a[3] f(x3) )
or    f''(x0) = 1/h^2 (2 f(x0) -5 f(x0+h) +4 f(x0+2h) -1 f(x0+3h) )

Note that integers are returned and should be cast to appropriate
floating point type. Various languages require minor changes in the call.

This computation of derivatives will be used extensively in the
lectures that follow on ordinary and partial differential equations.

For non uniformly spaced x values, the program nuderiv.c
demonstrates how to compute the coefficients of f(x0), f(x1), ...
Note that the function nuderiv must be called in the application because
the coefficients depend on the specific problems x values.

The development for non uniformly spaced derivative coefficients is:
given  x0, x1, x2, x3  non uniformly spaced
find the coefficients of y0, y1, y2, y3
in order to compute the derivatives:
y'(x0) = c00*y0 + c01*y1 + c02*y2 + c03*y3
y'(x1) = c10*y0 + c11*y1 + c12*y2 + c13*y3
y'(x2) = c20*y0 + c21*y1 + c22*y2 + c23*y3
y'(x3) = c30*y0 + c31*y1 + c32*y2 + c33*y3

Method:  fit data to  y(x)    = a + b*x + c*x^2 + d*x^3
y'(x)   =     b   + 2*c*x + 3*d*x^2
y''(x)  =           2*c   + 6*d*x
y'''(x) =                   6*d

with y0, y1, y2 and y3 variables:

y0  y1  y2  y3
form:   | 1  x0  x0^2  x0^3   1   0   0   0 | = a
| 1  x1  x1^2  x1^3   0   1   0   0 | = b
| 1  x2  x2^2  x2^3   0   0   1   0 | = c
| 1  x3  x3^2  x3^3   0   0   0   1 | = d

reduce  | 1   0     0     0  a0  a1  a2  a3 | = a
| 0   1     0     0  b0  b1  b2  b3 | = b
| 0   0     1     0  c0  c1  c2  c3 | = c
| 0   0     0     1  d0  d1  d2  d3 | = d

this is just the inverse

y'(x0) =  b0*y0        + b1*y1        + b2*y2        + b3*y3        +
2*c0*y0*x0   + 2*c1*y1*x0   + 2*c2*y2*x0   + 2*c3*y3*x0   +
3*d0*y0*x0^2 + 3*d1*y1*x0^2 + 3*d2*y2*x0^2 + 3*d3*y3*x0^2

or  c00 = b0 + 2*c0*x0 + 3*d0*x0^2
c01 = b1 + 2*c1*x0 + 3*d1*x0^2
c02 = b2 + 2*c2*x0 + 3*d2*x0^2
c03 = b3 + 2*c3*x0 + 3*d3*x0^2

y'(x0) = c00*y0 + c01*y1 + c02*y2 +c03*y3  where  y0 is f(x0) etc.

y'(x1) =  b0*y0        + b1*y1        + b2*y2        + b3*y3        +
2*c0*y0*x1   + 2*c1*y1*x1   + 2*c2*y2*x1   + 2*c3*y3*x1   +
3*d0*y0*x1^2 + 3*d1*y1*x1^2 + 3*d2*y2*x1^2 + 3*d3*y3*x1^2

or  c10 = b0 + 2*c0*x1 + 3*d0*x1^2
c11 = b1 + 2*c1*x1 + 3*d1*x1^2
c12 = b2 + 2*c2*x1 + 3*d2*x1^2
c13 = b3 + 2*c3*x1 + 3*d3*x1^2

cij = bj + 2*cj*xi + 3*dj*xi^2

y'(x1) = c10*y0 + c11*y1 + c12*y2 +c13*y3

y''(x0) = 2*c0*y0    + 2*c1*y1    + 2*c2*y2    + 2*c3*y3    +
6*d0*y0*x0 + 6*d1*y1*x0 + 6*d2*y2*x0 + 6*d3*y3*x0

or  c00 = 2*c0 + 6*d0*x0
c01 = 2*c1 + 6*d1*x0
c02 = 2*c2 + 6*d2*x0
c03 = 2*c3 + 6*d3*x0

y'''(x0) = 6*d0*y0 + 6*d1*y1 + 6*d2*y2 + 6*d3*y3

or  c00 = 6*d0    independent of x
c01 = 6*d1
c02 = 6*d2
c03 = 6*d3

The function prototype is

nuderiv.h
void nuderiv(int order, int npoint, int point, double x[], double c[])

where order is the order of the desired derivative
where npoint is the desired number of points (length of x[] and c[] arrays)
where point is desired x index for the derivative
where x[] is input  x0, x1, ...
where c[] is output  c_point_0, c_point_1, ...

For reasonably smooth functions, using larger numbers of points
allows greater spacing between points and thus fewer equations in
the system of equations to be solved.

Intermediate points may then be found by multidimensional interpolation.
See Lecture 4 for method.

nuderiv.c
nuderiv.h
nuderiv_test.c
nuderiv_test.out

nuderiv.f90
nuderiv_test.f90
nuderiv_test_f90.out
inverse.f90

nuderiv.java
nuderiv_test.java
nuderiv_test_java.out

some definitions and explanation of formulas

A test application of nuderiv.c on a 2D PDE that should be exact is
pdenu22_eq.c
pdenu22_eq.out
A test application of nuderiv.adb on a 2D PDE that should be exact is
A test application of nuderiv.java on a 2D PDE that should be exact is
pdenu22_eq.java
pdenu22_eq_java.out
A test application of nuderiv.f90 on a 2D PDE that should be exact is
pdenu22_eq.f90
pdenu22_eq_f90.out

A non perfect application to a smooth function exp(x*y)*sin(x+y)
pdenu_eq.c
pdenu_eq.out
pdenu_eq.java
pdenu_eq_java.out
pdenu_eq.f90
pdenu_eq_f90.out

A test application of nuderiv.c on a 3D PDE that should be close is
Note how each term of the differential equation is processed.
This case used 1.0, 2.0, ... for coefficients yet any expression
in x,y,z could be used. A term such as u(x,y,z) * du(x,y,z)/dx
would create a non linear system of equations that would be very
difficult to solve.
pdenu23_eq.c
pdenu23_eq.out

A test application of nuderiv.adb on a 3D PDE that should be close is

A test application of nuderiv.f90 on a 3D PDE that should be close is
pdenu23_eq.f90
pdenu23_eq_f90.out

The test code was generated with Maple:
Start by choosing a solution, u(x,y,z) any function of x,y,z.
Symbolically compute the derivatives.
c(x,y,z) is computed as the RHS (right hand side) of the
chosen differential equation, dx(x,y,z) = du(x,y,z)/dx etc.



### Lecture 24a, Computing partial derivatives


Extending derivative computation to partial derivatives uses
the previous work on computing derivatives. Here we consider
the Cartesian coordinate system in 2, 3, and 4 dimensions,
computing second, third and fourth partial derivatives

The algorithms and some code shown below will be used when the
function is unknown, when we are solving partial differential
equations.

Consider a function  f(x,y) that you can compute but do not know
a symbolic representation. To find the derivatives at a point
(x,y) we will use our previously discussed "nuderiv" nonuniform
derivative function.

To compute the numerical derivatives of f(x,y) at a set of points
we choose the number of points in the x and y directions, nx, ny.
Thus we have a grid of cells typically programmed as a
two dimensional array.

The partial derivative with respect to x at any (x1,y1) point
just uses points (x0,y1) (x1,y1) (x2,y1) ... (xn,y1))

The partial derivative with respect to y at any (x1,y1) point
just uses points (x1,y0) (x1,y1) (x1,y2) ... (x1,yn))

Similarly for second, third, fourth derviatives.

The partial derivative with respect to x and y is more complicated.
We first take the derivative with respect to x, same as calculus.
But, numerically, we need the value for every y.
Then, as in calculus, we take the derivative with respect to y
of the previously computed derivative with respect to x.
(Typically we save the x derivatives as a matrix because we
typically need all the partial derivatives with respect to y.)

A code fragment could look like:

static int nx = 11;          /* includes boundary */
static int ny = 10;          /* includes boundary */
static double xg[11] = {-1.0, -0.9, -0.8, -0.6, -0.3, 0.0,
0.35, 0.65, 0.85, 1.0, 1.2};
static double yg[10] = {-1.1, -1.0, -0.9, -0.6, -0.2,
0.3,  0.8,  1.0,  1.2,  1.3};
/* nx-2 by ny-2  internal, non boundary cells */
static double cx[11];
static double cy[10];
static double cxy[11][10]; // d^2 u(x,y)/dx dy  or  Uxy

for(i=1; i<nx-1; i++)
{
for(j=1; j<ny-1; j++)  // computing inside boundary
{
/* d^2 u(x,y)/dxdy */
nuderiv(1,nx,i,xg,cx);
nuderiv(1,ny,j,yg,cy);
for(kx=0; k<nx; kx++)
{
for(ky=0; k<ny; ky++)
{
cxy[kx][ky] = cx[kx] * cy[ky]; // numerical derivative at each point
}
}
}
}   // have cxy inside the boundary



### Lecture 24b, Computing partial derivatives in polar, cylindrical, spherical


Numerical differentiation can be applied to all coordinate systems.
It gets more complicated, yet a reasonable extension of Cartesian
coordinate systems. These are all partial derivatives.

The algorithms and some code shown below will be used when the
function is unknown, when we are solving partial differential
equations in non Cartesian coordinate systems.

Polar coordinates

Consider a function  f(r,theta) that you can compute but do not know
a symbolic representation. To find the derivatives at a point
(r,theta) in a polar coordinate system we will use our previously
discussed "nuderiv" nonuniform Cartesian derivative function.

x = r * cos(Θ)      r = sqrt(x^2 + y^2)      0 < r
y = r * sin(Θ)      Θ = atan2(y,x)           0 ≤ θ < 2Π

^         ^
∇f(r,Θ) = r ∂f/∂r + Θ 1/r ∂f/∂Θ

∇2f(r,Θ) = 1/r ∂/∂r(r*∂f/∂r) + 1/r2  ∂2f/∂Θ2

= 2/r ∂f/∂r + ∂2f/∂r2) + 1/r2  ∂2f/∂Θ2

Neither radius grid "rg[]" nor theta grid "tg[]" need to be uniformly
spaced. To compute the derivatives at the green dot, (rg[3],tg[3])

Use the red circle grid points for ∂/∂r and ∂2/∂r2

Use the blue box grid points for ∂/∂Θ and  ∂2/∂Θ2

The computation, in "C" language, at rg[3],tg[3] would be:
nuderiv(1, nr, 3, rg, cr); /* nr is 5 in this example */
Ur = 0.0;
for(k=0; k<nr; k++) Ur += cr[k]*f(rg[k],tg[3]); /* ∂/∂r */

nuderiv(1, nt, 3, tg, ct); /* nt is 8 in this sample */
Ut = 0.0;
for(k=0; k<nt; k++) Ut += ct[k]*f(rg[3],tg[k]); /* ∂/∂Θ */

^
nab1r = Ur /* r ∇f(r,Θ) */

^
nab1Θ = (1.0/rg[3])*Ut; /* Θ ∇f(r,Θ) */

nuderiv(2, nr, 3, rg, crr); /* nr is 5 in this example */
Urr = 0.0;
for(k=0; k<nr; k++) Urr += crr[k]*f(rg[k],tg[3]); /* ∂2/∂r2 */

nuderiv(2, nt, 3, tg, ctt); /* nt is 8 in this example */
Utt = 0.0;
for(k=0; k<nt; k++) Utt += ctt[k]*f(rg[3],tg[k]); /* ∂2/∂Θ2 */

nab2 = (2.0/rg[3])*Ur + Urr + (1.0/(rg[3]*rg[3]))*Utt; /* ∇2f(r,Θ) */

The actual code to check these equations at all grid points is:

Cylindrical coordinates

Consider a function  f(r,theta,z) that you can compute but do not know
a symbolic representation. To find the derivatives at a point
(r,theta,z) in a cylindrical coordinate system we will use our previously
discussed "nuderiv" nonuniform Cartesian derivative function.

x = r * cos(Θ)      r = sqrt(x^2 + y^2 + z^2)    0 < r
y = r * sin(Θ)      Θ = atan2(y,x)               0 ≤ θ < 2Π
z = z               z = z

^         ^             ^
∇f(r,Θ,z) = r ∂f/∂r + Θ 1/r ∂f/∂Θ + z ∂f/∂z

∇2f(r,Θ,z) = 1/r ∂/∂r(r*∂f/∂r) + 1/r2  ∂2f/∂Θ2 + ∂2f/∂z2

= 2/r ∂f/∂r + ∂2f/∂r2) + 1/r2  ∂2f/∂Θ2 + ∂2f/∂z2

Neither radius grid "rg[]" nor theta grid "tg[]" nor the z grid "zg[]"
need to be uniformly spaced. To compute the derivatives at (rg[0],tg[4],zg[2])

Use the green box grid points for ∂/∂r; and  ∂2/∂2r

Use the blue box grid points for  ∂2/∂z2

Use the red circle grid points for  ∂2/∂Θ2

The computation, in "C" language, would be:
nuderiv(1, nr, 0, rg, cr); /* nr is 3 in this example */
Ur = 0.0;
for(k=0; k<nr; k++) Ur += cr[k]*f(rg[k],tg[4],zg[2]); /* ∂/∂r */

nuderiv(1, nt, 4, tg, ct); /* nt is 7 in this sample */
Ut = 0.0;
for(k=0; k<nt; k++) Ut += ct[k]*f(rg[0],tg[k],zg[2]); /* ∂/∂Θ */

nuderiv(1, nz, 2, zg, cz); /* nz is 5 in this sample */
Uz = 0.0;
for(k=0; k<nz; k++) Uz += cz[k]*f(rg[0],tg[4],zg[k]); /* ∂/∂z */

^
nab1r = Ur /* r ∇f(r,Θ,z) */

^
nab1Θ = (1.0/rg[0])*Ut; /* Θ ∇f(r,&Theta,z) */

^
nab1z = Uz /* z ∇f(r,Θ,z) */

nuderiv(2, nr, 0, rg, crr); /* nr is 3 in this example */
Urr = 0.0;
for(k=0; k<nr; k++) Urr += crr[k]*f(rg[k],tg[4],zg[2]); /* ∂2/∂r2 */

nuderiv(2, nt, 4, tg, ctt); /* nt is 7 in this example */
Utt = 0.0;
for(k=0; k<nt; k++) Utt += ctt[k]*f(rg[0],tg[k],zg[2]); /* ∂2/∂Θ2 */

nuderiv(2, nz, 2, zg, czz); /* nz is 5 in this example */
Uzz = 0.0;
for(k=0; k<nz; k++) Uzz += czz[k]*f(rg[0],tg[4],zg[k]); /* ∂2/∂z2 */

nab2 = (2.0/rg[0])*Ur + Urr + (1.0/(rg[0]*rg[0]))*Utt + Uzz; /* ∇2f(r,Θ,z) */

The actual code to check these equations at all grid points is:

Spherical coordinates
interchange Φ and Θ on diagram to agree with equations

x = r * cos(Θ) * sin(Φ)     r = sqrt(x^2 + y^2 + z^2)  0 < r
y = r * sin(Θ) * sin(Φ)     Θ = atan2(y,x)             0 ≤ Θ < 2Π
z = r * cos(Φ)              Φ = atan2(sqrt(x^2+y^2),z)  0 < Φ < Pi

^         ^                       ^
∇f(r,Θ,Φ) = r ∂f/∂r + Θ 1/r sin(Φ) ∂f/∂Θ + Φ 1/r ∂f/∂φ

∇2f(r,Θ,Φ) = 1/r2 ∂/∂r(r2*∂f/∂r) + 1/(r2 sin(Φ)2) ∂2/∂Θ2f + 1/(r2 sin(Φ)) ∂/∂Φ(sin(Φ) ∂/∂f&Phi);

= 2/r ∂f/∂r + ∂2f/∂r2 + 1/(r2 sin(Φ)2) ∂2f/∂Θ2 + cos(Φ)/(r2 sin(Φ)2 ∂f/∂Φ + 1/(r2  ∂2f/∂Φ2

Partial derivatives in R, Theta, Phi follow their color.
All possible are shown.

Some "C" code to compute the derivatives at (rg[2],tg[3],zg[4])

nuderiv(2, nr, 2, rg, crr); /* nr is 3 in this example */
Urr = 0.0;
for(k=0; k<nr; k++) Urr += crr[k]*f(rg[k],tg[3],pg[4]); /* ∂2/∂r2 */

nuderiv(2, nt, 3, tg, ctt); /* nt is 7 in this example */
Utt = 0.0;
for(k=0; k<nt; k++) Utt += ctt[k]*f(rg[2],tg[k],pg[4]); /* ∂2/∂Θ2 */

nuderiv(2, np, 4, pg, cpp); /* np is 5 in this example */
Upp = 0.0;
for(k=0; k<np; k++) Upp += cpp[k]*f(rg[2],tg[3],pg[k]); /* ∂2/∂Φ2 */

nab2 = (1.0/rg[0])*Ur + Urr + (1.0/(rg[0]*rg[0]))*Utt + Uzz; /* ∇2f(r,Θ,Φ) */

The actual code to check these equations at all grid points is:

Other checking of code and one figure came from
draw_sphere_deriv.java
draw_sphere_deriv_java.out



### Lecture 25, Ordinary Differential Equations

An "ordinary" differential equation has exactly one independent
variable and at least one derivative of that variable.

For notation we use y=f(x) to  say y is a function of x.
For the derivative of y with respect to x, we may write
d f(x)/dx  or we can use  dy/dx  or  y'  when the independent
variable x is understood.

One of the simplest ordinary differential equations is

y' = y

which has the analytic solution y(x) = exp(x)  the exponential function
also written as e^x .

For a second derivative we use notation d^2 y/dx^2 or y''.

Many ordinary differential equations have analytic solutions. Yet,
many do not. We are interested in ordinary differential equations
that do not have analytic solutions and must be approximated by
equation that has an analytic solution so that we can check our
numeric solution method. The test case is the classic beam problem.

given a beam of length L, from 0 < x < L
attached rigidly at both ends
with Young's Modulus of E
with moment of inertia I
with p(x) being the load density  e.g. force per unit length
with both ends fixed, meaning:
y=0 at x=0, y=0 at x=L, dy/dx=0 at x=0, dy/dx=0 at x=L
then the differential equation that defines the y position of
the beam at every x coordinate is

d^4 y
EI ----- = p(x)   with the convention for downward is negative
dx^4

for uniformly distributed force (weight) p(x) becomes  - W/L
This simple case can be integrated and solved analytically:

d^4 y
EI ----- = -W/L
dx^4

d^3 y
EI ----- = -W/L x + A   ( A is constant of integration, value found later)
dx^3

d^2 y
EI ----- = -W/L x^2/2 + A x + B
dx^2

dy
EI -- = -W/L x^3/6 + A x^2/2 + B x + C  we see C=0 because dy/dx=0 at x=0
dx

EI y = -W/L x^4/24 + A x^3/6 + B x^2/2 + D  we see D=0 because y=0 at x=0

substituting y=0 at x=0 in the equation above gives

0 = -W/L L^4/24 + A L^3/6 + B L^2/2

substituting dy/dx=0 at x=L in the equation above the above gives

0 = -W/L L^3/6 + A L^2/2 + B L

solving two equations in two unknowns  A = W/2  B = - WL/12
then substituting for A and B in  EI y = ... and combining terms

W
y = --------  x^2 (x-L)^2
24 L EI

The known solution for a specific case is valuable to check your
programming of a numerical solution before computing a more
general case of p(x) where no closed form solution may exists.

In order to find a numerical solution, set up a system of linear
equations such that the solution of the equations are values
of y(x) at some points. The codes below just use seven points, n=7,
yet could use a much larger number. Because we have a fourth order
differential equation, we will use fourth order difference equations.
The solution is fourth order (or less, actually second order) thus
we will get results with no truncation error and only small
roundoff error. The seven points will be equally spaced by h.
h = L/(n+1). The value of y(0)=0 is given and the value y(L)=0
is given. The unknown points are y(h), y(2h), ... , y(7h).

We need the difference equations for fourth order derivatives from
nderiv.out

From nderiv.out the difference equation for d y^4/dx^4 is:

d y^4/dx^4(x) =(1/h^4)( y(x)-4y(x+h)+6y(x+2h)-4y(x+3h)+y(x=4h) )

Since d y^4/dx^4 = -W/L for uniform load, we can get equations
at x=2h ... x=6h. We will need special equations at x=h
and x=7h that take into account the boundary conditions..

the partial system of equations we want looks like:

|                             |   | y( h) |   |  0   |
| -4   6  -4   1   0   0   0  |   | y(2h) |   | -W/L |
|  1  -4   6  -4   1   0   0  |   | y(3h) |   | -W/L |
|  0   1  -4   6  -4   1   0  | * | y(4h) | = | -W/L |
|  0   0   1  -4   6  -4   1  |   | y(5h) |   | -W/L |
|  0   0   0   1  -4   6  -4  |   | y(6h) |   | -W/L |
|                             |   | y(7h) |   |  0   |

The boundary conditions dy/dx(0)=0, dy/dx(L)=0
must now be applied. Looking again at nderiv.out for a
first derivative that uses 5 points we find:

dy/dx(x) = 1/12h ( -25y(x) +48y(x+h) -36y(x+2h) +16y(x+3h) -3y(x+4h) )

For x=0, and knowing y(0)=0 we get the first row of the matrix

| 48 -36  16  -3   0   0   0  |   | y( h) |   |  0   |

At x=L we find

dy/dx(x) = 1/12h ( 3y(x-4h) -16y(x-3h) +36y(x-2h) -48y(x-h) +25y(x) )

For x=L, working backward,  and knowing y(L)=0 we get
the last row of the system of equations.

| 0   0   0  -3  16 -36  48  |   | y(7h) |   |  0   |

Solve the equations and we have y(h), y(2h), ..., y(7h) as
shown in the output files below in beam2_c.out.

In the code h=1 so that y(h) = y(x=1), y(7h) = y(x=7)
Note that the rather large step size works because we are
using a high enough order method. The analytic solution
is shown with the slopes at each point. The numeric
solution at the end checks very close to the analytic
solution.

beam2.c
beam2_c.out
beam2.f90
beam2_f90.out

For more versions of static and dynamic beam equations
Lecture 28d near the end



### Lecture 26, Ordinary Differential Equations


Quick and dirty numerical solution

Using the notation y for y(x), y' for d y(x)/dx, y'' for d^2 y(x)/dx^2 etc.

To solve  a(x) y''' + b(x) y'' + c(x) y' + d(x) y + e(x) = 0
for y at one or more values of x,
with initial condition constants c1, c2, c3 at x0:
y(x0) = c1  y'(x0) = c2  y''(x0) = c3

Create a function that computes y'''
f(ypp, yp, y, x) = -(b(x)*ypp + c(x)*yp + d(x)*y + e(x))/a(x)

a(x), b(x), c(x), d(x) and e(x) may be any functions of x and
that includes constants and zero. a(x) must not equal zero in
the range of x0 .. desired largest x.

Diagram for y = y + h*yp, similar for yp = yp + h*ypp, etc.

Initialize x = x0
y = c1
yp = c2
ypp = c3
loop:     use some method with the function, f, to compute the next ypp
yppp=f(ypp, yp, y, x)
ypp = ypp + h*yppp     first order or use some higher order
yp = yp + h*ypp        or some higher order method
y  = y  + h*yp         or some higher order method
x  = x  +h             increment x and loop

Quit when you have solved for y at the largest desired x

You may save previous values of x and y, interpolate at desired
value of x to get y.
You may run again with h half as big, keep halving h until you get
approximately the same result.

Quick and dirty finished! This actually works sometimes.

"x" is called the independent variable and
"y" is called the dependent variable.

This is known as an "initial value problem"
The conditions are known at the start and the definition of the
problem is used to numerically compute each successive value of
the solution. This was the method used to compute the flight
of the rocket in homework 1.

Later, we will cover a "boundary value problem" where at least
some information is given at the beginning and end of the
independent variable(s).

Better numerical solutions

To see some more accurate solutions to very simple ODE's
Solve y' = y   y(0)=1   which is just y(x)= exp(x)

ode_exp.c
ode_exp_c.out note very small h is worse

To see some more accurate solutions to simple second order ODE's
Solve y'' = -y   y(0)=1  y'(0)=0  which is just y(x)= cos(x)

ode_cos.c
ode_cos_c.out many cycles

An example of an unstable ODE is Hopf:
p is a bifurcation parameter, constant for each solution.
Given: dx = gx(x,y,p) = p*x - y - x*(x^2+y^2)
dy = gy(x,y,p) = x + p*y - y*(x^2+y^2)
dr = sqrt(dx^2 + dy^2)
z = f(x,y,p)
Integrate from t=0 to t=tfinal
dz(x,y,p)/dt = dx/dr   and
dz(x,y,p)/dt = dy/dr

The plot below uses initial conditions  p=-0.1, x=0.4, y=0.4, t=0.0, dt=0.01
and runs to t=10.0

hopf_ode.c

A three body problem, using simple integration, shows the
sling-shot effect of gravity when bodies get close.
This is numerically very unstable.

body3.c needs OpenGL to compile and link.
Hopefully, it can be demonstrated running in class.

body3.java plain Java, different version.
Hopefully, it can be demonstrated running in class.

Center of Mass, Barycenter, general motion

For more typical ordinary differential equations there are some
classic methods. A common higher order method is Runge-Kutta fourth
order.
Given  y' = f(x,y)  and the initial values of x and y

L:   k1 = h * f(x,y)
k2 = h * f(x+h/2.0,y+k1/2.0)
k3 = h * f(x+h/2.0,y+k2/2.0)
k4 = h * f(x+h,y+k3)
y = y + (k1+2.0*k2+2.0*k3+k4)/6.0
x = x + h
quit when x>your_biggest_x
loop to L:

When the solution  y(x) is a fourth order polynomial, or lower
order polynomial, the solution will be computed with no truncation
error, yet may have some roundoff error. Very large step sizes, h,
may be used for this very special case.

In general, the solution will not be accurately approximated by
a low order polynomial. Thus, even the Runge-Kutta method may
require a very small step size in order to compute an accurate
solution. Because very small step sizes result in long computations
and may have large accumulations of roundoff error, variable
step size methods are often used.

Typically a maximum step size and a minimum step size are set.
Starting with some intermediate step size, h, the computation
proceeds as shown below.

Given  y' = f(x,y)  and the initial values of x and y

L:   y1 = y + dy1    using some method with step size h compute dy1
the value of y1 is at  x = x + h
y2a = y + dy2a  using same method with step size h/2 compute dy2a
the value of y2a is at  x = x + h/2
y2 = y2a + dy2  using same method, from y2a at x = x + h/2
with step size h/2 compute dy2
the value of y2 is at  x = x + h
((y1-y2)<0.0?(-(y1-y2)):(y1-y2))>tolerance  h = h/2, loop to L:

((y1-y2)<0.0?(-(y1-y2)):(y1-y2)) < tolerance/4    h = 2 * h
y = y2
x = x + h
loop to L: until x > final x

There are many variations on the variable step size method.
The above is one simple version, partially demonstrated by:

FitzHugh-Nagumo equations  system of two ODE's
v' = v - v^3/3.0 + r      initial v = -1.0
r' = -(v - 0.2 - 0.2*r)   initial r =  1.0
initial h = 0.001
t in 0 .. 25

fitz.c just decreasing step size
fitz.m easier to understand, MatLab

Of course, we can let MatLab solve the same ODE system

fitz2.m using MatLab ode45

The ode45 in MatLab uses a fourth order Runge-Kutta and then
a fifth order Runge-Kutta and compares the result. A neat formulation
is used to minimize the number of required function evaluations.
The fifth order method reuses some of the fourth order evaluations.

Written in "C" from p355 of textbook
fitz2.c similar to fitz.c
fitz2.out close to same results

Another coding of fourth order Runge-Kutta known as Gill's method
is the (subroutine, method, function) Runge given in a few languages.
The user provided code comes first, then Runge.
This code solves a system of first order differential equations and
thus can some higher order differential equations also, as shown above.

Systems of ordinary differential equations

The system of differential equations is N equations in N dependent
variables, Y, with one independent variable X.
Y_i = Y'_i(X, Y_1, Y_2, ... , Y_N)  for i=1,N
The user provides the code for the functions Y'_i and places
the results in the F array. X goes from initial value, set by user,
to XLIM final value set by user. The user initializes the Y_i.

sim_difeq.f90 Fortran version
sim_difeq_f90.out

sim_difeq.java Java version
sim_difeq_java.out close to same results

sim_difeq.c C version
sim_difeq_c.out close to same results

sim_difeq.m MatLab version (main)
ode_test1.m funct computes derivatives
ode_test4.m funct computes derivatives

A brief look at definitions (we will cover more later)
See  differential equations definitions .



### Lecture 27, Partial Differential Equations


Some partial differential equations are difficult to solve symbolically
and some are difficult to solve numerically.

This lecture covers the numerical solution of simple partial
differential equations where the boundary values are known.

See  simple definitions .

We are told there is an unknown function u(x,y) that satisfies
the partial differential equation:
(read 'd' as the partial derivative symbol, '^' as exponent)

d^2u(x,y)/dx^2 + d^2u(x,y)/dy^2 = c(x,y)

and we are given the function c(x,y)
and we are given the values of u(x,y) on the boundary
of the domain of x,y in xmin,ymin to xmax,ymax

To compute the numerical solution of u(x,y) at a set of points
we choose the number of points in the x and y directions, nx, ny.
Thus we have a grid of cells typically programmed as a
two dimensional array.

The number of solution points, unknowns, is typically called
the degrees of freedom, DOF.

From the above we get a step size  hx = (xmax-xmin)/(nx-1) and
hy = (ymax-ymin)/(ny-1). For our first solution we choose
hx = hy = h and just use a step size of h.

If we need u(x,y) at some point not on our solution grid,
we can use two dimensional interpolation to find the desired value.

Now, write the differential equation as a second order
numerical difference equation.
(Numerical derivatives from nderiv.out order 2, points 3, term 1)

d^2u(x,y)/dx^2 + d^2u(x,y)/dy^2 =
1/h^2(u(x-h,y)-2u(x,y)+u(x+h,y)) +
1/h^2(u(x,y-h)-2u(x,y)+u(x,y+h)) +O(h^2)

Setting the above equal to c(x,y) and collecting terms

d^2u(x,y)/dx^2 + d^2u(x,y)/dy^2 =  c(x,y) becomes

1/h^2(u(x-h,y)+u(x+h,y)+u(x,y-h)+u(x,y+h)-4u(x,y))=c(x,y)

now solve for a new u(x,y), based on neighboring grid points
u(x,y)= (u(x-h,y)+u(x+h,0)+u(x,y-h)+u(x,y+h)-c(x,y)*h^2)/4

The formula for u(x,y) is programmed based on indexes i and j
such that x=xmin+i*h, y=ymin+j*h  for i=1,nx-2  j=1,ny-2

Note that the boundary values are set and unchanged for the four
sides  i=0 for j=0,ny-1    i=nx-1 for j=0,ny-1
j=0 for i=0,nx-1    j=ny-1 for i=0,nx-1
and the interior cells are set to some value, typically zero.

Then iterate, computing new interior cells based on the previous
set of cells including the boundary.

Compute the absolute value of the difference between each new
interior cell and the previous interior cell. A stable problem
will result in this value getting monotonically smaller.

The stopping condition may be based on the above difference
value or on the number of iterations.

To demonstrate the above development, a known u(x,y) is used.
The boundary values are set and the iterations computed.
Since the solution is known, the error can be computed at
each cell and reported for each iteration.

At several iterations, the computed solution and the errors
from the known exact solution are printed.

The same code has been programmed in "C", Java, Fortran 95, Ada 95 and
Matlab as shown below with file types .c, .java, .f90, .adb and .m:

Chose the language you like best and get an understanding of
how the method above was programmed.

"C" source code  pde3.c
output of pde3.c
Fortran 95 source code  pde3.f90
output of pde3.f90
Java source code  pde3.java
output of pde3.java
Matlab 7 source code  pde3.m
output of pde3.m

You may tailor any of the above code to solve second order
partial differential equations in two independent variables.

1) You need a function  c(x,y)  that is the forcing function, the right
hand side of the differential equation.

2) You need a function  u(x,y)  that provides at least the boundary
conditions. If your  u(x,y)  is not a test solution, remove the "check"
code. And a choice of xmin, xmax, ymin, ymax.

3) If your differential equation is different, then use  nderiv.out
to find the coefficients and create the function  u00(i,j) that
computes the next value of a solution point from the surrounding
previous solution points.

The following examples will be covered for a more general case
in the next lecture. The same problem is solved as given above,
using a method of solving simultaneous linear equations, rather
than an iterative method. The files corresponding to the above
examples are with minimum modifications are:

derivation of  pde3_eq.c
"C" source code  pde3_eq.c
output of pde3_eq.c
Matlab 7 source code  pde3_eq.m
output of pde3_eq.m

Extending to four dimensions in four variables causes the
indexing to get more complex, yet the implementation is
still systematic. An example in C, Java, Ada and Fortran is:
Note that a fourth order method is used when the solution
is at most a fourth order polynomial, thus very small
errors are expected.

"C" source code  pde44_eq.c
output is pde44_eq_c.out
"C" source code  simeq.c
"C" source code  simeq.h
"C" source code  deriv.c
"C" source code  deriv.h

"java" source code  pde44_eq.java
output is pde44_eq_java.out
"java" source code  simeq.java
"java" source code  rderiv.java

"Fortran" source code  pde44_eq.f90
output is pde44_eq_f90.out
"Fortran" source code  simeq.f90
"Fortran" source code  deriv.f90

Then, the same set using sparse matrix

"C" source code  pde44_sparse.c
output is pde44_sparse_c.out
"C" source code  ssparse.c
"C" source code  sparse.h
"C" source code  deriv.c
"C" source code  deriv.h

"java" source code  pde44_sparse.java
output is pde44_sparse_java.out
"java" source code  sparse.java
"java" source code  rderiv.java

(be careful: sparse.ads is zero based subscript,

The instructor understands that some students have a strong
prejudice in favor of, or against, some programming languages.
languages, the instructor finds that the difference between
programming languages is mostly syntactic sugar. Yet, since
students may be able to read some programming languages
easier than others, these examples are presented in "C",
Fortran 90, Java, Ada and Matlab. The intent was to do a quick
translation, keeping most of the source code the same,
for the different languages. Style was not a consideration.
Some rearranging of the order was used when convenient.
In Java and Ada the quick and dirty formatting was used
and thus it is suggested to look at the formatted output of
"C" or Fortran. The numerical results are almost exactly
the same although the presentation differs considerably.
The Matlab code is a combination of "C" and Fortran given
as a single file with internal functions. The subscript base
is shifted from 0 to 1 as in Fortran.

All of this source code compiles and runs on Windows, Linux
and MacOSX. You should not care what operating system your
code will be executed on. Make it operating-system independent.



### Lecture 27a, Differential Equations Definitions


Some simplified definitions and terminology related to differential equations:

Given a function  y = f(x)
In plain text, without mathematical symbols, the derivative of
f(x) with respect to x may be written many ways:

df(x)/dx = df/dx = f'(x) = f'

or for y=f(x)  dy/dx = y'(x) = y'

Plain "d" will be used also for the partial derivative symbol.

The fundamental theorem of calculus is:

Given f(x) is continuous on the interval a <= x <= b and
F(x) is the indefinite integral of f(x) then
integral from x=a to x=b of f(x)dx = F(b) - F(a)

Derivative f(x) = dF(x)/dx = F'(x) = F'

Note that any letters, upper or lower case, may be used for any
function or variable.

Second Derivative is simply the derivative of the first derivative

d^2 f(x)/dx^2 = d (df(x)/dx)/dx = f''(x) = f''

Ordinary Differential Equation (ODE)
A differential equation with only one independent variable.
Example:  d^2 f(x)/dx^2 = -f(x)  f(0)=0, f'(0)=1 thus  f(x)=sin(x)

Partial Differential Equation (PDE)
A differential equation with more than one independent variable.
Example: dU/dx + dU/dy = f(x,y) given f(x,y)=x+y then U(x,y)=xy

Dimension of a differential equation is the number of independent
variables. Typically the independent variables are:
x for one dimension
x,y for two dimensions
x,y,z for three dimensions
x,y,z,t for four dimensions, t usually being time.
x,y,z,u,v,w,t seven independent variables for fluid problems

With many combinations, we use a shorthand, for example for
a function U(x,y,z):
Uxxx is notation for third derivative with respect to x
Uxy  is notation for partial derivative with respect to x and
with respect to y
Uyzz is notation for partial derivative with respect to y and
second partial derivative with respect to z

Order of a differential equation is the highest derivative appearing.
Example: First order:  dF/dx + dF/dy + F(x,y)^3 + x^4 + y^5 = 0
Example: Second order: d^2 F/dx^2 = F''(x)     Uxx
Example: Third order:  d^3 F/dx^3 = F'''(x)    Uxxx
Example: Fourth order: d^4 F/dx^4 = F''''(x)   Uxxxx

Degree of a differential equation is the highest power of any derivatives.
Example: First degree(linear):  d^2 F/dx + (dF/dx) = 0
Example: Second degree(quadratic):  (d^3 F/dx^3) + (dF/dx)^2 = 0
Example: Third degree(cubic):  (d^2 F/dx^2)^3 (dF/dx) = 0
Note: A "linear" differential equation has highest degree one
for all orders. No U'*U, U^2, U'*U'' etc.
The solution methods that use solving a linear system of
equations only work with linear differential equations.

Initial value differential equation problems have values given at one end
of the domain.

Boundary value differential equation problems have values given at all ends
of the domain.
One dimensional has two values.
Two dimensional has values on an enclosing path. e.g. square, circle
Three dimensional has values on an enclosing surface. e.g. cube, sphere
Four dimensional has values on an enclosing volume. e.g. hyper cube

Dirichlet boundary values are the values at a point on the boundary.
Neumann boundary values are the derivative at a point on the boundary,
typically the first derivative in the direction of the outward normal.

Mixed value partial differential equation problems may have some variables
initial value and some variables boundary value. Often the time
variable is given only as an initial value.

Types of second order, first degree, partial differential equations
in two variables, very common, thus named:

Given  A d^2 U/dx^2 + 2B d^2 U/dxdy + C d^2 U/dy^2 + other terms = f(x,x)

Parabolic  when B^2 = A C   e.g. Diffusion equation
one unique real characteristic
system has one zero eigenvalue, others
all positive or all negative

Elliptic   when B^2 < A C   e.g. Laplace's equation
two unique complex characteristics
system has eigenvalues all positive or all negative

Hyperbolic when B^2 > A C   e.g. Wave equation
two unique real characteristics
system has no zero eigenvalues and at least one
positive and one negative

Parabolic Diffusion equation: B=0, C=0
k d^2 V/dx^2 - dV/dt = 0

Elliptic Laplace's equation:  B=0, A>0, C>0
d^2 V/dx^2 + d^2 V/dy^2 = 0

Hyperbolic Wave equation:     B=0, A>0, C<0
d^2 V/dx^2 - 1/c^2 d^2 V/dt^2 = 0

The above definitions are motivated by the equation of a cone
cut by a plane at various angles, giving the conic sections:
parabola,  ellipse  and  hyperbola.

The definitions are extended by some authors to first order
equations such as  dV/dt + a dV/dx = 0 to be called a hyperbolic
one-way wave equation.

The three example equations above are called "homogeneous" because
no term has an independent variable.
More general equations have a forcing function that would replace
the "0" with  f(x,y) or f(x,t) and thus have an inhomogeneous equation.

Using web math symbols:

Laplace Equation is  ΔU = 0  or  ∇2U = 0  or

∂2U/∂x2 + ∂2U/∂y2 + ∂2U/∂z2 = 0

Poisson Equation is  ΔU = f  or  ∇2U = f  or

∂2U/∂x2 + ∂2U/∂y2 + ∂2U/∂z2 = f(x,y,z)

Some terminology is used for various methods of numerical solution
of differential equations.

The Finite Difference Method, FDM, replaces the continuous differential
operators with finite difference approximations. The order of
the approximation may be checked by substitution of the Taylor series.
The FDM is explicit if the solution at the next cell can be expressed
entirely in terms of previously computed cells. Our "nderiv".
The FDM is implicit if the solution at the next group of cells must
be represented as a set of simultaneous equations based on a
previous group of cells.

The Finite Element Method, FEM, develops a system of simultaneous
equations for the solution at every cell.
The FEM is explicit if one solution of the simultaneous equations
yields the solution of the differential equation.
The FEM is iterative if each solution of the simultaneous equations
yields the next approximation to the solution of the differential equation.

more definitions on ODE and PDE



### ODE PDE Overview


ODE Ordinary Differential Equation - one independent variable
a(x)*U''(x) + b(x)*U'(x) + c(x)*U(x) = f(x)
x is independent variable
the solution is U(x)
U'(x) is derivative of U(x) with respect to x, written Ux,
functions a(x), b(x), c(x) and f(x) must be known
enough initial conditions must be known for a unique solution

PDE Partial Differential Equation - more than one
independent variable
a(x,y)*Uxx(x,y) + b(x,y)*Uxy(x,y) + c(x,y)*Uyy(x,y) +
d(x,y)*Ux(x,y)  + e(x,y)*Uy(x,y)  + g(x,y)*U(x,y) = f(x,y)
x and y are independent variables
the solution is U(x,y)
Ux(x,y) is partial derivative of U(x,y) with respect to x,
written Ux,
Uxy(x,y) is partial derivative of U(x,y) with respect to x and
partial derivative with respect to y,
written Uxy,
functions a(x,y), b(x,y), c(x,y), d(x,y), e(x,y),
f(x,y) and g(x,y) must be known and computable.
Enough boundary condition must be known for a unique solution.
in three dimensions x,y becomes x,y,z
in four dimensions  x,y becomes x,y,z,t

Methods for computing numerical solutions replace the continuous
variable x with discrete values x1, x2, x3, ... , x_nx
Uniformly spaced values may be used, given xmin, xmax and h,
xmin, xmin+h, xmin+2h, ... , xmax   nx=1+(xmax-xmin)/h
Discrete values are also used for y, z, and t

The numerical solution is computed at the discrete values of
the independent variables  U(x1), U(x2), U(x3), ... , U(x_nx)
or  U(x1,y1), U(x1,y2), U(x2,y2), ... ,  U(x_nx,y_ny), etc.
The solution is a set of numbers, often written as
U1, U2, ... U_xn  or  U[1,1], U[1,2], U[2,2], ... U[nx,ny]

A numerical solution may use a uniform grid or a
nonuniform grid.

A numerical solution method may be iterative, computing
a closer approximation at each step.

A numerical solution may set up a system of linear equations
to solve for the solution values.

A numerical solution may use a combination of iterative and
linear equations to solve for the solution values.
Not that a non linear differential equation will create a
non linear system of equations to solve. Very difficult!

All of the above statements apply to a set of methods,
generally called discretization.

FEM Finite Element Method is a set of methods for finding the
numerical solution of a differential equation. Within FEM
there are various sub methods including the Galerkin method.

FEM may use equally spaced or variable spaced values for
the independent variables.

FEM may also use triangles, quads, or other polygons for two
independent variables.

FEM may also use tetrahedrons or other solids for three
independent variables. And four dimensional objects for four
independent variables.

see FEM lecture for the rest of the explanation.
cs455_l32.html Finite Element Method



### Lecture 28, Partial Differential Equations 2


We now extend the analysis and solution to three dimensions and
unequal step sizes.

We are told there is an unknown function u(x,y,z) that satisfies
the partial differential equation:
(read 'd' as the partial derivative symbol, '^' as exponent)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/dz^2 = c(x,y,z)

and we are given the function c(x,y,z)
and we are given the values of u(x,y,z) on the boundary
of the domain of x,y,z in xmin,ymin,zmin to xmax,ymax,zmax

To compute the numerical solution of u(x,y,z) at a set of points
we choose the number of points in the x,y and z directions, nx,ny,nz.
The number of non boundary points is called the degrees of freedom, DOF.

Thus we have a grid of cells typically programmed as a
three dimensional array.

From the above we get a step size  hx = (xmax-xmin)/(nx-1) ,
hy = (ymax-ymin)/(ny-1) and hz = (zmax-zmin)/(nz-1).

If we need u(x,y,z) at some point not on our solution grid,
we can use three dimensional interpolation to find the desired value.

Now, write the differential equation as a second order
numerical difference equation. The analytic derivatives are
made discrete by numerical approximation. "Discretization."

(Numerical derivatives from nderiv.out order 2, points 3, term 1)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/d^2z =
1/h^2(u(x-h,y,z)-2u(x,y,z)+u(x+h,y,z)) +
1/h^2(u(x,y-h,z)-2u(x,y,z)+u(x,y+h,z)) +
1/h^2(u(x,y,z-h)-2u(x,y,z)+u(x,y,z+h)) + O(h^2)

Setting the above equal to c(x,y,z)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/d^2z =  c(x,y,z) becomes

1/hx^2(u(x-hx,y,z)+u(x+hx,y,z)-2u(x,y,z)) +
1/hy^2(u(x,y-hy,z)+u(x,y+hy,z)-2u(x,y,z)) +
1/hz^2(u(x,y,z-hz)+u(x,y,z+hz)-2u(x,y,z)) = c(x,y,z)

now solve for new u(x,y,z), based on neighboring grid points
u(x,y,z) = ((u(x-hx,y,z)+u(x+hx,y,z))/hx^2 +
(u(x,y-hy,z)+u(x,y+hy,z))/hy^2 +
(u(x,y,z-hz)+u(x,y,z+hz))/hz^2 - c(x,y,z)) /
(2/hx^2 + 2/hy^2 + 2/hz^2)

The formula for u(x,y,z) is programmed based on indexes i,j,k
such that x=xmin+i*hx, y=ymin+j*hy, z=zmin+k*hz
for i=1,nx-2  j=1,ny-2  k=1,nz-2

Note that the boundary values are set and unchanged for the six
sides  i=0 for j=0,ny-1 k=0,nz-1   i=nx-1 for j=0,ny-1 k=0,nz-1
j=0 for i=0,nx-1 k=0,nz-1   j=ny-1 for i=0,nx-1 k=0,nz-1
k=0 for i=0,nx-1 j=0,nx-1   k=nz-1 for i=0,nx-1 j=0,ny-1
and the interior cells are set to some value, typically zero.

Then iterate, computing new interior cells based on the previous
set of cells including the boundary.

Compute the absolute value of the difference between each new
interior cell and the previous interior cell. A stable problem
will result in this value getting monotonically smaller.

The stopping condition may be based on the above difference
value or on the number of iterations.

To demonstrate the above development, a known u(x,y,z) is used.
The boundary values are set and the iterations computed.
Since the solution is known, the error can be computed at
each cell and reported for each iteration.

At several iterations, the computed solution and the errors
from the known exact solution are printed.

The same code has been programmed in "C", Java, Fortran 95 and Ada 95
as shown below with file types .c, .java, .f90 and .adb:

"C" source code  pde3b.c
output of pde3b.c
Fortran 95 source code  pde3b.f90
output of pde3b.f90
Java source code  pde3b.java
output of pde3b.java
Matlab 7 source code  pde3b.m
output of pde3b.m

It should be noted that a direct solution is possible.
The iterative solution comes from a large set of simultaneous
equations that may be able to be solved accurately. For the
same three dimensional problem above, observe the development
of the set of simultaneous equations:

We are told there is an unknown function u(x,y,z) that satisfies
the partial differential equation:
(read 'd' as the partial derivative symbol, '^' as exponent)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/dz^2 = c(x,y,z)

and we are given the function c(x,y,z)
and we are given the values of u(x,y,z) on the boundary ub(x,y,z)
of the domain of x,y,z in xmin,ymin,zmin to xmax,ymax,zmax

To compute the numerical solution of u(x,y,z) at a set of points
we choose the number of points in the x,y and z directions, nx,ny,nz.
Thus we have a grid of cells typically programmed as a
three dimensional array.

From the above we get a step size  hx = (xmax-ymin)/(nx-1) ,
hy = (ymax-ymin)/(ny-1) and hz = (xmax-zmin)/(nz-1).

If we need u(x,y,z) at some point not on our solution grid,
we can use three dimensional interpolation to find the desired value.

Now, write the differential equation as a second order
numerical difference equation.
(Numerical derivatives from nderiv.out order 2, points 3, term 1)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/d^2z =
1/h^2(u(x-h,y,z)-2u(x,y,z)+u(x+h,y,z)) +
1/h^2(u(x,y-h,z)-2u(x,y,z)+u(x,y+h,z)) +
1/h^2(u(x,y,z-h)-2u(x,y,z)+u(x,y,z+h)) + O(h^2)

Setting the above equal to c(x,y,z)

d^2u(x,y,z)/dx^2 + d^2u(x,y,z)/dy^2 + d^2u(x,y,z)/d^2z =  c(x,y,z) becomes

1/hx^2(u(x-hx,y,z)+u(x+hx,y,z)-2u(x,y,z)) +
1/hy^2(u(x,y-hy,z)+u(x,y+hy,z)-2u(x,y,z)) +
1/hz^2(u(x,y,z-hz)+u(x,y,z+hz)-2u(x,y,z)) = c(x,y,z)

Now, rearrange terms to collect the u(x,y,z) coefficients:

u(x-hx,y,z)/hx^2 + u(x+hx,y,z)/hx^2 +
u(x,y-hy,z)/hy^2 + u(x,y+hy,z)/hy^2 +
u(x,y,z-hz)/hz^2 + u(x,y,z+hz)/hz^2 -
u(x,y,z)*(2/hx^2 + 2/hy^2 + 2/hz^2) = c(x,y,z)

We build a linear system of simultaneous equations M U = V
by computing the values of the matrix M and right hand side vector V.
(The system of equations is linear if the differential equation is linear,
otherwise a system of nonlinear equations must be solved.)
The linear subscripts of U corresponding to u(i,j,k) is
m = (i-1)*(ny-2)*(nz-2) + (j-1)*(nz-2) + k
and the rows and columns and V use the same linear subscripts.

Now, using indexes rather than x, x+hx, x-hx,
solve system of linear equations for u(i,j,k).
u(i,j,k)=u(xmin+i*hx, ymin+j*hz, zmin+k*hz)

u(i-1,j,k)/hx^2 + u(i+1,j,k)/hx^2 +
u(i,j-1,k)/hy^2 + u(i,j+1,k)/hy^2 +
u(i,j,k-1)/hz^2 + u(i,j,k+1)/hz^2 -
u(i,j,k)*(2/hx^2 + 2/hy^2 + 2/hz^2) = c(x,y,z)

for i=1,nx-2 j=1,ny-2 k=1,nz-2 while substituting
u(i,j,k) boundary values when i=0 or nx-1, j=0 or ny-1,
k=0 or nz-1 using x=xmin+i*hx, y=ymin+j*hz, z=zmin+k*hz

Remember: hx, hy, hz are known constants, c(x,y,z) is computable.

The matrix has m = (nx-2)*(ny-2)*(nz-2) rows and columns for
solving for m equations in m unknowns. The right hand side vector
is the c(x,y,z) values with possibly boundary terms subtracted.
The unknowns are u(i,j,k) for i=1,nx-2, j=1,ny-2, k=1,nz-2 .

The first row of the matrix is for the unknown u(1,1,1) i=1, j=1, k=1

ub(xmin+0*hx,ymin+1*hy,zmin+1*hz)/hx^2 + u(2,1,1)/hx^2 +
ub(xmin+1*hx,ymin+0*hy,zmin+1*hz)/hy^2 + u(1,2,1)/hy^2 +
ub(xmin+1*hx,ymin+1*hy,zmin+0*hz)/hz^2 + u(1,1,2)/hz^2 +
u(1,1,1)*(2/hx^2 + 2/hy^2 + 2/hz^2) = c(xmin+1*hx,ymin+1*hy,zmin+1*hz)

Matrix cells in first row are zero except for:
M(1,1,1)(1,1,1) = 1/(2/hx^2 + 2/hy^2 + 2/hz^2)
M(1,1,1)(1,1,2) = 1/hz^2
M(1,1,1)(1,2,1) = 1/hy^2
M(1,1,1)(2,1,1) = 1/hx^2
V(1,1,1) =   c(xmin+1*hx,ymin+1*hy,zmin+1*hz)
-ub(xmin+0*hx,ymin+1*hy,zmin+1*hz)/hx^2
-ub(xmin+1*hx,ymin+0*hy,zmin+1*hz)/hy^2
-ub(xmin+1*hx,ymin+1*hy,zmin+0*hz)/hz^2

Note that three of the terms are boundary terms and thus have a
computed value that is subtracted from the right hand side vector V.

The row of the matrix for the u(2,2,2) unknown has no boundary terms

u(1,2,2)/hx^2 + u(3,2,2)/hx^2 +
u(2,1,2)/hy^2 + u(2,3,2)/hy^2 +
u(2,2,1)/hz^2 + u(2,2,3)/hz^2 +
u(2,2,2)*(2/hx^2 + 2/hy^2 + 2/hz^2) = c(xmin+2*hx,ymin+2*hy,zmin+2*hz)

Matrix cells in row (2,2,2) are zero except for:
M(2,2,2)(1,2,2) = 1/hx^2
M(2,2,2)(3,2,2) = 1/hx^2
M(2,2,2)(2,1,2) = 1/hy^2
M(2,2,2)(2,3,2) = 1/hy^2
M(2,2,2)(2,2,1) = 1/hz^2
M(2,2,2)(2,2,3) = 1/hz^2
M(2,2,2)(2,2,2) = 1/(2/hx^2 + 2/hy^2 + 2/hz^2)
V(2,2,2) = c(xmin+2*hx,ymin+2*hy,zmin+2*hz)

The number of rows in the matrix is equal to the number of
equations to solve simultaneously and is equal to the
number of unknowns, the degrees of freedom.

The matrix is quite sparse and may be considered "band diagonal".
Typically, special purpose simultaneous equation solvers are
used, often with preconditioners.

A PDE that has all constant coefficients, in this case,
there are only four possible values in the matrix. These are
computed only once at the start of building the matrix.
A PDE with functions of x,y,z as coefficients is covered later.

Solving the system of equations for our test problem gave
the error on the order of 10^-12 compared with
the iterative solution error of 10^-4 after 150 iterations.

The same code has been programmed in "C", Java, Fortran 95, Ada 95 and
Matlab 7 as shown below with file types .c, .java, .f90, .adb and .m:

"C" source code  pde3b_eq.c
output of pde3b_eq.c
Fortran 95 source code  pde3b_eq.f90
output of pde3b_eq.f90
Java source code  pde3b_eq.java
output of pde3b_eq.java
Matlab 7 source code  pde3b_eq.m
output of pde3b_eq.m

Please note that the solution of partial differential equations
gets significantly more complicated if the solution is not
continuous or is not continuously differentiable.

You can tailor the code presented above for many second order partial
differential equations in three independent variables.

1) You must provide a function  u(x,y,z)  that computes the solution
on the boundaries. It does not have to be a complete solution and
the "check" code should be removed.

2) You must provide a function  c(x,y,z)  that computes the
forcing function, the right hand side of the differential equation.

3a) If the differential equation is not d^2/dx^2 + d^2/dy^2 +d^2/dz^2
then use coefficients from  nderiv.out  to create a new function
u000(i,j,k) that computes a new value of the solution at i,j,k
from the neighboring coordinates for the next iteration.

3b) If the differential equation is not d^2/dx^2 + d^2/dy^2 +d^2/dz^2
then use coefficients from  nderiv.out  to create a new matrix
initialization function and then solve the simultaneous equations.

3c) There may be constant or function multipliers in the given PDE.
For example  exp(x+y/2) * d^2/dx^2 + 7 * d^2/dy^2  etc.
Constant multipliers are just multiplied when creating the
discrete equations. Functions of the independent variables, x,y
must be evaluated at the point where the discrete derivative is
being applied.

For an extended example of discretization of a fourth order PDE
in two dimensions see discrete2d.txt



### Lecture 28a, Higher Order, Higher Dimension


We can extend solutions to four dimensions, typically the physical
dimensions x,y,z and time t.

We can extend to solutions that have fourth derivatives in all
four dimensions.

A "well posed PDE problem" is defined as a specific PDE with
specific boundary conditions such that:
1) The solution is unique.
2) The solution is continuous inside and on the boundaries.
3) The solution is continuously differentiable.

A well posed problem is ideal for high order methods because
the solution can be approximated by a polynomial. The polynomial
may have to be a high order polynomial.

With a1, a2, ... a5 being computable functions a1(x) etc.
The notation Ux means first derivative of U(x) with respect to x.
Uxxxx means fourth derivative of U(x) with respect to x.
The maximal linear PDE in one dimension is
a fourth order PDE in one dimension:

a1*Uxxxx + a2*Uxxx + a3*Uxx + a4*Ux + a5*U = f(x)

With a1, a2, ... a15 being computable functions a1(x,y) etc.
The maximal linear PDE that can be solved in two dimensions is
The notation Uxy means the derivative of U(x,y)
with respect to x and with respect to y.
The maximal linear PDE in two dimensions is
a fourth order PDE in two dimensions:

a1*Uxxxx + a2*Uxxxy + a3*Uxxyy + a4*Uxyyy + a5*Uyyyy +
a6*Uxxx  + a7*Uxxy  + a8*Uxyy  + a9*Uyyy  +
a10*Uxx  + a11*Uxy  + a12*Uyy  +
a13*Ux   + a14*Uy   + a15*U    = f(x,y)

With a1, a2, ... a70 being computable functions a1(x,y,z,t) etc.
The notation Uxyzt means the derivative of U(x,y,x,t)
with respect to x and with respect to y and with respect to z
and with respect to t.
The maximal linear PDE in four dimensions is
a fourth order PDE in four dimensions:

a1*Uxxxx +  a2*Uxxxy +  a3*Uxxxz +  a4*Uxxxt +  a5*Uxxyy +
a6*Uxxyz +  a7*Uxxyt +  a8*Uxxzz +  a9*Uxxzt + a10*Uxxtt +
a11*Uxyyy + a12*Uxyyz + a13*Uxyyt + a14*Uxyzz + a15*Uxyzt +
a16*Uxytt + a17*Uxzzz + a18*Uxzzt + a19*Uxztt + a20*Uxttt +
a21*Uyyyy + a22*Uyyyz + a23*Uyyyt + a24*Uyyzz + a25*Uyyzt +
a26*Uyytt + a27*Uyzzz + a28*Uyzzt + a29*Uyztt + a30*Uyttt +
a31*Uzzzz + a32*Uzzzt + a33*Uzztt + a34*Uzttt + a35*Utttt +
a36*Uxxx  + a37*Uxxy  + a38*Uxxz  + a39*Uxxt  + a40*Uxyy  +
a41*Uxyz  + a42*Uxyt  + a43*Uxzz  + a44*Uxzt  + a45*Uxtt  +
a46*Uyyy  + a47*Uyyz  + a48*Uyyt  + a49*Uyzz  + a50*Uyzt  +
a51*Uytt  + a52*Uzzz  + a53*Uzzt  + a54*Uztt  + a55*Uttt  +
a56*Uxx   + a57*Uxy   + a58*Uxz   + a59*Uxt   + a60*Uyy   +
a61*Uyz   + a62*Uyt   + a63*Uzz   + a64*Uzt   + a65*Utt   +
a66*Ux    + a67*Uy    + a68*Uz    + a69*Ut    + a70*U = f(x,y,z,t)

Sample code for testing with constant coefficients is:
"C" source code  test_4d.c
rderiv being tested  deriv.c
"C" test code  check_test_4d.c
"C" test result  check_test_4d.out
This includes u(x,y,z,t) and all derivatives through fourth order.

Just changing number of points and limits for more checking
"C" test code  check2_test_4d.c
"C" test result  check2_test_4d.out

Checking non uniform grid
"C" test code  checknu_test_4d.c
"C" test result  checknu_test_4d.out
rderiv being tested  nuderiv.c

This includes u(x,y,z,t) and all derivatives through fourth order.

Fortran module  test_4d.f90
rderiv1 being tested  deriv.f90
Fortran test code  check_test_4d.f90
Fortran test result  check_test_4d_f90.out
This includes u(x,y,z,t) and all derivatives through fourth order.

We may need to extend the computation to use higher order methods
to obtain the needed accuracy.

A few samples include:

Four dimensions, fourth order
"C" source code  pde44_eq.c
output is pde44_eq_c.out
"C" source code  simeq.h
"C" source code  simeq.c
"C" source code  deriv.h
"C" source code  deriv.c

"java" source code  pde44_eq.java
output is pde44_eq_java.out
"java" source code  simeq.java
"java" source code  rderiv.java

Fortran source code  pde44_eq.f90
output is pde44_eq_f90.out
Fortran source code  simeq.f90
Fortran source code  deriv.f90

From Lecture 32, using Finite Element Method, FEM

fem_check44_la.c fourth order, four dimension
fem_check44_la_c.out  output with debug print

fem_check44_la.f90 fourth order, four dimension
fem_check44_la_f90.out  output with debug print

fem_check44_la.java fourth order, four dimension
fem_check44_la_java.out  output with debug print

Other files, that are needed by some examples above:

laphi.c  code through 4th derivative
laphi.f90 module through 4th derivative
laphi.java class through 4th derivative

This example computes the error of the numeric solution
against the definition of the PDE, and against the known solution.
The numeric solution is written to a file and gnuplot is used.
For the simple Laplace Equation, the maxerror verses the
size, degrees of freedom and quadrature order,
xmax, ymax, nx, ny, npx, npy     are shown in:
fem_laplace_la.java various cases
fem_laplace_la_java.out output

Solution U(x,y) = exp(x)*sin(y)
xmin=0.0, ymin=0.0       DOF      quadrature    maximum error against known
xmax=1.0, ymax=1.0, nx= 4, ny= 4, npx=3, npy=3, maxerr=3.88E-4
xmax=1.0, ymax=1.0, nx= 4, ny= 4, npx=4, npy=4, maxerr=2.62E-4
xmax=1.0, ymax=1.0, nx= 4, ny= 4, npx=6, npy=6, maxerr=2.62E-4
xmax=1.0, ymax=1.0, nx= 6, ny= 6, npx=4, npy=4, maxerr=3.94E-6
xmax=1.0, ymax=1.0, nx= 6, ny= 6, npx=6, npy=6, maxerr=1.33E-6
xmax=1.0, ymax=1.0, nx= 6, ny= 6, npx=8, npy=8, maxerr=1.33E-6
xmax=1.0, ymax=1.0, nx= 8, ny= 8, npx=6, npy=6, maxerr=4.64E-9
xmax=1.0, ymax=1.0, nx= 8, ny= 8, npx=8, npy=8, maxerr=2.27E-9
xmax=1.5, ymax=1.5, nx= 8, ny= 8, npx=8, npy=8, maxerr=1.09E-7
xmax=3.0, ymax=3.0, nx= 8, ny= 8, npx=8, npy=8, maxerr=8.71E-5
xmax=3.0, ymax=3.0, nx=10, ny=10, npx=8, npy=8, maxerr=1.38E-6
xmax=3.0, ymax=6.3, nx=10, ny=10, npx=8, npy=8, maxerr=2.06E-4

non uniform distributed vertices

A PDE of fourth order in four dimensions with non uniformly
distributed vertices of arbitrary geometry is difficult to
solve accurately. The building blocks are non uniform
four dimensional discretization, nuderiv4d.java.
The test program to check all 70 possible derivatives is
test_nuderiv4d.java.
A simple PDE
Uxxxx(x,y,z,t) + Uyyyy(x,y,z,t) + Uzzzz(x,y,z,t) + Utttt(x,y,z,t) =
2*(exp(x)+exp(y)+exp(z)+exp(t))
with solution
U(x,y,z,t) = exp(x)+exp(y)+exp(z)+exp(t)
is shown below.
It runs very fast, yet accuracy problems as number of unknown
vertices increases.
This is very sensitive to the independence of the underlying
grid points.

nuderiv4d.java basic discretization
test_nuderiv4d.java test discretization
pde44e_nuderiv4d.java PDE  source code
pde44e_nuderiv4d_1_java.out PDE  solution

A third degree PDE over the more linearly independent 4D grid points
shows the error degradation with more free vertices.
The output has three accuracy tests, for 1, 5 and 10 free nodes.

pde34_nuderiv4d.java PDE  source code
pde34_nuderiv4d_1_java.out PDE  solution

non uniform distributed vertices in Ada

Basically same code as above, in Ada

An extended example of discretization of a fourth order PDE in two
non uniform dimensions is shown in discrete2d.txt

Error increases with higher derivative and fewer points

The minimum number of points to compute a derivative of order n is n+1.
More points used decreases the error until numeric failure.
e.g. without multiple precision, third and fourth order derivative
fail with 11 points. An example of the variation of error from
computing the derivative on a sine wave is shown by

test_deriv.c half and full cycle
deriv.c basic discretization, uniform spacing
test_deriv_c.out output showing errors

Non uniform discretization follows the same error pattern and the maximum
error goes approximately as the largest spacing between points.



### Lecture 28d, Biharmonic PDE using Higher Order


Using Galerkin FEM on a fourth order Biharmonic PDE.
Two and three dimensions and parallel versions for C, Java, Ada and Python

Several versions of the Biharmonic PDE are used. First version is

∂4U(x,y)/∂x4 + 2 ∂4U(x,y)/∂x2∂y2 +  ∂4U(x,y)/∂y4 +
2 ∂2U(x,y)/∂x2 + 2 ∂2U(x,y)/∂y2 + 2 U(x,y) = f(x,y)

First in two dimensions, then three dimensions.
Very few degrees of freedom are needed by using high order
shape functions and high order quadrature.

First, create test case with known solution.
Then, test that the test case is correctly coded in a language of your choice.
source code  test_bihar2d.java
test output  test_bihar2d.out

Plot after clicking "next" a few times.

Next, include your test case in a PDE solver.
This case uses a previously covered Galerkin FEM with Lagrange shape functions.
Several choices higher order integration was used.
Several choices of degrees of freedom was used.

source code  fem_bihar2d_la.java
output  fem_bihar2d_la_java.out
source code  fem_bihar2d_la.c
output  fem_bihar2d_la_c.out

Plot of final solution, file plotted by gnuplot:

A multi language set of fourth order biharmonic PDE solutions.
High order shape function, high order quadrature, small DOF.
Wall time and checking solution against PDE included.
Expected solution is  sin(x)*sin(y)

source code  fem_bihar2dps_la.java
output  fem_bihar2dps_la_java.out
source code  fem_bihar2dps_la.c
output  fem_bihar2dps_la_c.out
source code  fem_bihar2dps_la.f90
output  fem_bihar2dps_la_f90.out
source code  fem_bihar2dps_la.py
output  fem_bihar2dps_la_py.out

Now a three dimensional, fourth order Biharmonic PDE,
solved with few degrees of freedom and high order quadrature.

First, create test case with known solution.
Then, test that the test case is correctly coded in a language of your choice.
source code  test_bihar3d.java
test output  test_bihar3d.out

Plot after clicking "next" a few times.

Next, include your test case in a PDE solver.
This case uses a previously covered Galerkin FEM with Lagrange shape functions.
Several choices higher order integration was used.
Several choices of degrees of freedom was used.

source code  fem_bihar3d_la.java
output  fem_bihar3d_la_java.out
source code  fem_bihar3d_la.c
output  fem_bihar3d_la_c.out

Plot of final solution at z=zmin, file plotted by gnuplot:

Using discretization on a fourth order Biharmonic PDE.

∂4U(x,y,z,t)/∂x4 + ∂4U(x,y,z,t)/∂y4 +  ∂4U(x,y,z,t)/∂z4 +  ∂4U(x,y,z,t)/∂t4 +
2 ∂2U(x,y,z,t)/∂x2 + 2 ∂2U(x,y,z,t)/∂y2 +  2 ∂2U(x,y,z,t)/∂z2 + 2 ∂2U(x,y,z,t)/∂t2 + 2 U(x,y,z,t) = f(x,y,z,t)

source code  pde_bihar44t_eq.java
source code  plot4d.java
output pde_bihar44t_eq_java.out
plot data  pde_bihar44t_eq.dat
output  plot4d.out

Three sets of boundary for homogeneous Biharmonic PDE four dimensions
show variation in accuracy of solution.
This code writes the solution to a file for plotting with plot4d_gl.

Other languages, homogeneous Biharmonic PDE in four dimensions
"C" source code  pde44h_eq.c
output  pde44h_eq_c.out
Fortran source code  pde44h_eq.f90
Java source code  pde44h_eq.java
output  pde44h_eq_java.out

Now, make both the Java, C and Python code parallel, using threads,
for a shared memory computer:
(coming soon for distributed memory)

source code  femth_bihar2dps_la.java
output  femth_bihar2dps_la_java.out
source code  fempt_bihar2dps_la.c
output  fempt_bihar2dps_la_c.out
source code  femth_bihar2dps_la.py
output  femth_bihar2dps_la_py.out

Other files, that are needed by some examples above:

Java

nderiv.java first to very high numerical derivatives
gaulegf.java low to very high order quadrature
laphi.java first through fourth order shape functions and derivatives
fem_bihar2dps_la.plot for gnuplot
femth_bihar2dps_la.plot for gnuplot

rderiv.adb first to very high numerical derivatives
test_deriv_ada.out accuracy limit 4th order, 11 points
laphi.ads first through fourth order shape functions and derivatives

C

nderiv.c first to very high numerical derivatives
nderiv.h
gaulegf.c low to very high order quadrature
gaulegf.h
simeq.h
laphi.c first through fourth order shape functions and derivatives
laphi.h
fem_bihar2dps_la_c.plot for gnuplot
femth_bihar2dps_la_c.plot for gnuplot

Fortran

nderiv.f90 first to very high numerical derivatives
gaulegf.f90 low to very high order quadrature
laphi.f90 first through fourth order shape functions and derivatives
fem_bihar2dps_la_f90.plot for gnuplot

Python

deriv.py first to very high numerical derivatives
gauleg.py low to very high order quadrature
laphi.py first through fourth order shape functions and derivatives
fem_bihar2dps_la_py.plot for gnuplot
femth_bihar2dps_la_py.plot for gnuplot

Using discretization, more difficult to program, yet faster
Works for non uniform grid in both X and Y

pde_bihar2d_eq.java very fast
simeq.java very accurate for reasonable DOF
nuderiv.java very accurate for reasonable grid
pde_bihar2d_eq_java.out output
pde_bihar2d_eq.dat output data
pde_bihar2d_eq.plot plot
pde_bihar2d_eq.sh plot

pde_bihar2d_eq.c very fast
simeq.c
nuderiv.c
pde_bihar2d_eq_c.out output
pde_bihar2d_eq_c.plot plot

A difficult first order PDE in 4 dimensions
pde4sin_eq.dat

Another fourth order PDE
Kuramoto-Sivashinsky equation, is shown in Lecture 31b

More fourth order PDE
The Euler Bernoulli Beam Equation (static)

∂2w/∂x2 ( E I  ∂2w/∂x2 ) = q

With EI constant

∂4w/∂x4 = q(x)

The Euler Lagrange Beam Equation (dynamic)

∂2w/∂x2 ( E I  ∂2w/∂x2 ) = -μ ∂2w/∂t2 = q(x,t)

can be four independent variables w(x,y,x,t)

Many methods are needed for various PDE's
The Laplace Equation

∂2U/∂x2 + ∂2U/∂y2 = 0

Given the boundary conditions for 17 vertices in X and Y on
the unit square, using 17 points for derivatives, efficiently
solves U(x,y)=cos(k*x)*cosh(k*x) where k is about 2Pi.

Absolute error order 10E-5, with solution about -300 to 300

pde_laplace.c very fast
pde_laplace_c.out print output
pde_laplace_c.dat data output
pde_laplace_c.sh gnu plot
pde_laplace_c.plot plot

Another Laplace forth order, four dimension,

This PDE needed 12th order discretization to obtain an
accuracy near 0.0001. Almost all of the execution time
was used solving 10,000 equations in 10,000 unknowns, DOF.
10 cores were used to keep the execution time to below
20 minutes rather than about 3 hours for a single core
execution.

∂4U(x,y,z,t)/∂x4 + ∂4U(x,y,z,t)/∂y4 +  ∂4U(x,y,z,t)/∂z4 +  ∂4U(x,y,z,t)/∂t4 +
2 ∂4U(x,y,z,t)/∂x2∂y2 + 2 ∂4U(x,y,z,t)/∂x2∂z2 +
2 ∂4U(x,y,z,t)/∂x2∂t2 + 2 ∂4U(x,y,z,t)/∂y2∂z2 +
2 ∂4U(x,y,z,t)/∂y2∂t2 + 2 ∂4U(x,y,z,t)/∂z2∂t2 -
16 U(x,y,z,t) = f(x,y,z,t) = 0

pde_bihar44tl_eq.c
pde_bihar44tl_eq_c.out
tsimeq.c

Plotted with plot4d_gl, one static view of the 4D solution is:



### Lecture 28b, Navier Stokes case study

One version of Navier Stokes equation

Each term above has units of length over time squared, acceleration,
in meters per second squared. u is the three component velocity
vector, each component in meters per second, ρ is the fluid density
in kilograms per cubic meter, p is the pressure in newtons per square meter,
μ is the dynamic viscosity newton seconds per square meter.
X is externally applied acceleration, F/m, newtons per kilogram.

Another representation of Navier Stokes equation, multiplying the
above equation by ρ, yields terms that are force per unit volume,
newtons per cubic meter. del  ∇  is the differential operator,
gradient. v is the velocity vector. b is externally applied acceleration.

navier_gl.txt notes
navier_gl.c work in progress
navier_gl.out output attempt

1) You are expected to be intelligent enough to know the practical
limitations of equations that are given to you. When you develop
the equations, you must take into account practical limitations.
Many scholarly textbooks on differential equations make the
statement that a differential equation describing the physical
world is an approximation of what will actually happen. If you
keep stretching a spring using F = k x, the equation only
applies until the spring breaks.

2) We use the statement that "mass is conserved" in many physical
equations. Understand that the limits, usually unstated, are
"in a closed system and with no E = M C^2" Typically fluid equations
do not account for evaporation or fluid loss do to leaks or spillage.
The equations assume an absolutely closed system.

3) We use, as a fact "energy is conserved" in many physical equations.
Energy may take many forms and energy is easily converted from one
form to another and may not be taken into account by a specific equation.
For example, potential energy, a given mass at a given height, may
be released and potential energy is converted to kinetic energy.
Then, splat, the mass hits the ground in an inelastic collision and
has zero potential and zero kinetic energy. Most of the energy is
converted to heat energy with the possible conversion to some
chemical energy and some electrical energy.
Understand that heat and electrical energy can take the form
of radiation and leave what you are considering a "closed system."

4) Momentum is not conserved during a non elastic collision!
Fortunately, molecules of most fluids do not stick very often.
Surface tension must be taken into account when the fluid has
a surface open to the atmosphere. Capillary action affect must
be taken into account when a fluid "wets" a vertical surface.

5) When considering an infinitesimal volume, for example a cube in
a Cartesian measurement system. We generally take z to be the
vertical axis, meaning along a gravitational force line. We
consider a non compressible fluid in this cube with side s, to have
pressure applied to all six faces. Several cases are possible:
a) The fluid is stationary, what are the pressure on each face?
We define pressure as positive force divided by area into the cube.
aa) The cube of fluid has the same fluid on all sides.
Because of gravity, not all faces have the same pressure.
Given pressure P on the more positive X face, there must
be the same force P on the more negative X face else
the cube would accelerate and move. The pressure on both
Y faces must be the same P else fluid would move into or
out of the cube in the Y direction.
But, we are making an assumption that the pressure is constant
all over the X and Y faces yet this is an approximation. The
fluid has density and thus the mass is density times volume.
Mass is attracted by gravity. We can reasonably assume the force
of gravity does not change enough to be considered for the height
s, of our infinitesimal cube. Yet the pressure on the more positive
Z face must be considered different from the pressure on
the more negative Z face. Gravity puts a force of mass times
the gravitational constant g in the negative Z direction
on the fluid in the cube. With g taken as a constant, the
force is linear in the Z direction, thus the pressure P
on the X and Y faces is actually the pressure at the center
of each face. The pressure on the top, most positive Z face
is  P - 1/2 rho * s * g and the pressure on the bottom,
most negative Z face is P + 1/2 rho * s * g. Pressure must
increase as depth increases.

ab) One or more face is against an immovable wall.
Same as aa)

b) The fluid is moving at a constant velocity.
ba) The cube of fluid has the same velocity as fluid on all sides.
Same as aa)
bb) The fluid against some face is moving at a different velocity.
Now we can not ignore viscosity. Viscosity is essentially a
measure of sliding resistance between the fluids at the faces

c) The fluid is accelerating.
All of the above must be taken into account with the addition
of F = m a applied to the center of mass of the cube.
Consider the X direction with pressure P1 on the most negative
X face and a smaller P2 on the most positive X face. We can
apply the force  (P1 - P2)*s^2 in the positive X direction to
the center of mass. Differences the pressure on Y faces are
independently applied to the Y direction. Use caution in
the Z direction, subtracting the difference in pressure due
to gravity because this does not impart an acceleration on
the center of mass when the infinitesimal cube has fluid or
an immovable surface under the cube.

It may be too late, but, don't panic, this was a small part of
Computational Fluid Dynamics, CFD. CFD is a large area and modeling
includes supersonic flight in complex scenarios.



### Lecture 28e, 5D five dimensions, independent variables

Just extending fourth order PDE in four dimensions, to five dimensions

Desired solution is U(w,x,y,z,t), given PDE:

.5  ∂4U(w,x,y,z,t)/∂w4 + 1.5 ∂4U(w,x,y,z,t)/∂w∂x∂y∂z + 2.5 ∂4U(w,x,y,z,t)/∂w2∂t2 +
∂4U(w,x,y,z,t)/∂x4 + 2 ∂4U(w,x,y,z,t)/∂y4 + 3 ∂4U(w,x,y,z,t)/∂z4 + 4 ∂4U(w,x,y,z,t)/∂t4 +
5 ∂3U(w,x,y,z,t)/∂x∂y∂t + 6 ∂4U(w,x,y,z,t)/∂y2∂z2 +
7 ∂3U(w,x,y,z,t)/∂z∂t2 + 8 ∂3U(w,x,y,z,t)/∂t3 + 9 ∂3U(w,x,y,z,t)/∂y2∂t +
10 ∂2U(w,x,y,z,t)/∂z∂t + 11 ∂U(w,x,y,z,t)/∂t + 12 U(w,x,y,z,t) = f(w,x,y,z,t)

With f(w,x,y,z,t) given in pde45.txt, Dirichlet boundary values from U(w,x,y,z,t)

Development requires test cases to verify correctness of code.
Check computing first through fourth order derivative of five variables:

Now, test a fourth order PDE in five dimensions.

The test case used in pde45_eq was created with the help of Maple
pde45.txt text of Maple output

Plotting solution against 5D independent variables
Designed for interactive changing of variables plotted and variables values.
pot5d_gl.c plot program

∇4U + 2 ∇2U + 5 U = 0

Fourth order Biharmonic PDE solution in 5 dimensions in "C"
pde45h_eq.c extended pde44h_eq.c
pde45h_eq_c.out output

Fourth order Biharmonic PDE solution in 5 dimensions in Fortran 90
pde45h_eq.f90 extended pde44h_eq.f90
pde45h_eq_f90.out output

Fourth order Biharmonic PDE solution in 5 dimensions in Java
pde45h_eq.java extended pde44h_eq.java
pde45h_eq_java.out output



### Lecture 28f, 6D six dimensions, Biharmonic

Just extending fourth order PDE in four dimensions, to six dimensions

Desired solution is U(v,w,x,y,z,t), given PDE:

∇4U + 2 ∇2U + 6 U = 0

∂4U(v,w,x,y,z,t)/∂v4 + ∂4U(v,w,x,y,z,t)/∂w4 +
∂4U(v,w,x,y,z,t)/∂x4 + ∂4U(v,w,x,y,z,t)/∂y4 +
∂4U(v,w,x,y,z,t)/∂z4 + ∂4U(v,w,x,y,z,t)/∂t4 +
2 ∂2U(v,w,x,y,z,t)/∂v2 + 2 ∂2U(v,w,x,y,z,t)/∂w2 +
2 ∂2U(v,w,x,y,z,t)/∂x2 + 2 ∂2U(v,w,x,y,z,t)/∂y2 +
2 ∂2U(v,w,x,y,z,t)/∂z2 + 2 ∂2U(v,w,x,y,z,t)/∂t2 +
6 U(v,w,x,y,z,t) = 0

6D cubes and spheres
faces.c program
faces.out Data on 2D to 6D
test_faces.c program
test_faces.out sphere data

Test a fourth order PDE in six dimensions.
∇4U + 2 ∇2U  + 6 U = 0

∇4U + 2 ∇2U  + 6 U = 0
pde46h_eq.c

pde46h_eq_c.out verification output

∇4U + 2 ∇2U  + 6 U = 0
pde46h_eq.java

pde46h_eq_java.out verification output

Plotted with plot6d_gl.c, stopped on one of the moving output
pde46h_eq_java.dat data written for plot

∇4P + 2 ∇2P  = f(x,y,z,t,u,v)
Solving for P = exp(x+y+t*z+u+v) six independent variables, 4th order
pde64eb_eq.c testing source code
pde64eb_eq_c.out verification output
d6p4eb.mw Maple test generation

∇4P + 2 ∇2P  = f(x,y,z,t,u,v)

Solving for P = sin(x*u+y*v+z+t) six independent variables, 4th order
pde64sb_eq.c testing source code
pde64sb_eq_c.out verification output
d6p4sb.mw Maple test generation
d6p4sb_rhs.jpg RHS

Plotting solution against 6D independent variables
Designed for interactive changing of variables plotted and variables values.
pot6d_gl.c plot program

With a small change, we obtain a nonuniform refinement
Now we use "nuderiv" rather than "rderiv" that can use
nonuniform and different grids in each dimension.

Test a fourth order PDE in six dimensions with nonuniform refinement.
∇4U + 2 ∇2U  + 6 U = 0

∇4U + 2 ∇2U  + 6 U = 0
pde46h_nu.java

pde46h_nu_java.out verification output

∇4U + 2 ∇2U  + 6 U = 0
pde46h_nu.c

pde46h_nu_c.out verification output

Plotting solution against 6D independent variables
Designed for interactive changing of variables plotted and variables values.
pot6d_gl.c plot program

Least Square Fit 6 independent variables up to sixth power
lsfit.ads Least Square Fit 6D 6P
lsfit.adb Least Square Fit 6D 6P

Some programs above also need

You won't find many free or commercial 5D and 6D PDE packages
many lesser problems have many opensource and commercial packages
en.wikipedia.org/wiki/list_of_finite_element_software_packages



### Lecture 28g, extending to 7 dimensions

Just extending fourth order PDE in four dimensions, to seven dimensions

Desired solution is U(x,y,z,t,u,v,w), given PDE:

∇4U + 2 ∇2U + 8 U = f(x,y,z,t,u,v,w)

∂4U(x,y,z,t,u,v,w)/∂v4 + ∂4U(x,y,z,t,u,v,w)/∂w4 +
∂4U(x,y,z,t,u,v,w)/∂x4 + ∂4U(x,y,z,t,u,v,w)/∂y4 +
∂4U(x,y,z,t,u,v,w)/∂z4 + ∂4U(x,y,z,t,u,v,w)/∂t4 +
2 ∂2U(x,y,z,t,u,v,w)/∂v2 + 2 ∂2U(x,y,z,t,u,v,w)/∂w2 +
2 ∂2U(x,y,z,t,u,v,w)/∂x2 + 2 ∂2U(x,y,z,t,u,v,w)/∂y2 +
2 ∂2U(x,y,z,t,u,v,w)/∂z2 + 2 ∂2U(x,y,z,t,u,v,w)/∂t2 +
8 U(x,y,z,t,u,v,w) = f(x,y,z,t,u,v,w)

Test a fourth order PDE in seven dimensions.
∇4U + 2 ∇2U  + 8 U = f(x,y,z,t,u,v,w)

Test a fourth order PDE in seven dimensions.
∇4U + 2 ∇2U  + 8 U = f(x,y,z,t,u,v,w)
pde47h_nu.java extended pde46h_nu.java

pde47h_nu_java.out verification output
pde47h_eq.java solver source code
pde47h_eq_java.out verification output

∇4U + 2 ∇2U  + 8 U = f(x,y,z,t,u,v,w)
pde47h_nu.c

pde47h_nu_c.out verification output
pde47h_eq.c solver source code
pde47h_eq_c.out verification output

Plotted with plot7d_gl.c, stopped on one of the moving output
pde47h_nu_c1.dat data written for plot

Boltzmann Equation in seven dimensions, using Galerkin FEM.
∇U(x,y,z,t,p,q,r)  = Sf(x,y,z,t,p,q,r)
boltzmann_7la.java extended fem_check44_la.java
boltzmann_7la_java.out verification output
boltzmann_7la.dat for plot7d_gl and lsfit_file

least square fit of 7 dimensional data
lsfit.adb Least square fit, up to 7 dimensions, 4th order
test_lsfit7.adb Test least square fit, up to 7 dimensions, 4th order
lsfit.java Least square fit, up to 7 dimensions, 4th order
lsfit_file.java fit polynomial solution to PDE numerical solution
lsfit_file_java7.out Test results

Some programs above also need:
nuderiv.java basic non uniform grid derivative
rderiv.java basic uniform grid derivative
simeq.java basic simultaneous equation
deriv.h basic derivatives
deriv.c basic derivatives
plot7d_gl.c plot solution to PDE

You won't find many open source or commercial 7D PDE packages
many lesser problems have many open source and commercial packages
en.wikipedia.org/wiki/list_of_finite_element_software_packages



### Lecture 28h, PDE polar, cylindrical, spherical

Extending PDE's in polar, cylindrical, spherical

solve PDE in polar coordinates
∇2U(r,Θ) + U(r,Θ) = f(r,Θ)
given Dirichlet boundary values and f

Development version with much checking:

solve PDE in cylindrical coordinates
∇2U(r,Θ,z) + U(r,Θ,z) = f(r,Θ,z)
given Dirichlet boundary values and f

reference equations, cylindrical, spherical

Development version with much checking:

solve PDE in spherical coordinates
∇2U(r,Θ,Φ) + U(r,Θ,Φ) = f(r,Θ,Φ)
given Dirichlet boundary values and f

Development versions with much checking:

pde_spherical_eq.java
pde_spherical_eq_java.out

pde_spherical_eq.c
pde_spherical_eq_c.out

pde_spherical_eq.f90
pde_spherical_eq_f90.out

Other utility files needed by sample code above:
simeq.h
simeq.c
nuderiv.h
nuderiv.c
simeq.java
nuderiv.java
nuderiv.f90
inverse.f90

Easily converted to any other of courses languages.
Also possible, toroidal coordinates.



### Lecture 28j, PDE toroid geometry

Extending PDE's to toroid geometry

solve PDE in toroid boundary

First refinement

Another view of refinement

Toroid center at  0,0,0  can be translated and rotated
r1, r2, theta, phi  mapped to x,y,z
x = (r1+r2*sin(phi))*cos(theta)
y = (r1+r2*sin(phi))*sin(theta)
z = r2 * cos(phi)
0 ≤ theta ≤ 2Pi    0 ≤ phi ≤ 2Pi   0 < r2 < r1

Area = (2*Pi*r1)*(2*Pi*r2)     infinitesimal area = dtheta*r1*dphi*r2
Volume = (2*Pi*r1)*(Pi*r2*r2)  infinitesimal volume = dtheta*r1*dphi*r2*dr2
Equation   (r1-sqrt(x^2+y^2))^2 + z^2 = r2^2

r2, x, y, z mapped to r1, theta, phi (r2 taken as constant)
theta = arctan(y/x)
phi = arccos(z/r2)
r1 = x/cos(theta) - r2*sin(phi)   or
r1 = y/sin(theta) - r2*sin(phi)   no divide by zero

r1, x, y, z mapped to r2, theta, phi (r1 taken as constant)
theta = arctan(y/x)    fix angle by quadrant
x1 = r1*cos(theta)
y1 = r1*sin(theta)
phi = arctan(sqrt((x-x1)^2+(y-y1)^2)/z)  fix by quadrant
r2 = sqrt((x-x1)^2+(y-y1)^2+z^2)

PDE for testing: r1 constant, r2 is r, theta is t, phi is p

dU^2(r1,r2,t,p)/dr2^2 + dU^2(r1,r2,t,p)/dt^2 +
dU^2(r1,r2,t,p)/dp^2  = f(r1,r2,t,p)

U(r,t,p):=r*r*(1+sin(t))*(1+cos(p));
2
U(r, t, p) := r  (1 + sin(t)) (1 + cos(p))

Urr(r,t,p):=diff(diff(U(r,t.p),r),r);
Urr(r, t, p) := D[1, 1](U)(r, t . p)

Urr(r,t,p):=diff(diff(U(r,t,p),r),r);
Urr(r, t, p) := 2 (1 + sin(t)) (1 + cos(p))

Utt(r,t,p):=diff(diff(U(r,t,p),t),t);
2
Utt(r, t, p) := -r  sin(t) (1 + cos(p))

Upp(r,t,p):=diff(diff(U(r,t,p),p),p);
2
Upp(r, t, p) := -r  (1 + sin(t)) cos(p)

f(r,t,p):=Urr(r,t,p)+Utt(r,t,p)+Upp(r,t,p);
2
f(r, t, p) := 2 (1 + sin(t)) (1 + cos(p)) - r  sin(t) (1 + cos(p))

2
- r  (1 + sin(t)) cos(p)

simplify(f(r,t,p));
2             2
2 + 2 cos(p) + 2 sin(t) + 2 sin(t) cos(p) - r  sin(t) - 2 r  sin(t) cos(p)

2
- r  cos(p)

f(r2,t,p) =
2.0*(1.0 + cos(p) + sin(t) + sin(t)*cos(p)) -
r2*r2*(cos(p) + sin(t) + 2.0*sin(t)*cos(p))

Ub(r2,t,p):=r2*r2*(1.0+sin(t))*(1.0+cos(p));  Dirichlet boundary



### Lecture 29, Review


Go over WEB pages Lecture 1 through 29, including:
3a, 3b, 28a and 28a.
Open book, open notes, no computer.
Follow the instructions, else lose points.
The study guide is not allowed during the exam.

Go over Quiz1 and Quiz2. Some of those questions may be reused.
Go over Lecture 9 Review, Lecture 19 Review.

Things you should know:

It is best to find working code to do the numerical computation that
you need, rather than to develop the code yourself.

Thoroughly test any numerical code you are planning to use.

Convert existing, tested, numerical code to the language of your choice.

It is usually better to convert working numerical code to the language
of your choice, rather than creating a multi language interface.
(The exception to this suggestion is LAPACK.)

Modify the interfaces as needed for your application.

Do not put trust in benchmarks that others have run. Run your own benchmarks.

It is possible for operating system code or library code to have an
error that causes incorrect time for a benchmark.

A benchmark must run long enough to avoid error due to the hardware
or software quantization of time. Ten seconds has been found acceptable
on most computers. Less that one second has been found to be bad on
a few computers.

Dead code elimination can cause a benchmark to appear to run very fast.
Using an "if" statement of "print" statement can prevent dead code
elimination.

If you are unable to run a benchmark yourself, try to find benchmarks
that resembles what you are interested in.

It is possible to computer derivatives very accurately with high
order equations. A function is available in a few languages to
compute the required coefficients.

Derivatives can be computed for any function that can be evaluated.
By using more function evaluations, better accuracy can be obtained.
Making the step size extremely small may hurt accuracy.

A second derivative can be computed by numerically computing two
successive first derivatives. Yet, accuracy will be better when
using the formulas for second order derivatives.

More function evaluations are required in order to maintain the
same order of accuracy for each higher derivative.

Ordinary differential equations have only one independent variable.

Partial differential equations have more than one independent variable.

The "order" of a differential equation is the highest derivative in
the equation.

Given y=f(x) the first derivatives of  f(x)  may be written as
y'  f'  f'(x)  df/dx .

For partial derivatives, given  z=f(x,y)  dz/dx may be written as  fx,
dz/dy may be written as fy,  d^3z/dx^2 dy man be written  fxxy .

The Runge-Kutta method is a common way to compute an iterative solution
to an ordinary differential equation.

Solving a system of linear equations is a common way to compute a
solution to both ordinary and partial differential equations.

The unknowns in the system of differential equations being used
to solve a differential equations are the values of the unknown
function at specific points. e.g. y(h), y(2h), y(3h), etc.

Given a fourth order ordinary differential equation with only
initial conditions, y'''' = f(x), in order to find a unique
numerical solution, values for  c   must be given for:
c1 = y(x0)  c2 = y'(x0)  c3 = y''(x0)  c4 = y'''(x0)
and all the values must be at the same x0.

A method that might give reasonable results for the above equation can  be:
x     = x0
y     = c1
yp    = c2   yp is for y'
ypp   = c3
yppp  = c4
L:  ypppp = f(x)
yppp  = yppp + h * ypppp
ypp   = ypp  + h * yppp
yp    = yp   + h * ypp
y     = y    + h * yp       this is the solution at  x0+h
x     = x    + h            loop to L: for more values.

Much of Lecture 27 and 28 were from the source code:
pde3.c     A second order partial differential equation with
boundary values, two independent variables, with equal
step size, using an iterative method.

pde3_eq.c  same as pde3.c with the iteration replaced by a call
to simeq and the setup of the matrix in init_matrix.

pde3b.c    A similar second order partial differential equation
with boundary values, three independent variables and
each independent variable can have a unique step size
and a unique number of points to be evaluated.

pde3b_eq.c same as pde3b.c with the iteration replaced by a call
to simeq and the setup of the matrix in init_matrix.

Notice that these programs were also provided in various other
languages and looked very similar.

The problem was provided in the comments and in the code.

These "solvers" had the solution programmed in so that the method
and the code could be tested. In general the solution is not
known. The solution is computed at some specific points. If the
solution is needed at other points, then interpolation is used.

These "solvers" were based on solving a partial differential
equation that was continuous and continuously differentiable.
There are many specialized solvers for specific partial differential
equations. These lectures just covered two of the many methods

You have worked with physical world modeling and numerical
computation. This is a valuable skill in helping your
future employer reach good decisions, as shown in the
diagram from SIAM Review.



### Lecture 30, Final Exam


Open book, open note exam.

All homework and projects must be submitted by
midnight of the final exam.

submit cs455 proj your-files
for the project.



### Lecture 31, Creating PDE Test Cases


Generate a test case for a second order PDE with two independent variables.
Make the PDE general by having it be  elliptic in some regions,
hyperbolic in some regions and passing through parabolic.

Note: This is for the set of solutions that are continuous and
continuously differentiable. There are many special solutions and
corresponding special solvers that should be used for special cases.
This lecture is about a general case and a general solver.

The solution will be u(x,y).
The notation for derivatives will be
uxx(x,y) second derivative of u(x,y) with respect to x
uxy(x,y) derivative of u(x,y) with respect to x and with respect to y
uyy(x,y) second derivative of u(x,y) with respect to
ux(x,y) derivative of u(x,y) with respect to x
uy(x,y) derivative of u(x,y) with respect to y

We will create functions a1(x,y), b1(x,y), c1(x,y), d1(x,y), e1(x,y) and
f1(x,y).

The PDE we intend to solve, compactly, is:

a1 uxx + b1 uxy + c1 uyy + d1 ux + e1 uy + f1 u = c

The PDE written for computer processing is:

a1(x,y)*uxx(x,y) + b1(x,y)*uxy(x,y) + c1(x,y)*uyy(x,y) +
d1(x,y)*ux(x,y) + e1(x,y)*uy(x,y)  + f1(x,y)*u(x,y) = c(x,y)

Yes, we could use algebra to reorganize the PDE, yet we will not.

We choose the functions a1, b1 and c1 such that b1^2 -a1 c1 is
elliptic, hyperbolic and parabolic in various regions.

a1(x,y) = exp(x/2)*exp(y)/2

b1(x,y) = 0.7/(x*x*y*y + 0.5)

c1(x,y) = (4 - exp(x) - exp(y/2))*2

then we plot  b1^2 - a1 c1 over the x-y plane.

abc.m MatLab to plot a1, b1, c1 and b1^2-a1*c1

Below the yellow plane is elliptic, above the yellow plane is
hyperbolic and on the yellow plane is parabolic. This has little
to do with the numerical solution of our PDE. For some classic
PDE's only in one region, there are specialized solutions.

Equations of mixed type:
If a PDE has coefficients which are not constant, as shown above,
it is possible that it will not belong to any of the three categories.
A simple but important example is the Euler-Tricomi equation
uxx = xuyy
which is called elliptic-hyperbolic because it is elliptic in the region
x < 0, hyperbolic in the region x > 0, and degenerate parabolic
on the line x = 0.

The other functions for this test case were chosen as:

d1(x,y) = x^2 y

e1(x,y) = x y^2

f1(x,y) = 3x + 2y

Now we have to pick our solution function u(x,y).
Since we will be taking second derivatives, we need at least a
third order solution in order to be interesting.

Out of the air, I chose:

u(x,y) = x^3 + 2y^3 + 3x^2 y + 4xy^2 + 5xy + 6x + 7y + 8

We do not want the solution symmetric in x and y.
We want a smattering of terms for testing.

Now we have to compute the forcing function   c(x,y)
Doing the computation by hand is too error prone.
I use Maple.

Maple computation of c(x,y)  Look at this!

Maple work sheet

c(x,y) =  0.5*exp(x/2.0)*exp(y)*(6.0*x+6.0*y)                  +
0.7*(6.0*x + 8.0*y + 5.0)/(x*x*y*y+0.5)              +
(8.0 - 2.0*exp(x) - 2.0*exp(y/2.0))*(12.0*y + 8.0*x) +
(x*x+y)*(3.0*x*x + 6.0*x*y + 4.0*y*y + 5.0*y + 6.0)  +
x*y*y*(6.0*y*y + 3.0*x*x + 8.0*x*y + 5.0*x +7.0)     +
(3.0*x + 2.0*y)*(x^3 + 2y^3 + 3x^2 y + 4xy^2 + 5xy + 6x + 7y + 8)

We have to code the function  c(x,y) for our solver.
We have to code the function  u(x,y) for our solver to compute
boundary values and we will use the function to check our solver.

The solution will be u(x,y) and the solver must be able to evaluate
this function on the boundary. The region will be
xmin <= x <= xmax
ymin <= y <= ymax
We will solve for nx points in the x dimension, ny points in y dimension with
hx=(xmax-xmin)/(nx-1)  step size in x
hy=(ymax-ymin)/(ny-1)  step size in y

We will compute the approximate solution at u(i,j) for the point
u(xmin+i*hx,ymin+j*hy). With subscripts running i=0..nx-1 and j=0..ny-1.
Known boundary values are at i=0, i=nx-1, j=0, and  j=ny-1.
These subscripts are for "C" and Java, add one for Ada, Fortran and
MatLab subscripts.

The actual numeric values can be set at the last minute, yet we will
use  xmin = ymin = -1.0    xmax = ymax = 1.0   nx = 7   ny = 7

Now, we have to generate the matrix that represents the system of
linear simultaneous equations for the unknown values of u(x,y)
at xmin+i*hx  for i=1..nx-2    ymin+j*hy  for j=1..ny-2

I am using solution points  1, 2, ... , nx-2 by
1, 2, ... , ny-2 stored in a matrix starting at ut[0][0] for coding in "C".

The solution points will be the same as used in pde3:

The PDE will be made discrete by using unknown values in difference
equations as given in nderiv.out or

For this test case, I choose to use five point derivatives for
both first and second order. Note that the uxy term has the
first derivative with respect to x at five y values then
the first derivative of these values with respect to y.

Taking a term at a time from the PDE and writing the discrete version:

a1(x,y)*uxx(x,y) + b1(x,y)*uxy(x,y) + c1(x,y)*uyy(x,y) +
d1(x,y)*ux(x,y) + e1(x,y)*uy(x,y)  + f1(x,y)*u(x,y) = c(x,y)

At (x,y)  the discrete approximation of  a1(x,y)*uxx(x,y) =

a1(x,y) * (1/(12*hx*hx)) *
(-1*u(x-2hx,y) +16*u(x-hx,y) -30*u(x,y) +16*u(x+hx,y) -1*u(x+2hx,y))

yet, we want to solve for u(i,j) in a matrix,
thus using x=xmin+i*hx and y=ymin+j*hy rewrite the above approximation as

a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hx)) *
(-1*u(i-2,j) +16*u(i-1,j) -30*u(i,j) +16*u(i+1,j) -1*u(i+2,j))

At (x,y)  c1(x,y)*uyy(x,y) = the discrete approximation

c1(x,y) * (1/(12*hy*hy)) *
(-1*u(x,y-2hy) +16*u(x,y-hy) -30*u(x,y) +16*u(x,y+hy) -1*u(x,y+2hy))

again, we want to solve for u(i,j) in a matrix,
thus using x=xmin+i*hx and y=ymin+j*hy rewrite the above approximation as

c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *
(-1*u(i,j-2) +16*u(i,j-1) -30*u(i,j) +16*u(i,j+1) -1*u(i,j+2))

What happened to b1(x,y)*uxy(x,y) ?
Well that one is more difficult, thus we now use the combination of
the first derivative with respect to x, then the first derivative
with respect to y, 25 terms.
Writing all the terms using i and j

b1(xmin+i*hx,ymin+j*hy) * (1/(144*hx*hy)) *
( 1*u(i-2,j-2)  -8*u(i-1,j-2) +0*u(i,j-2)  +8*u(i+1,j-2) -1*u(i+2,j-2)
-8*u(i-2,j-1) +64*u(i-1,j-1) +0*u(i,j-1) -64*u(i+1,j-1) +8*u(i+2,j-1)
+0*u(i-2,j  )  +0*u(i-1,j  ) +0*u(i,j  )  +0*u(i+1,j  ) +0*u(i+2,j  )
+8*u(i-2,j+1) -64*u(i-1,j+1) +0*u(i,j+1) +64*u(i+1,j+1) -8*u(i+2,j+1)
-1*u(i-2,j+2)  +8*u(i-1,j+2) +0*u(i,j+2)  -8*u(i+1,j+2) +1*u(i+2,j+2))

Just writing the remaining terms using i and j
using the discrete coefficients.

d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *
(1*u(i-2,j) -8*u(i-1,j) +0*u(i,j) +8*u(i+1,j) -1*u(i+2,j))

e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *
(1*u(i,j-2) -8*u(i,j-1) +0*u(i,j) +8*u(i,j+1) -1*u(i,j+2))

The u(x,y) term is just

f1(xmin+i*hx,ymin+j*hy)*u(i,j)

The right hand side of every approximation above is just

c(xmin+i*hx,ymin+j*hy)

If any of the u(i+a,j+b) are boundary elements, we plug in the
numeric value of the boundary element, multiply by the
coefficient, and subtract the product from the right hand side.

Now we have many equations for u(i,j) using values for i and j,
and we must compute the coefficients for the set of equations
for  i=2..nx-3  j=2..ny-3  the central case.

Remember, u(i=0,j), u(i=nx-1,j), u(i,j=0) and u(i,j=ny-1) are
known boundary values.

What about u(1,j), u(nx-2,j) for j=1..ny-2 and
u(i,1), u(i,ny-2) for i=2..nx-3  (do not use u(1,1) twice !!!)
(do not use u(nx-2,ny-2) twice)

Fortunately, the coefficients for the discrete derivatives can be

At (1,j) the discrete approximation of
a1(xmin+1*hx,ymin*j*hy)*uxx(xmin+1*hx,ymin+j*hy) =

a1(xmin+1*hx,ymin+j*hy) * (1/(12*hx*hx)) *
(-3*u(i-1,j) -10*u(i,j) +18*u(i+1,j) -6*u(i+2,j) +1*u(i+3,j))

We could not use i-2 because it is outside our region, thus
note "deriv" is called with 'point' set to 1.

At (nx-2,j)  the discrete approximation of
a1(xmin+(nx-2)*hx,ymin*j*hy)*uxx(xmin+(nx-2)*hx,ymin+j*hy) =

a1(xmin+(nx-2)*hx,ymin+j*hy) * (1/(12*hx*hx)) *
(-1*u(i-3,j) +6*u(i-2,j) -18*u(i-1,j) +10*u(i,j) +3*u(i+1,j))

We could not use i+2 or (nx-2)+2 because it is outside our region, thus
note "deriv" is called with 'point' set to 3.

There is nothing special about uxx(i,1) or uxx(nx-2,1) use the general case.

At (i,1)  the discrete approximation of  c1(x,y)*uyy(x,y) =

c1(xmin+i*hx,ymin+1*hy) * (1/(12*hy*hy)) *
(-3*u(i,j-1) -10*u(i,j) +18*u(i,j+1) -6*u(i,j+2) +1*u(i,j+3))

We could not use j-2 because it is outside our region, thus
note "deriv" is called with 'point' set to 1.

At (i,nx-2)  the discrete approximation of  c1(x,y)*uyy(x,y) =

c1(xmin+i*hx,ymin+(nx-2)*hy) * (1/(12*hy*hy)) *
(-1*u(i,j-3) +6*u(i,j-2) -18*u(i,j-1) +10*u(i,j) +3*u(i,j+1))

We could not use j+2 or (ny-2)+2 because it is outside our region, thus
note "deriv" is called with 'point' set to 3.

There is nothing special about uyy(1,j) or uxx(nx-2,j) use the general case.

Oh, and yes, ux(x,y) uy(x,y) uxy(x,y) also have to be shifted for the
"just inside the boundary case." Think of how much easier it is using five
points rather than sever points. With seven points the two rows and
columns inside the boundary are special cases.

Now we have enough equations to exactly compute the approximate solution.
We build a system of linear equations of the form:

| ut00  ut01  ut02  ut03 |   | u(1,1) |   | k0 |
| ut10  ut11  ut12  ut13 | * | u(1,2) | = | k1 |
| ut20  ut21  ut22  ut23 |   | u(2,1) |   | k2 |
| ut30  ut31  ut32  ut33 |   | u(2,2) |   | k3 |

We know the values at the boundary u(0,0), u(0,1), u(0,2), u(0,3)
u(1,0),                 u(1,3)
u(2,0),                 u(2,3)
u(3,0), u(3,1), u(3,2), u(3,3)

For this specific system of equations, nx=4, ny=4 and there are four
internal, non boundary, values to be found. The number of equations
will always be (nx-2)*(ny-2).

The value found for u(1,1) from solving the system of linear equations
is the desired solution at u(xmin+1*hx,ymin+1*hy). u(2,2) is the value of
u(xmin+2*hx,ymin+2*hy). Additional values, not on hx or hy steps,
of u(x,y) may be found by two dimensional interpolation.

The matrix "ut", used in the source code pde_abc_eq.c
is cleared to zero. Then each equation is used and the
coefficient of u(i,j) is added the appropriate ut entry.

There will be (nx-2)*(ny-2) equations that must be used.
The i and j for these equations are  for i=1..nx-2  for j=1..ny-2.
The PDE approximation above must be used for these i,j pairs.
Note that u(i,j), u(i-2,j) etc are variables and it is the
coefficients of these variables that get added into the "ut" matrix.
The first equation will add to entries in the first row of the matrix.
The second equation will add to entries in the second row of the matrix.
The boundary values are not stored in the ut matrix and using algebra,
a boundary value coefficient is multiplied by the boundary value and
subtracted from the constant term. This can cause many special cases
in the solver.
In general many entries in a row will not be changed from zero.
The final matrix will be in the class of a band matrix.

Now the gruesome work of evaluating the u(i,j) coefficients.
We cheat here. Now assume nx>4 and ny>5 and thus
i=2, j=3 is a general center case. From above, we have

a1(x,y)*uxx(x,y) + b1(x,y)*uxy(x,y) + c1(x,y)*uyy(x,y) +
d1(x,y)*ux(x,y) + e1(x,y)*uy(x,y)  + f1(x,y)*u(x,y) = c(x,y)

converted to a discrete approximation for the central elements is

a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hx)) *
(-1*u(i-2,j) +16*u(i-1,j) -30*u(i,j) +16*u(i+1,j) -1*u(i+2,j))          +
b1(xmin+i*hx,y+j*hy) * (1/(144*hx*hy)) *
( 1*u(i-2,j-2)  -8*u(i-1,j-2) +0*u(i,j-2)  +8*u(i+1,j-2) -1*u(i+2,j-2)
-8*u(i-2,j-1) +64*u(i-1,j-1) +0*u(i,j-1) -64*u(i+1,j-1) +8*u(i+2,j-1)
+0*u(i-2,j  )  +0*u(i-1,j  ) +0*u(i,j  )  +0*u(i+1,j  ) +0*u(i+2,j  )
+8*u(i-2,j+1) -64*u(i-1,j+1) +0*u(i,j+1) +64*u(i+1,j+1) -8*u(i+2,j+1)
-1*u(i-2,j+2)  +8*u(i-1,j+2) +0*u(i,j+2)  -8*u(i+1,j+2) +1*u(i+2,j+2)) +
c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *
(-1*u(i,j-2) +16*u(i,j-1) -30*u(i,j) +16*u(i,j+1) -1*u(i,j+2))          +
d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *
(1*u(i-2,j) -8*u(i-1,j) +0*u(i,j) +8*u(i+1,j) -1*u(i+2,j))              +
e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *
(1*u(i,j-2) -8*u(i,j-1) +0*u(i,j) +8*u(i,j+1) -1*u(i,j+2))              +
f1(xmin+i*hx,ymin+j*hy)*u(i,j)                                           =
c(xmin+i*hx,ymin+j*hy)

Collecting terms for u(i,j) we compute the coefficient  ct
a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hx)) *(-30) +
b1(xmin+i*hx,ymin+j*hy) * (1/(144*hx*hy)) *(0)     +
c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *(-30) +
d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *(0)       +
e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *(0)

and the right hand side is  c(xmin+i*hx,ymin+j*hy)

A bit of bizarre subscripting, assuming we are at i=2, j=3 this
is equation number ii=(i-1)+(nx-2)*(j-1), right?
Check i=1,j=1    is equation 0    if i or j equal zero, we have a boundary
i=2,j=1    is equation 1
...
i=nx-2,j=1 is equation nx-3
i=1,j=2    is equation nx-2
i=2,j=2    is equation nx-1
...
i=2,j=3    is equation (2-1)+(nx-2)*(3-1) the row in the ut matrix.

cs, the subscript for the right hand side is  cs=(nx-2)*(ny-2) stored in ut.

Thus we add ct to ut(ii,ii). The second subscript is where u(i,j) is.
Well, to tell the truth, we add each component of the sum in a loop,
directly into ut(ii,ii). If we had  ct  we would just store it in ut(ii,ii).

ut(ii,cs) is set to  c(xmin+i*hx,ymin+j*hy) - f1(xmin+i*hx,ymin+j*hy)

We must, of course, compute all the coefficients, e.g. u(i-1,j), ctm1

a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hx)) *(16) +
b1(xmin+i*hx,ymin+j*hy) * (1/(144*hx*hy)) *(0)    +
c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *(0)  +  does not exists
d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *(-8)     +
e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *(0)         does not exists

ij=(((i-1)-1)+(nx-2)*(j-1) where u(i-1,j) is in the solution vector.
ut(ii,ij) = ctm1

We must, of course, compute all the coefficients, e.g. u(i-2,j), ctmm1

a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hy)) *(-1) +
b1(xmin+i*hx,ymin+j*hy) * (1/(144*hx*hy)) *(0)    +
c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *(0)  +  does not exists
d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *(1)     +
e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *(0)         does not exists

ij=(((i-2)-1)+(nx-2)*(j-1) where u(i-2,j) is in the solution vector.
ut(ii,ij) = ctmm1   Wrong if i is 2!  u(0,j) is a boundary value, thus
ut(ii,cs) = ut(ii,cs) - ctmm1 * u(0,j)

And, one more coefficients, e.g. u(i,j+1), ctp1

a1(xmin+i*hx,ymin+j*hy) * (1/(12*hx*hx)) *(0)  +  does not exists
b1(xmin+i*hx,ymin+j*hy) * (1/(144*hx*hy)) *(0)    +
c1(xmin+i*hx,ymin+j*hy) * (1/(12*hy*hy)) *(16) +
d1(xmin+i*hx,ymin+j*hy) * (1/(12*hx) *(0 )     +  does not exists
e1(xmin+i*hx,ymin+j*hy) * (1/(12*hy) *(8)

"C" and Java subscripting
ij=((i-1)+(nx-2)*((j+1)-1) when u(i,j+1) is in the solution vector.
ut(ii,ij) = ctp1
ij=(i-1)+(nx-2)*((j+1)-2) when u(i,j+1) is in the solution vector.
ut(ii,ij) = ctp1
where uc(ii) is the right hand side.

Yes, the code is nested four levels deep in iterations.

Now do u(1,j), u(nx-2,j) for j=1..ny-2 and
u(i,1), u(i,ny-2) for i=2..nx-3  (do not use u(1,1) twice !!!)
(do not use u(nx-2,ny-2) twice)

Once the "ut" matrix is initialized, solve the simultaneous equations
and print the answers. Now, wasn't that easy.

A crude layout for  nx=9, ny=9 is shown below with subscripts
The  i,j  notation is for a known boundary value,
the number is the ii subscript of the unknown value inside the I====I
I ut I
I====I.

j
0     1     2     3     4     5     6     7     8   ny=9
+-----------------------------------------------------+
0 | 0,0 | 0,1 | 0,2 | 0,3 | 0,4 | 0,5 | 0,6 | 0,7 | 0,8 |
|     I=========================================I     |
1 | 1,0 I  0  |   7 |  14 |  21 |  28 |  35 |  42 I 1,8 |
|     I-----------------------------------------I     |
2 | 2,0 I  1  |   8 |  15 |  22 |  29 |  36 |  43 I 2,8 |
|     I-----------------------------------------I     |
3 | 3,0 I  2  |   9 |  16 |  23 |  30 |  37 |  44 I 3,8 |
|     I-----------------------------------------I     |
i 4 | 4,0 I  3  |  10 |  17 |  24 |  31 |  38 |  45 I 4,8 |
|     I-----------------------------------------I     |
5 | 5,0 I  4  |  11 |  18 |  25 |  32 |  39 |  46 I 5,8 |
|     I-----------------------------------------I     |
6 | 6,0 I  5  |  12 |  19 |  26 |  33 |  40 |  47 I 6,8 |
|     I-----------------------------------------I     |
7 | 7,0 I  6  |  13 |  20 |  27 |  34 |  41 |  48 I 7,8 |
|     I=========================================I     |
8 | 8,0 | 8,1 | 8,2 | 8,3 | 8,4 | 8,5 | 8,6 | 8,7 | 8,8 |
+-----------------------------------------------------+
nx=9
ut(ii) where ii = (i-1)+(nx-2)*(j-1)
These are "C" and Java subscripts.

j
1     2     3     4     5     6     7     8     9   ny=9
+-----------------------------------------------------+
1 | 1,1 | 1,2 | 1,3 | 1,4 | 1,5 | 1,6 | 1,7 | 1,8 | 1,9 |
|     I=========================================I     |
2 | 2,1 I  1  |   8 |  15 |  22 |  29 |  36 |  43 I 2,9 |
|     I-----------------------------------------I     |
3 | 3,1 I  2  |   9 |  16 |  23 |  30 |  37 |  44 I 3,9 |
|     I-----------------------------------------I     |
4 | 4,1 I  3  |  10 |  17 |  24 |  31 |  38 |  45 I 4,9 |
|     I-----------------------------------------I     |
i 5 | 5,1 I  4  |  11 |  18 |  25 |  32 |  39 |  46 I 5,9 |
|     I-----------------------------------------I     |
6 | 6,1 I  5  |  12 |  19 |  26 |  33 |  40 |  47 I 6,9 |
|     I-----------------------------------------I     |
7 | 7,1 I  6  |  13 |  20 |  27 |  34 |  41 |  48 I 7,9 |
|     I-----------------------------------------I     |
8 | 8,1 I  7  |  14 |  21 |  28 |  35 |  42 |  49 I 8,9 |
|     I=========================================I     |
9 | 9,1 | 9,2 | 9,3 | 9,4 | 9,5 | 9,6 | 9,7 | 9,8 | 9,9 |
+-----------------------------------------------------+
nx=9
ut(ii) where ii = (i-1)+(nx-2)*(j-2)
These are Ada, Fortran and Matlab subscripts.

A differential equation that has regions hyperbolic, elliptic and parabolic
defined in the various languages "abc" file.

The basic functions coded in "C" for this test case are:

abc.txt  instructions for user
abc.h  sample test case described above
abc.c  implementation of test case
deriv.c  discretization
pde_abc_eq.c  solver program
pde_abc_eq_c.out  sample output
abc.c  implementation of test case

The basic functions coded in Fortran 90 for this test case are:

abc.f90  implementation of test case
deriv.f90  discretization
simeq.f90  solve simultaneous equations
pde_abc_eq.f90  solver program
pde_abc_eq_f90.out  sample output

The basic functions coded in Java for this test case are:

abc.java  sample test case described above
nderiv.java  discretization
simeq.java  solve simultaneous equations
pde_abc_eq.java  solver program
pde_abc_eq_java.out  sample output

The basic functions coded in Ada 95 for this test case are:

abc.ads  sample test case described above

Sparse matrix storage is needed for the system of linear equations
when nx or ny is significantly greater than the discretization
order, nd.

The same PDE's as above, using the definition in "abc" are
easily modified to use sparse matrix storage and a direct
sparse matrix solution to the linear equations.

The basic functions coded in "C" for this test case are:

abc.txt  instructions for user
sparse.c  sparse matrix for PDE's
deriv.c  discretization
abc.h  sample test case described above
abc.c  implementation of test case
sparse_abc.c  sparse solver program
sparse_abc_c.out  sample output

The basic functions coded in Java for this test case are:

abc.java  sample test case described above
sparse.java  sparse matrix for PDE's
nderiv.java  discretization
sparse_abc.java  solver program
sparse_abc_java.out  sample output

The basic functions coded in Ada 95 for this test case are:

abc.ads  sample test case described above

Some challenging fourth order partial differential equations
including biharmonic equations
are covered in lecture 28d

Solving nonlinear PDE is covered in Lecture 31b.
The specific case for the nonlinear Navier Stokes Equations is
covered in Lecture 28b

For nonlinear PDE to be solved by solving a system of equations,
the simultaneous equations are non linear.
To solve a system of nonlinear equations reasonably efficiently,
use a Jacobian matrix and an iterative solution.

An example that can handle up to third order nonlinear systems of
equations is shown by using a Newton iteration based on the inverse
of the Jacobian:

x_next = x_prev - J^-1 * ( A * x_prev - Y)

simeq_newton3.java basic solver
test_simeq_newton3.java test program
test_simeq_newton3_java.out test_output
on three test cases (minor heuristic used to build J)

Plotting Solutions
Plotting solutions can be easy with Linux and gnuplot.
To demonstrate, I have a program that generates a 2D solution
that needs a 3D plot. make_gnuplotdata.c
The data file, for this case, is gnuplot.dat
The input file to gnuplot is my2.plot
The shell script to make the plot is test2_gnuplot.sh
The plot is run by typing  sh test2_gnuplot.sh
or from inside a program with test2_gnuplot.c
or from command line    gnuplot -persist my2.plot
The result is the plot

A special case is tests for Laplace Equation

ΔU = 0  or  ∇2U = 0  or Uxx(x,y) + Uyy(x,y) = 0  or

∂2U/∂x2 + ∂2U/∂y2 = 0

Many known solutions and thus Dirichlet boundary conditions
for test cases are known:
k>0 is any constant
U(x,y)=k
U(x,y)=k*x
U(x,y)=k*y
U(x,y)=k*x*y
U(x,y)=k*x^2 - k*y^2
U(x,y)=sin(k*x)*sinh(k*y)
U(x,y)=sin(k*x)*cosh(k*y)
U(x,y)=cos(k*x)*sinh(k*y)
U(x,y)=cos(k*x)*cosh(k*y)
U(x,y)=sin(k*x)*exp( k*y)
U(x,y)=sin(k*x)*exp(-k*y)
U(x,y)=cos(k*x)*exp( k*y)
U(x,y)=cos(k*x)*exp(-k*y)

r=sqrt(x^2+y^2)
th=atan2(y,x)          -Pi..Pi principal value is
U(x,y)=r^k * sin(k*th) discontinuous along negative X axis
using atan2 in 0..2Pi discontinuous derivative along positive X axis



### Lecture 31a, sparse solution of PDE


When the number of degrees of freedom gets large, the size of
the matrix for the system of simultaneous equations may get
larger than acceptable for your computer.

For example a four dimensional PDE with 10 DOF in each dimension
has 10^4 = 10,000 DOF. The non sparse simultaneous equations
would require at least 10,000 by 10,000 entries, 10^8 times
8 bytes = 0.8GB.

Much larger PDE's, need to be solved.

A sparse matrix stores only the non zero entries.
A system of simultaneous equations does not have a unique
solution if any row is all zeros, thus a sparse structure
has at least one entry of each row. Then in each row,
the column index and value are stored for the non zero
values. Typical spare data structures use a singly linked
list for each row, keeping the column index in increasing
numerical order.

Optimized code is typically written for inserting and
adding entries, solving simultaneous equations, vector
times matrix multiplication and a few other operations.

For various languages, the sparse packages are shown with
the initial test programs and several PDE solutions.

"C" implementation of sparse
"C" source code sparse.c
"C" test code test_sparse.c
test output test_sparse_c.out
"C" more test code test1_sparse.c
test output test1_sparse_c.out
simple conversion to sparse_abc.c
output of sparse_abc_c.out
"C" source code pde44_sparse.c
test_output pde44_sparse_c.out

For subscripts starting with zero
For subscripts starting with one

Java implementation of sparse
Java source code sparse.java
Java test code test_sparse.java
test output test_sparse_java.out
simple conversion to sparse_abc.java
output of sparse_abc_java.out
Java source code pde44_sparse.java
test_output pde44_sparse_java.out

Fortran implementation of sparse
For subscripts starting with zero
Fortran source code sparse.f90
Fortran test code test_sparse.f90
test output test_sparse_f90.out
For subscripts starting with one
Fortran source code sparse1.f90
Fortran test code test_sparse1.f90
test output test_sparse1_f90.out
Fortran source code pde44_sparse.f90
test_output pde44_sparse_f90.out



### Lecture 31b, Nonlinear PDE


Some of the most difficult to solve PDE's have an
"order" greater than one. These are nonlinear PDE's.

A specific problem, challenge if you prefer, is the
possibility of "non physical solutions" and
"bifurcating solutions" in addition to no possible solution.
A small change in the boundary conditions or a different
starting vector for the nonlinear solver can produce
different results with no way of knowing the result is
ambiguous.

A PDE having any product or ratio of U, Ux, Uxx, etc.
is nonlinear.

Several nonlinear PDE of interest are:

Second order, third degree:
U^2 U'' + 2 U U' + 3U = f(x)

Reciprocal nonlinear:
D U' - E(x) U'/U^2 - F U'' = f(x)

An example that can handle up to third degree nonlinear systems of
equations is shown by using a Newton iteration based on the inverse
of the Jacobian:

x_next = x_prev - J^-1 * ( A * x_prev - Y)

Java nonlinear

simeq_newton5.java basic solver
test_simeq_newton5.java test program
test_simeq_newton5_java.out test_output
on four test cases (automatic adjusting heuristic used)

A linear PDE using conventional Finite Element Method, FEM, works:
source code fem_nl11_la.java
output fem_nl11_la_java.out
A nonlinear PDE using conventional FEM does not work.
source code fem_nl12_la.java
output fem_nl12_la_java.out
(very similar, nonlinear, fails)

Discretization works on nonlinear PDE's:
source code pde_nl21.java
output pde_nl21_java.out
source code pde_nl22.java
output pde_nl22_java.out
solver simeq_newton5.java

on five test cases (automatic adjusting heuristic used)

hard coded nonlinear coefficients, non unique solution

automated computation of nonlinear coefficients, still non unique solution

More examples with many checks:

Third order PDE in 3 dimensions with 3rd degree nonlinearity.
Demonstrates use of least square fit of boundary as an
initial guess to solve nonlinear system of equations.

C nonlinear

source code pde_nl13.c
output pde_nl13_c.out
solver simeq_newton5.c

difficulties

The nonlinear PDE can not be solved by simple application
of Finite Element Method, covered in lecture 32.
For example, the above nonlinear PDE is not solved by:
source code fem_checknl_la.c
output fem_checknl_la_c.out

Using discretization with a nonlinear solver, such as simeq_newton5,
works accurately and efficiently.

Kuramoto-Sivashinsky nonlinear

There seem to be many variations of the Kuramoto-Sivashinsky
non linear, fourth order PDE, depending on the physical problem.

I have seen in various reference papers:

Ut + Uxxxx + Uxx + 1/2 (Ux)^2 = 0  mainly variations in the last term, and

∂U(x,t)/∂t + ∂^4 U(x,t)/∂x^4 + ∂^2 U(x,t)/∂x^2 -(∂U(x,t)/∂x)^2 = 0

In the figure below, read h as U, read y as t.

For nonlinear PDE to be solved by solving a system of equations,
the simultaneous equations are non linear.
To solve a system of nonlinear equations reasonably efficiently,
use a Jacobian matrix and an iterative solution and adaptive
solver such as simeq_newton5, available in a few languages.



### Lecture 31c, Parallel PDE


Now we attack solving the PDE on multicore and multiprocessor
computer architectures.

The basic concept is to use worker task to do the parallel
computation and a master control task, the main procedure,
to keep the workers synchronized. These examples, in several
languages, use the barrier method of synchronization.

The first example is converting  fem_check44_la.adb  to
a parallel version. We developed  psimeq, parallel simultaneous
equations solver in lecture 3b
Thus we need to choose the number of processors, NP, to pass
to psimeq.

Then we need to parallelize the longest computation, creating
the stiffness matrix, as covered in lecture 32

Care must be taken to insure no worker tasks set any variable that
might by modified by any other worker task or the master task.

The other files needed to compile and execute include:

Parallelizing Java, FEM example

The first example is converting  fem_check44_la.adb  to
a parallel version. We developed  psimeq, parallel simultaneous
equations solver in lecture 3b
Thus we need to choose the number of processors, NP, to pass
to psimeq.

Then we need to parallelize the longest computation, creating
the stiffness matrix, as covered in lecture 32

Care must be taken to insure no worker tasks set any variable that
might by modified by any other worker task or the master task.

fem_check44_pla.java
fem_check44_pla_java.out



### Lecture 31d, Parallel Multiple precision PDE


Now we attack solving the PDE on multicore and multiprocessor
computer architectures using multiple precision arithmetic.
Many examples use GMP, Gnu Multiple Precision.

The basic concept is to use worker task to do the parallel
computation and a master control task, the main procedure,
to keep the workers synchronized. These examples, in several
languages, use the barrier method of synchronization.

Using pthreads and gmp to solve large systems of
simultaneous equations.

mpf_tsimeq.h
mpf_tsimeq.c
time_mpf_tsimeqb.c
time_mpf_tsimeqb.out

A sample of using   top -H



### Lecture 32, Finite Element Method


A brief introduction to the "Finite Element Method" FEM,
using the Galerkin Method is galerkin.pdf

There are entire books on FEM and this just covers one small view.

Examples demonstrating the above galerkin.pdf
are shown below. The examples are for Degree 1 (linear), Order 1 through 4,
Dimension 1 through 7. Several programming languages are shown for
some of the examples. Results were close to the same in time and
accuracy for various languages.

The first two examples use three methods of integration:
For these cases Gauss-Legendre was best.

fem_checke_la.c  first order, one dimension
fem_checke_la_c.out  output with debug print

fem_checke_la.f90  first order, one dimension
fem_checke_la_f90.out  output with debug print

fem_check4th_la.c  fourth order, one dimension
fem_check4th_la_c.out  output with debug print

fem_check4th_la.f90  fourth order, one dimension
fem_check4th_la_f90.out  output with debug print

fem_check22_la.c  second order, two dimension
fem_check22_la_c.out  output with debug print

fem_check22_la.f90  second order, two dimension
fem_check22_la_f90.out  output with debug print

fem_check22_la.java  second order, two dimension
fem_check22_la_java.out  output with debug print

with a small study of grid size and integration order, solution with exp(g(x,y))
fem_check22e_la.java  second order, two dimension
fem_check22e_la_java.out  output with debug print

fem_check_abc_la.c  second order, two dimension
fem_check_abc_la_c.out  output with debug print
abc_la.h problem definition
abc_la.c problem definition

fem_check_abc_la.f90  second order, two dimension
fem_check_abc_la_f90.out  output with debug print
abc_la.f90 problem definition

fem_check33_la.c third order, three dimension
fem_check33_la_c.out  output with debug print

fem_check33_la.f90 third order, three dimension
fem_check33_la_f90.out  output with debug print

fem_check33_la.java third order, three dimension
fem_check33_la_java.out  output with debug print
npx=3 gave error 10^-12, npx=4 gave error 10^-13  little better for bigger npx

fem_check44_la.c fourth order, four dimension
fem_check44_la_c.out  output with debug print

fem_check44_la.f90 fourth order, four dimension
fem_check44_la_f90.out  output with debug print

fem_check44_la.java fourth order, four dimension
fem_check44_la_java.out  output with debug print

fem_check47h_la.c fourth order, seven dimension
fem_check47h_la_c.out  output with debug print

fem_check47h_la.java fourth order, seven dimension
fem_check47h_la_java.out  output with debug print

In getting ready for this lecture, I have tried many versions of FEM.
The "historic" or classic methods used very low order orthogonal
polynomials and a piecewise linear approach. The trade-off I found was
the time to solve of very large systems of equations vs the time for
numerical quadrature for a much smaller system of equations.

Another trade-off was the complexity of coding the special cases
that arise in piecewise linear approximations. There are many
published tables and formulas for the general case using many
geometric shapes. What is generally missing is the tables and
formulas for the cells next to the boundary.

I found that developing a library routine for the various derivatives
of phi and using high order Lagrange polynomials led to minimizing
programming errors. Each library routine must, of course, be
thoroughly tested. Here are my Lagrange phi routines and tests.

laphi.c  code through 4th derivative
test_laphi.c  numeric test
test_laphi_c.out  test output
plot_laphi.c  plot test

laphi.f90 module through 4th derivative
test_laphi.f90  numeric test
test_laphi_f90.out  test output

laphi.java class through 4th derivative
test_laphi.java  numeric test
test_laphi_java.out  test output

Other files, that are needed by some examples above:
simeq.f90 solve simultaneous equations
simeq.java solve simultaneous equations



### Lecture 33, Finite Element Method, triangles


A brief introduction to the "Finite Element Method" FEM,
using triangles or tetrahedrons rather than a uniform grid.
The previous Galerkin Method  galerkin.pdf
applies with some changes.

There are entire books on FEM and this just covers one small view.

Examples are shown below. The examples are for Degree 1 (linear),
with Order 1 through 4,
Dimension 2 (triangle)  and Dimension 3 (tetrahedron)

Some specialty stuff that may be used later:
affine_map_tri.txt triangle mapping
test_affine_map.c test program
test_affine_map.out output
test_affine_map2.c test program
test_affine_map2.out output
test_affine_map.html Maple solution
test_affine_map.mw Maple worksheet

Given a set of coordinates and a boundary path, generate a set of
triangles that cover the interior. Four sets are shown below:

The programs are  triangle.c  and  showme.c  run using commands:
triangle A.poly  generates  file  A.1.poly  A.1.node  A.1.ele
showme -p A.1.poly  plots boundary, click on ele to plot triangles

A.poly initial input (simpler ones follow)
A.1.poly generated by program  triangle
A.1.ele generated by program  triangle
A.1.node generated by program  triangle

triangle -D -a0.01 -q30 -p B.poly  generates B.1.*
showme -p B.1.poly   wait for menus at bottom,  click on  ele

B.poly initial input (square)
B.1.poly generated by program  triangle
B.1.ele generated by program  triangle
B.1.node generated by program  triangle

triangle -D -a0.1 -q30 -p C.poly   generates C.1.*
showme -p C.1.poly   wait for menus at bottom,  click on  ele

C.poly initial input (triangle)
C.1.poly generated by program  triangle
C.1.ele generated by program  triangle
C.1.node generated by program  triangle

triangle -D -a2.0 -q45 -p D.poly   generates D.1.*
showme -p D.1.poly   wait for menus at bottom,  click on  ele

D.poly initial input (channels)
D.1.poly generated by program  triangle
D.1.ele generated by program  triangle
D.1.node generated by program  triangle

triangle.c source code
triangle.help documentation
showme.c source code

note: the C.1.node   C.1.ele   C.1.poly  can be used for fem_check below
C.coord    C.tri     C.bound   by deleting sequence numbers
and extra trailing stuff

C.coord  from C.1.node
C.tri from  C.1.ele
C.bound  from C.1.poly
plot_fem_tri.c  plots .coord,.tri,.bound

D.coord  from D.1.node
D.tri from  D.1.ele
D.bound  from D.1.poly

The most basic geometry and several triangle splits:

22_t.coord  3 triangles, 4 vertices
22_t.tri      3 boundary,  1 free
22_t.bound

Any FEM group can be split, every triangle becomes 4 triangles that
are one fourth the area and congruent to the original triangles.
This cuts "h" the longest edge in half and should improve accuracy,
up to some limit, at the cost of longer execution time.
do_tri_split.c

22_ts.coord  12 triangles, 10 vertices
22_ts.tri       6 boundary,   4 free
22_ts.bound

22_tss.coord  48 triangles, 31 vertices
22_tss.tri      12 boundary,  19 free
22_tss.bound

The D.poly from above as original, split once, split again

Program files that produces the images above:
tri_input.java
tri_split.java
test_tri_split.java

The first example of FEM on triangles is a first order in two dimensions.
fem_check21_tri.c source code
fem_check21_tri.out output 22_tri
fem_check21_tria.out output 22_t
fem_check21_tri_C.out output C

additional files needed for execution of the above:
phi_tri.h source code
phi_tri.c source code
phi_tri_cm.h source code
phi_tri_cm.c source code
phi_tri_pts.h source code
phi_tri_pts.c source code

A similar example in Java of FEM on triangles is a first order in two dimensions.
This is an improved version from fem_check21
fem_check21a_tri.java source code
fem_check21a_tri22_t.out output
fem_check21a_tri22_ts.out output
fem_check21a_tri22_tss.out output

additional files needed for execution of the above:
simeq.java source code

the test for accuracy of triquad.java are:
test_triquad.out results, see last two sets

the test for accuracy of triquad_int.c are:

In order to understand the complexity of the code,
the following figures show one vertex, V1, that is a
part of three triangles, T1, T2 and T3.

For vertex V1 there are three test functions, phi1, phi2 and phi3,
represented by blue for the phi1, yellow for phi2 and red for phi3.

These test functions are shown as linear "hat" functions, yet could
be higher order test functions.

The triangles are in the plane Z=0.0.
All three test functions are at Z=1.0 above the vertex of interest,
V1. All three test functions are at Z=0.0 for all other vertices
in the triangles that include vertex V1.

Looking down on the triangles, the three test functions can be seen
to cover the area of the three triangles T1, T2 and T3 with functions
phi1, phi2 and phi3.

The equation

is, in the general case, computed by numerical quadrature
of the linear operator, L, with the dependent variable and
its derivatives replaced by a test function and corresponding
derivatives of the test function.

For the triangulation shown above and for the first equation,
for vertex V1, there are three numerical quadratures that
are summed to get the integral as shown.

The first region, omega is T1 and the test function is phi1.
The second region, omega is T2 and the test function is phi2.
The third region, omega is T3 and the test function is phi3.

In the integral equation, the omega is the sum of T1, T2 and T3.
In the integral equation, the symbol φ1 is phi1 for T1,
phi2 for T2 and phi3 for T3.

In general, the global stiffness matrix is constructed by
processing one triangle at a time. The the integral shown
above accumulates the sum as the three triangles containing
the vertex V1 are processed in some order. In general there
will be more than three triangles that contain a vertex.

A general outline of FEM for triangles is presented in
femtri.pdf

Second Order Basis Functions

There is a large variety of basis functions, test functions,
or "phi" functions using FEM terminology. For triangles,
a family of second order, 6 point, functions is:
tri_basis.h source code
tri_basis.c source code
test_tri_basis.c test code
test_tri_basis.out test output
plot_tri_basis.c plotting code
plot_tri_basis2.c plotting code
plot is dynamic with motion, run it yourself, type "n" for next function

Scaled Lobatto shape functions may be used in conjunction with
other basis functions.
lobatto_shape.h source code
lobatto_shape.c source code
plot_lobatto.c source code

Lobatto  k-2  shape functions

basis functions

Discretization of Triangular Grids

A very different method than FEM to solve a subset of
FEM triangle PDE's, uses discretization of a group
of points, subset of the vertices, and simultaneous
equations to get a solution.

The primary technique is the non uniform discretization of
a set of points, given in two dimensions:
nuderiv2dg.java source code
test_nuderiv2dg_tri.java test code
test_nuderiv2dg_tri_java.out test output
test_nuderiv2dg2_tri.java test code
test_nuderiv2dg2_tri_java.out test output
test_nuderiv2dg3_tri.java test code
test_nuderiv2dg3_tri_java.out test output
pde_nuderiv2dg_tri.java pde code
pde_nuderiv2dg_tri_java.out pde output
simeq.java utility class

A weaker form, note missing 'g' for general, that requires independent
points is:
nuderiv2d.java source code
test_nuderiv2d.java test code
test_nuderiv2d_java.out test output

Tetrahedrons for three dimensions

Then, for three dimensions, we need tetrahedrons in place of triangles.
test_split_tetra.c test code
test_split_tetra.out test output
plot_split_tetra.c plot program

High order basis functions and integration
pde2sin_la.adb code, sin(x-y) 0 to 4Pi
pde2sin4_la.dat data for plot

Challenging Modeling and Simulation

Solving Maxwell's equations for electric fields can be
very challenging. I have not been able to numerically
compute the discharge pattern shown below.

Maxwell's Equations

The numerical solution of Maxwell's Equations for electro-magnetic
fields may use a large four dimensional array with dimensions
X, Y, Z, T. Three spatial dimensions and time.



### Lecture 33a, Lagrange fit triangles


Phi functions for triangles.
Using basic Lagrange polynomials to get a value 1.0 at
the vertex of interest and a value of 0.0 at the other
two vertices.

A "C" implementation is shown by:

phi_tril.c
phi_tril.h

A Matlab implementation with resulting plots is:

phi_tril.m



Recent update.

There are many file formats that may be of interest.
Collected here are a tiny fraction of what is available.
Search the web for code to read and convert formats.

AVS UCD data
AVS UCD data format

UCD cyl1.inp demonstrating boundary conditions
This file defines a rectangular duct with a cylinder inside.
The Dirichlet and Neumann boundary conditions are specified
as well as the geometry.

cyl1.inp boundary definition
plot_ucd.c  for plotting 3D UCD boundary

Extending UCD data to 4 dimensions
Without changing the basic UCD format,
the 4th coordinate can be provided along with
Dirichlet and Neumann boundary conditions.

pde_ucd4.inp  4D geometry

Source code to read UCD format

ucd1.inp  test input data
ucd1.out  test output results
ucd2.inp  test input data
ucd2.out  test output results

Ada Language implementation, including use in a PDE

abc_ucd.inp  UCD file

UCD file with color, mechanical properties, ranges
The UCD File Format is very versatile.
This file is just a demonstration of the flexibility.

# ucd1.inp Sample AVS UCD File
#
9 8 1 3 5                 9 nodes, 8 cells, 1 node data, 3 cell data, 5 model
1 0.000 0.000 1.000       Node coordinates
2 1.000 0.000 1.000
3 1.000 1.000 1.000
4 0.000 1.000 1.000
5 0.000 0.000 0.000
6 1.000 0.000 0.000
7 1.000 1.000 0.000
8 0.000 1.000 0.000
9 0.500 0.600 0.707
1 1 hex 1 2 3 4 5 6 7 8  cell id, material id, cell type, cell node numbers
2 1 tri 1 3 4
3 1 pt  1
4 1 tet 2 3 4 5
5 2 line 7 9
6 2 quad 2 4 6 8
7 2 prism 1 2 3 6 7 8
8 2 pyr   2 3 4 5 6
1 1                      number of node components, size of each component
stress, lb/in**2         Component name, units name
1 4999.9999              Data value for each node component
2 18749.9999
3 37500.0000
4 56250.0000
5 74999.9999
6 93750.0001
7 107500.0003
8 5000.0001
9 5001.0000
1 3                      number of cell components, size of each component
RGB, cell color          component name, units name
2 1.0 0.0 0.0            Data value for each cell component
1 0.0 1.0 0.0            (Do not have to be in order, some may require order)
3 0.0 0.0 1.0
4 1.0 1.0 0.0
5 1.0 0.0 1.0
6 0.0 1.0 1.0
7 0.5 0.5 0.5
8 0.7 0.7 0.7
2 2 2                   number of model components, size of each component
limits, min/max min/max component name, value description
range,  min/max min/max component name, value description
1 -1.0  1.0 -1.0  1.0
2 -1.2  1.2 -1.3  1.3
3 -1.5  1.5 -1.6  1.6
4 -1.9  2.0 -2.0  1.9
5 -2.0  2.0 -2.0  2.0

Just for fun, a real three pronged blivet

UCD format blivet.inp for above labeling

Triangle files
.poly, .ele, .node  triangle and showme

.tri, .bound, .coord

MatLab files
coord.dat, elem3.dat, elem4.dat, dirich.dat, neum.dat



### Lecture 28c, fem_50 case study

A modified version of fem_50, a Matlab program to use
the Finite Element Method, FEM, to solve a specific
partial differential equation is applied to three very
small test cases with full printout to show the details
of one software implementation.

The differential equation is:

d^2 u(x,y)   d^2 u(x,y)
- ---------- - ---------- = f(x,y)   or  -Uxx(x,y) -Uyy(x,y) = F(x,y)
dx^2         dy^2

For testing, the solution is  u(x,y)=1 + 2/11 x^2 + 3/11 y^2

and then  f(x,y)= - 10/11

The modified  fem_50d.m is also shown below

Case 1 used a triangularization with 6 triangles
7 nodes
6 boundary nodes
1 degree of freedom

The blue edges and dots are boundary.
The red edges and dots are internal, free.
The one free node is in the center, node number 7.

The full output is A7_fem_50d.out
The solution for the 7 nodes is at the end.

Input files are:
A7_coord.dat
A7_elem3.dat
A7_dirich.dat
The files A7_elem4.dat  and  A7_neum.dat  exists and are empty.

The solution plot is:

Case 2 used a triangularization with 16 triangles
15 nodes
12 boundary nodes
3 degrees of freedom

The blue edges and dots are boundary.
The red edges and dots are internal, free.
The three free nodes are in the center, number 13, 14, 15

The full output is A15_fem_50d.out
The solution for the 15 nodes is at the end.

Input files are:
A15_coord.dat
A15_elem3.dat
A15_dirich.dat
The files A15_elem4.dat  and  A15_neum.dat  exists and are empty.

The solution plot is:

Case 3 used a triangularization with 18 triangles
16 nodes
12 boundary nodes
4 degrees of freedom

The blue edges and dots are boundary.
The red edges and dots are internal, free.
The four free nodes are in the center, numbered 1, 2, 3, 4

The full output is A16_fem_50d.out
The solution for the 16 nodes is at the end.

Input files are:
A16_coord.dat
A16_elem3.dat
A16_dirich.dat
The files A16_elem4.dat  and  A16_neum.dat  exists and are empty.

The solution plot is:

The original fem_50.m all in one file with lots of output added

%% fem_50d applies the finite element method to Laplace's equation.
function fem_50d
% input  a prefix, e.g. A7_
% files read  A7_coord.dat   one x y pair per line, for nodes in order  1, 2, ...
%             A7_elem3.dat   three node numbers per line, triangles, any order
%                            function stima3 is applied to these cooddinates
%             A7_elem4.dat   four node numbers per line, quadralaterals
%                            function stima4 is applied to these coordinates
%             A7_dirich.dat  two node numbers per line, dirichlet boundary
%                            function u_d is applied to these coordinates
%             A7_neum.dat    two node numbers per line, neumann bounday
%                            function g is applied to these coordinates
%
%  Discussion:
%    FEM_50d is a set of MATLAB routines to apply the finite
%    element method to solving Laplace's equation in an arbitrary
%    region, using about 50 lines of MATLAB code.
%
%    FEM_50d is partly a demonstration, to show how little it
%    takes to implement the finite element method (at least using
%    every possible MATLAB shortcut.)  The user supplies datafiles
%    that specify the geometry of the region and its arrangement
%    into triangular and quadrilateral elements, and the location
%    and type of the boundary conditions, which can be any mixture
%    of Neumann and dirichlet.
%
%    The unknown state variable U(x,y) is assumed to satisfy
%    Laplace's equation:
%      -Uxx(x,y) - Uyy(x,y) = F(x,y) in Omega
%    with dirichlet boundary conditions
%      U(x,y) = U_D(x,y) on Gamma_D
%    and Neumann boundary conditions on the outward normal derivative:
%      Un(x,y) = G(x,y) on Gamma_N
%    If Gamma designates the boundary of the region Omega,
%    then we presume that
%      Gamma = Gamma_D + Gamma_N
%    but the user is free to determine which boundary conditions to
%    apply.  Note, however, that the problem will generally be singular
%    unless at least one dirichlet boundary condition is specified.
%
%    The code uses piecewise linear basis functions for triangular elements,
%    and piecewise isoparametric bilinear basis functions for quadrilateral
%    elements.
%
%    The user is required to supply a number of data files and MATLAB
%    functions that specify the location of nodes, the grouping of nodes
%    into elements, the location and value of boundary conditions, and
%    the right hand side function in Laplace's equation.  Note that the
%    fact that the geometry is completely up to the user means that
%    just about any two dimensional region can be handled, with arbitrary
%    shape, including holes and islands.
%
%  Modified:
%    29 March 2004
%    23 February 2008 JSS
%     3 March 2008 JSS
%  Reference:
%    Jochen Alberty, Carsten Carstensen, Stefan Funken,
%    Remarks Around 50 Lines of MATLAB:
%    Short Finite Element Implementation,
%    Numerical Algorithms,
%    Volume 20, pages 117-137, 1999.
%
clear
format compact
prename = input('enter prefix to file names: ', 's')
diary 'fem_50d.out'
disp([prename 'fem_50d.out'])
debug = input('enter debug level:0=none, 1=input and matrices, 2=assembly ')
%
%  Read the nodal coordinatesinate data file.
%  ordered list of  x  y  pairs, coordinates for nodes
%  nodes must be sequentially numbered 1, 2, 3, ...
if debug>0
disp([prename 'coord.dat ='])
disp(coordinates)
end % debug
%
%  Read the triangular element data file.
%  three integer node numbers per line
if debug>0
disp([prename 'elem3.dat ='])
disp(elements3)
end % debug
%
%  four integer node numbers per line
if debug>0
disp([prename 'elem4.dat ='])
disp(elements4)
end % debug
%
%  Read the dirichlet boundary edge data file.
%  two integer node numbers per line, function u_d sets values
%  there must be at least one pair
%  other boundary edges are set in  neumann
if debug>0
disp([prename 'dirich.dat ='])
disp(dirichlet)
end % debug
%
%  Read the Neumann boundary edge data file.
%  two integer node numbers per line, function g  sets values
if debug>0
disp([prename 'neum.dat ='])
disp(neumann)
end % debug

A = sparse ( size(coordinates,1), size(coordinates,1) );
b = sparse ( size(coordinates,1), 1 );

%
%  Assembly from triangles.
%
for j = 1 : size(elements3,1)
VV = coordinates(elements3(j,:),:);
MM = stima3(coordinates(elements3(j,:),:));
if debug>1
disp(['MM = stima3(VV) at j=' num2str(j) ' nodes(' ...
num2str(elements3(j,1)) ',' num2str(elements3(j,2)) ',' ...
num2str(elements3(j,3)) ')'])
disp('VV=')
disp(VV)
disp('MM=')
disp(MM)
end % debug
A(elements3(j,:),elements3(j,:)) = A(elements3(j,:),elements3(j,:)) + MM;
end
if debug>0
disp('assembly from triangles A=')
disp(A)
end % debug
%
%
for j = 1 : size(elements4,1)
VV = coordinates(elements3(j,:),:);
MM = stima4(coordinates(elements4(j,:),:));
if debug>1
disp(['MM = stima3(VV) at j=' num2str(j) ' nodes(' ...
num2str(elements4(j,1)) ',' num2str(elements4(j,2)) ',' ...
num2str(elements4(j,3)) ',' num2str(elements4(j,4)) ')'])
disp('VV=')
disp(VV)
disp('MM=')
disp(MM)
end % debug
A(elements4(j,:),elements4(j,:)) = A(elements4(j,:),elements4(j,:)) + MM;
end
if debug>0
disp('assembly from triangles and rectangles A=')
disp(A)
end % debug

%
%  Volume Forces from triangles.
%
for j = 1 : size(elements3,1)
b(elements3(j,:)) = b(elements3(j,:)) ...
+ det( [1,1,1; coordinates(elements3(j,:),:)'] ) * ...
f(sum(coordinates(elements3(j,:),:))/3)/6;
end
if debug>0
disp('forces from triangles b=')
disp(b)
end % debug
%
%
for j = 1 : size(elements4,1)
b(elements4(j,:)) = b(elements4(j,:)) ...
+ det([1,1,1; coordinates(elements4(j,1:3),:)'] ) * ...
f(sum(coordinates(elements4(j,:),:))/4)/4;
end
if debug>0
disp('forces from triangles and rectangles b=')
disp(b)
end % debug

%
%  Neumann conditions.
%
for j = 1 : size(neumann,1)
GG = norm(coordinates(neumann(j,1),:) - coordinates(neumann(j,2),:)) * ...
g(sum(coordinates(neumann(j,:),:))/2)/2;
if debug>2
disp(['Neumann at j=' num2str(j)])
disp(GG)
end % debug
b(neumann(j,:)) = b(neumann(j,:)) + GG
end
if debug>0
disp('using g() add neumann conditions b=')
disp(b)
end % debug

%
%  Determine which nodes are associated with dirichlet conditions.
%  Assign the corresponding entries of U, and adjust the right hand side.
%
u = sparse ( size(coordinates,1), 1 );
BoundNodes = unique ( dirichlet );
if debug>1
disp('BoundNodes=')
disp(BoundNodes)
end % debug
u(BoundNodes) = u_d ( coordinates(BoundNodes,:) );
if debug>0
disp('using u_d() add dirichlet conditions u=')
disp(u)
end % debug

b = b - A * u;
if debug>0
disp('apply u to b, dirichlet conditions b=')
disp(b)
end % debug

%
%  Compute the solution by solving A * u = b
%  for the, un bound, remaining unknown values of u.
%
FreeNodes = setdiff ( 1:size(coordinates,1), BoundNodes );

u(FreeNodes) = A(FreeNodes,FreeNodes) \ b(FreeNodes);
if debug>0
disp('solve for A * u = b  u=')
disp(u)
end % debug

%
%  Graphic representation.
%
show ( elements3, elements4, coordinates, full(u), dirichlet );
%
%  check solution
%
for i=1:size(coordinates,1)
disp([num2str(i) '  x= ' num2str(coordinates(i,1)) '  y= '  num2str(coordinates(i,2)) ...
' analytic solution= ' num2str(uana(coordinates(i,1),coordinates(i,2)))])
end
diary off
return % logical end of function  fem_50d

%% uana computes boundary values and analytic solution for checking
%  Discussion:
%    This is generally unknown yet is used here for testing
%  Parameters:
%    Input, real pair  xx,yy are x,y coordinates
%    Output, value of solution at  x,y
%
function ana = uana(xx, yy)
ana= 1.0+(2.0/11.0)*xx*xx+(3.0/11.0)*yy*yy;
end % uana

%% f evaluates the right hand side of Laplace's equation.
%  Discussion:
%    This routine must be changed by the user to reflect a particular problem.
%  Parameters:
%    Input, real U(N,M), contains the M-dimensional coordinates of N points.
%    Output, VALUE(N), contains the value of the right hand side of Laplace's
%    equation at each of the points.
function valuef = f ( uf )
valuef = ones ( size ( uf, 1 ), 1 );
valuef = valuef.*(-10/11.0);
end % f

%% g evaluates the outward normal values assigned at Neumann boundary conditions.
%  Discussion:
%    This routine must be changed by the user to reflect a particular problem.
%  Parameters:
%    Input, real U(N,M), contains the M-dimensional coordinates of N points.
%    Output, VALUE(N), contains the value of outward normal at each point
%    where a Neumann boundary condition is applied.
function valueg = g ( ug )
valueg = zeros ( size ( ug, 1 ), 1 );
end % g

%% u_d evaluates the dirichlet boundary conditions.
%  Discussion:
%    The user must supply the appropriate routine for a given problem
%  Parameters:
%    Input, real U(N,M), contains the M-dimensional coordinates of N points.
%    Output, VALUE(N), contains the value of the dirichlet boundary
%    condition at each point.
function valued = u_d ( ud )
valued = zeros ( size ( ud, 1 ), 1 );
for kk=1:size(ud,1)
% U(x,y) = 1 + 2/11 x^2 + 3/11 y^2 solution values on boundary
valued(kk)=uana(ud(kk,1),ud(kk,2));
end
end % u_d

%% STIMA3 determines the local stiffness matrix for a triangular element.
%  Discussion:
%
%    Although this routine is intended for 2D usage, the same formulas
%    work for tetrahedral elements in 3D.  The spatial dimension intended
%    is determined implicitly, from the spatial dimension of the vertices.
%  Parameters:
%
%    Input, real VERTICES(1:(D+1),1:D), contains the D-dimensional
%    coordinates of the vertices.
%
%    Output, real M(1:(D+1),1:(D+1)), the local stiffness matrix
%    for the element.
function M = stima3 ( vertices )
d = size ( vertices, 2 );
D_eta = [ ones(1,d+1); vertices' ] \ [ zeros(1,d); eye(d) ];
M = det ( [ ones(1,d+1); vertices' ] ) * D_eta * D_eta' / prod ( 1:d );
end % stima3

%% STIMA4 determines the local stiffness matrix for a quadrilateral element.
%  Parameters:
%    Input, real VERTICES(1:4,1:2), contains the coordinates of the vertices.
%    Output, real M(1:4,1:4), the local stiffness matrix for the element.%
function M = stima4 ( vertices )
D_Phi = [ vertices(2,:) - vertices(1,:); vertices(4,:) - vertices(1,:) ]';
B = inv ( D_Phi' * D_Phi );
C1 =  [  2, -2; -2,  2 ] * B(1,1) ...
+ [  3,  0;  0, -3 ] * B(1,2) ...
+ [  2,  1;  1,  2 ] * B(2,2);
C2 =  [ -1,  1;  1, -1 ] * B(1,1) ...
+ [ -3,  0;  0,  3 ] * B(1,2) ...
+ [ -1, -2; -2, -1 ] * B(2,2);
M = det ( D_Phi ) * [ C1 C2; C2 C1 ] / 6;
end % stima4

%% SHOW displays the solution of the finite element computation.
%  Parameters:
%    Input, integer elements3(N3,3), the nodes that make up each triangle.
%    Input, integer elements4(N4,4), the nodes that make up each quadrilateral.
%    Input, real coordinates(N,1:2), the coordinates of each node.
%    Input, real U(N), the finite element coefficients which represent the solution.
%    There is one coefficient associated with each node.
function show ( elements3, elements4, coordinates, us, dirichlet )
figure(1)
hold off
%
%  Display the information associated with triangular elements.
trisurf ( elements3, coordinates(:,1), coordinates(:,2), us' );
%
%  Retain the previous image, and overlay the information associated
hold on
trisurf ( elements4, coordinates(:,1), coordinates(:,2), us' );
%
%  Define the initial viewing angle.
%
view ( -67.5, 30 );
title ( 'Solution to the Problem' )
hold off
figure(2)
for kk=1:size(elements3,1)
plot([coordinates(elements3(kk,1),1)   ...
coordinates(elements3(kk,2),1)   ...
coordinates(elements3(kk,3),1)   ...
coordinates(elements3(kk,1),1)], ...
[coordinates(elements3(kk,1),2)   ...
coordinates(elements3(kk,2),2)   ...
coordinates(elements3(kk,3),2)   ...
coordinates(elements3(kk,1),2)],':r')
hold on
end
for kk=1:size(dirichlet,1)
plot([coordinates(dirichlet(kk,1),1)   ...
coordinates(dirichlet(kk,2),1)], ...
[coordinates(dirichlet(kk,1),2)   ...
coordinates(dirichlet(kk,2),2)],'-b')
hold on
end
title('triangularization')
xlabel('X')
ylabel('Y')
axis tight
axis square
grid off
hold on
xb=coordinates(BoundNodes,1);
yb=coordinates(BoundNodes,2);
plot(xb,yb,'.b')
hold on
xb=coordinates(FreeNodes,1);
yb=coordinates(FreeNodes,2);
plot(xb,yb,'.r')
hold off
end % show
end % fem_50d



### Lecture 35, Navier Stokes Airfoil Simulation


Basically, this is a study to see if the 1933 wind tunnel
measurements of airfoils for coefficient of lift CL and
coefficient of drag CD can be reproduced by computer
simulation.

Airfoil 0015
Testing formulas for 0015 airfoil
rotate_prime.java

Create data file for airfoil at a specific angle and grid:
w0015grid.java

Build a PDE analysis grid around the airfoil:
w0015vert.java

w0015grid_java.out
w0015a22.grid
w0015vert_java.out

w0015b22_grid_java.out
w0015b22.grid
w0015b22_vert_java.out

w0015c22_grid_java.out
w0015c22.grid
w0015c22_vert_java.out

w0015b00.grid
w0015b00_vert_java.out

References:
NACA-460 1933 78 airfoils
NACA-824 1945 Airfoil Summary



### Lecture 36, Some special PDE's


From "A Collection of 2D Elliptic Problems for Testing
Adaptive Algorithms" by William F. Mitchell

1. Solutions of the form
U(x,y)=2^4p x^p (x-1)^p y^p (y-1)^p  for p=4
PDE Uxx(x,y) + Uyy(x,y) = f(x,y)
boundary unit square

pde22_eq.c
pde22_eq_c.out
pde22_eq_c.dat
pde22_eq_c.plot
pde22_eq_c.sh

pde22_eq.java
pde22_eq_java.out
pde22_eq_java.dat
pde22_eq_java.plot
pde22_eq_java.sh

pde22_eq.f90
pde22_eq_f90.out
pde22_eq_f90.dat
pde22_eq_f90.plot
pde22_eq_f90.sh

Then U(x,y)=2^4p x^p (x-1)^p y^p (y-1)^p for  p=10

pde22_eq_10.c
pde22_eq_10_c.out
pde22_eq_10_c.dat
pde22_eq_10_c.plot
pde22_eq_10_c.sh

2. Solutions of the form
alpha = Pi/omega
r = sqrt(x^2+y^2)
theta = atan2(y,x) 0..2Pi
U(x,y)=r^alpha * sin(alpha*theta)
PDE Uxx(x,y) + Uyy(x,y) = 0
boundary x -1..1, y -1..1

Exact solution

nx=11, ny=11 uniformly spaced, computed solution

corner0.c
corner0_c.out
corner0_c.dat
corner0_c.plot
corner0_c.sh

corner0.java
corner0_java.out
corner0_java.dat
corner0_java.plot
corner0_java.sh

corner0.f90
corner0_f90.out
corner0_f90.dat
corner0_f90.plot
corner0_f90.sh

Makefile and other files for programs listed above
Makefile_spde

nuderiv.c
inverse.c
simeq.c
mpf_nuderivd.c
mpf_inverse.c

nuderiv.java
inverse.java
simeq.java

nuderiv.f90
inverse.f90
simeq.f90

Have gnuplot installed to get plots.
Have gcc  installed for .c
Have Java installed for .java
Have gfortran installed for .f90

PDE for Turbulence and Difusion

pde_turbulance.pdf

diffusion_equation

Generic_scalar_transport_equation



### Lecture 36a, Special discretization, non uniform


We have arbitrarily scattered boundary points and DOF.
No structure is provided. The problem is to discretize
and build a system of equations, solve the equations,
and determine the values at the DOF.

So far, only for 2D and 3D geometry, yet could be generalized.
The basic problem is avoiding singular matrices.
The worse case of arbitrarily scattered points is
when they are uniformly spaced.

The basic non uniform discretization:
nuderiv2d.c use all points
nuderiv2d.h
test_nuderiv2d.c
test_nuderiv2d_c.out
nuderiv2dg.c use good points, less singular
nuderiv2dg.h
test_nuderiv2dg.c
test_nuderiv2dg_c.out

nuderiv2d.java use all points
test_nuderiv2d.java
test_nuderiv2d_java.out
nuderiv2dg.java use good points, less singular
test_nuderiv2dg.java
test_nuderiv2dg_java.out

Now a corner geometry
test_nuderiv2dgc.c
test_nuderiv2dgc_c.out

The basic 3D non uniform discretization:
nuderiv3d.c use all points
nuderiv3d.h
test_nuderiv3d.c
test_nuderiv3d_c.out
nuderiv3dg.c use good points, less singular
nuderiv3dg.h
test_nuderiv3dg.c
test_nuderiv3dg_c.out

nuderiv3dg.java use good points, less singular
test_nuderiv3dg.java
test_nuderiv3dg_java.out

Applying non uniform discretization to a PDE

blivet.inp zero based, regular
blivet_nr.inp non regular

pde_blivet.c
pde_blivet_c.out
pde_blivet_nr_c.out
pde_blivet.java
pde_blivet_java.out
pde_blivet_nr_java.out



### Lecture 37, various utility routines


Permutations, Combinations, Heapsort

Some various utility routines that may be used for applications
or generating tests.

Generating all permutations of a vector
permute.h
permute.c
test_permute.c
test_permute_c.out

permute.java
test_permute.java
test_permute_java.out

Generating Combinations of N things taken M at a time
combinations.h
combinations.c
test_combinations.c
test_combinations_c.out

combinations.java
test_combinations.java
test_combinations_java.out

Heap Sort a vector
heapsort.h
heapsort.c
test_heapsort.c int, double, char
test_heapsort_c.out
test2_heapsort.c on struct
test2_heapsort_c.out

heapsort.java
test_heapsort.java
test_heapsort_java.out

A version of the traveling salesman program, tsp.
Once in a while, the heuristic solution O(n^2) comes out
the same as the NP Complete solution.
Every run gets different random numbers.
tsp.c
tsp.out



### Lecture 38, Open Tutorial on LaTeX


You can prepare high quality math and engineering
papers using LaTeX, pronounced lah tech.

This is not intended to cover all or even most of LaTeX.

You will prepare a LaTeX file  foo.tex

running LaTeX
On UMBC GL machines you have several options to get output:

latex foo.tex
latex foo.tex  # this is needed for TOC, bibliography, etc
dvips -o foo.ps foo.dvi  # compiling produces a .dvi and other files
lpr foo.ps     # print high quality PostScript
# your browser can view foo.ps, lower quality print

pdflatex foo.tex  # generate a .pdf
# your browser can view foo.pdf

and, many other auxiliary programs are available.

Use your favorite editor to create foo.tex

Best on-line reference is:
http://en.wikibooks.org/wiki/LaTeX

Linux: sudo apt-get install texlive
MacOSX: get mactex
Windows: get protext or miktex

First example
Here is a simple, complete, .tex file
hello.tex
hello.ps
% hello.tex absolute minimum to get output
\documentclass[12pt]{letter} % could be paper, article, report, book, slides ...
\begin{document} % start of document

Hello world from LaTeX.

\end{document} % end of document, nothing after this line

Note some syntax "\" and "{ }"
"\" starts a command, there are many.
You can define new commands to save on typing.
More commands can be brought in from packages

"{ }" are for grouping or enclosing. Usually REQUIRED!
e.g. \frac {numerator equation} {denominator equation}

Generally, arbitrary nesting is allowed, just use
lots of { { {} {} } { } }

one or mode spaces is a space.
carriage return and tab are a space.
lines automatically formatted unless you interfere
No semicolons at end of lines or statements

Printing equations
equation.tex
equation.ps

% equation.tex just a few samples
\documentclass[12pt]{article}
\begin{document} % start of document
For subscript use underscore

$A_1 \; \; A_i \; \; A_{i,j}$ % \; for more spaces

For power, superscript use hat

$x^2 \; \; x^{p+q} \; \; x^{y^z}$

${n \choose k} = \frac{n!}{k!(n-k!)}$

$z = \sqrt {x^2 + y^2}$

$\int _{i=0} ^{\infty} \frac {1}{{(x+1)}^2} dx$

$$\sum _{j=1} ^{10} A_j$$

\end{document} % end of document, nothing after this line

printing matrices and vectors
Using \matrix

pdemat.tex
pdemat.ps
% pdemat.tex
\documentclass[11pt,fleqn]{article} % preamble follows
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{0.5in}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{0in}

\newcommand{\LL}{\mit L}

\begin{document}        % start of document

System of simultaneous equations that are too wide for paper

$$\left| \matrix{ \int_{\Omega} \LL \left( \phi_1(x,y) \right) \phi_1(x,y) dx \, dy \ \int_{\Omega} \LL \left( \phi_2(x,y) \right) \phi_1(x,y) dx \, dy \ \ldots \ \int_{\Omega} \LL \left( \phi_{nxy}(x,y) \right) \phi_1(x,y) dx \, dy \cr \int_{\Omega} \LL \left( \phi_1(x,y) \right) \phi_2(x,y) dx \, dy \ \int_{\Omega} \LL \left( \phi_2(x,y) \right) \phi_2(x,y) dx \, dy \ \ldots \ \int_{\Omega} \LL \left( \phi_{nxy}(x,y) \right) \phi_2(x,y) dx \, dy \cr \ldots \cr \int_{\Omega} \LL \left( \phi_1(x,y) \right) \phi_{nxy}(x,y) dx \, dy \ \int_{\Omega} \LL \left( \phi_2(x,y) \right) \phi_{nxy}(x,y) dx \, dy \ \ldots \ \int_{\Omega} \LL \left( \phi_{nxy}(x,y) \right) \phi_{nxy}(x,y) dx \, dy \cr } \right| \times$$
$$\left| \matrix{ U_1 \cr U_2 \cr \ldots \cr U_{nxy} \cr } \right| = \left| \matrix{ \int_{\Omega} f(x,y) \phi_1(x,y) dx \, dy \cr \int_{\Omega} f(x,y) \phi_2(x,y) dx \, dy \cr \ldots \cr \int_{\Omega} f(x,y) \phi_{nxy}(x,y) dx \, dy \cr } \right|$$

\end{document} % end of document, nothing after this line

Drawings that have a lot of math symbols
put(x,y){\}
note that x and y and dimensions are in "points" 1/72 of inch
\line \vector \circle \oval \qbezier

draw2.tex
draw2.pdf
draw2.ps

% draw2.tex
\documentclass[11pt,fleqn]{article} % preamble follows
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{1in}
\setlength{\textwidth}{7.5in}
\setlength{\oddsidemargin}{0in}

\begin{document}        % start of document
\thicklines

Polynomial Hierarchy based on quantifiers

\begin{picture}(468,576)

% basic hierarchy lines
\put(70,110){\framebox(288,432)} % \hbox overfull, NOT
\put(70,182){\line(4,-1){288}} % slope and length
\put(70,182){\line(4,1){288}}
\put(358,182){\line(-4,-1){288}}
\put(358,182){\line(-4,1){288}}
\put(70,326){\line(4,-1){288}}
\put(70,326){\line(4,1){288}}
\put(358,326){\line(-4,-1){288}}
\put(358,326){\line(-4,1){288}}

% labeling
\Large
\put(210,122){\mbox{${\rm P}$}}
\put(80,140){\mbox{${\Sigma_1^{\rm P}}={\rm NP}$}}
\put(265,140){\mbox{$\Pi_1^{\rm P}={\rm co\mbox{-}NP}$}}
\put(50,140){\mbox{${\exists}$}}
\put(370,140){\mbox{${\forall}$}}
\put(205,176){\mbox{$\Delta_2^{\rm P}$}}

\put(80,212){\mbox{${\Sigma_2^{\rm P}}={\rm DP}$}}
\put(265,212){\mbox{$\Pi_2^{\rm P}={\rm co\mbox{-}DP}$}}
\put(42,212){\mbox{${\exists\forall}$}}
\put(370,212){\mbox{${\forall\exists}$}}
\put(205,248){\mbox{$\Delta_3^{\rm P}$}}

\put(80,284){\mbox{${\Sigma_3^{\rm P}}$}}
\put(265,284){\mbox{$\Pi_3^{\rm P}$}}
\put(34,284){\mbox{${\exists\forall\exists}$}}
\put(370,284){\mbox{${\forall\exists\forall}$}}
\put(205,320){\mbox{$\Delta_4^{\rm P}$}}

\put(80,356){\mbox{${\Sigma_4^{\rm P}}$}}
\put(265,356){\mbox{$\Pi_4^{\rm P}$}}
\put(26,356){\mbox{${\exists\forall\exists\forall}$}}
\put(370,356){\mbox{${\forall\exists\forall\exists}$}}
\put(205,392){\mbox{$\Delta_5^{\rm P}$}}

\put(120,474){\mbox{${\rm PH}={\bigcup_{i=1}^{\infty}{\Sigma_i^{\rm P}}} ={\bigcup_{i=1}^{\infty}{\Pi_i^{\rm P}}}$}}

\normalsize
\end{picture}
\end{document} % end of document, nothing after this line

http://en.wikibooks.org/wiki/LaTeX/Creating_Graphics

Graphs using qbezier
qbezier.tex
qbezier.ps

% qbezier.tex absolute minimum to get a plot
\documentclass[12pt]{article}
\begin{document} % start of document

\setlength{\unitlength}{1mm}
\begin{picture}(90,32)
\put(0,15){\vector(1,0){57}}
\put(60,14){$\chi$ = Rapidit\"at}
\put(25,0){\vector(0,1){32}}
\put(27,29){$\beta=v/c$}
\multiput(0,5)(4,0){13}{\line(1,0){2}}
\multiput(0,25)(4,0){13}{\line(1,0){2}}
\put(60,24){$\displaystyle\beta=\frac{v}{c}=\tanh\chi$}
\qbezier(25,15)(34,24)(45,24.6)
\qbezier(25,15)(16, 6)( 5,5.4)
\end{picture}

\end{document} % end of document, nothing after this line

title page, chapters, sections, appendices.

simplepaper.tex
simplepaper.ps
\documentclass[12pt]{report} % preamble follows
\title{My Title}
\author{Me \\
Other \\
\\
More \\
\\ }
\date{September 15, 1999} % put in or uses date generated

\begin{document} % start of document

\maketitle % uses parameters from above \title \author \date

\begin{abstract}
Just an abstract here.

\end{abstract}

\tableofcontents  % will be generated and put here

\chapter{Introduction}
This is just text in the open.

A blank line makes a new paragraph, indented.
Stuff in this paragraph, automatically fills lines.

Force end of short line \\
Another short line \\

\begin{verbatim}
Verbatim text       left lots of spaces

Indented and skipped two lines
Unindent.
\end{verbatim}

\chapter{More stuff}
\begin{verbatim}
Hi there
Line two, col 1
\end{verbatim}

\newpage

\appendix
\chapter{This appendix covers xyz}

Start of appendix
\section{append section first}
Just next section of appendix

\section{more append sections}
more appendix stuff

end of my text

\end{document} % end of document, nothing after this line

Lecture 39, Tutorial on numerical solution of differential equations

The goal of this lecture is to have the student understand how
to numerically compute the solution of a differential equation.
The numerical solution will be computed by a computer program.
Almost any computer language can be used to compute the numerical
solution of differental equations. Various languages will be
used as examples. Much of the notation will be more computer
programming oriented than mathematical.

This lecture starts with very basic concepts and defines the notation
that will be used.

Expect terms to be defined close to the terms first usage.

partial differential equations.

An ordinary differential equation has exactly one independent
variable, typically  x .

A partial differential equation has more than one independent
variable, typically  x,y  x,y,z  x,y,z,t  etc.

understanding derivatives

From a numerical point of view, the derivative of
a function is the slope at a specific value for the
independent variable.

The second derivative is just the derivative of
the first derivative of a function.

Note***  first a polynomial, then the sine

we know from Calculus:
U(x) = sin(x)
U'(x)   d/dx sin(x) = cos(x)   the first derivative od sine x is cosine x
U''(x)  d/dx cos(x) = -sin(x)  the second derivative of sine x is -sine x
U'''(x) d/dx -sin(x) = -cos(x) the third derivative of sine x is -cosine x

Look at the plot of sin(x) and its derivatives.
Note the slope of each curve at each point, is the derivative
plotted below it. A few are highlighted by red, blue and
green dots.

notation for ordinary differential equations

f(x)  A function of x, typically any function of one independent variable.
x^3 + 7 x^2 + 5 x + 4  is a polynomial function of x,
x cubed plus 7 times x squared plus 5 times x plus 4 .

sin(x) a function of x that computes the sine of x .
All trignometric functions will use x in radians.

y=f(x) an equation, some times written  y = f(x);

y'(x)=3 x^2 + 14 x + 5; y(0)=4;
A differential equation with an initial condition.
The analytic solution is  f(x) the polynomial above.
Given y=f(x) , y'(x) is the notation for taking
the first derivative of f(x),
also written as  d/dx f(x)  or d f(x)/dx .
Typically we are given a starting and ending value
for which the numerical solution is needed.
The result may be a table of  x,y values and
often a graphical plot of the solution.

y''''(x)-y(x)=0; y(0)=0; y'(0)=1; y''(0)=0; y'''(0)=-1;
A fourth order differential equation with initial conditions.
We will see later why four initial conditions are
needed in order to compute the numerical solution.
We know the analytic solution is f(x)=sin(x);
yet we will see that we must be careful of errors
in computing the numerical solution.

a(x)*y'' + b(x)*y' + c(x) = 0 is a second order ordinary
differential equation. The three functions a(x),
b(x), and c(x) must be known and programmable.
In general functions will appear more than
constants in differential equations.

notation for partial differential equations

f(x,y,z) a function of three independent variables.
2*x*y^2*z + 4*x^2*z^2 + 3*y
a polynomial in three variables
2 times x times y squared times z plus
4 times x squared times z squared plus 3 time y

U(x,y,z)  will be the numerical solution of a partial differential
equation giving values of f(x,y,z) in a specified three
dimensional region.

Ux(x,y,z) or  just  Ux  if the independent variables are obvious,
is the first derivative with respect to x of f(x,y,z) .

Uxx       is the second derivative with respect to x of f(x,y,z)

Uxy       is the first derivative with respect to x of f(x,y,z)=Ux
then the first derivative with respect to y of Ux.

Uxyyz     is the first derivative with respect to x, then the
second derivative with respect to y, then the first
derivative with respect to z.

Uxxx + a(x,y,z)*Uyyz + b(x,y,z)*Uz = c(x,y,z);
Is a partial differential equation.
In order to numerically solve this equation we need
the three functions a(x,y,z), b(x,y,z) and c(x,y,z)
to be specified and we must be able to code them
in the programming language we wish to use to
compute the numerical solution.
Typically this type of partial differential equation
has a closed three dimensional volume, called the
boundary, where the solution must be computed.
Enough values of the function f(x,y,z) must be
given, typically on the boundary, to have a
unique solution defined. These values are known
as Dirchilet boundary values. Derivative values
may also be provided on the boundary, these are
known as Neumann boundary values.

required conditions

In order to compute a unique numerical solution we need:
1) The function f that is the solution must be continuous
and continuously differentiable in and on the boundary.
(Solving for discontinuous results, such as shock waves
is beyond the scope of this tutorial.)
2) There must be enough boundary values to make the
solution unique. At least one Dirichilet value must
be given and possibly some number of Neumann values.
3) The boundary must be given. For this tutorial we
cover only one closed boundary with no internal "holes".
4) The coordinate system is assumed Cartesian unless
otherwise specified, Polar, Cylindrical, Spherical,
Elipsoidal, or other.

numerically solving our first ordinary differential equation

Lecture 40, unique numerical solution of differential equations

The goal of this lecture is to show the long development
cycle of a unique numerical solution of a partial differential
equation in the domain of an arbitrary 3D closed polyhedra with
Dirchilet boundary conditions at the vertices.

The development included using several file formats to define
the closed polyhedra and the developing code to be sure that
the polyhedra was really closed. There may be one or more
polyhedra "hole" completely inside the outer closed polyhedra.

Just determining the "refinement", the location of internal
points where the solution is to be computed was needed.

Previous solutions have used FEM, finite element method and
various forms of discretization. Basically building a system
of linear equations containing the internal points and then
solving the system of linear equations to get the solution
at each point. The unique method that came about during this
development is to solve for each internal point, DOF, degree
of freedom serially, independent of any other internal point.
This also makes the solution easily parallizable to any
number of processores.

The intuition came from studying one internal point that
seemed to be surrounded by a randomly scattered bunch of
vertices. Oh! If I could compute the derivatives in the
PDE at that point based on boundary vertices with known
Dirchilet values, the only unknown is the solution at
that internal point. Yes I can. My object of interest
was a Medieval Spiked Flail, the surface was a bunch
of points created out of tetrahedrons.

Now, to the formal development, specifically for three
dimensions, although it does generalize. Also limited
to fourth order derivatives, although that could be
increased.

The series of Makefile runs

sphere_div.c  data generator

spike.dat polyhedra

format changer .dat to .inp UCD

spike.inp polyhedra UCD format

test_pde_inside3_ucd.c point in polyhedra
test_pde_inside3_ucd_c.out point in polyhedra tested

nuderiv3dg.h derivative from random points
nuderiv3dg.c up to 4th order in 3 variables
test_nuderiv3dg.c test every call
test_nuderiv3dg_c.out all OK

lsfit.h  multidimensional polynomial fit,
lsfit.c  fit from random points
test_lsfit7.c  sample test
test_lsfit7_c.out  sample output

simeq.h  solve simultameous equations
simeq.c  solve simultameous equations
test_simeq.c  simple test
test_simeq_c.out  OK

point_in_poly.h determine if a point
point_in_poly.c is in polygon or polyhedra
test_point_in_poly.c  test
test_point_in_poly_c.out results

pde_spike.c  solve 4th order PDE on spike.inp
pde_spike_c.out  test results

Lecture 41, Numerically solving AC circuits

Spice is one of a number of electrical circuit simulation programs.
Our UMBC computers have a version called  ngspice  installed.
This is an example of a transient analysis, computed by integration
in small time steps. Similar to our rocket simulation.

Spice circuit input

capdio.cir voltage multiplier  ngspice -b capdio.cir > capdio.out
VS  99  0  AC  10  SIN(0VOFF 10VPEAK 1KHZ)
R1  99  98  1
C1  98  1  1UF
C2   0  2  1UF
C3   1  3  1UF
C4   2  4  1UF
C5   3  5  1UF
C6   4  6  1UF
D1   0  1
D2   1  2
D3   2  3
D4   3  4
D5   4  5
D6   5  6
.TRAN  100US  50MS
.print tran V(1) V(2) V(3) V(4) V(5) V(6)
.plot  tran V(1) V(2) V(3) V(4) V(5) V(6)
.end

The schematic diagram with node numbers is:

The simulation with time step of 100 microseconds
for 0.01 seconds shows the voltage buildup at
the six nodes 1, 2, 3, 4, 5, 6 in the circuit.

Running the simulation to 0.04 seconds shows
stable voltage at node 6.

Another type of Spice simulation is to simulate
a circuit at a series of frequencies. Type AC.
After setup of a matrix, solve simultaneous
equations to compute results.

Spice circuit input

ac.cir ac tuned circuit   ngspice -b ac.cir > ac.out
I  0 1 ac 1000.0
R1 0 1 0.001
R2 1 2 1000.0
R3 1 3 1000.0
R4 1 4 1000.0
R5 1 5 1000.0
C1 2 0 0.0000001591549431
L1 3 0 0.1591549431
C2 4 0 0.0000001591549431
L2 4 0 0.1591549431
C3 5 6 0.0000001591549431
L3 6 0 0.1591549431
*F1 10.0 100000.0 10.0
.ac DEC 20 100 10k  $100Hz to 10kHz, 20 points per decade .print AC V(1) V(2) V(3) V(4) V(5) .plot AC V(2) V(3) V(4) V(5) .end Some of the output from ac.out showing magnitude of signal at nodes 2, 3, 4, 5 with frequency from 100Hz to 10kHz, note where frequency matches the parallel tuned circuit at node 4, and the series tuned circuit at node 5, at 1kHz. Note node 2, the capacitor C1 has low impedance at high frequency. Note node 3. the inductor L1 has high impedance at high frequency. Circuit: ac.cir ac tuned circuit ngspice -b ac.cir > ac.out selected output from ac.out Legend: + = v(2) * = v(3) X = multiple nodes = = v(4)$ = v(5)
--------------------------------------------------------------------------
frequency  v(2)     0.00e+00  2.00e-01  4.00e-01  6.00e-01  8.00e-01  1.00e+00
----------------------|---------|---------|---------|---------|---------|
1.000e+02  9.901e-01 X         .         .         .         .        X.
1.122e+02  9.876e-01 X         .         .         .         .        X.
1.259e+02  9.844e-01 X         .         .         .         .        X.
1.413e+02  9.804e-01 *=        .         .         .         .       $+. 1.585e+02 9.755e-01 .X . . . . X . 1.778e+02 9.693e-01 .X . . . . X . 1.995e+02 9.617e-01 .*= . . . .$+ .
2.239e+02  9.523e-01 . X       .         .         .         .      X  .
2.512e+02  9.406e-01 . *=      .         .         .         .     $+ . 2.818e+02 9.264e-01 . *= . . . .$+   .
3.162e+02  9.091e-01 .   *=    .         .         .         .   $+ . 3.548e+02 8.882e-01 . * = . . . .$ +     .
3.981e+02  8.632e-01 .     *  =.         .         .         $+ . 4.467e+02 8.337e-01 . * .= . .$ .+        .
5.012e+02  7.992e-01 .         *    =    .         .   $+. . 5.623e+02 7.597e-01 . . * =$.      +  .         .
6.310e+02  7.153e-01 .         .   *     .  $= . + . . 7.079e+02 6.661e-01 . . X . . X . . 7.943e+02 6.131e-01 .$ .        *.         +         .=        .
8.913e+02  5.573e-01 . $. . * + . . = . 1.000e+03 5.000e-01$         .         .   X     .         .        =.
1.122e+03  4.427e-01 . $. . + * . . = . 1.259e+03 3.869e-01 .$ .        +.         *         .=        .
1.413e+03  3.339e-01 .         .     X   .         .  X      .         .
1.585e+03  2.847e-01 .         .   +     .  $= . * . . 1.778e+03 2.403e-01 . . + =$.      *  .         .
1.995e+03  2.008e-01 .         +    =    .         .   $*. . 2.239e+03 1.663e-01 . + .= . .$ .*        .
2.512e+03  1.368e-01 .     +  =.         .         .         $* . 2.818e+03 1.118e-01 . + = . . . .$ *     .
3.162e+03  9.091e-02 .   +=    .         .         .         .   $* . 3.548e+03 7.359e-02 . += . . . .$*   .
3.981e+03  5.935e-02 . +=      .         .         .         .     $* . 4.467e+03 4.773e-02 . X . . . . X . 5.012e+03 3.829e-02 .+= . . . .$* .
5.623e+03  3.065e-02 .X        .         .         .         .       X .
6.310e+03  2.450e-02 .X        .         .         .         .       X .
7.079e+03  1.956e-02 +=        .         .         .         .       \$*.
7.943e+03  1.560e-02 X         .         .         .         .        X.
8.913e+03  1.243e-02 X         .         .         .         .        X.
1.000e+04  9.901e-03 X         .         .         .         .        X.
----------------------|---------|---------|---------|---------|---------|
frequency  v(2)     0.00e+00  2.00e-01  4.00e-01  6.00e-01  8.00e-01  1.00e+00

Then, nodes 2, 3, 4, 5 plotted on linear frequency scale

Syllabus - class dates and subjects, homework dates, reading assignments

Homework assignments  - the details

Projects -

Partial Lecture Notes, one per WEB page

Partial Lecture Notes, one big page for printing

Some brief notes on Matlab

Some brief notes on Python

Some brief notes on Fortran 95

Some brief notes on Ada 95

Finite difference approximations for derivatives

Reference pages on Taylor series, identities,
coordinate systems, differential operators

selected news related to numerical computation

Go to top

Last updated 9/22/13