CMSC 683/691c, Parallel Programming

Project 2: The N-body Problem in MPL

(c) 1996, Howard E. Motteler
Assigned Tues Oct 15
Due Tue Oct 29
100 pts

The Project

You are to write a MasPar mpl program that reads an array specifying the mass, position, and velocity of a large number of objects, computes their motion under the effects of mutual (Newtonian) gravitation, and writes an array representing their positions after a selected time interval. In addition, as a command line option, your program should periodically write a series of bitmaps to the standard output, representing a projection of the positions of the objects.

A MatLab implementation of the n-body problem is available; this linear-algebraic approach may not be the most efficient algorithm, but it can help you to get started. A MatLab program to generate an initial set of bodies is also available. Some thoughts concerning realistic values for the various parameters for n-body problems, courtesey of Jim Scott, are also available.

Specifications

Your program should be able to do the n-body problem for up to as many bodies as there are PEs; for extra credit you can do more bodies, which will probably require some sort of tiling. Your program should accept the following arguments, and be able to perform the described functions.

Basic parameters

  -t n   set delta-t (the time increment) to n  (default 1)
  -g c   specify a gravitational constant of c  (default 1)
  -q m   write output and quit after m steps    (default 1)

Input and output files

  -i fin    read input object file fin (default in.mat)
  -o fout   write output object file fout (default out.mat)
The input and output files are n x 7 arrays of ascii (%g) format floating point numbers, in which each row/line represents a single object. The columns are organized as follows.

                     (position)   (velocity)
              mass    x   y   z    x   y   z

Graphic representation of bodies

  -b k   produce a 256 x 256 bitmap projection of the xy
         plane every k-th step
If selected by the "-b n" flag, bitmap projections are written to standard output every n-th step. The bitmap is a 256 x 256 array of 8-bit pixels; it is written as a row-order stream of bytes, with one pixel per byte. The bitmap projection should be scaled to include all objects initially, as follows. Let

       L = min x val of initial data
       R = max x val of initial data
       B = min y val of initial data
       T = max y val of initial data

This defines a rectangle, which we square off as follows:

       if (R-L) > (T-B) then
           d = (R-L) - (T-B);
           T = T + d/2;
           B = B - d/2
       else
           d = (T-B) - (R-L)
           R = R + d/2
           L = L - d/2
Now, multiply each of the values L, R, B, T by 1.25, to get the initial ``image square'' used to generate the bitmaps. The default action is to use this initial image square for all successive generations, even though ojects may leave the square, or condense to a smaller subregion.

Each of the 256 x 256 pixels corresponds to a small sub-square or tile of the image square. If the x-y coordinates of an object's position fall within a particular pixel's region, we can think of the object as being ``in'' that pixel. A given pixel might have zero, one, or many objects.

We want the bitmap to show the distribution of mass in the image plane. With the colormap "nbody.cmap" pixel values range from 0 to 127, with 0 black, 1 red, and values 2-127 ranging in a spectrum from red to violet. A 0-valued pixel represents no mass, i.e., no body in that pixel. The non-zero pixels correspond to total mass of bodies ``in'' a pixel, with pixel value 1 corresponding to the lightest mass, and pixel value 127 the sum of all masses. If m1 is the smallest mass, m2 the largest, and s is the sum of masses in a particular pixel, then that pixel should have the value

              126 * (s - m1)/(m2 - m1) + 1 .
This will displays the range of masses as a "spectrum", with lighter bodies red, medium bodies green, and heavier bodies blue. It should be truncated so as not to exceed 127, leaving all the really heavy stuff a blue-violet color.

Image client and server programs (courtesy of Amen Zwa, hacked on very slightly by me) are provided; the bitmap output of your n-body solver can be piped into the image client (running on the MasPar host) with the image displayed by the server appearing on your local X-console.

Examples

Suppose your program is called "nbody". To read a file "in.mat", compute one time step with default values dT=1 and G=1, and write the results to "out.mat" you would simply type:

     nbody
To read a file "in8192.mat", compute 1000 time steps, write a xy bitmap projection out each 100th steps, and use "isend" pipe the results to an image server running on toto.cs.umbc.edu you would type:
     nbody -i in8192 -q1000 -b100 | isend -s toto.cs.umbc.edu

Test data

Test data is available as a tar file which includes one dT step of a 100 body problem (in100.mat and out100.mat) and one dT step of an 8192 body problem (in8192.mat and out8192.mat). These were generated using dT=1 and G=1. These files and some additional test data are also available on the MasPar in the ~motteler/proj2.data directory.

You can generate your own test data sets for small numbers of bodies (up to a few hundred) with the MatLab programs mkbod.m and nbody.m. Beware that because of the inherently chaotic nature of the n-body problem, two different but correct implementations may diverge after a sufficiently long sequence of dT steps.

What to turn in

Use email to submit a single mpl source file. Make sure that your name and "project 2" are at the top of the file. At the beginning of your comments you should also mention: (1) how well your project works, including any missing or extra features, and (2) your DPU time on 10 iterations of the in8192.mat test data, excluding read and write time.

Grading

See the general information on programming projects. About 60% of your grade is based on how well your code works, with the remainder of the points divided between design and documentation. Projects are due by midnight of the assigned date; there is a 5% bonus for each day a project is turned in early (for up to two days early), and a 5% penalty for each day the project is turned in late. Extra credit may be given for interesting extra features, including handling more bodies than PEs and improving the bitmap viewer.

Hints and Suggestions

The simplest approach is probably to associate masses with PEs, and to rotate a copy of masses and positions through the processor array, so that each mass gets paired up with each other mass. The MasPar manual section I gave you has an example of a simple "raster" procedure that may be useful for doing such rotations, on page 2-40.

One way to develop you program is to first write a sequential version with three loops: an outer loop of time steps, and two inner loops that compare each body to every other body. On the MasPar, the masses are distributed across the PE array, and this becomes a program with two loops: the outer loop of time steps, and an inner loop that rotates a copy of masses past each PE/body.

If you find that core of your project that actually does the n-body computation is very complicated, you may want to think very carefully about your overall approach; the SIMD version for this project should not be much more complex than a corresponding sequential implementation.

There are separate singular and plural math libraries; plural math functions begin with p_, e.g., p_sqrt(), and single precision plural math functions begin with fp_, e.g., fp_sqrt(). You will probably need the following include files:

#include <mpl.h>
#include <mpml.h>
#include <mp_libc.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <strings.h>
Note that, as discussed in class, reading and writing large arrays from the DPU, i.e., from .m programs with printf() and scanf(), can be very slow. The example ``bp'' program gets around this problem by doing all i/o from the front end, by using block-move procedures to move arrays between the front end and DPU, and by using the somewhat clunky callRequest() protocol to run DPU procedures from the front end. When even faster i/o is needed, the parallel file system, a RAID array of disks, is used. You don't have do do any of these things for your project; this is simply to point out that it's possible to do i/o much faster.