Project 3

Assigned Oct. 16 , 2000
Due Oct. 30, 2000
Preview Sessions WTh, Oct. 18 & 19, 2000


Background

Most modern filesystems use a heirarchical model of data storage. A filesystem contains a top-level directory which may in turn contain subdirectories as well as "plain" (non-directory) files.

Since filesystems manage data on storage devices and storage devices (usually) have a fixed size, it's convenient to be able to see how that space is being used. To do this, we can write a tree-like operation which asks the filesystem how much space is used beneath each subdirectory. Unix programs like du can provide this information for a given set of files. However, when we want to examine the information for a large number of directories, we can become overwhelmed by a sea of numbers. In cases like these, where we need to quickly process large quantities of data, a pictorial representation is often very helpful. This pictorial representation is called a visualization. One way to visualize this sort of heirarchical, quantitative data is to use something called a tree map. For this project, you will be implementing a general tree structure to hold filesystem data as well as a tree map to visualize the use of space between areas of the filesystem.

For example, given the following input file:

/umbc/csee/full 6 /umbc/csee/associate 8 /umbc/csee/assistant 5 /umbc/csee/lecturer 6 /umbc/math/full 6 /umbc/math/assistant 3 /umcp/cs 50 /umcp/math 20 Your program will generate the following image:

Before you panic, let's talk about how the tree map encodes both the size and tree information into the same picture. In addition, let me point out that the graphics primitives will be provided. All you'll need to do is know where to draw the rectangles.

Drawing rectangles is a good place to start. Each directory is represented by a rectangle in the image above. In fact, the entire filesystem is actually a "rooted" tree. The root is a directory called "/". The everything shown in this image is in the tree that starts at /. Your program will always use rooted trees and assumes that all of the paths given start from a common parent directory. This means that you can leave off the first "/" from all of the filenames and you'll get the same result.

Let's look at what happens if we have just one directory. For an empty input file, we have no subdirectories. This means that all we know is that 100% of the space used in the filesystem is under the root directory. The tree map for this case looks like this:

Notice that this shows just that: that 100% of the space is taken up by the directory within.

This next case looks at the following file. Note that there are two sub directories for this directory.

umbc 34 umcp 70

The thing to notice here is that the top level directory has been cut vertically into two rectangles and that the areas of these rectangles are proportional to the percentage of the storage space used beneath them. That means that since the total is 104, the top rectangle is 34/104 = almost 1/3 of the space while the bottom rectangle takes up the remaining 2/3. Note that 74/104 is about 2/3. Make sure that you understand this picture before moving on. Remember, the numbers in the file are sizes in blocks, not percentages.

Adding More Levels
The following file shows what happens when we add subdirectories to just one of these rectangles. umbc 34 umcp/cs 50 umcp/math 20

Notice that the total size (in blocks) is still 104 and that umcp and umbc still take up the same amount of this space. The difference here is that we're explaining where umcp's space is used in more detail. To keep from having an ever finer series of horizontal lines showing this partitioning, the levels in the filesystem alternate (by depth) between horizontal and vertical divisions of the space. Note that the umbc portion of the filesystem is identical to the previous picture. However, the umcp section has been partitioned into two smaller rectangles. One of these is 50/70 (about 70%) and the other is about 20/70 (the remaining 30%) of the area of the umcp rectangle.

Note that this process is recursive. The only thing that differs from level to level is the alternation of the horizontal and vertical partitioning.

Adding extra levels of detail, we end up with the following images as we progress towards the full detail of the first image:

... and finally:

Note also that any heirarchical data which can be given a numeric value can be visualized in this way. For example, the list above could be the number of courses taught by different types of instructors in the UM system.

Objectives

This project will once again deepen your skills in coding with templates. You will
  1. work with the tree ADT and extend it by creating a data type, the TreeMap
  2. practice designing robust classes
  3. Design and implement the required behavior for the KTree, a generic tree class.
In addition, you will continue to practice using the string data class.

What to do

You will implement three classes: The KTreeNode class, the KTree (k-ary tree) class and the TreeMap class.
The main function
Your main function should perform these tasks:
  1. Read the command line to obtain the name of the text file for which it will build the tree and tree map
  2. Build the TreeMap (which involves building the KTree)
  3. Display the original tree on cout (See sample run below)
  4. Calculate the space contained in each subtree and display that on cout (See sample run below)
  5. Use the provided Image class to write the visualization of the tree to a file, named "infile.ppm" where "infile" was the name of the input file name.
The KTreeNode class
Trees are made of nodes and this one is no exception. This file (KTree.H) contains a class definition for a node in a first-child/next-sibling tree that maintains a size value at each node. Note that these nodes can hold any kind of data but that each node will have some size associated with it. Note also that all of the members of the node are private but are accessible to tree and tree iterators through friendship.
The KTree class
The same file contains a definition for class KTree. This is a general tree class with operations as expected.

In addition, a simple tree iterator class is provided for you. Note that this class has no notion of order, but rather can choose to go down() to its first child or over() to its next sibling at any given point. You should feel free to modify the iterator's operation to suit your needs, but the project can be written using only this iterator; you do not have to write your own iterator in order to finish the project.

The input files on specify sizes for leaf nodes (for instance, in a file system, only files really have their own sizes, while directory sizes are the total of the sizes of the files they contain and a little book-keeping space). So, when a tree is initially created, only leaf nodes will ahve sizes. In order to facilitate the calculation of the size requirements of the while subtree rooted at a node, the KTree class provides two utility methods, percolate() and rinse(). percolate() traverses the tree to calculate the total size of all descendents of a node and store that total as the node's size. rinse() does the opposite, traversing the tree and setting all sizes of non-leaf nodes to be zero. percolate() and rinse() are not standard tree operations, but we have chosen to add them here to make your life easier.

The TreeMap class
The real public part of this project is the Treemap class contained in Treemap.H. Your main will have no notion of trees outside of the tree map class.

Treemaps are a subclass of KTree support all of the tree operations. In addition, Treemaps can draw themselves to an image. Note that there are both public and private draw() methods. The private version actually does all of the work (in the same way that the textbook's tree operations did) while the public version simply calls the private one. The private function has signature: draw(Image & i, int xmin,int ymin,int xmax,int ymax, const TreeItr t, int level); where i is the image to draw into, xmin, ymin, xmax, and ymax give the rectangle that the map for this subtree should be drawn into (initially the whole image), t is an iterator pointing to the subtree to be drawn, and level tells which level of the tree is currently being drawn (useful for deciding whether to split this rectangle horizontally or vertically when recursing to children). Your main task in coding the private version is determine the dimensions of the rectangles in the drawing and to draw them to the Image that's provided.

To draw a subtree, draw the whole rectangle. Then, if the subtree is not a leaf, divide the rectangle into parts according to the sizes of the children and call each child recursively to draw itself.

The Provided Image class
The files Image.H and Image.C are provided to you to simplify drawing. Your program will output files in Portable Pixel Map (PPM) format. (See man ppm for details if you like.) On the Xwindows system at UMBC, both xv and xloadimage can be used to view PPM files. Under windows or MacOS, various freeware viewers can be found. In addition, the SGIs at UMBC have a program called ppmtogif installed which will take the uncompressed (read: BIG) PPM files and convert them into the more common GIF format if you prefer. The OIT consultants can provide extra help on finding a viewer for your platform. If not, please complain to their manager.

Grading and Academic Integrity

Project grading is described in the Project Policy handout. Please remember that you must provide good documentation as described in the handout.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.


Files To Be Submitted

You should submit only the files you have written, and a makefile. The files to be submitted are: Please do not submit any of the files provided to you such as LinkedList.h, etc.

Submit the files using the procedure given to you for your section of the course.


Sample Output

Sample output is available for your study at
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj3/sample_output.txt
It was obtained by executing Proj3 on irix.gl with a command-line argument of small.txt.

A copy of the executable is available for you to try out. It is

/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj3/Proj3
Other sample input files can be found in
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj3/*.txt