CMSC 791A (Spring '98): Project

Project Information

CMSC 791A : File Systems & Mass Storage

Spring 1998

Suggested Topics

You're encouraged to work on a project of your own choosing &endash; as long as it has something to do with file systems or storage, it's probably OK (but check with me first). However, I have several project suggestions for those who don't have topics of their own. NOTE: each project can be done by at most one group, and I'll decide (randomly) which group gets a "contested" topic.

Create a good file system benchmark.
Not much has been done beyond the Andrew benchmark, which simulates compilation and various micro-benchmarks that stress the file system by reading and writing fixed amounts of data. Your benchmark (suite - I'd suggest more than one) should reflect real workloads but also be portable to a wide range of systems. You can also report on the performance of various file systems at UMBC (and elsewhere, if you have access).
File system tracing.
File system researchers can always use good traces. By gathering long-term (several days) usage information on the CS or (better) GL clusters, we can understand how current file system usage has (and hasn't) changed from previous studies. Hopefully, this can be done without too many modifications to IRIX....
File-based redundancy.
Traditionally, redundancy in file systems has resided at the disk level (RAID). This gives the same redundancy to all files, but it's not too flexible. Small files might be best off duplicated (to reduce the overhead of updates), while large files might compute parity across many disks. Propose a design for a file system that does this, and simulate it. How well does it perform? What are the limitations?
File system usability.
Traditional file systems have been hierarchical. However, this paradigm doesn't work as well in modern file systems because there are usually many ways to categorize an object. Some file systems (ie, SGI's xfs) have support for metadata, but they still can't search through it. Propose a file system design that relies on an underlying flat file system (64 bit file IDs) and supports a variety of indexing schemes, including (at least) a relational database system. Building a simple prototype would be a plus.
File systems using content-derived names.
One approach to building a file system is to name an object by hashing its content. This makes for security (it's easy to see when a file has changed) but may introduce other problems. Implement a prototype of this file system (possibly with the help of user-level programs to compute names) and see how such a system would change the way programs work.
File systems for mobile computers.
There have been several proposals for mechanisms to allow mobile computers to share files with non-mobile file servers. There are many issues that must be dealt with, including consistency mechansims (particularly for files that might be written by more than one mobile user), trust (unlike a LAN, a mobile computer has to be sure that the data it gets is correct), and caching (a mobile user may want files that are stored across the country). Address one or more of these issues in a file system design and (hopefully) simulation or implementation.
Wide area file systems.
With the advent of the World Wide Web, people want to build global file systems. Such a system could include files from many local file systems. Propose (and preferably implement a prototype) of such a system. Issues include naming of files (different people may want to use different names yet share objects), security, and caching. NOTE: a project similar to this was done in CMSC 621. This might therefore be a good project for them; however, duplication of effort that's already been done is not acceptable.
File systems for multimedia.
Multimedia file systems place special demands on file systems because they store large objects and must supply them at a guaranteed rate. Start from the papers in the recent SOSP conference (October 1997) and discuss improvements to the file system covered in the conference. In particular, their file system didn't do much load balancing by duplicating common files.

Project Schedule

The dates marked with a (*) will require a 10 minute meeting with me to make sure all is going well with your project. These meetings will be scheduled on an online signup sheet. Meeting times will generally be either just before or just after class.

12 Feb 1998 (*): Group & project selection
Each group should turn in a single page listing the group members (with e-mail addresses) and a brief description of the project.
26 Feb 1998: Background research well underway
Each group should turn in a list of 10-12 papers on its topic along with a brief summary of each paper. These papers will form the basis for your project. You need not have read every paper by this date, but you should have read some of them and read the abstract for all of them.
19 Mar 1998 (*): Preliminary plans & design
By this time, your group should have its project planned out and designed. Early results would be welcome, but not required.
21 Apr 1998: Implementation & coding complete:
At this point, you should be done with any programming you need to do. The rest of the time will be spent running experiments and writing up your results. If your code isn't complete by this time, it will be difficult to complete the project.
12 & 14 May 1998: Project presentations
One member of the group will present their project in a 25 minute presentation (20 minutes for the talk, and 5 minutes for questions). This presentation should focus on quantitative results.
14 May 1998: Written project report due
Each group must hand in a written at the last class. This report should describe the background material, the project design, and any results. A sample paper is available.

This page has been visited by people.

Last updated 5 Feb 1998 by Ethan Miller (elm@umbc.edu)