CMSC 791A : File Systems & Mass Storage
You're encouraged to work on a project of your own choosing
&endash; as long as it has something to do with file systems or
storage, it's probably OK (but check with me first). However, I have
several project suggestions for those who don't have topics of their
own. NOTE: each project can be done by at most one group, and I'll
decide (randomly) which group gets a "contested" topic.
- Create a good file system benchmark.
Not much has been done beyond the Andrew benchmark, which
simulates compilation and various micro-benchmarks that stress the
file system by reading and writing fixed amounts of data. Your
benchmark (suite - I'd suggest more than one) should reflect real
workloads but also be portable to a wide range of systems. You can
also report on the performance of various file systems at UMBC
(and elsewhere, if you have access).
- File system tracing.
File system researchers can always use good traces. By
gathering long-term (several days) usage information on the CS or
(better) GL clusters, we can understand how current file system
usage has (and hasn't) changed from previous studies. Hopefully,
this can be done without too many modifications to IRIX....
- File-based redundancy.
Traditionally, redundancy in file systems has resided at the
disk level (RAID). This gives the same redundancy to all files,
but it's not too flexible. Small files might be best off
duplicated (to reduce the overhead of updates), while large files
might compute parity across many disks. Propose a design for a
file system that does this, and simulate it. How well does it
perform? What are the limitations?
- File system usability.
Traditional file systems have been hierarchical. However, this
paradigm doesn't work as well in modern file systems because there
are usually many ways to categorize an object. Some file systems
(ie, SGI's xfs) have support for metadata, but they still can't
search through it. Propose a file system design that relies on an
underlying flat file system (64 bit file IDs) and supports a
variety of indexing schemes, including (at least) a relational
database system. Building a simple prototype would be a plus.
- File systems using content-derived names.
One approach to building a file system is to name an object by
hashing its content. This makes for security (it's easy to see
when a file has changed) but may introduce other problems.
Implement a prototype of this file system (possibly with the help
of user-level programs to compute names) and see how such a system
would change the way programs work.
- File systems for mobile computers.
There have been several proposals for mechanisms to allow mobile
computers to share files with non-mobile file servers. There are
many issues that must be dealt with, including consistency
mechansims (particularly for files that might be written by more
than one mobile user), trust (unlike a LAN, a mobile computer has
to be sure that the data it gets is correct), and caching (a
mobile user may want files that are stored across the country).
Address one or more of these issues in a file system design and
(hopefully) simulation or implementation.
- Wide area file systems.
With the advent of the World Wide Web, people want to build
global file systems. Such a system could include files from many
local file systems. Propose (and preferably implement a prototype)
of such a system. Issues include naming of files (different people
may want to use different names yet share objects), security, and
caching. NOTE: a project similar to this was done in CMSC 621.
This might therefore be a good project for them; however,
duplication of effort that's already been done is not
- File systems for multimedia.
Multimedia file systems place special demands on file systems
because they store large objects and must supply them at a
guaranteed rate. Start from the papers in the recent SOSP
conference (October 1997) and discuss improvements to the file
system covered in the conference. In particular, their file system
didn't do much load balancing by duplicating common files.
The dates marked with a (*) will require a 10 minute meeting with
me to make sure all is going well with your project. These meetings
will be scheduled on an online signup sheet. Meeting times will
generally be either just before or just after class.
- 12 Feb 1998 (*): Group & project
Each group should turn in a single page listing the group members
(with e-mail addresses) and a brief description of the
- 26 Feb 1998: Background research well
Each group should turn in a list of 10-12 papers on its topic
along with a brief summary of each paper. These papers will form
the basis for your project. You need not have read every paper by
this date, but you should have read some of them and read the
abstract for all of them.
- 19 Mar 1998 (*): Preliminary plans &
By this time, your group should have its project planned out and
designed. Early results would be welcome, but not required.
- 21 Apr 1998: Implementation & coding complete:
At this point, you should be done with any programming you need to
do. The rest of the time will be spent running experiments and
writing up your results. If your code isn't complete by this time,
it will be difficult to complete the project.
- 12 & 14 May 1998: Project
One member of the group will present their project in a 25 minute
presentation (20 minutes for the talk, and 5 minutes for
questions). This presentation should focus on quantitative
- 14 May 1998: Written project report due
Each group must hand in a written at the last class. This report
should describe the background material, the project design, and
any results. A sample
paper is available.
This page has been visited by
5 Feb 1998