CMSC 621- Fall 2002
Project
Overview
In this project, you will design and implement a simple distributed file system with file discovery and sharing. The system should allow on demand sharing of files between users. The system will consist of a number of nodes/hosts which are unreliable, in as much as that there is no guarantee that at any given time a particular node will be operational or accessible. Normal operations related to file systems, such as creating, deleting, opening, closing, reading, writing, seeking etc. should be allowed. Files should have access rights associated with them similar to Unix Files, and you may assume that userids are unique across the system.
Documents (10 points)
You are allowed to discuss the project across groups. Clearly, you are not
allowed to share solutions. You may read papers and textbooks in this area
as well -- some pointers are provided in this document. However, you should
cite the sources you have consulted. It is intended that you will do this
project on the CS/UCS Unix systems. However, you are free to develop the
code on your home/work machine, should you so desire, as long as the project
runs on CS/UCS machines.
Design and Implementation: (15 + 35 points)
Details:
Each user of the system owns a set of nodes. Think of them as different machines (pda, laptop, cell phone etc) that have a fixed storage capacity. Each user also owns a set of files. The aim is for users to be able to access files from any of their devices at all times no matter which nodes in the system are functional at that time. The location of the file should be transparent to the user. You will need to use some form of replication for this – however you must ensure consistency in as strong a manner as possible. You can assume that you know the IDs (e.g. ip address) of all nodes, that at least one node owned by each user is up at any given time, and that that no messages are lost in transit.
Cost Model:
If a file that a particular user owns cannot be made available there is a cost associated with it as specified below. There is a cost associated with network traffic equal to the size of the messages exchanged.
Cost associated with not finding ones own file is 3 * the size of the file.
Cost associated with not finding others files is 2 * file size
Testing and Validation: (20 Points)
When the system starts, its initial state is specified as detailed below.
In addition, a request set can be specified -- if one is
specified then your system should simulate its exection.
The format of all test
files mentioned above will be released by October 10. When finished, your
system should provide the cost incurred by each node for the given request
set, and which requests could not be satisfied.
Assumptions:
(20 points)
The description of the system above, like in real life, is underspecified. You will need to make additional assumptions to come up with a concrete design and implementation. You should realize that the aim of the system is to minimize the cost of each node. Thus valid assumptions to achieve this goal will be given credit. Look at papers on similar systems for ideas. The more general your assumptions, the more the credit you get.
Demonstration:
You will submit your code
and project report on the due date (
Some general suggestions
As should be evident to most of you, it is imperative for a project of this complexity and involving teams that you design your system before you code! In your design, you will need to make assumptions as you flesh in the details of the system. Please make sure that you state them in your design document. Make a timeline for your work, and try and stick to it. Where you divide tasks, make sure you clearly define points of articulation and interfaces between modules. As you form groups, please make sure that you can find a common time to meet. This is especially true for those who are part time students and hold jobs which will restrict your schedule. Please comment your code well -- it will help both you and us. You in figuring out code your partners have written, us in grading it. Also, use some form of revision control on your source tree. CS/UCS machines have systems such as CVS and RCS available for your use. This will help if lightening strikes, UPC fails and machines/disks crash, making your recent changes disappear! Please do create makefiles as well, or better still, help your instructor learn about ANTS.
References
1. CODA paper from the course web page
2. RUMOR paper from the course web page
3. Services such as Napster, Gnutella, morpheus etc.
4. PFS http://www.spa.is.uec.ac.jp/~tate/pfs/
4. Softupdates ideas in FreeBSD OS
5. FLAPPS (http://flapps.cs.ucla.edu/)
Last
modified: Tue Oct 02