CMSC 421 (Fall 2001) : Project #2

Programming Project #2

CMSC 421, Fall 2001

Assigned: 8 Nov 2001
Due: Dec 5 2001 at 11:59 PM

Goals

With project 1 under your belt, you should now be comfortable modifying linux code in general, and adding system calls to linux in particular. This document will describes new functionality that we want you to add to the linux filesystem.

Most present day filesystems store the raw data directly on disk. This means that system administrators can see any data you store. In addition, the security of your data is tied to the security of the system as a whole. If miscreants can hack into the system as superuser, or can defeat the protection mechanisms of the OS, or physically steal the disk, then your data is compromised. One way to avoid this is to store the data on the disk in an encrypted format, with the decryption possible only with a key that you posses. This project asks you to create such an encrypted filesystem by layering the encryption/decryption process on top of the existing linux filesystems.

Mechanics, and what to hand in

The project can be done in groups of up to two people, although you are welcome to work alone. If a group works on a project, then in general we will assign a common score to both participants. Please make sure that the group is identified in the README you turn in, and that only one member of the group submits the project!

The submission instructions are the same as for the first assignment. You will turn in a patch file to the kernel, your test programs, any other changes you made, and a README describing your system. Please remember that the patch file should be generated after you remove any object files by using a make clean or make mrproper. Also, make sure you use the right options to diff. Remember that in addition to -rc options, you need to use -P if you have added any new files to the kernel distribution (which you almost certainly will in this case).

In addition, you will turn in a plain text file reporting your test, especially measurements of performance. We suggest you measure the time taken to read/write files of different sizes with and without encryption to figure out how much overhead the encryption process causes.

Your design documentation is due by 11:59 PM on 16 Nov 2001. We are enforcing this deadline to ensure that people don't leave the project until the last minute. You are, of course, welcome to visit either the faculty or TA office hours for help; however, one of the first things we'll ask for is your design documentation (unless you're asking for help with that...). You may make changes to your documentation before the full Project handin; however, the design portion of your grade will depend heavily on the design document you handin on November 16^th. We will review/grade and return this to you within a week.

Your design documentation, typically 3-5 pages for a project of this size, should include the basic design of your software (what modules will you write, what is there functionality, where will you make changes to the kernel etc.), a timeline, as well as details on the testing that you plan to do to ensure that your code works.

The assignment name to use with submit for the documentation is p2doc, and for the project code is p2code.

Specifics

We ask you to implement several new system calls

secopen( <parameters of open>,key) : This opens a file and returns the file descriptor that subsequent calls to secread , secwrite, and secclose will use. In addition to all the parameters needed for open, it also takes the decryption key. You will probably need to populate some kernel level data structure with the key information.
secread(<parameters of read>): This call will read the contents from the disk, decrypt them using the key specified on open, and return them to the user
secwrite(<parameters of write>): This call will encrypt the contents using the key specified in open, and write them to the disk
secclose(<parameters of close>): Same as close, but should clean up any data structures you created/populated in secopen.

Helpful Hints

By default, code for linux exists in/usr/src/linux. If you have multiple versions of the kernel, the code may exist in /usr/src/linux-version instead.
Recall that a system call is a software trap or interrupt. This means that when adding a new system call, you will need to update the system call table (i.e. the interrupt vector) with a new entry, and generate a stub that will be used by user programs. Look for files arch/i386/kernel/entry.S and include/asm/unistd.h, and look for the macro _syscalln , where n is the number of parameters in your call.
A kernel function is just like any other function. The header declaration has a minor difference -- the keyword asmlinkage precedes the declaration, e.g. asmlinkage void sys_foo(void).
There is a great deal of information available about encryption in linux. Things to check out include gpg, the linux encryption howto, and information about the international kernel patches (which add many crypto algorithm codes to the kernel, which you can then use. Most of these are easily discovered using search engines like google or yahoo.
For encryption, we suggest you use either Blowfish or 3DES algorithms. Beware though that many of these operate on fixed size blocks, so you will need to figure out how to covert random sized reads and writes into the blocks needed by these algorithms.
There are four similar implementations out there in the public domain. They mostly do more than what we ask you to do here, or do it in a different way. These include the CryptFS, CFS, TCFS, and the Encrypted Loopback filesystem. Your are welcome (in fact, encouraged) to read their design and documentation. When you do so, please identify any sources you used in your own design document, especially of you liked some approach they have taken and plan to use it yourself. DO NOT borrow without acknowledgment, and DO NOT borrow code at all. These will both be counted as plagiarism. The only code you are allowed to borrow as is from the net is the code for the encryption algorithms themselves.

Grading the Project

We suggest that you do the project in two phases. First, just add in the new functions without doing any complex encryption. Use something simple -- we suggest a substitution cipher, with the key indicating a shift. So a key of 4 will mean that A becomes E, B becomes F, ... Z becomes D and so on. Successfully completing this phase will entitle you to 75% of the implementation credit. The remaining 25% will come from using a "real" crypto algorithm such as Blowfish or 3DES.

There are several extra credit opportunities available, with the extra credit varying from 5 to 25 percent of the total. For a small amount of extra credit, encrypt not just the contents but even the names of the files. For greater extra credit, integrate your encryption with NFS or AFS; or allow the user to specify not just the encryption key but also the encryption algorithm on a per file basis. These are just examples -- you can discuss any other ideas you have Joshi to see if they would be suitable for extra credit. The intent of the grading for the project is not to differentiate among those students who do a careful design and implementation of the assignments. Rather, the grading helps us identify those students who (i) don't do the assignments or (ii) don't think carefully about the design, and therefore end up with a messy and over-complicated solution. Remember that you can't pass this course without at least making a serious attempt at each of the assignments. Further, the grading is skewed so that you will get substantial credit, even if your implementation doesn't completely work, provided your design is logical and easy to understand. This means that you should first strive to come up with a clean design of your project on paper. Second, don't try to add fancy features because some other group is!

The grading for the project will be as follows: 40% design, 50% implementation, 10% testing. We have structured the grading in this way to encourage you to think through your solution before you start coding, and realize that testing your implementation is an important part of any software development process. If all you do is to work out a detailed design for what you would do to address the assignment (and if the design would work!), but you write no code, you will still get almost half of the credit for the assignment. Conversely, if you implement correctly, but do not prove that by testing your code, you will still not be given complete credit. Tests should convince us of two things -- firstly that your implementation works and secondly how much overhead it adds to the file operations.

The implementation portion of the grade considers whether you implemented your design and provided documentation that the TA could understand. Part of being a good computer scientist is coming up with simple designs and easy to understand code; a solution which works isn't necessarily the best that you can do. Thus, part of the design and implementation grade will be based on whether your solution is elegant, simple, and easy to understand.

Rules for Collaboration

It is Ok for you to discuss general approaches with other groups. It is NOT OK to exchange solutions -- ideas or code. Please recall that academic dishonesty will be sternly dealt with.