Principles of Operating Systems

CMSC 421 - Spring 2017


Project 2

Part I: Due by 11:59PM EDT on Thursday, April 27, 2017

Part II: Due by 11:59PM EDT on Tuesday, May 16, 2017

Changelog:

Sunday, May 14 — Compression/Encryption described

Friday, April 28 — Part II - Initial version

Sunday, April 23 — Added directory of sample FS images

Friday, April 21 — s/block block/disk block/

Friday, April 21 — Extended due dates

Monday, April 18 — Fixed size of reserved block at 0xB0.

Thursday, April 13 — Part I - Initial version

Introduction/Objectives

In this project, you will take a deeper look at kernel-level code in the Linux kernel and produce a modified version of a driver within the kernel. This project will focus on modifying a filesystem driver to add additional features to the filesystem that it implements. In addition, this project is designed to give you experience with modifying/maintaining a piece of code that was written by someone else, which is an essential tool in the world of a computer programmer.

This project will involve creating a new filesystem driver, which we will call FAT64. This new filesystem driver will be based on a modified version of the code in the Linux kernel that implements the FAT series of filesystems (which were popularized by Microsoft® OSes). You will be extending the FAT32 filesystem with a number of new features (some of which are optional and for extra credit). You will also work to develop a user-space program to create a new FAT64 volume.

Before you begin on each part of the project, you will have to create the GitHub repository for that part (there will be two parts, each with their own repositories). The links to create each repository will be posted on the course Piazza page, as with previous projects. Unlike previous projects, this project will not require you to make any changes directly in the Linux kernel — the filesystem driver you develop in Part II will be done as a loadable kernel module. However, you will still be basing the code on the same version of the kernel used in previous projects. This will be covered in more detail in Part II of the assignment.

To begin, we recommend that you create an appropriate amount of space on your VM to aid in development. At this point, you are unlikely to have received a final grade on Project 1, so we do not recommend deleting your Project 1 code from your VM, however, it would be very helpful to run the make clean command in your /usr/src/project1 directory to remove all the built components of the kernel for that version. You should not delete your /usr/src/vanilla-project1 directory, as that will be the base kernel for your kernel module in Part II!

Incremental Development

One of the nice things about using GitHub for submitting assignments is that it lends itself nicely to an incremental development process. As they say, Rome wasn't built overnight — nor is most software. Part of our goal in using GitHub for assignment submission is to give all of the students in the class experience with using an source control system for incremental development.

You are encouraged in this project to plan out an incremental development process for yourself — one that works for you. There is no one-size-fits-all approach here. You should break the project down into steps that you find logical, and work on the project over time, committing early and often.

As this project is broken into two parts (with Part II not revealed until Part I is due), you cannot complete this entire project in one sitting. We don't want you all waiting until the last minute to even start on the assignment. Waiting until the last minute or attempting to do everything at once leads to all kinds of troubles (including poor grades on the assignment and issues that prove difficult to debug arising in your code). To this end, we are requiring you to make at least 6 non-trivial commits to your GitHub repositories for the assignment. These six commits must be made on different dates. We will total the number of commits you make to the two repositories you will create for this assignment. Also, keep in mind that Part I of the assignment is due much earlier than Part II!

A non-trivial commit is defined for this assignment as one that meets all of these requirements:

Failure to adhere to these requirments will result in a significant deduction in your score for the assignment. This deduction will be applied after the rest of your score is calculated, much like a deduction for turning in the assigment with a late penalty.

The FAT64 Filesystem

The filesystem you will design in this project will follow a very specific format, to ensure that all students should be able to design code that is interoperable with each other. That filesystem is described below. Note that in many parts of this description, we defer to how things are handled in FAT32. Thus, it is imperative that you consult the reference links at the end of the project and build a good understanding of FAT32 to complete this project!

The FAT64 filesystem is an extension of the FAT32 filesystem, breaking many of the boundaries that exist in that filesystem. For instance, file sizes are represented as 64-bit unsigned integers, making the maximum file size on the filesystem an astounding 264 - 1 bytes. This is much larger than any hard drives that are commercially available today, and is large enough that you will inevitably hit many other limits before that one becomes a problem. In addition, a full implementation of FAT64 supports many other useful extensions that make the filesystem much more aligned with international standards than what FAT32 is.

Here is a brief overview of the new features in FAT64 that do not exist in the FAT32 filesystem:

A * represents a feature that is optional (extra credit) for this project.

The FAT64 Superblock Structure

The FAT64 filesystem features a new superblock structure that replaces the BPB of the older FAT filesystems. This superblock does not make any attempt to maintain any sort of backwards compatibility with FAT32 or earlier filesystems. In this way, we ensure that it is not accidentally mistaken for any of those filesystems. The FAT64 superblock is stored starting at byte 1024 of the filesystem (meaning the first two sectors of the filesystem on a disk with 512-byte sectors are ignored). The first 1024 bytes of the volume are completely ignored, and thus can store things like boot code and other useful data. The FAT64 superblock itself is 1024 bytes long (thus it takes up 2 disk sectors). The format of the FAT64 superblock is described below. All multi-byte integers in this table should be stored in little-endian (x86-style) byte order and offsets are relative to the start of the superblock.

Offset Size
(bytes)
Description
0x00 8 Filesystem Signature - ASCII Characters -FAT-64-
0x08 4 Size of each logical disk block, in bytes (must be a power of 2 greater than or equal to 512, generally 512)
0x0C 4 Number of logical disk blocks per cluster (must be a power of 2)
0x10 8 Number of blocks in the filesystem
0x18 2 Filesystem major version (currently 1)
0x1A 2 Filesystem minor version (currently 0)
0x1C 4 Filesystem flags (detailed below)
0x20 8 Cluster number of the root directory (usually 2)
0x28 2 Number of copies of the FAT (usually 1 or 2)
0x2A 2 Logical block where the backup superblock is stored (usually 4)
0x2C 2 Number of reserved sectors at the beginning of the volume (usually 32)
0x2E 2 State of the filesystem (detailed below)
0x30 8 Number of logical blocks per copy of the FAT
0x38 8 Reserved for future extensions, set to 0
0x40 16 Filesystem UUID (May be any version of UUID, easiest to do version 4, variant 1)
0x50 16 Filesystem name (UTF-8)
0x60 8 UNIX timestamp of the last time the volume was mounted
0x68 8 UNIX timestamp of the last time the volume was written to
0x70 4 Number of times the filesystem has been mounted since creation
0x74 4 Reserved, set to 0xFFFFFFFF
0x78 8 Creating tool/OS (ASCII characters, usually "mkfat64")
0x80 8 Number of free clusters
0x88 8 Last allocated cluster, to aid in finding free clusters
0x90 4 Number of clusters to pre-allocate for new directories, if enabled (at least 1, at least 32 KiB of space should be pre-allocated, so this may be greater than 1 if cluster size < 32 KiB)
0x94 4 Compression algorithm used, if enabled (detailed below)
0x98 4 Encryption algorithm used, if enabled (detailed below)
0x9C 4 Reserved for future extensions, set to 0
0xA0 8 Logical block where the FAT encryption key + nonce is stored
0xA8 8 Logical block where the root directory encryption key + nonce is stored
0xB0 48 Reserved for future extensions, set to 0
0xE0 32 SHA-256 hash of the entire superblock (this field should be set to 0 for calculating the hash).
0x100 768 Reserved for future extensions, set to 0
Filesystem flags

The filesystem flags value in the superblock is a 32-bit bitfield for various filesystem flags. Here are the defined flags (all undefined bits must be set to 0):

Notes regarding optional parts of the assignment (you must read this section, even if you do not wish to do the optional portions)

For the features marked as optional for this assignment (compression and encryption), if you encounter a volume that has the flags set at mount time to turn these features on, you must return an error. In addition, if the user attempts to provide a volume key at mount time and you are not supporting encryption, you must return an error (you would likely get an invalid superblock anyway if you tried to mount the volume).

Filesystem state

The filesystem state variable is used to store whether the filesystem was cleanly unmounted the last time it was mounted. All bits should be set to 1 when the filesystem is mounted, and cleared to 0 when cleanly unmounted. If any bits are set in this field before mounting, the filesystem should be treated as potentially corrupted.

Compression algorithms

If enabled, this filesystem supports transparent file compression. For simplicity, only one algorithm is currently supported. This field should be set to the integer value 1 to indicate that the zlib Deflate algorithm is used to compress files, if compression is to be enabled. If you find any other values in this field at mount time, treat them as in error and do not attempt to mount the volume. If compression is not enabled in the filesystem flags, this must be set to 0.

Filesystem compression is an optional part of this assignment. If you choose to not support it, then return error if compression is enabled in the filesystem flags.

Encryption algorithms

If enabled, this filesystem supports transparent encryption of the full volume. Two algorithms are to be supported, detailed below. If you find any other values in the encryption algorithm field of the superblock besides those detailed below, treat them as in error and do not attempt to mount the volume. This value applies only to encryption operations performed after the volume has been mounted. If the volume is encrypted, a volume key is required at mount time to decrypt the superblock and the other keys (FAT, root directory, and individual file/directory keys). The superblock, FAT key, and root directory key are always encrypted with AES-256-ECB. File and directory keys (other than the root directory key) are encrypted with AES-256-CTR in much the same way as file data is encrypted (as described below).

Valid filesystem encryption settings are as follows:

For encrypting the FAT, the nonce (stored in the same block as the key, immediately after it) is to be combined with the logical block number that is being examined in order to form the final counter value. For all other filesystem decryption operations, the nonce (once again, stored immediately after the key) is to be combined with the cluster number to form the counter.

Filesystem encryption is an optional part of this assignment. If you choose to not support it, then return error if the user attempts to mount a volume with a volume key or if encryption is enabled in the filesystem flags. In any case, you must reject any volume where the filesystem flags in the superblock indicate encryption is enabled if a volume key is not provided.

The FAT64 File Allocation Table structure

The FAT64 file allocation table follows the same format as that of earlier FAT filesystems, except that each entry is 64 bits in length. The end of chain marker is any value from 0xFFFFFFFFFFFFFFF8 - 0xFFFFFFFFFFFFFFFF. Bad clusters are those marked by 0xFFFFFFFFFFFFFFF7. The first two entries of the FAT follow the same format as previous filesystems. That is to say that FAT[0] = 0xFFFFFFFFFFFFFFF8 and FAT[1] = 0xFFFFFFFFFFFFFFFF. Parsing the FAT works exactly as it does in earlier versions, other than the difference in entry size. Please refer to the references below if you have any questions on that.

The FAT64 Directory structure

The FAT64 filesystem contains an improved directory entry format, which differs in an incompatible manner with earlier versions of FAT. Most notably, each directory entry is of variable size, depending on the length of the filename. Each directory entry must be aligned to a 4 byte boundary and no directory entry shall span the boundary between a logical filesystem cluster. The structure of a directory entry is described below. As the file name size is limited to 8-bits, the maximum length of a file name in FAT64 is 255 bytes. File names are not necessarily NUL terminated, unless it is needed for padding purposes! File names may contain any characters other than NUL bytes.

Offset Size
(bytes)
Description
0x00 2 Displacement to the next directory entry, in bytes
0x02 1 Length of the name field
0x03 1 File type (0 = regular file, 1 = directory)
0x04 2 UID of the owner of the file
0x06 2 GID of the file
0x08 8 Starting cluster of the file
0x10 8 Length of the file, in bytes
0x18 8 UNIX timestamp of the creation of the file
0x20 8 UNIX timestamp of the last modification of the file
0x28 8 UNIX timestamp of the last access of the file
0x30 2 File mode bits (described below)
0x32 2 Reserved for future expansion, set to 0
0x34 Variable File name, padded to a 4 byte boundary with NUL (0) bytes
The following fields are only included if filesystem encryption is enabled
Varies 0 or 4 bytes Padding (0 bytes) before the key, if needed to put the key at an 8-byte boundary.
Varies 16 or 32 bytes AES encryption key for the file or directory in question.
Varies 16 or 32 bytes Nonce used for encryption. The lower 64 bits of this value should always be zero to allow for concatenation of the counter.

The file mode bits are as described below:

These bits are exactly as they are defined for other filesystems, such as ext2.

If filesystem encryption is enabled, the key and nonce for the encryption algorithm is stored in the cluster chain of the file, at the first byte of the cluster chain. The key is encrypted with the main volume key, with AES-256-CTR, as described above. The file data is encrypted using the key that is decrypted at the start of the cluster chain, and takes up the entire first block of the cluster chain. Thus, file data starts at the second block in the cluster chain.

Part I - mkfat64fs

Part I of this project is to create the user-space program to create a new FAT64 filesystem. The GitHub repository you create for this part of the project will come pre-populated with a tool called mkfatfs.kos, which was written by one of the instructors. This tool creates FAT16 or FAT32 filesystems. This should provide a good reference for how to initialize a new filesystem. You may use this example code as a base for your new mkfat64fs tool if you wish, or you may simply remove it from your repository and create your own tool from scratch.

Regardless of how you choose to begin your tool, your tool must have the following funtionality for this part of the project:

Notably, for Part I of this assignment, you should not implement any of the optional parts of the filesystem. You should, for this part of the assignment, always create filesystems with the encryption/compression options turned off.

If you wish to implement support for passing additional parameters to your tool (for instance, to specify a name for the filesystem), you may do so. Please document any additional flags you accept in a README. However, if you add additional flags, they must be optional! We will be running your program as mkfat64fs filename.img and not passing any additional arguments. Your program must be named mkfat64fs, and should have a Makefile that builds it.

You should create image files to use with your mkfat64fs with the dd tool. For instance, to create an image file with 100,000 512-byte blocks, you could do the following:
dd if=/dev/zero of=filename.img bs=512 count=100000

Example filesystem images may be found in this directory. Use them to compare with your own generated filesystems to check them for consistency.

Part II - FAT64 Filesystem Driver

In Part II of this project, you will be developing a Linux driver to read and write data to a FAT64 volume. Filesystem drivers are a very important piece of an operating system, providing the tools that allow programmers and users to interact with storage devices in a simple manner. There are two primary ways to implement a filesystem driver in Linux, and you will be given the opportunity to choose for yourself which way you wish to implement support for FAT64 among those two methods.

The traditional way of implementing filesystem drivers in an operating system is through writing a kernel-space driver for the filesystem. This driver would be able to be built in to the kernel or built as a loadable kernel module (you will be writing a kernel module, if you choose this approach, but it would be entirely possible to embed the driver in the built kernel directly, if you chose to do so). This can often be a very complex thing to do, since you are still writing kernel-level code. However, this allows you direct, unfettered access to the hardware and gives the best performance of any methods of creating a filesystem driver.

A more recent method of implementing filesystems is through the Filesystem in Userspace (FUSE) interface. This allows programmers to add new filesystems, including completely virtual filesystems such as the LVFS system in use at NASA for certain datasets. The benefits of using FUSE over an in-kernel filesystem driver mainly come in ease of development. FUSE filesystem drivers are user-space programs — they are developed just like any other program. This gives a lot of flexibility in their design, but this comes at the cost of performance. FUSE filesystems are often orders of magnitude slower to access than in-kernel filesystems, since they have all of the overheads associated with user-space programs.

For Part II of the project, you will be given the opportunity to choose whether to develop an in-kernel filesystem driver (based on the FAT driver from the Linux kernel) or a FUSE driver based on a user-space library for reading and writing to FAT volumes. In either case, you will be adapting the existing code to support FAT64, rather than FAT12/FAT16/FAT32 as it already does. Also in both cases you are welcome to completely create your own driver without the starting code, but it is highly recommended that you do not do this.

Details on implementing the optional portions of the assignment (compression and encryption) will be released at a later date. For the time being, you should work on supporting the rest of the FAT64 functionality, including but not limited to supporting multiple file allocation tables and preallocation of directory structures.

Upon creating your repository on GitHub with the links provided on Piazza, you will have both the base code for an in-kernel filesystem and for the FUSE based option. If you wish to pursue using FUSE for your driver, you should install the libfuse-dev, libfuse2, and fuse packages in much the same manner as you installed the packages required for completing Project 0. You should be sure to include a README with your submission detailing which approach you took to the project and any other information you may wish to share with the TAs and/or instructors with regard to your assignment.

Developing a filesystem driver can be a very complex process. It is highly recommended that you start on this as soon as possible. After all, it's much better to finish early than to run out of time while working on a project at the end of the semester.

Compression Structure

If compression is to be enabled on a volume, it is enabled on all files on that volume, regardless of whether the compression is at all effective on those files. This is as a simplifying feature for ease of implementation. Directories shall never be compressed in any way.

Compressed files store the same information in the directory structure that uncompressed files do. The file size in the directory structure is the uncompressed length of the file.

Compressed file data has a small header attached to the beginning of the file, before the compressed data begins. This header is 8 bytes in length and contains the compressed size of the file in bytes. Files are compressed with the algorithm that is specified in the superblock (which, as of this version of the filesystem is always the DEFLATE algorithm, as defined in zlib).

Encryption Structure

If encryption is enabled on the volume, all directories and files are encrypted before they are stored on the disk. When this is the case, each file/directory is given a key that is generated randomly using some pseudorandom number generator that should be cryptographically-secure (so, do not use rand()). Encryption keys are to be stored in the directory entry for the file in question, along with the nonce values. The modifications needed to the directory entry structure are detailed in that section of the project description. Nonce values should also be generated from a cryptographically-secure random number generator.

As encryption keys are part of the directory structure themselves, they are encrypted along with the rest of the directory structure. This should keep the keys safe from prying eyes.

All encryption operations are done with symmetric encryption based on the Advanced Encryption Standard. If in doubt of how this works, please consult relevant documentation available on the Internet.

Nonces used for encryption shall be (key size - 64) bits in length. This allows the counter value to be simply concatenated to the nonce for use in the CTR mode of AES. The counter value, as it is either a block number or a cluster number, is always 64 bits in size, hence the limitation on the size of the nonce. For AES 128, the nonce shall be 64 bits in size, whereas for AES 256, it shall be 192 bits in size.

Cluster numbers used as part of the counter mode are absolute values — that is to say, they are the same values used to look up an entry in the global file allocation table.

Submission Instructions

Submission of this project largely follows the same procedures as for previous projects, with the main difference being that you will have two repositories to commit to. Please be sure to include a README file with each of your repositories, describing what is described in their section of this project document.

Notes on Extra Credit

Extra credit portions of the assignment cannot be used to make up for portions of the assignment that are completely unimplemented. That is to say, extra credit is for boosting the final grade of an assignment where a reasonable attempt was made to complete all the other features. This is not to say that everything must be implemented pefectly, but if an assignment doesn't even attempt to implement the directory structure of FAT64 and uses that of FAT16 or FAT32, then extra credit will not be assigned to make up for that (not that this would be really possible to do in the case of encryption, since it depends on new fields in the directory entry of FAT64).

The TAs judgements on what constitutes a reasonable attempt to implement the filesystem features will be considered final.

References

Below is a list of references that you may find useful in your quest to complete this project:

If in doubt, please refer to the Kernel API (project 1), Linux Cross Reference, Linux Kernel Module Programming Guide, and the Kernel Documentation page for programming questions.


What to do if you want to get a 0 on this project

Any of the following will cause a significant point loss on this project. Items marked with a * will result in a 0.

Please do not make us take off points for any of these things!


Last modified Tuesday, 01-Sep-2020 18:43:36 EDT