Principles of Operating Systems

CMSC 421 — Spring 2020

Project 1

Due by 11:59PM EDT on ~~Sunday, March 15^th~~ Wednesday, March 18^th

Changelog:

March 12: Extended due date until March 18.

February 24: Fixed byte ordering issue with example XOR output.

February 23: Added note carifying bad pointer cases.

Introduction/Objectives

In this project, you will create a new version of the Linux kernel that adds various system calls that relate to interprocess communication. While the kernel does already provide IPC-related calls, we wish to have a bit more control over the process. To that end, you will be implementing a relatively simple message passing interface that can be queried asychronously by multiple processes. In addition, this message passing interface will support some basic access control mechanisms to ensure the proper functionality of the message system as an IPC tool.

Before you begin, be sure to create your new GitHub repository for Project 1 by using the link posted on the course Piazza page. Then, follow the same steps you did in Project 0 to clone this new repository (obviously, substitute project1 for project0 from the earlier instructions). Also, you may remove the /usr/src/vanilla-project0 directory and create a new /usr/src/vanilla-project1 directory with the newly checked out code.

As a first step, change the version string of the new kernel to reflect that it is for Project 1. That is, make the version string read 5.5.0-cmsc421project1-USERNAME (substituting your UMBC username where appropriate).

Incremental Development

One of the nice things about using GitHub for submitting assignments is that it lends itself nicely to an incremental development process. As they say, Rome wasn't built in a day — nor is most software. Part of our goal in using GitHub for assignment submission is to give all of the students in the class experience with using an source control system for incremental development.

You are required in this project to plan out an incremental development process for yourself — one that works for you. There is no one-size-fits-all approach here. One suggested option is to break the assignment down into steps and implement things as you go. For instance, the locking/thread safety portions of the assignment can be easily added after the main functionality is implemented, in most cases. You are also encouraged to seek out the review of your TAs to determine whether an approach might be feasible.

You should not attempt to complete this entire project in one sitting. Also, we don't want you all waiting until the last minute to even start on the assignment. Students doing either of these tend to lead to getting poor grades on the assignment. To this end, we are requiring you to make at least 4 non-trivial commits to your GitHub repository for the assignment. These four commits must be made on different dates and at least one must be done before Wednesday, March 4^th at 11:59 PM EST. You may make more than four commits during the timeline of the project — four is simply a minimum number required for full credit. In addition, you must have made a reasonable attempt at implementing the basic operations on the data structure described later in this document by March 4^th. That is to say that we expect to see some version of code that is capable of initializing the IPC structure, inserting items into it, deleting items from it, and searching for existing items by March 4^th. This may be in the form of kernel-space code or in the form of a user-space prototype (as detailed below).

In addition, as it is significantly easier to build and test code in user-space than it is in the kernel, you are encouraged to build a user-space prototype of at least part of your system before attempting to implement it in the kernel. Please note that we are not suggesting you implement the entire project in user-space, but rather just some of the basic functionality. One approach that many previous students have found that works for this is to implement a prototype version of the data structure that you will be using in user-space. This will allow you to ensure that you have the basic algorithms for implementing insertion, deletion, search, etc. working without having to spend a long time waiting on kernel builds. Some portions of the assignment are not feasible to be done in a user-space prototype, as the interfaces for doing so are significantly different in user-space and in kernel-space (like locking, access control, and actual IPC), so we do not necessarily recommend building a full prototype of the whole system in user-space. If you do choose to implement a user-space prototype of your code, please place it in a directory called proj1proto in the root of the Linux kernel source code and be sure to tell us about it in your README.proj1 file (let us know what you did in the prototype, why you chose to do so, how things changed when you ported it over to the kernel, etc).

A non-trivial commit is defined for this assignment as one that meets all of these requirements:

Does not contain only documentation (i.e, just committing a README file does not count).
Does not contain only Makefile modifications or creation.
Modifies/creates at least 10 lines of code in a combination of existing or newly created .c or .h files. That is to say, creating a new file with 10 lines of code counts, but creating a new file with a 10 line comment does not.
Code modifications/creation must be relevant to the project. Creating a bunch of useless files/functions that are unrelated or otherwise superfluous to the assignment does not count. It is ok to reorganize your code after you have started and remove pieces of code, of course, but if you are obviously only adding code to the repository early on that you completely delete later (or that has nothing to do with the assignment), then that commit will not count toward the requirements herein.
Code implementing a prototype version of the assignment (for instance, a user-space version) does count, but any example code or anything else of that nature that you use in that prototype (for instance, a user-space version of the <linux/list.h> header file) does not count.

Failure to adhere to these requirments will result in a significant deduction in your score for the assignment. This deduction will be applied after the rest of your score is calculated, much like a deduction for turning in the assigment with a late penalty.

Access Control

Only the root user may call the creation and deletion system calls below. If a regular user account attempts to call any of the prohibited system calls, then they must be given an error indicating that they have been denied that permission such as -EPERM.

Encryption

Your IPC mechanism will use one of two simple encryption algorithms. The first of these is a simple XOR Cipher. The second encryption algorithm you will use is known as the Extended Tiny Encryption Algorithm (XTEA). At mailbox creation time, one of these two algorithms will be selected (by the user creating the mailbox) and that algorithm must be used for all messages added to the mailbox.

The XOR cipher provides minimal security, but is a very simple algorithm to understand. You will be implementing an XOR cipher with a 32-bit key (and thus a 32-bit block length).

The Extended Tiny Encryption Algorithm is a simple encryption algorithm that has a key length of 128 bits and operates on blocks of data that are 64 bits in size. As messages can be of arbitrary size in your mailbox system, you will have to apply the XTEA algorithm in a loop in order to process messages larger than 64 bits (8 bytes) in size. To deal with messages that are of a size not divisible evenly by 8 bytes, you should pad the end of the message with zero bytes (you should store the original length with your messages so as not to ever reveal this padding to programs using your system calls).

Each message in the mailbox will have a separate key that is passed in to the kernel when the message is input into the system. In order for the data to come out in the clear, the same key must be passed in when retrieving the message. You must not store this key along with the message in your IPC system's data structures in the kernel. If the mailbox is using the XOR cipher, the key will be passed in as a pointer to a 32-bit integer. If the mailbox is using XTEA, the key will be passed in as an array of 4 32-bit integers. In either case, the key will have to be read from the user space pointer provided before use.

A somewhat more brief explanation of how an XOR cipher works (geared directly to your project) is provided here. Here is an example of how the cipher works using real data:

Data Passed in (6 bytes): 0xDE 0xAD 0xBE 0xEF 0x12 0x34
Data to encrypt (with padding): 0xDE 0xAD 0xBE 0xEF 0x12 0x34 0x00 0x00
Key: 0x1BADC0DE
Data stored: 0x00 0x6D 0x13 0xF4 0xCC 0xF4
Data computed (with padding): 0x00 0x6D 0x13 0xF4 0xCC 0xF4 0xAD 0x1B

As for why this looks a bit odd, this is a result of the little endian byte ordering of the x86 CPU. If you attempt to do your encryption byte-by-byte instead of in 32-bit increments, you will need to adjust for this. If you do your encryption in 32-bit increments you will not need to adjust for endianness in order to match the output shown above. Your encrypted data must match for the example shown above (and in similar situations) in order to receive full credit.

You must write the code to implement this cipher yourself in your kernel code. Your algorithm should encrypt and decrypt in 32-bit blocks, which means that you must either pad messages out to 32-bit increments while encrypting (optimally) or you must do your encryption/decription on byte-sized increments (which will potentially incur a performance penalty).

You may use the code included below in order to implement the XTEA algorithm in your assignment (which is adapted from the public domain source code presented in the Wikipedia article and the original XTEA reference code). These functions will need to be called for every 64-bits of your message data (so, you will have to call them in a loop).

/* Encrypt 64 bits of plaintext. Modifies the message in-place. */
static void xtea_enc(uint32_t *v, uint32_t const key[4]) {
    unsigned int i;
    uint32_t v0 = v[0], v1 = v[1], sum = 0, delta = 0x9E3779B9;

    for (i = 0; i < 32; i++) {
        v0 += (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + key[sum & 3]);
        sum += delta;
        v1 += (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + key[(sum >> 11) & 3]);
    }

    v[0] = v0;
    v[1] = v1;
}

/* Decrypt 64 bits of an encrypted message. Modifies the message in-place. */
static void xtea_dec(uint32_t *v, uint32_t const key[4]) {
    unsigned int i;
    uint32_t v0 = v[0], v1 = v[1], delta = 0x9E3779B9, sum = delta * 32;

    for (i = 0; i < 32; i++) {
        v1 -= (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + key[(sum >> 11) & 3]);
        sum -= delta;
        v0 -= (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + key[sum & 3]);
    }

    v[0] = v0;
    v[1] = v1;
}

You must make sure that all encryption and decryption steps take place completely within the kernel's memory space. That is to say that you should never attempt to encrypt or decrypt from or to a user-space buffer directly, nor should you attempt to use the key from user-space without copying it into the kernel's memory first. Doing any of these things would be a serious lapse in security!

New System Calls

You will add a few new system calls for managing mailboxes of IPC messages. The mailboxes and their contents all exist only in the Kernel address space. You will develop the system calls specified below in order to access the boxes and their contents by user processes.

Each mailbox should be identified by an unsigned long value, which is passed in at creation time. Each mailbox can store an "unlimited" number of messages, each of which can be of "unlimited" length. Messages are binary data and should not be treated as text strings — this means that messages may have zero bytes (also known as NUL terminators) embedded within them. To this end, you should not be using any string related functions on them (such as strlen() or strcpy()).

Each mailbox will store its messages in FIFO order. That is to say that each mailbox should be seen as a queue.

Please keep in mind that these functions may well be called from multiple different processes simultaneously. You must provide for appropriate locking to ensure concurrent access to these functions works properly.

As this code will be part of the kernel itself, correctness and efficiency should be of primary concern to you in the implementation. Particularly inefficient (memory-wise, algorithmic, or poor locking choices) solutions to the problem at hand may be penalized in grading. In regard to correctness, you will probably find that the majority of your code for this assignment will be spent in ensuring that arguments and other such information passed in from user-space is valid. If in doubt, assume that the data passed in is invalid. Users tend to do a lot of really stupid things, after all. Crashing the kernel because a NULL or otherwise invalid pointer is passed in will result in a significant deduction of points.

Any pointers that cannot be read or written for the amount that is required shall be reported back to the user with a bad pointer error (that is to say -EFAULT) and the call must not affect the state of the mailbox system in any way. For instance, if the user sends a message and says that it is 20000 bytes long, but you can only read 20 bytes of it successfully, you must report an error. The same goes for keys — if you can only read 3 bytes of a key for an XOR mailbox or 7 bytes of a key for an XTEA one (for instance), you must report an error to the user. This also applies in any recv/peek operations. A message must never be put into the system unless you can read the whole message. A message must never be removed from the system by a recv operation unless you can either read the entire message or the length specified by the user (whichever is less).

Finally, you are to implement this system on your own. The IPC systems within the kernel already will not be helpful to you in implementing this assignment. Also, kfifo will not be useful to you at all — seriously, don't confuse yourself by even looking at kfifo.

The signature and semantics for your system calls must be:

long create_mbox_421(unsigned long id, int crypt_alg): creates a new empty mailbox with ID id, if it does not already exist, and returns 0. If the crypt_alg parameter is 0, the mailbox's messages shall be encrypted with the XOR cipher described above. Otherwise, the messages shall be encrypted with the XTEA algorithm.
long remove_mbox_421(unsigned long id): removes mailbox with ID id, if it is empty, and returns 0. If the mailbox is not empty, this system call shall return an appropriate error and not remove the mailbox.
long count_mbox_421(void): returns the number of existing mailboxes.
long list_mbox_421(unsigned long __user *mbxes, long k): returns a list of up to k mailbox IDs in the user-space variable mbxes. It returns the number of IDs written successfully to mbxes on success and an appropriate error code on failure.
long send_msg_421(unsigned long id, unsigned char __user *msg, long n, uint32_t __user *key): encrypts the message msg (using the correct algorithm), adding it to the already existing mailbox identified. Returns the number of bytes stored (which shall be equal to the message length n) on success, and an appropriate error code on failure. Messages with negative lengths shall be rejected as invalid and cause an appropriate error to be returned, however messages with a length of zero shall be accepted as valid.
long recv_msg_421(unsigned long id, unsigned char __user *msg, long n, uint32_t __user *key): copies up to n bytes from the next message in the mailbox id to the user-space buffer msg, decrypting with the specified key, and removes the entire message from the mailbox (even if only part of the message is copied out). Returns the number of bytes successfully copied (which shall be the minimum of the length of the message that is stored and n) on success or an appropriate error code on failure.
long peek_msg_421(unsigned long id, unsigned char __user *msg, long n, uint32_t __user *key): performs the same operation as recv_msg_421() without removing the message from the mailbox.
long count_msg_421(unsigned long id): returns the number of messages in the mailbox id on success or an appropriate error code on failure.
long len_msg_421(unsigned long id): returns the length of the next message that would be returned by calling recv_msg_421() with the same id value (that is the number of bytes in the next message in the mailbox). If there are no messages in the mailbox, this shall return an appropriate error value.

Remember from Project 0 that system calls are defined in the kernel by way of using a SYSCALL_DEFINE macro. So, for instance, the send_msg_421 syscall from the list below would defined as follows:

SYSCALL_DEFINE4(send_msg_421, unsigned long, id, unsigned char __user *, msg,
                long, n, uint32_t __user *, key) {
    /* Code goes here */
}

See the <linux/syscalls.h> file for more information about these macros.

Each system call returns an appropriate non-negative integer on success, and a negative integer on failure which indicative of the error that occurred. See the <errno.h> header file for a list of error codes. Suggested error codes for several error conditions are listed below (this does not necessarily cover all error cases you might encounter):

-EPERM: Permission denied
-ENOMEM: Out of memory during an allocation
-EFAULT: Supplied an invalid pointer or one that cannot be read/written for the entire requested length
-ENOENT: Invalid mailbox id specified
-EEXIST: Mailbox already exists on creation
-ENOTEMPTY: Mailbox not empty on deletion

The kernel must be very careful with memory access. Remember that users often don't properly error check their code very well and that malicious users do also exist. You should be very careful in your code to ensure that any pointers that come in from user-space (all marked with __user above are checked sufficiently. Also, you should ensure to not leak private information outside the kernel, such as encrypted messages. All of your encryption and decryption MUST occur in memory within the kernel's memory space — not in any user-space memory. Also, it is up to the user to provide the key for both encryption and decryption. You MUST NOT store the key with the messages in the kernel. If the user specifies a different encryption key for encryption and decryption, the user should expect to get a scrambled message back. Finally, 0 is a perfectly valid encryption key — mathematically it just won't change the message at all for the XOR cipher.

User-space driver program(s)

You must adequately test your kernel changes to ensure that they work under all sorts of situations (including in error cases). You should build one or more testing drivers and include them in your sources submitted. Create a new directory in the Linux kernel tree called proj1tests to include your test case program(s). Be sure to include a Makefile to build them and instructions on how to run them in a plain-text README file within this directory. Your README for the test programs should also describe your general strategy for testing the system calls. Remember that testing is one of the primary jobs of a developer in the real world!

It is strongly suggested that you additionally build a separate program for each system call to be implemented to simply call that system call with user-provided arguments. For the data to be sent as a message, you might consider allowing the user to specify a file of data to send or a string on the command line. These programs will likely prove to be invaluable in debugging.

Submission Instructions

You should follow the same basic set of instructions for submitting Project 1 that you did for Project 0. That is to say, you should do a git status to ensure that any files you modified are detected as such, then do a git add and a git commit to add each modified/newly created file or directory to the local git repository. Then do a git push origin master to push the changes up to your GitHub account.

Be sure to include not only your modified kernel files, but also your driver program files. The driver should go in a proj1tests directory, in the root of the kernel source tree. You must include a Makefile that can build your test program(s) in this directory as well. You should not attempt to add your test directory to the main kernel Makefile. Also, include a plain-text README file in this directory describing your approach to testing this project. Tell us what your testcases actually test, and why you chose to test those things. If your testcases are supposed to fail at any point, make sure to tell us that in the README (after all, you should not only test your code with good inputs, but with bad ones too — we'll do just that in our testcases).

You must also include a plain-text README.proj1 file in the root directory of the kernel source code that describes anything you might want us to know when we're grading your assignment. This can include an outline of how you implemented the requirements of the project, for instance. This is also where you should cite any references you have used for the assignment other than those given in this assignment description.

You should also verify that your changes are reflected in the GitHub repository by viewing your repository in your web browser.

References

Below is a list of references that you may find useful in your quest to complete this project:

The Linux Kernel API — documentation of the internal API for programming in the Linux kernel (Especially useful is the chapter on user-space memory access and the chapter on linked lists within the kernel). Please note that the user-space memory access chapter of the Kernel API refers to functions that start with __ as being functions that work "with less checking". The versions of these functions that do not do "less checking" do not have the underscores at the beginning. That is to say, if the document refers to a function called __copy_from_user, you should instead call copy_from_user to ensure that all appropriate error checking is done.
The Linux Cross-Reference (for version 5.5.0 of the kernel) — a cross-referenced copy of the Linux kernel source code for relatively easy searching
The Open Group Base Specifications Issue 7/IEEE Std. 1003.1 - 2008, 2018 Edition/POSIX.1-2008
The Unreliable Guide to Locking [in the Linux Kernel]
Kernel Korner: System Calls — old and outdated, but still demonstrates the concepts used for system calls

If in doubt, the Kernel API and Linux Cross Reference should be your ultimate guides.

What to do if you want to lose points on this project

Any of the following will cause a significant point loss on this project.

Excessive unnecessary changes made to the kernel sources.
Extraneous files are included.
Files are missing that needed to be modified.
Hello World system call included, or the system calls required are otherwise out of the order specified.
Failure to follow the requirements in the "Incremental Development" section of the assignment.

Please do not make us take off points for any of these things!

Last modified Thursday, 12-Mar-2020 17:09:56 EDT