Principles of Operating Systems

CMSC 421 — Spring 2021


Project 2

Due by 11:59:00PM EDT on Sunday, March 28, 2021

Changelog:

March 22: Additional guidelines for online research vs. plagiarism
March 6: Clarified what n and m are in the algorithmic performance requirements.
March 4: Exchange user-space list.h source
March 3: A) Syscall function complexity B) User-space version of linux/list.h
March 1: A) Add root requirement to reset. B) Clone instructions
February 28: Initial version

Introduction/Objectives

In this project, you will create a new version of the Linux kernel that adds various system calls that relate to interprocess communication. While the kernel does already provide IPC-related calls, we wish to have a bit more control over the process. To that end, you will be implementing a relatively simple message passing interface that can be queried asychronously by multiple processes. In addition, this message passing interface will support some basic access control mechanisms to ensure the proper functionality of the message system as an IPC tool.

Before you begin, be sure to create your new GitHub repository for Project 2 by using the link posted on the course Piazza page. Then, follow the steps listed below this new repository — they are slightly different than those from Project 0. The build instructions are the same, however starting from step 3 of the building the kernel instructions. Also, you may remove the /usr/src/vanilla-project0 directory and create a new /usr/src/vanilla-project2 directory with the newly checked out code.

Clone Instructions
  cd /usr/src
  git clone git@github.com:umbc-cmsc421-sp2021/linux.git vanilla-project2
  cd vanilla-project2
  git remote remove origin
  git remote add origin git@github.com:umbc-cmsc421-sp2021/project-2-yourGITHUBusername.git
  git push origin main
  cd /usr/src
  cp -ra vanilla-project2 project2

As a first step, change the version string of the new kernel to reflect that it is for Project 2. That is, make the version string read 5.5.0-cmsc421project2-USERNAME (substituting your UMBC username where appropriate).

You must complete this project in the VM that you set up earlier in the semester!

The code of the Linux kernel is written in C89/C90. This means that certain features, like // comments, for loop initial variable declarations (for(int i = 0; i < 10; ++i) -- specifically the initial declaration of int i there), variable declarations in the middle of a scope, etc. are not available. You will get warnings and/or errors for such things in your code. You must clean up any warnings/errors before you turn your code in or ask for help from the TAs with it!

Incremental Development

One of the nice things about using GitHub for submitting assignments is that it lends itself nicely to an incremental development process. As they say, Rome wasn't built in a day — nor is most software. Part of our goal in using GitHub for assignment submission is to give all of the students in the class experience with using an source control system for incremental development.

You are required in this project to plan out an incremental development process for yourself — one that works for you. There is no one-size-fits-all approach here. One suggested option is to break the assignment down into steps and implement things as you go. For instance, the locking/thread safety portions of the assignment can be easily added after the main functionality is implemented, in most cases. You are also encouraged to seek out the review of your TAs to determine whether an approach might be feasible.

You should not attempt to complete this entire project in one sitting. Also, we don't want you all waiting until the last minute to even start on the assignment. Students doing either of these tend to lead to getting poor grades on the assignment. To this end, we are requiring you to make at least 4 non-trivial commits to your GitHub repository for the assignment. These four commits must be made on different dates and at least one must be done before Wednesday, March 10th at 11:59:00 PM EST. You may make more than four commits during the timeline of the project — four is simply a minimum number required for full credit. In addition, you must have made a reasonable attempt at implementing the basic operations on the data structure described later in this document by March 10th. That is to say that we expect to see some version of code that is capable of initializing the IPC structure, inserting items into it, deleting items from it, and searching for existing items by March 10th. This may be in the form of kernel-space code or in the form of a user-space prototype (as detailed below).

In addition, as it is significantly easier to build and test code in user-space than it is in the kernel, you are encouraged to build a user-space prototype of at least part of your system before attempting to implement it in the kernel. Please note that we are not suggesting you implement the entire project in user-space, but rather just some of the basic functionality. One approach that many previous students have found that works for this is to implement a prototype version of the data structure that you will be using in user-space. This will allow you to ensure that you have the basic algorithms for implementing insertion, deletion, search, etc. working without having to spend a long time waiting on kernel builds. In addition, this will allow you to use gdb and valgrind on your prototype to fish out any bugs or memory problems before moving to the kernel where these tools are not readily available. Some portions of the assignment are not feasible to be done in a user-space prototype, as the interfaces for doing so are significantly different in user-space and in kernel-space (like locking, access control, and actual IPC), so we do not necessarily recommend building a full prototype of the whole system in user-space. Here is a version of linux/list.h that you can use in your user-space prototype, if you want to use the kernel's built-in linked list for that purpose. Just copy it into your /usr/src/project2/proj2proto directory (and make sure to add/commit it to github with the rest of your code): https://www.mcs.anl.gov/~kazutomo/list/list.h
If you do choose to implement a user-space prototype of your code, please place it in a directory called proj2proto in the root of the Linux kernel source code and be sure to tell us about it in your README.proj2 file (let us know what you did in the prototype, why you chose to do so, how things changed when you ported it over to the kernel, etc).

Here is a version of linux/list.h that you can use in your user-space prototype, if you want to use the kernel's built-in linked list for that purpose: user-space list

As noted, you may only copy list.h and list_hacks.h into your project (into your proj2proto make sure to add/commit it to github with the rest of your code). You are absolutely not allowed to copy/paste the example code, although you can reference it for an idea of how the list works. You must cite this reference in your README. DO NOT OVERWRITE THE KERNEL'S LINKED LIST WITH THIS VERSION

A non-trivial commit is defined for this assignment as one that meets all of these requirements:

Failure to adhere to these requirements will result in a significant deduction in your score for the assignment. This deduction will be applied after the rest of your score is calculated, much like a deduction for turning in the assigment with a late penalty.

Access Control

Only the root user may call the creation, deletion, and reset system calls below. If a regular user account attempts to call any of the prohibited system calls, then they must be given an error indicating that they have been denied that permission such as -EPERM.

New System Calls

You will add a few new system calls for managing mailboxes of IPC messages. The mailboxes and their contents all exist only in the Kernel address space. You will develop the system calls specified below in order to access the boxes and their contents by user processes.

Each mailbox shall be identified by an unsigned long value, which is passed in at creation time. Each mailbox shall store up to an "unlimited" number of messages, each of which shall be of up to "unlimited" length. Messages are binary data and shall not be treated as text strings — this means that messages may have zero bytes (also known as NUL terminators) embedded within them. To this end, you MUST NOT use any string related functions on them (such as strlen() or strcpy()).

Each mailbox shall store its messages in FIFO order. That is to say that each mailbox shall act as a queue.

Please keep in mind that these functions may be called from multiple different processes simultaneously. You must provide for appropriate locking to ensure concurrent access to these functions works properly.

As this code will be part of the kernel itself, correctness and efficiency must be of primary concern to you in the implementation. Particularly inefficient (memory-wise, algorithmic, or poor locking choices) solutions to the problem at hand may be penalized in grading. In regard to correctness, you will probably find that the majority of your code for this assignment will be spent in ensuring that arguments and other such information passed in from user-space is valid. If in doubt, assume that the data passed in is invalid. Users tend to do a lot of really bad things, after all. Crashing the kernel because a NULL or otherwise invalid pointer is passed in will result in a significant deduction of points.

Any pointers that cannot be read or written for the amount that is required shall be reported back to the user with a bad pointer error (that is to say -EFAULT) and the call must not affect the state of the mailbox system in any way. For instance, if the user sends a message and says that it is 20000 bytes long, but you can only read 20 bytes of it successfully, you must report an error. This also applies in any recv/peek operations. A message must never be put into the system unless you can read the whole message. A message must never be removed from the system by a recv operation unless you can either read the entire message or the length specified by the user (whichever is less).

For all functions and algorithmic complexities listed below, you may assume that the handling of bytes in a message can be done in constant time, regardless of the message length (that is to say, essentially ignore any memory copies, etc. in determining if you meet the requirements). In all cases, n is the number of mailboxes and m is the number of messages in a mailbox. Also, remember that these are just bounds on the average/amortized case performance. You may write code that performs better than what we have given as the bounds, if you so choose.

Finally, you are required to implement this system on your own. The IPC systems within the kernel already will not be helpful to you in implementing this assignment. Also, kfifo will not be useful to you at all — seriously, don't confuse yourself by even looking at kfifo.

Your new code for your system calls shall be placed within one or more .c files in a proj2 directory in the root of your kernel sources (so if you refer back to Project 0, you shall create a proj2 directory instead of a hello one). All of your system calls may be implemented in a single .c file — you do not have to create multiple files to implement your system calls if you do not wish to do so.

The signature and semantics for your system calls must be:

Remember from Project 0 that system calls are defined in the kernel by way of using a SYSCALL_DEFINE macro. So, for instance, the send_msg_421 syscall from the list below would defined as follows:

SYSCALL_DEFINE3(send_msg_421, unsigned long, id, const unsigned char __user *, msg,
                     long, n) {
    /* Code goes here */
}

See the <linux/syscalls.h> file for more information about these macros.

Your system calls must be added to the kernel's system call table in the order we have provided them above. If they are added in a different order to the system call table, you will fail most if not all of our testcases that we run on your code (yes, we will be running our own tests, not just the ones you provide).

You may add additional system calls to the kernel if you find it useful to do so. if you do, they must be added to the system call table after the ones we have required. Also, you can have utility functions in the kernel without making them system calls! System calls must only be made for code that is designed to be called from user-space directly, not from some other portion of the kernel.

Each system call returns an appropriate non-negative integer on success, and a negative integer on failure which is indicative of the error that occurred. See the <errno.h> header file for a list of error codes. Appropriate error codes for several error conditions are listed below (this does not necessarily cover all error cases you might encounter, please detail any further error codes you choose in your README.proj2 file, justifying each one):

The kernel must be very careful with memory access. Remember that users often don't properly error check their code very well and that malicious users do also exist. You should be very careful in your code to ensure that any pointers that come in from user-space (all marked with __user above are checked sufficiently. Also, you should ensure to not leak private information outside the kernel, such as any kernel pointer addresses.

For the print_mbox_421() system call above, we have asked you to print out a hexadecimal dump of the messages in the mailboxes. The reason we have done this is because messages are explicitly described as not being strings, thus printing the messages out as a textual string may well not produce very useful results. To this end, you must print out each byte of each message with the "%02x" format specifier to get a hexadecimal representation of each byte. That means, you will need to do printk() in a loop, printing each element. You should place a space between each byte in the message, and only print a maximum of 16 bytes on one line. For an example of what this might look like for a mailbox with two messages, see below:

00 01 02 03 04 05 06 17 28 09 1a 7b 8c 3d 9e af
77 23 80 44 cd 7f
---
1b ad c0 de ca fe c0 ff ee

User-space driver program(s)

You must adequately test your kernel changes to ensure that they work under all sorts of situations (including in error cases). You must build one or more testing drivers and include them in your sources submitted. Create a new directory in the Linux kernel tree called proj2tests to include your test case program(s). Be sure to include a Makefile to build them and instructions on how to run them in a plain-text README file within this directory. Your README for the test programs should also describe your general strategy for testing the system calls. Remember that testing is one of the primary jobs of a developer in the real world!

It is strongly suggested that you additionally build a separate program for each system call to be implemented to simply call that system call with user-provided arguments. For the data to be sent as a message, you might consider allowing the user to specify a file of data to send or a string on the command line. These programs will likely prove to be invaluable in debugging. If you do implement such programs, place them in a directory called proj2drivers within the root of the kernel source tree, along with a Makefile that can be used to build them.

Submission Instructions

You should follow the same basic set of instructions for submitting Project 2 that you did for Project 0. That is to say, you should do a git status to ensure that any files you modified are detected as such, then do a git add and a git commit to add each modified/newly created file or directory to the local git repository. Then do a git push origin main to push the changes up to your GitHub account. Refer back to project 0 if you have any questions about how this should work.

Be sure to include not only your modified kernel files, but also your testing program files.

You must also include a plain-text README.proj2 file in the root directory of the kernel source code that describes anything you might want us to know when we're grading your assignment. This can include an outline of how you implemented the requirements of the project, for instance. This is also where you should cite any references you have used for the assignment other than those given in this assignment description.

You should also verify that your changes are reflected in the GitHub repository by viewing your repository in your web browser.

To sum up what you are expected to turn in for this assignment:

Do not turn in any compiled versions of code or other build artifacts!

References

Below is a list of references that you may find useful in your quest to complete this project:

If in doubt, the Kernel API and Linux Cross Reference should be your ultimate guides.


Online Research vs. plagiarism

Students are allowed to research operating system concepts online and use the knowledge to write their own code. Any sources must be cited in the readme file.

There are many ways to use different syntax or different logic to achieve the same semantics. Students must create their own code. Any submissions that contain similar code between two or more students, or similar code to an online source will be considered plagiarism.
These cases will be referred to the office of academic integrity.

What to do if you want to lose points on this project

Any of the following will cause a significant point loss on this project.

Please do not make us take off points for any of these things!