Creating a UNIX shell

Due by 11:59 PM on Sunday, Sep 27

Changelog

September 13, 2020: Added clarification for the Makefile
September 11, 2020: Added examples at the end of the page
September 11, 2020: Added clarification for getenv

Introduction

In this assignment, you will be producing a simple *nix shell program. This assignment only requires a few very basic features of a shell, and leaves out much of the functionality that more advanced shells such as Bash include. That is not to say that this assignment will be easy — this is a 400-level course, after all. There are still several parts of this assignment that could trip you up, especially if you are not comfortable with lower-level C programming.

This project is designed to help you get a bit of a warm-up with C, including topics that will be of extreme importance going forward with your programming projects. This assignment is to be completed entirely in user-space — not within the Linux kernel source code and does not involve recompiling the kernel. You may complete this assignment outside of your VM for the class if you wish, however the TAs will be using a setup like your VM in order to grade the assignment (so you should at least compile/run your submitted assignment once in your VM to ensure it works before submitting it).

You are however not allowed to use the GL system (or any other shared environment) at UMBC to complete this assignment! You can get into real trouble for this!

This assignment must be completed in the C programming language (you can choose to use C89/C90/ANSI C, C99, or C11 (with POSIX extensions) as you see fit — please don't try to torture yourself with pre-ANSI/traditional/K&R C)

What is a shell?

At it’s core, a shell is a piece of software that will get a string like ls -la from the user, format it, then attempt to execute it. ls is an actual program located in /bin for instance. When you type ls in your shell, it finds that program and executes it. The shell itself does not execute any logic that says: for each file in directory: print filename or something like that. It just invokes ls to do it’s own thing. Yes modern shells have a ton of extra bells and whistles but you should not worry about them in this assignment. You do not have to create any GUIs, windows, or anything of the sort.

Requirements

For this assignment, you will only need to support a few very basic features of a full-fledged *nix shell. Specifically, you will need to have support for all of the following:

If run with no arguments, the shell shall present the user with a prompt at which they can enter commands. Upon the completion of a command, it should prompt the user for a new command.
If run with any arguments, the shell shall print an error message to stderr explaining that it does not accept any command line arguments and exit with an error return code (1).
The shell shall accept command input of arbitrary length (meaning you cannot set a hard limit on command length).
Parse command-line arguments from the user's input and pass them along to the program that the user requests to be started. The command to be called will be either specified as an absolute path to a program binary (like /bin/ls), as the name of a program that resides in a directory in the user's $PATH environment variable (like gcc), or as a relative path (for instance, if we are in the /usr directory, we could type bin/gcc as a command to run /usr/bin/gcc. In addition, your argument parsing code must properly handle escape sequences and quoting. That is to say that the input /bin/echo Hello\nWorld should be parsed into two pieces — the program name /bin/echo and one argument to that program containing the string Hello World with an actual newline character in place of the space (and no quotes).
The shell shall support reading environment variables with a built-in getenv command. This command shall accept a single command line argument which shall be the name of the environment variable that the user wishes to read. If more than one argument is provided, the command shall print an error message to stderr. Similarly, if no arguments are provided, you should print an error to stderr. If a single argument is provided that names an existing environment variable, the content of that variable shall be printed on a new line on stdout in the shell before returning to the normal prompt. If an argument is provided that does not name an existing environment variable, then a blank line shall be printed to stdout before returning to the normal prompt.
The shell shall support setting environment variables with a built-in setenv command. This command shall accept two command line arguments. The first argument shall be the name of the variable to set and the second shall be what value to set that variable to. The second argument shall properly handle escape sequences and quoting as needed. If a number of arguments other than two is provided to the setenv command, an error shall be printed on stderr. Otherwise, the command shall produce no output and continue with normal command parsing. No escape parsing shall be done on the first argument.
The shell shall support a built-in exit command. This command shall accept zero or one arguments. If provided with zero arguments, the shell shall exit with a normal exit status (that is to say, it shall exit with a status of 0). If provided with one argument, it shall attempt to parse that argument as an integer. If this parsing fails, the command shall be ignored and the shell shall prompt for another command as normal. If the parsing succeeds, the shell shall exit with a status of whatever integer the argument parses as. In either case of the shell exiting, it MUST clean up all memory it has allocated before exiting, along with ensuring that any child processes it has created have exited.
The shell shall not leak memory after it is done with it. The valgrind program can be your friend while debugging this program (unlike projects that are done in the kernel). We will also be using valgrind to test your implementation

You can find examples of multiple commands at the end of this page

You are not expected to support any of the following features:

Scripting control features (like if statements or loops)
Use of environment variables in commands (other than getting and setting them as described above)
Support for pipes (including stdin/stdout redirection)
Built-in functionality that is often part of a *nix shell (such as implementations of common utilities like cd), other than what has been outlined above
The ability to change directories or anything else of the like
Running programs in the background or resuming backgrounded programs

To sum up what you are expected to implement in this project:

Present the user with some sort of prompt at which the user may enter a command to execute
Parse out the program the user is attempting to call from its arguments and build an appropriate argument array which can be used to execute the program
Determine if the program specified is a built-in (getenv, setenv, or exit) and handle those functions without creating a new process or attempting to execute another program
If the program specified is not a built in, your shell must create a new process to execute the new program in, and pass in the correct arguments to one of the exec family of functions to execute the program with the arguments provided. Your shell then must wait for the newly created process to finish executing. Your shell must also handle the case in which a program cannot be executed properly and print out an appropriate error message on the stdout I/O stream
Once the specified built-in or program has been executed (or failed executing), your shell should prompt the user for another command to run (unless the shell has exited from the exit built-in command)

Dos and Don’ts

Dos

Here is a list of functions that are worth to take a look at. You don’t necessarily have to use all of them, depending on how you implement your shell:

fgetc
malloc
realloc
free
strtok_r
strchr
isspace
fork
exec (this is a whole family of functions)
fprintf
getenv
setenv

Additionally, you probably want to use some potentially useful functions that we have provided for this assignment. There are two files; utils.c and utils.h. Particularly useful in this code are the functions unescape (which removes escape sequences and quotes from strings) and first_unquoted_space which will tell you the location of the next space in the string that is not quoted or part of an escape sequence. You are not required to use this code in your shell if you would rather implement this part yourself. If you do use this code in your shell, be sure to add the file to your git repository, just like any other source code you write.

Don’ts

Your shell program is not allowed to use any external libraries other than the system’s C library. Do not try to use libraries like Readline. You will lose points for using external libraries to implement shell functionality! You are not allowed to use any of the following functions in the C library to implement your shell:

system (insecure and can lead to major problems)
scanf (this one is largely to save you trouble)
fscanf (ditto)
popen (there is no reason you should need this, since pipes aren't to be supported)
readline (in case this wasn't obvious from the above ban on external libraries)

You are not allowed to implement any of your shell’s functionality by calling on another shell to do the work. You must do the argument parsing and calling of programs in your own code!

A header file is NOT a library. In order to add an external library you have to link against it. So to link with the threading library for example, you would have to add -lpthreads to your build command. So as long as you are not adding an -lsomething in your build, or copying code from an external library into your code you should be ok.

Submission

When submitting your shell program, please be sure to include the source code of the shell program (in one or more C source code files), as well as a Makefile that can be used to build the shell. Your shell must be able to be built and run on a VM as has been set up for this course in your projects. Also, you should include a README file describing your approach to each of the requirements outlined above. Additionally, your program must be compiled to a binary called simple_shell with the Makefile you provide.

If you would like a template for use as a Makefile for your shell, we have provided one here: Makefile.

If you use this Makefile do NOT add any .h files in the SRCS section

To submit your project, you must first accept the assignment on GitHub. The link to do so is posted on the course Piazza. Once you have done that, make sure that your project files (any source files and your Makefile) are in their own directory, then run the following commands in that directory (substituting the list of files you need to commit for your_files_go_here and your GitHub username for username, of course):

git initYou only do this once. Not every time
git remote add origin git@github.com:umbc-cmsc421-fa2020/project1-username.gitYou also do this only once.
git add your_files_go_here
git commit
git push origin master

You only need to submit all the .c and .h files, as well as the Makefile. Yes, that includes the util files we gave you. We do not need all the compilation artifacts.

Hints

The code that is provided to you is very useful. It is highly suggested that you use it in your shell.

The first_unquoted_space function provided can greatly ease the work of parsing a string into arguments. For instance, on the input /bin/echo "Hello World", the function would return 9, which is the index of the first space. If run on the remainder of the string after that space, it would return -1, telling you that there are no further spaces in the string that are not quoted.

The unescape function allocates memory. If it returns non-NULL, you must free the value that it returns when you no longer need it. Additionally, the second argument to unescape should probably always be stderr.

Yes, you can and will lose points on this assignment for memory leaks. A shell is intended to be a long-running program and thus it is very important not to leak memory. Also, this is meant to provide practice for your later projects in the course, where memory leaks can be very problematic.

Examples

Here are a few commands that you can use and expand on. Please note that these do not test every edge case:

ls
/bin/ls -la
setenv MESSAGE "Hello, world"
getenv MESSAGE
setenv MESSAGE Hello,\ \"Lawrence\",\ How" are you today?"
getenv MESSAGE
getenv PWD
echo \x48\151\x20\157\165\164\040\x74\x68\x65\x72\x65\041
echo Goodbye, \'World\'\a

And these are the expected results of running those in simple_shell. Obviously, there will be some differences when you run them on your machines, such as usernames and directories: