UMBC CMSC421

UMBC | CSEE | CMSC421 | Fall 1999 (Section 0101)

Projects

CMSC 421, Section 0101 (Fall 1999)

There will be four projects assigned this semester, each one due 2-3 weeks after it is assigned. Due dates are listed on the class news and notes page as well as on the schedule and each individual project description:

General Information on `dlxos`

What is `dlxos`?

The projects for this course require you to build a real operating system and then to experiment with it. Our base is a very simple, but functional, operating system called dlxos. The system was written at UMBC, and is based on the DLX instruction set and computer described by Hennessy & Patterson. Over the course of the semester, your job will be to improve the functionality and performance of dlxos.

As far as possible, the assignments are structured so that you will be able to finish the project even if all of the pieces are not working, but you will get more out of the project if you use your own code. An interesting aspect of building this operating system is that you get to ``use what you build'' - if you do a good job in designing and implementing the early phases of the project, it will simplify your task in building later phases.

DLX cross-compiler & assembler

The first piece of your toolkit is a cross-compiler and assembler that translates programs written in C (C++ doesn't work yet) into a format that the DLX emulator can load; this format is described below. The compiler is a version of gcc, so code that works on "generic" GNU C compilers will work here. However, there's one difference. In dlxos, as in the real world, calls to the C library (libc) and to other libraries don't work. We've provided a few calls for you, but most "standard" libc calls don't work. In particular, a simplified version of printf is provided as an emulator trap, and memory allocation (malloc/new/delete) don't work.

This lack of memory allocation means that you'll have to preallocate any structures your operating system uses or write (borrow) your own memory allocator. We suggest that you simply preallocate a pool of whatever structures you need - it's the method commonly used in many operating systems.

The DLX C compiler is located in /afs/umbc.edu/users/e/t/etmiller/pub/dlx/bin/gcc-dlx. It uses files that have been hardcoded to live under /afs/umbc.edu/users/e/t/etmiller/pub/dlx, so it won't necessarily work if you copy it off the gl cluster to your machine at home. However, you can add a soft link to this file (using ln -s) or make an alias to it, and all will be happy.

You use the DLX C compiler much as you would any "normal" C compiler, but with a few small changes. First, you must use the -mtraps option to the compiler to tell it not to expect to find the C library. Second, you shouldn't use the -g flag to produce debugging output. Third, it's a good idea to always use -O3 to produce faster code. A sample compile line looks like this:
gcc-dlx -mtraps -O3 -c synch.c
This will compile the file synch.c into synch.o, which is just a DLX assembly file. You can link together several files using:
gcc-dlx -mtraps -O3 synch.o process.o -o os.dlx
This will combine the code in synch.o and process.o into os.dlx, which is (also) a DLX assembly file. You can then assemble the file into an "object file" that can be loaded the by DLX simulator using:
dlxasm -i _osinit -l os.lst os.dlx
This produces a simulator-loadable file in os.dlx.obj, and a listing file in os.lst. The listing file isn't necessary, but may be useful for debugging. The -i option tells the assembler that the first routine to be executed should be _osinit, rather than the default _main.

The assembler takes other options as well, summarized in this table:

-output outputfile Produce a simulator-loadable file in outputfile rather than the default, which is the input file name with .o appended.

-list listfile Create a listing file in listfile. This listing file may be very useful when you're debugging your code. It's an assembly language listing with addresses and instructions.

-sym symfile Produce a symbol table list into symfile. The table is sorted alphabetically by symbol name. It can be sorted into numerical order using the UNIX sort command.

-init initroutine Tell the assembler to use initroutine as the procedure to call first when running the code. This doesn't change the code itself, but does change the line in the output file that tells the simulator where to start execution.

-debug Turn on debugging for the assembler. The code output isn't changed.

The options may be abbreviated by using just the first letter of the option.

Examples of how to use the compiler and assembler may be found in the Makefile supplied with the first project.

IMPORTANT: The tools (compiler & emulator) only run on machines using the Linux operating system. There are many PCs on campus that can be dual-booted with either Linux or Windows NT; we suggest you use one of them. There's also a Linux server on campus — linux.gl.umbc.edu — but it's likely to be heavily loaded, so we strongly suggest you log into a workstation directly.

DLX object files

DLX object files are actually just assembly language files that have to be assembled by dlxasm. dlxasm produces a file that contains binary data in an easy-to-read text-based format. A piece of a sample file is shown below:

start:000088a8 00013830 00001000 00007c00 00009000 0000a830
00009000:3a207472
:6170732e
:632c7620
:312e3220
:31393939
:2f30332f
:30372032
:313a3238
:3a303520
:656c6d20
:45787020
:656c6d20
:303030
00001000:afbefffc
:001df020
:afbffff8
:2fbd0010
:afa20000

The first line of the file provides six numbers:
<start location> <highest address in file> <text start address> <text length> <data start address> <data length>
All of these numbers are in hexadecimal. The "start location" is the address at which the simulator will start executing this file. The other numbers are self-explanatory — keep in mind that "text" is the same thing as "program code".

The following lines in the file (as many as necessary) are in the format
<address>:<data>
If the address is missing, the data in the line immediately follows the data in the preceding line. For example, in the example above, the word 0x3a207472 is stored at location 0x9000, and 0x6170732e is stored at 0x9004.

There's sample code to read object files in process.c. You'll need this information for Project #2, but can safely ignore it until then.

DLX emulator

The second part of your toolkit is a software emulator for the DLX instruction set. This emulator completely defines the computer that your operating system will run on. This includes both the instruction set as well as the actions of hardware - page faults, interrupts, and physical devices such as a disk and the console. You aren't allowed to modify the emulator to do things that might make your life as an OS programmer easier, though you can fix bugs and borrow code from it if you like.

The emulator takes the following options:

-x execfile Load execfile into the simulator's memory, and execute it (jump to the start address) when the simulator runs. If this option is specified more than once, only the last file specified actually is run (though the rest are loaded).

-l loadfile Load loadfile into the simulator's memory. This file will not be executed, though code in it may be called from the file specified with -x.

-s startaddr Specify startaddress as the starting address for the simulator to execute at. This option overrides the address specified by a file loaded with -x. The starting address is that provided by the last -s or -x option on the command line.

-m memorysize Set the memory size of the simulated DLX processor to memorysize. The default memory size is 4 MB.

-k stackaddr Set the initial stack address of the simulated DLX processor to stackaddr. The default stack address is the top of memory (normally 4 MB, but the stack address will adjust automatically if the -m option is used).

-t instrexectime Make instrexectime the simulated time (in microseconds) taken to execute a single DLX instruction. This only has an effect on operations that involve simulated time, such as timer interrupts and the length of time that disk I/Os require. It's not necessary to make this number match the actual time it takes to simulate one DLX instruction. Default time is 1 microsecond per instruction.

-a arg1 arg2 ... Pass the remaining arguments on the command line to the program being simulated by the DLX simulator. The arguments aren't parsed in any way - they're passed the same way that UNIX passes arguments to user programs. In other words, argc and argv for your operating system will be set correctly.

-I Turn on instruction tracing. Print a list of the addresses at which instructions are executed. The list is in the form addr:numinstrs, which means that the simulator executed numinstrs consecutive instructions starting at address addr.

-M Turn on memory access tracing. Print the operation type (instruction), address and value moved for each load and store executed by the simulator.

-F tracefile Make tracefile the file to which instruction and memory traces are written. The default is standard output, which may also be specified by "-".

-D debugstr Turn on debugging in the simulator for the options listed in debugstr. This is only available if the simulator is compiled with debugging code enabled (and the default version you'll be using isn't). Normally, the simulator should be run without debugging code because the debugging code slows things down even if it's not turned on (all those debugging checks).

The options to list memory accesses and instruction traces can be useful for debugging. Feel free to use printf() for debugging as well.

Using the DLX tools off-campus

The emulator and compiler/assembler should be capable of running on any Unix-like workstation. We've tested them under Linux (both on-campus and off-campus), but can't make guarantees for any other operating system. The dlxos code, however, runs only in the emulator, and will thus run anywhere the emulator can run.

It's important to realize that while you run dlxos on top of this emulation as a user program on UNIX, all of the code you write is exactly the same as if dlxos were running on bare hardware. The emulator runs as a user program for convenience: multiple students can run dlxos at the same time on the same physical machine. These same reasons apply in industry - it's usually a good idea to test out system code in a simulated environment before running it on potentially flaky hardware.

In real life, you are not allowed to throw out a running machine and ask for a CPU with different features before your code will work. Thus, you are not permitted to change any of the CPU emulation code, although you are permitted to change any of the dlxos code that runs on top of the emulation. dlxos is coded in C; if you know C++, you should have little trouble with the language. If you don't know C, you're in the wrong course....

Finding the tools

There are three pieces of code you need: the operating system code, the DLX emulator, and the DLX compiler/assembler. If you want to do your project work on gl.umbc.edu (or associated workstations), you only need the code for the operating system itself. Both the DLX emulator and the DLX compiler and assembler are available on all UCS machines that can mount AFS in the directory /afs/umbc.edu/users/e/t/etmiller/pub/dlx/bin. You need not copy files from this directory; instead, make links to the executables (gcc-dlx, dlxasm, and dlxsim). Of course, your operating system development should be done in your home directory.

If you want to run the system off-campus on your own Linux box, you'll need to build your own version of the compiler and emulator (the assembler is written in perl, so you can just copy it from the AFS directory to your own machine). You'll need the following packages from /afs/umbc.edu/users/e/t/etmiller/pub/dlx/bin:
dlxmd.tgz gcc-2.7.2.3.tgz dlxsim.tgz

Building the simulator is relatively straightforward: unpack the compressed tar file, enter the directory, and type make. The compiler is a little more difficult, so follow these steps:

Unpack gcc-2.7.2.3.
Unpack dlxmd.tgz
Move the dlx directory (created by unpacking dlxmd.tgz) into gcc-2.7.2.3/config.
Change directories into gcc-2.7.2.3.
Patch configure by running "patch < config/dlx/configure.diff"
Patch config.sub by running "patch < config/dlx/config.sub.diff"
Change directories to config/dlx.
Modify the Makefile in this directory by setting the PREFIX variable to be the directory that you want to install gcc-dlx into.
Build the compiler by typing make.
Install the compiler by typing make install.

dlxos Details

You can get a copy of dlxos on any machine with AFS directories mounted from /afs/umbc.edu/users/e/t/etmiller/pub/dlx/dlxos.tgz. Copy this file to your home directory and unpack it using the command "gtar xzf dlxos.tgz". The directory has a Makefile and C and assembly files in it to build the operating system. There may be upgrades to the OS during the semester, which may be picked up from this location.

Debugging `dlxos`

The dlxos code is documented quite well, and part of your job over the semester is to figure out exactly how the pieces fit together. To help you, we've included an easy way to print debugging statements. Recall that the simulator can pass arguments to programs run in it just like argc and argv work in regular Unix. This can be used to implement debugging statements by passing the -D debugflags argument to the operating system using the -a option to the simulator. For example, you could turn on all dbprintf statements whose first argument is 'p' with the simulator line:
dlxsim -x os.dlx.obj -a -D p
This would tell the simulator to execute os.dlx.obj and to pass the arguments -D p to os.dlx.obj. os.dlx.obj is then free to interpret those arguments as it desires; the current OS code treats the arguments as a way to specify which debugging statements to print out. Feel free to add additional debugging flags. Also, a '+' in the debugging string means that all debugging statements are enabled. Otherwise, just those statements mentioned in the string are turned on. For example, -D abc means that only debugging statements that specify a, b, or c will be printed; the rest won't be.

Grading the Projects

The intent of the grading for the project is not to differentiate among those students who do a careful design and implementation of the assignments. Rather, the grading helps us identify those students who (i) don't do the assignments or (ii) don't think carefully about the design, and therefore end up with a messy and over-complicated solution. Remember that you can't pass this course without at least making a serious attempt at each of the assignments. Further, the grading is skewed so that you will get substantial credit, even if your implementation doesn't completely work, provided your design is logical and easy to understand. This means that you should first strive to come up with a clean design of your project on paper. Second, don't try to add fancy features just because some other group is!

The grading for the project will be as follows: 40% design, 60% implementation. We have structured the grading in this way to encourage you to think through your solution before you start coding. If all you do is to work out a detailed design for what you would do to address the assignment (and if the design would work!), but you write no code, you will still get almost half of the credit for the assignment. The implementation portion of the grade considers whether you implemented your design, ran reasonable test cases, and provided documentation that the TA could understand. Part of being a good computer scientist is coming up with simple designs and easy to understand code; a solution which works isn't necessarily the best that you can do. Thus, part of the design and implementation grade will be based on whether your solution is elegant, simple, and easy to understand.

Additional Resources

Tutorial on programming with threads from DEC SRC

Last updated 3 Dec 1999 by Ethan Miller (elm@csee.umbc.edu)

-output outputfile	Produce a simulator-loadable file in outputfile rather than the default, which is the input file name with `.o` appended.
-list listfile	Create a listing file in listfile. This listing file may be very useful when you're debugging your code. It's an assembly language listing with addresses and instructions.
-sym symfile	Produce a symbol table list into symfile. The table is sorted alphabetically by symbol name. It can be sorted into numerical order using the UNIX `sort` command.
-init initroutine	Tell the assembler to use initroutine as the procedure to call first when running the code. This doesn't change the code itself, but does change the line in the output file that tells the simulator where to start execution.
-debug	Turn on debugging for the assembler. The code output isn't changed.

-x execfile	Load execfile into the simulator's memory, and execute it (jump to the start address) when the simulator runs. If this option is specified more than once, only the last file specified actually is run (though the rest are loaded).
-l loadfile	Load loadfile into the simulator's memory. This file will not be executed, though code in it may be called from the file specified with -x.
-s startaddr	Specify startaddress as the starting address for the simulator to execute at. This option overrides the address specified by a file loaded with -x. The starting address is that provided by the last -s or -x option on the command line.
-m memorysize	Set the memory size of the simulated DLX processor to memorysize. The default memory size is 4 MB.
-k stackaddr	Set the initial stack address of the simulated DLX processor to stackaddr. The default stack address is the top of memory (normally 4 MB, but the stack address will adjust automatically if the -m option is used).
-t instrexectime	Make instrexectime the simulated time (in microseconds) taken to execute a single DLX instruction. This only has an effect on operations that involve simulated time, such as timer interrupts and the length of time that disk I/Os require. It's not necessary to make this number match the actual time it takes to simulate one DLX instruction. Default time is 1 microsecond per instruction.
-a arg1 arg2 ...	Pass the remaining arguments on the command line to the program being simulated by the DLX simulator. The arguments aren't parsed in any way - they're passed the same way that UNIX passes arguments to user programs. In other words, `argc` and `argv` for your operating system will be set correctly.
-I	Turn on instruction tracing. Print a list of the addresses at which instructions are executed. The list is in the form addr:numinstrs, which means that the simulator executed numinstrs consecutive instructions starting at address addr.
-M	Turn on memory access tracing. Print the operation type (instruction), address and value moved for each load and store executed by the simulator.
-F tracefile	Make tracefile the file to which instruction and memory traces are written. The default is standard output, which may also be specified by "-".
-D debugstr	Turn on debugging in the simulator for the options listed in debugstr. This is only available if the simulator is compiled with debugging code enabled (and the default version you'll be using isn't). Normally, the simulator should be run without debugging code because the debugging code slows things down even if it's not turned on (all those debugging checks).