CMSC-421 Operating Systems

Fall 99

Project II

Adapted from Dr. Howard Motteler

Assigned: 15 Nov 99
Due: 13 Dec 99

Update 17 Nov

Memory must be allocated in continuous blocks of 4MB at a time, so if a process needs 19MB, you must allocate 20BM.

When you take something off of one queue and put it on another one, you must be it at the end of the queue.

Purpose

To gain experience with process & memory management and data structures.

Background

A time sharing system, such as UNIX has multiple users running multiple processes concurrently. A simplified process control system can be seen as a state diagram with three states. A process is either running, ready to run, or waiting for an I/O request to be filled. This makes sense. Even though there are multiple jobs, a CPU can only ever be doing one thing at one time, so it can only run one process. Since it can only run one and this is a time sharing system, there must be a number of processes that are ready to run. Since we want to utilize the CPU to the fullest extent we can and it takes time to fulfill an I/O request, we want to put currently running processes that request I/O to sleep until the I/O completes. These states are usually called "running", "ready to run", and "sleeping on I/O". The simple state diagram is shown below.

It is important to note that process scheduling gets done between chunks of "real work". The CPU happily chugs away running a single process for a specific amount of time. This is "real work". The system keeps time with a hardware clock that interrupts the CPU at a fixed, hardware-dependent rate, typically between 50 and 100 times a second. Each occurrence of a clock interrupt is called a clock tick. The system allows a process to execute on the specific number of clock ticks before it gets moved to another state. This number is called the system's timeslice. After each clock tick, the CPU begins to execute operating system code, part of which is job scheduling. In the diagram above each circle contains none, one, or more processes and the processes are transitioned between circles during the operating system code that does job scheduling. The transitions are explained as follows:

Fork (new job): A new job is created by a process calling the fork() system call and is put in the ready to run state. (For this project, this can not occur, unless there is sufficient unallocated memory for the new process to run. If not, the process can not be fork'ed and must sit in the queue until there is sufficient memory!)
Schedule Process: A lucky process has been chosen to execute for the next timeslice.
Preempt: An unlucky process has run for it's entire allocated timeslice, so it is put back into the ready to run state.
Sleep: The currently running process needs to do I/O before it can get any more work done. So the process is put to sleep until the I/O subsystem can retrieve the requested data chunk from an external source. (disk drive, keyboard, etc.)
Wakeup: The data chunk the process was waiting for has arrived; the process can now continue to work, so it is moved into the ready to run state.
Exit: The process has finished its job.

Project

Your project is to implement this simplified process management scheme using 2 different scheduling techniques: prioritized round robin and shortest remaining time. You also need to keep track of various statistical data for each scheduling technique and job: total time each process was in each state, number of jobs run, total elapsed time, total elapsed time per job (throughput), and longest and shortest time taken for any job in the system. You must also keep track of the number of times a process can not be started because of lack of memory.

The system will have the following behavior. You must have a way to keep time intervals (clock ticks) during the execution of the system. A simple approach to this problem might be to use a loop and, for each iteration, increment the clock. Seven steps should be done in the scheduling loop during each cycle, the first six of which map directly into the transitions of the diagram above. Your main loop will do something like the following:

If there are new jobs and there is sufficient memory, then put them in "ready to run" state. Otherwise, make notice of an attempt to start failed due to lack of memory.
If the job currently running has used up its timeslice, then put it back into the "ready to run" state.
If the job currently running has requested I/O, then move it to "sleeping on I/O" state.
If a job was just swapped out, then choose a new job to swap in.
For all jobs waiting for I/O, check if the request has been satisfied.
If a job's I/O has completed, move it to "ready to run" state.
Keep track of statistics.

For this simulation, jobs will do no "real" work; i.e., your simulation will implement what happens when the CPU invokes the operating system process scheduler. You can think of this project as a function call inside the operating system. A function call that updates all job states and sets up the next job to run, so that when the function call completes, the CPU has the next job and can begin to run it.

Since our simulated jobs are doing no "real work", we have a little problem. If they are not actually executing code, they can not request I/O. If they cannot request I/O, we can't transition from running to sleeping on I/O. And since there is no real I/O going on, I/O can not complete, so we cannot transition from sleeping on I/O to ready to run. This is a problem. But we have a solution. When a job is in the running state, you will have a function that randomly decides whether the job has requested I/O. When a job is in the "waiting for I/O" state, you will have a function that randomly decides whether or not I/O has completed for that job. These functions are given below.

#define CHANCE_OF_IO_REQUEST	10
#define CHANCE_0F_IO_COMPLETE	 4

int IO_request( ) 
{

    if ( ( rand( ) % CHANCE_OF_IO_REQUEST ) == 0 )
    {
        return 1;
    }
    else
    {
	  return 0;
    }

int IO_complete( ) 
{
    if ( ( rand( ) % CHANCE_OF_IO_COMPLETE ) == 0 )
    {
        return 1;
    }
    else
    {
        return 0;
    }
}

(Note: In order to use the rand() function, your simulator needs to seed the random number generator. Also, you must use #include <stdlib.h>)

Jobs in this simulation will be simulated using a job data file named process.dat that contains 4 columns. The format is:

Process Id: Arrival time: Service time: Priority:Memory requirement (in MB)

For example the data file:

123:0:10:1:4
124:1:20:0 :8

describes a simulation that has two jobs. The first has a PID of 123, an arrival time of zero (when the simulation starts), needs to do 10 clock ticks of work, and has a priority of 1. The second job has a PID of 124, arrives at clock time 1, needs to do 20 clock ticks of work, and has a priority of 0.

The behavior of all jobs that your simulator will handle is described in the data file attached. Associated with every job is:

a unique process id (PID),
the arrival time for the given job,
a service time ( how long a job has until completion, i.e., how long the jobs "real work" will take),
a priority rating (for this simulation there will be five priorities, numbered 0-4, with 0 being the highest priority), and
A memory allocation of some size.

Please do not change this data file, your grade will be based on specific output generated by the given job attributes.

Implementation Structure

Since jobs can have differing priorities (0, 1, 2, 3, 4 ), it is suggested use five queues, numbered by priority. The jobs with priority 0 have higher precedence than those with priority 1, same with the queues. That means in order to start a job in queue 1, there shouldn't be any jobs currently in Queue 0. Keep in mind that this should be a time-sharing system. This means that after a certain time, no matter if the current job has finished or not, that job should be put back on the line and the job next in line should begin. The timeslice for this project is 3 clock ticks. Also, for each job that the scheduler works on, that job's time remaining should decrease by one. The scheduler should never be "idle" if there is an existing job in line. (There might be a situation when all of the jobs that arrived before a specific time are finished and there are jobs that have not "arrived" yet.) When jobs are completed, update any statistics and get the next one immediately. If a job is swapped back to "ready to run", it should be put on the queue assigned to its priority.

Your project's main function (without checking for available memory) could look something like this:



clock = 0;
seed the random number generator
add new incoming jobs to the ready to run state

while (there are jobs in the system ) 
{
    current_job = choose job to execute
    while (1) /* this loop is the main process management loop */
    {
        Add new incoming jobs to the ready to run state
        while there are jobs waiting for I/O
        {
            /* check if this job's I/O is complete */
            status = IO_complete()
            if status == 1 then
            { 
                put the job whose I/O has completed on the
                    ready to run queue
            }
        }

        if this job has finished its work (if time remaining is 1 ) then
        { 
            mark current_job as swapped out (job complete, exit)
        }
        else if there are now higher priority jobs on the ready to run state then 
        {
            mark current_job as swapped out (preempted)
        }
        /* check for I/O request from current_job */
        else if the job is not complete then
        { 
            status = IO_request()
            if status == 1 then /* need to do I/O */
            {
                mark current_job as swapped out (sleeping on I/O)
            }
            else if this job has been on the CPU for an entire timeslice then 
            {
                mark the job as swapped out (end of timeslice ) 
                and move it to the ready to run state
            }
        }

        do bookkeeping and statistics
        clock++

        if current_job was swapped out then 
        {
            move the job to the appropriate queue and break from this loop
        }
    }
}

Note: The choose job to execute line will be your 2 different scheduling functions. (If the inner loop above is well thought out, it should not have to change when your scheduling algorithm changes.) Also make sure you account for the case that no processes will run. (All jobs are blocked for I/O, or all jobs that have arrived up to this point have finished processing.) In this case, run an idle process (no statistics need to be generated for this pseudo process.)

Implementation Notes

The project needs to be written in C.

The timeslice is 3 clock ticks.

The random seed is 1.

Assume your system has 32MB of memory.

Your program must specify which scheduling algorithm to use on the command line: "-P" is prioritized round robin, and "-L" is least time remaining (shortest job first) without considering priority.

The simulation needs to be deterministic so the TA can grade it without having to check everyone's output by hand. In order for this to happen, the calls to rand() inside IO_request and IO_complete need to be called in the same order in everyone's implementation. The order of the calls is given in the above pseudocode for your main. For each iteration of the inner loop you must first call IO_complete, in a FIFO fashion, for all currently waiting jobs in the I/O queue, then call IO_request for the current job. Your implementation must follow this order. Also everyone's implementation needs to call srand() before the main loop is started. To be deterministic everyone will use "1" as the seed to srand().

The output should be standard and follow the format given below:

     | Total time      | Total time     | Total time | Number
Job# | in ready to run | in sleeping on | in system  | of times
     | state           | I/O state      |            | could not
     |                 |                |            | start process
=====+=================+================+============+=========
pid0 | XXX             | XXX            | XXX        | XXXX
pid1 | XXX             | XXX            | XXX        | XXXX
pid2 | XXX             | XXX            | XXX        | XXXX
pid3 | XXX             | XXX            | XXX        | XXXX
...
...
...
pid0 | XXX             | XXX            | XXX        | XXXX
 
Total simulation run time: XXX
Total number of jobs: XXX
Shortest job completion time: XXX
Longest job completion time: XXX
Average job completion time: XXX

Your program is required to produce verbose output if given the "-v" flag on the command line. The verbose output will be written to stderr every clock tick and consist of the following one line format.

#while the system has jobs

For example, the job file above would produce the following verbose output to stderr:

0:123:9:false:none:preempted
1:124:19:false:none:still running
2:124:18:false:none:still running
3:124:17:false:none:still running
4:124:16:false:true:none:sleeping
5:123:8:false:none:still running
6:123:7:false:0:preempted
7:124:15:true:none:sleeping
...
...

For example, assume the following things happen during clock tick 45: job 123 is running, it does not request I/O, it has 15 clock ticks until it exits, it has not used its entire timeslice, and the I/O completes for jobs 234, 345, and 456. The verbose output for that cycle would be:

45:123:15:false:234,345,456:still running

If no I/O had been completed, it would be

45:123:15:false:none:still running

Your simulator must read the input file from stdin and write its output to stdout. So the program will be run as follows:

% my_simulator < input_file.txt > output_file.txt

It would be faster and easier to test your simulator, if all the inputs can be changed from the command line, i.e.,

% my_simulator -v -ts 3 -s 1 -r 10 -c 4 < input_file.txt > output_file.txt

Where ts is the time slice, s is seed value, r is I/O request chance, c is I/O complete chance, and verbose output is on. But by no means is this required.

process.dat

The following data should be put into a file named process.dat and your simulator will have to read in and parse the data. You may want to simply read in the entire file before you start the simulation. It is possible that you will have one set of data to use for development and another set when running your program for submission, so don't make any assumptions about the values in the data.

100:0:7:3:20
101:1:3:2:6
102:3:7:0:16
103:6:15:4:9
104:8:4:1:28
105:15:2:4:16
106:15:1:3:4
107:21:12:2:8
108:22:6:2:19
109:23:7:2:8
110:24:2:1:15
111:25:3:0:30
112:28:21:3:12
113:29:7:2:4

An additional data set to be used in this file is:

100:2:6:2:10
101:2:1:2:16
102:4:5:4:26
103:6:25:4:9
104:8:20:1:28
105:15:3:15:16
106:15:10:1:24
107:21:12:0:18
108:22:6:2:19
109:23:7:2:8
110:24:2:1:15
111:25:3:0:30
112:28:21:3:12
113:29:7:2:4

Submitting the Project

The project will be submitted as before, using the script command. You will have to run the program four times, once with -L option and once with the -P option for each of the two data sets.

Due Date

The project is due on 14 July. Projects must be submitted before midnight of the due date. (That is, the project must be submitted by 11:59:59 p.m. on 13 December.)

Extra Credit

There is an extra credit bonus of fifteen (15) percent if you are finished by 1 December. There is an extra credit bonus of five (5) percent if your are finished by 8 December.