CS611 Details of homework assignments

    The most important item on all homework is YOUR NAME!
    No name, no credit. ALSO, put last 4 digits of SS#.
    Staple or clip pages together.

Homework must be submitted when due. You loose 10%, one grade, the first day homework is late. Then 10% each week thereafter. Max 50% off. A zero really hurts your average! Paper or EMail to squire@cs.umbc.edu is acceptable. If I can not read or understand your homework, you do not get credit. Type or print if your handwriting is bad. Homework is always due on a scheduled class day within 15 minutes after the start of the class. If class is canceled then homework is due the next time the class meets.

  EMail only plain text! No word processor formats.
       You may use a word processor or other software tools and
       print the results and turn in paper.
       Put CS611 and HW number in subject line.

Some homework must be "submitted"

 The "submit" facility only works on the "irix.gl.umbc.edu" 
 and  linux.gl.umbc.edu machines.

 The student commands are:
    submit   cs611 HW6 file   puts your "file" into cs611 HW6
    submitrm cs611 HW6 file   removes your "file" from cs611 HW6
    submitls cs611 HW6        lists your files in cs611 HW6

    Note: "HW" is upper case
       a) you must have your userid registered for "submit"
          send mail from a gl machine to squire if your submit fails
       b) you have to be logged onto a gl machine, SSH or telnet are OK
       c) everything is case sensitive, remember the uppercase HW.

Do your own homework!

You can discuss homework with other class members but DO NOT COPY!

Contents

  • Homework 1
  • Homework 2
  • Homework 3
  • Homework 4
  • Homework 5
  • Homework 6
  • Midterm Exam
  • Homework 7
  • Homework 8
  • Homework 9
  • Final Exam
  • Other Links
  • HW1 Amdahl's Law 25 points

      You must show your work, not just the answer.
      Book Page 60, Exercise 1.2
      Book Page 62, Exercise 1.7
    

    HW2 CPI 25 points

      You must show your work, not just the answer.
      Book Page 121, Exercise 2.11
      Book Page 122, Exercise 2.12 
    

    HW3 Pipelines 25 points

      You must show your work, not just the answer.
      Book Page 214, Exercise 3.1
      Book Page 219, Exercise 3.12 
    
    

    HW4 SuperScalar DLX 25 points

      1. Does a DLX sequence of instructions exists that must
         stall in a "scoreboard" machine, Figure 4.3, Page 244,
         yet the same sequence will not stall in a Tomasulo machine,
         Figure 4.8, Page 253, ? (yes or no)
    
      2. Does a DLX sequence of instructions exists that must
         stall in a Tomasulo machine, Figure 4.8, Page 253,
         yet the same sequence will not stall in a "scoreboard"
         machine, Figure 4.3, Page 244, ? (yes or no)
    
      3. Using the paper "Combining Branch Predictors" by Scott
         McFarling, use the instruction trace given below and
         update a Bimodal Predictor, Figure 1, for the case:
         two PC bits, four entries in the count vector, two
         bit counts as described in the paper.
         Initialize all counts to 10 base 2 (different from paper)
    
         a) show the four count values at the end of the trace.
         b) keep a count of correct predictions as you update
            the counts in order to give the percent predicted
            correctly.
    
         The following trace, as decimal integers, represent the
         sequence of PC (low order bits) of conditional branches
         and the letter T following the PC value indicates the
         branch was taken.
         1T, 1T, 1T, 1, 3T, 2, 1T, 1T, 1T, 1, 3, 2, 1T, 1T, 1T, 1,
         0T, 0T, 0T, 0, 3T, 2, 1T, 1T, 1T, 1, 2, 3
    
         (Sample output with dummy answers should look like:
               0  00    <-- initially  10  in all cases
               1  11
               2  01
               3  11
                  50%)
    

    HW5 Cache 25 points

     
    Draw the diagram and compute values for the cache system
    described below. The diagram can be drawn free hand yet
    needs to be neat enough to be read. Use similar level of detail
    as was on the handout in class on caches. Show at least the
    tag comparators and "and" gate with the valid bit. Show all
    four rows of the L1 cache, and about 8 rows of the 65,536 rows
    of the L2 cache. Use this diagram to hand simulate the caches
    action when running the address sequence below.
    
      L1 instruction cache for a DLX machine,
      2-way associative, block size is 4 words (16 bytes),
      index field in PC is two bits (e.g. 4 blocks long)
      LRU (Least recently used) replacement policy.
    
      Thus PC bits are  +--------------+-------+---------+----------+
                        |  tag         | index | word    | byte     |
                        |              |       | select  | select   |
                        +--------------+-------+---------+----------+
        bit number       31          6  5    4  3       2 1        0
    
      Timing: the instruction is delivered in 1 ns for a hit,
      a miss requires that a block be filled from the L2 cache.
      (The 1 ns is still used by the L1 cache, even on a miss.)
    
      L2 general cache, direct mapped, block size is 4 words (16 bytes)
      index field is 16 bits (i.e. 65,536 blocks long)
    
      Timing: four words are delivered to L1 in 8 ns for a hit,
      a miss requires that a block be filled from RAM.
      Assume the 8 ns includes time to get the address,
      put the four words on the bus into the L1 cache and
      raise the "L2 hit" signal. (The data from RAM flows
      through the L2 cache on the way to the L1 cache,
      thus the 8 ns is used by the L2 cache, even on a miss.) 
    
    
      RAM, 128 bit bus, (16 bytes) (4 words) delivered to L2 in 20 ns
      Assume the 20 ns includes time to get the address, fetch
      the data, put the data on the bus and raise the "data_ready"
      signal for the L2 cache..
    
      All "valid" bits are initially zero.
      From the above, the first instruction takes 1 + 8 + 20 = 29 ns.
      Other facts: The memory to L2 cache bus is 128 bits wide.
                   The L2 to L1 bus is 128 bits wide, thus no word
                   select multiplexer on the L2 cache.
    
      Given the sequence of PC addresses below,
      1) What is the total time to deliver all instructions. (ns)
      2) What is the average time to deliver all instructions. (ns)
      3) What is the L1 cache miss rate. (fraction)
      4) What is the L2 cache miss rate looking at only the L2 cache.
         (fraction)
      5) Assume one clock per nanosecond (ns) What is the average CPI.
         (xx.xx clocks per instruction)
    
      6) Show hit or miss on each cache for each PC.
         (Do this first, of course!)
    
                        L1   L2 
      PC:  00000000                <--- show H for hit, M for miss
           00000010                     blank for unused.
           00000020
           00000030                     for each address for L1 and L2
           00000040
           00000004
           00000008                         * change, was 6
           00000080
           00000044
           00000008
           440001F4
           110003B8                         * change, 8 was 6
           00000038
    

    HW6 VHDL 25 points

     
    Write the VHDL code to perform an IEEE 754 Floating point add.
    You are given two 32-bit floating point numbers that are to
    be added to produce a third floating point number.
    
    Simplifications you may use include:
      Input numbers are normalized.
      No overflow or underflow or denormalization will occur.
      No rounding is necessary (either use truncation or round toward zero.)
      Use VHDL add, subtract, shift and other operators as needed.
      You do not have to go to the gate level.
    
      Use fp_add_test.vhdl as a start.
      Fill in the  architecture behavior of fp_add  to do the
      IEEE floating point add.
    
      Choose some reasonable test data for "a" and "b" in the test bench.
    
      The handout in class shows the commands needed on sunserver1.cs.umbc.edu
    
      A previous handout shows commands for linux.gl.umbc.edu but you
      will have to delete a few words to make VHDL-87 rather than VHDL-93 
    
      Look at VHDL help for more information.
    
      Compile and run. When reasonably correct, on gl.umbc.edu  do a submit
    
      submit cs611 HW6 fp_add_test.vhdl
    
    
    

    Midterm exam. 15% of course grade

      Closed book. Short answer, Numeric problems and some Multiple choice.
      Numerical problems will be on CPI, Amdahl's Law, Pipelining,
      Branch Prediction, Cache and IEEE Floating Point.
    
      Exam covers book:     1.5, 1.6,
                            2.3, 2.8,
                            3.1-3.5, 3.7, 3.9,
                            4.2, 4.4
                            5.1-5.5
                  lectures: 1 through 14 excluding Introduction and VHDL
                  homework: 1 through 5 
                  papers:   McFarling "Combining Branch predictors"
    

    HW7 I/O Timing 25 points

     
    Based on textbook and lecture answer the following:
    Show your work.
    
    Q1. Given a PCI bus running at 66MHz, 64 bits wide,
        what is the maximum bandwidth in MHz?
        (I could have asked for Mb/sec, same number, yet MB/sec is wrong!)
    
    Q2. Given a Ultra SCSI 160MB/s controller and disk drive that
        spins at 10,000 rpm and has an average seek of 6ms, 160MB/s
        transfer rate, defragmented.
    
        How long does it take, in seconds, to transfer 3.2MB where
        each disk transfer is a 32KB block?
    
    a)  assuming 1/3 average seek time for first block and average
        rotational delay for all blocks.
    
    b)  like a) but assuming a 4MB internal disk buffer so that only
        the first block has a rotational delay penalty.
    
        All numbers in decimal, K=1,000, M=1,000,000
        Defragmented disks will typically only pay the seek penalty
        on the first block, the internal disk buffer can prevent
        any rotational delay penalty assuming there is room for
        read-a-head. Assume ideal timing.
    
    Q3. How many raw bytes must be stored to have one hour of
        music played at 44.1 KHz with 16 bits coming from each of
        two channels?
    
    Q4. What bandwidth in MHz (one bit per clock) is needed to continuously
        read 4.7GB of digital data from a 12X DVD in 10 minutes?
        (ignore seek and rotational delay, G=1,000,000,000)
    
    

    HW8 Little's Law 25 points

     
    Given a queue/server model M/M/4
    Given an average-arrival-rate  20 tasks per second
    
    Q1.   Given a single-server-utilization of 80%
      a)  What is the average-single-server-rate in tasks per second ?
      b)  What is the average-time-in-queue for a task ?
      c)  What is the average-time-in-system for a task ?
      d)  What is the average-tasks-in-queue ?
    
    Q2.  What is the maximum-tasks-in-queue ?
         Possible short answers:
           average-tasks-in-queue / single-server-utilization
           about ten times the average-tasks-in-queue
           unbounded
    
    Q3.  Given that we want the average-tasks-in-queue to be 10 tasks,
         (Still M/M/4 and average-arrival-rate of 20 tasks per second)
         What single-server-utilization is needed?
         (Answer as a percentage within 2% gets full credit.)
    
    Comments: None should be needed and these may be redundant,
              yet, in order to prevent long lines asking questions:
              M/M/4 technically stands for
              First M means memory-less random distribution of arrivals
              Second M means memory-less random distribution of service times
              4 means the single queue is feeding four servers
    
              For a server problem it is reasonable to assume an
              exponential probability distribution for each M.
              It is reasonable to assume the four servers equally
              divide the workload of one server that is four times as fast.
              It is reasonable to assume all servers have the same utilization.
    
              The equations in the textbook on page 509 and 510 apply.
              The equations given in class represent the same equations
              that are in the textbook.
    
              As always, do not plug numbers into randomly selected equations.
              Try to understand what equations apply to the problem
              and check if your answers are intuitively reasonable.
    
    

    HW9 25 points

      
    
      Not assigned in Fall 2000
    
    

    Final Exam 50 points

     
      Comprehensive, about 1/3 pre midterm, 2/3 post midterm
      True/False, multiple choice, short answer
      In range 25 to 50 questions.
    
      The exam covers:
        Lectures 3 - 13, 16 - 28
        Homework 1 - 8
        McFarling Paper through bimodal
        IEEE 754 paper, floating point add, sub, mul, div
        Textbook  1.5
                  3.2 - 3.7
                  4.1 , 4.2 , 4.4 , 4.8
                  5.2 - 5.7
                  6.2 - 6.5
                  7.2 , 7.5
                  8.2 - 8.6
                  A.3 - A.5
                  B.3
                  E.1
    
    
    

    Other Links

    Go to top

     Last updated 12/12/00