UMBC CMSC 313 -- Writing Subprograms Previous | Next


Writing Subprograms

Higher level languages use subprograms all the time. Some languages have two types of subprograms, procedures (or subroutines) and functions, while others, like C, only have functions. Normally, the difference between the two is that functions return a value (only one!). Assembly languages uses "subprograms" and typically returns a value in AX register. In C, we evoke a functions by using the name with paraentheses. In assembly language we put the instruction:

call SubProg In C, the end of the function is the optional return statement. In assembly, there is the required ret instruction.

Rules

To understand how subprograms actually work, we first need to understand the stack in detail.

The Stack

The hardware stack and stack pointer sp are necessary to get subprograms to work properly. Actually early CPUs did not have them, and as a result, they were more limited than what we have today. For instance, recursion was not easily possible.

A stack is a data structure, a Last-In, First-Out (LIFO) queue, but data can only be added or removed at one end. There are special instructions built into the CPU to work with the stack. The sp register points to the newest 16-bit (or 32-bit double word when using the extended registers in the later models) value that is on the stack, which is the next item to be removed. When putting an item on the stack, it starts at the highest address and grows down.

The instructions are:

Either a memory location (word or double word only, as appropriate), constant, or a register can be specified. If we have set up the code segment as:

X DW 1111
Y DW 2222
Z DW ?
We can have a stack that looks like:
?  
?  
?  
In this case, the SP register really points to what is above the first location. If we execute:
  push X
We now have a stack that looks like:
1111 <- SP
?  
?  
If we then execute:
  push Y
We now have a stack that looks like:
1111  
2222 <- SP
?  
If we finally execute:
  pop Z
We now have a stack that looks like:
1111 <- SP
2222  
?  
The location for the variable Z is set to 2222 and the SP register points to the previous entry. Note that stack location holding 2222 is considered unused and will be overwritten by the next push instruction. You can not count on items popped from the stack remaining in unused stack memory because the operating system also uses the stack.)

If you do push X followed by pop X, nothing is changed.

Saving and restoring registers is particularly important when using subprograms. You have the responsibility to save and restore important data in the registers before you call a subprogram and then you are responsible for restoring those registers afterwards.

Well-behaved subprograms should save and restore any registers that they use, unless they are returning values in certain registers! However, not all subprograms written by others are well-behaved. This means you have to write the instructions to do it! in order to make sure the program works correctly.

Separately Translating Subprograms

Putting subprograms into spearate files lets you do things better and faster. Better, because you can use the subprograms in more than one program -- Software reuse is good! Faster because you only have to assembly those files with changes. Makefiles come in handy here.

Rules

How the Linker Works

We have used the linker (ld and gcc [as a linker] ) and have sed library procedures, printf and scanf. These and many others have been assembled separately and stored in a library created by compiler developers. How does the linker handle that?

When the assembler translates a source file into object code, it creates a symbol table of all the names and attributes of symbols defined in the file. When it is done, that symbol table is thrown away. Since GLOBAL symbols may be defined in one file and referenced in another file(s), the assemblers saves two files in the .o file -- a table of EXTERN symbols and a list of places where each symbol is referenced, and a table of GLOBAL symbols and the unique place where each is defined.

The unresolved external references must be resolved by the linker. Once the linker knows where is one of the GLOBAL symbols is stored, it goes back and modifies the locations of the EXTERN references with the now known address.


Previous | Next

©2004, Gary L. Burt