Writing Subprograms

UMBC

CMSC 313 -- Writing Subprograms

Writing Subprograms

Higher level languages use subprograms all the time. Some languages have two types of subprograms, procedures (or subroutines) and functions, while others, like C, only have functions. Normally, the difference between the two is that functions return a value (only one!). Assembly languages uses "subprograms" and typically returns a value in AX register. In C, we evoke a functions by using the name with paraentheses. In assembly language we put the instruction:

call SubProg In C, the end of the function is the optional return statement. In assembly, there is the required ret instruction.

Rules

Subprograms can be included in the same file or stored in a separate file.
If they are in the same file, the ordering and naming is immaterial.
There is no "main()", unless you are using gcc to link the program.
There can be more that one data and bss section.
Data defined in the data and bss sections are global.
The local variable defined in C are really on the stack. That is why they don't exist after returning from the subprogram.

To understand how subprograms actually work, we first need to understand the stack in detail.

The Stack

The hardware stack and stack pointer sp are necessary to get subprograms to work properly. Actually early CPUs did not have them, and as a result, they were more limited than what we have today. For instance, recursion was not easily possible.

A stack is a data structure, a Last-In, First-Out (LIFO) queue, but data can only be added or removed at one end. There are special instructions built into the CPU to work with the stack. The sp register points to the newest 16-bit (or 32-bit double word when using the extended registers in the later models) value that is on the stack, which is the next item to be removed. When putting an item on the stack, it starts at the highest address and grows down.

The instructions are:

push (16-bit or 32-bit, depending on the register specified. If it is a memory location or constant, you will have to specify whether it is a WORD or DWORD.)
pusha (16-bit, pushs AX, CX, DX, BX, SP, BP, SI, DI)
pushad (16-bit, pushs EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI)
pushf (16-bit, pushs the flag register)
pushfd (32-bit)
pop (16- or 32-bit, based on the register specified, can not be sp register! If it is a memory location or constant, you will have to specify whether it is a WORD or DWORD.)
popa (16-bit pops in reverse order of pusha)
popad (32-bit)
popf (16-bit)
popfd (32-bit)

Either a memory location (word or double word only, as appropriate), constant, or a register can be specified. If we have set up the code segment as:

X	DW	1111
Y	DW	2222
Z	DW	?

We can have a stack that looks like:

?
?
?

In this case, the SP register really points to what is above the first location. If we execute:

push

We now have a stack that looks like:

1111	<- SP
?
?

If we then execute:

push

We now have a stack that looks like:

1111
2222	<- SP
?

If we finally execute:

pop

We now have a stack that looks like:

1111	<- SP
2222
?

The location for the variable Z is set to 2222 and the SP register points to the previous entry. Note that stack location holding 2222 is considered unused and will be overwritten by the next push instruction. You can not count on items popped from the stack remaining in unused stack memory because the operating system also uses the stack.)

If you do push X followed by pop X, nothing is changed.

Saving and restoring registers is particularly important when using subprograms. You have the responsibility to save and restore important data in the registers before you call a subprogram and then you are responsible for restoring those registers afterwards.

Well-behaved subprograms should save and restore any registers that they use, unless they are returning values in certain registers! However, not all subprograms written by others are well-behaved. This means you have to write the instructions to do it! in order to make sure the program works correctly.

Separately Translating Subprograms

Putting subprograms into spearate files lets you do things better and faster. Better, because you can use the subprograms in more than one program -- Software reuse is good! Faster because you only have to assembly those files with changes. Makefiles come in handy here.

Rules

If you call a subprogram that is in another file, you must have the EXTERN statement,
For data that is defined in one file and used in another must have the EXTERN/GLOBAL pair, but notice that when you do that, you must provide the size (BYTE or WORD). Normally, this is not a good way to do things because it creates a global variable. Use the stack instead if possible.

How the Linker Works

We have used the linker (ld and gcc [as a linker] ) and have sed library procedures, printf and scanf. These and many others have been assembled separately and stored in a library created by compiler developers. How does the linker handle that?

When the assembler translates a source file into object code, it creates a symbol table of all the names and attributes of symbols defined in the file. When it is done, that symbol table is thrown away. Since GLOBAL symbols may be defined in one file and referenced in another file(s), the assemblers saves two files in the .o file -- a table of EXTERN symbols and a list of places where each symbol is referenced, and a table of GLOBAL symbols and the unique place where each is defined.

The unresolved external references must be resolved by the linker. Once the linker knows where is one of the GLOBAL symbols is stored, it goes back and modifies the locations of the EXTERN references with the now known address.

Previous | Next