CMSC 313 Lecture 3,

    <- previous    index    next ->

Lecture 3 Registers, syntax, sections

The Intel 80x86 has many registers and named sub-registers.
Here are some that are used in assembly language programming
and debugging (the "dash number" gives the number of bits):

 +---------------------------+  EAX extended accumulator
 | EAX-32 +-----------------+|  (lower part of dividend)
 |        |       AX-16     ||  (quotient after division)
 |        |+--------+------+||  (lower part of product)
 |        ||  AH-8  | AL-8 |||
 |        |+--------+------+||
 |        +-----------------+|
 +---------------------------+

 +---------------------------+  EBX extended base pointer
 | EBX-32 +-----------------+|  (BX in DS segment)
 |        |       BX-16     ||  
 |        |+--------+------+||
 |        ||  BH-8  | BL-8 |||
 |        |+--------+------+||
 |        +-----------------+|
 +---------------------------+

 +---------------------------+  ECX extended counter
 | ECX-32 +-----------------+|  (string and loop operations)
 |        |       CX-16     ||  (CX is a 16 bit counter)
 |        |+--------+------+||
 |        ||  CH-8  | CL-8 |||
 |        |+--------+------+||
 |        +-----------------+|
 +---------------------------+

 +---------------------------+  EDX extended DX
 | EDX-32 +-----------------+|  (I/O pointer for memory mapped I/O)
 |        |       DX-16     ||  (remainder after divide)
 |        |+--------+------+||  (upper part of dividend)
 |        ||  DH-8  | DL-8 |||  (upper part of product)
 |        |+--------+------+||
 |        +-----------------+|
 +---------------------------+

 +---------------------------+  ESP extended stack pointer
 | ESP-32     +-------------+|  SP  stack pointer
 |            | SP-16       ||  (used by PUSH and POP)
 |            +-------------+|
 +---------------------------+

 +---------------------------+  EBP extended base pointer
 | EBP-32     +-------------+|  (by convention, callers stack)
 |            | BP-16       ||  (BP in ES segment)
 |            +-------------+|
 +---------------------------+

 +---------------------------+  ESI extended source index
 | ESI-32     +-------------+|  SI  source index
 |            | SI-16       ||  (in DS segment)
 |            +-------------+|
 +---------------------------+

 +---------------------------+  EDI extended destination index
 | EDI-32     +-------------+|  
 |            | DI-16       ||  (DI in ES segment)
 |            +-------------+|
 +---------------------------+

 +---------------------------+  EIP extended instruction pointer
 | EIP-32     +-------------+|  IP  instruction pointer
 |            | IP-16       ||  
 |            +-------------+|
 +---------------------------+

 +---------------------------+   EFLAGS error flags
 | EFLAGS-32  +-------------+|   or just  flags
 |            | EFLAGS-16   ||   (not a register name!)
 |            +-------------+|   (must use PUSHF and POPF)
 +---------------------------+

 For 32-bit "C" compatible programming, stop here.

              +-------------+   CS code segment
              | CS-16       |
              +-------------+

              +-------------+   SS stack segment
              | SS-16       |
              +-------------+

              +-------------+   DS data segment
              | DS-16       |   (current module)
              +-------------+

              +-------------+   ES data segment
              | ES-16       |   (calling module, destination string)
              +-------------+

              +-------------+   FS heap segment
              | FS-16       |
              +-------------+

              +-------------+   GS global segment
              | GS-16       |   (shared)
              +-------------+

There are also 80-bit floating point registers ST0 .. ST7
There are also 64-bit MMX registers MM0 .. MM7
There are also control registers CR0 .. CR4
There are also debug registers DR0 .. DR3, DR6, DR7
There are also test registers TR3 .. TR7

A dumb program to test register names is testreg.asm

Another dumb program to test al,ah,ax,eax regeax.asm


The basic syntax for a line in NASM is:

label:  opcode  operand(s) ; comment

The "label" is a case sensitive user name, followed by a colon.
The label is optional and when not present, indent the opcode.
The label should start in column one of the line.
The label may be on a line with nothing else or a comment.

The "opcode" is not case sensitive and may be a machine instruction
or an assembler directive (pseudo operation) or a macro call.
Typically, all "opcode" fields are neatly lined up starting in the
same column. Use of "tab" is OK.
Machine instructions may be preceded by a "prefix" such as:
a16, a32, o16, o32, and others.

"operand(s)" depend on the choice of "opcode".
An operand may have several parts separated by commas,
The parts may be a combination of register names, constants,
memory references in brackets [ ] or empty.

Comments are optional, yet encouraged.
Everything from the semicolon to the end of the line is
a comment, ignored by the assembler.
The semicolon may be in column one, making the entire line
a comment.

Sections or segments:
One specific assembler directive is the "section" or "SECTION"
directive. Four types of section are predefined for ELF format:

        section  .data    ; initialized data
                          ; writeable, not executable
                          ; default alignment 4 bytes

        section  .bss     ; uninitialized space for data
                          ; writeable, not executable
                          ; default alignment 4 bytes

        section  .rodata  ; initialized data
                          ; read only, not executable
                          ; default alignment 4 bytes

        section  .text    ; instructions (code)
                          ; not writeable, executable
                          ; default alignment 16 bytes

        section  other    ; any name other than .data, .bss,
                          ; .rodata, .text
                          ; your stuff
                          ; not executable, not writeable
                          ; default alignment 1 byte

A few comments on efficiency:
My experience is that a good assembly language programmer
can make a small (about 100 lines) "C" program more
efficient than the  gcc  compiler. But, for larger
programs, the compiler will be more efficient.

Exceptions are, for example, the SGI IRIX  cc  compiler
that has super optimization for that specific machine.

For the Intel 80x86 here are some samples in nasm and from gcc
(different syntax but you should be able to recognize the instructions)
Focus on the loop, there is prologue and epilogue code that should
be included, yet was omitted. Note the test has "check" values
at each end of the array. There is no range testing in
either "C" or assembly language.

A simple loop loopint.asm
Same code from gcc  loopint.s
Hex machine code generated by nasm loopint.lst

Most efficient loop loopint2.asm
Same code from gcc  loopint2.s
Hex machine code generated by nasm loopint2.lst

Speed consideration must take into account cache and virtual memory
performance, number of bytes transfered from RAM and clock cycles.
On modern computer architectures, this is almost impossible. For example,
the Pentium 4 translates the 80x86 code into RISC pipeline code and
is actually executing instructions that are different from the
assembly language. Carefully benchmarking complete applications is
about the only conclusive measure of efficiency.

    <- previous    index    next ->

Lecture 3 Registers, syntax, sections

Other links

Go to top