Addressing

UMBC

CMSC 313 -- Addressing

Addressing

If you are familiar with C pointers and pointer arithmetic, you will notice many things here that look familiar. In fact, C/C++ pointers were designed to mimic assembly language in order to make them efficient to implement. In other words, if you don't really understand pointers in C, after you see how they really work, you will probably be able to understand them, finally. This makes it imperative that you understand how to form good addresses and the many many to address memory.

Memory Address versus Memory Contents

In high-level languages, such as C, the identifier refers to the contents of a memory location. Thus, when you write:

int age;
int junk
int *pAge

age = 21;	 /* assigns the value of twenty-one to the memory location you
			named age */
junk = age;      /* the memory location junk gets the value used at location age
			or three */
pAge = &age;     /* pAge now holds the address of the memory location you
			named age */

Unless you use & symbol, you are referring to the contents. The contents of that location change during the execution of your program, but the address will be the same throughout.

As you remember, we can write something like this:

    A     DB    78h
    B     DW    1234h
    C     DD    0FEDCBA89h

Now when we want to get the contents or address we do:


    mov al, [ A ]     ; moves the contents of A to al
    mov eax, A        ; moves the address A to eax

    mov bx, [ B ]     ; moves the contents of B to bx
    mov ebx, B        ; moves the address of B to ebx

    mov ecx, [ C ]    ; moves the contents of C to ecx
    mov ecx, C        ; moves the address of C to ecx

Notice that when we move the address of of a variable, we are dealing with a 32-bit address and must select a 32-bit container to hold that address. It does not matter what the value of the variable is, C could hold the value of 1 and it would still be the same!

Address Arithmetic

In C, we have something like:

char	name[10];

This reserves ten bytes of memory and the first one is called name[0]. In assembly language we can do it a number of ways. The first thing to ask, is the data going to be initialized or not.

A       DB       0Ah, 1Ah, 2Ah, 3Ah, 4Ah, 5Ah, 6Ah, 7Ah, 8Ah, 9Ah

ages    DB       0Ah
        DB       1Ah
        DB       2Ah
        DB       3Ah
        DB       4Ah
        DB       5Ah
        DB       6Ah
        DB       7Ah
        DB       8Ah
        DB       9Ah

Now the location [ A ] holds 0Ah, etc. Notice that the other locations do not have a name as such, but we can get to them with address arithmetic. The location [ A + 3 ] refers to the byte containing 3Ah.

        mov al, ages          ; moves the contentes of ages
                              ; to al(which will hold 0Ah)
        mov bl, [ ages + 3 ]  ; moves the contents of ages plus 3 
                              ; to bl (which will hold 3Ah!)

Additionally, if pAges is a pointer variable in C, we can say that [pAges] is the equivalent of *pAges.

If you to get the address of of the byte holding the value of 03Ah, you would use:

        mov eax, A + 3  ; moves address of where the 
                        ; byte holding 3Ah is into eax

Once again, remember the address is a 32-bit number.

Suppose we have the following definitions of arrays:

AA      DB  0Ah, 1Ah, 2Ah, 3Ah, 4Ah, 5Ah, 6Ah, 7Ah
BB      DB  0Bh, 1Bh, 2Bh, 3Bh
CC      DB  0Ch, 1Ch, 2Ch, 3Ch, 4Ch, 5Ch

In memory we would have:

Label AA:                                        BB:             CC:

Contents 0A 1A 2A 3A 4A 5A 6A 7A 0B 1B 2B 3B 0C 1C 2C 3C 4C 5C

Offset AA
BB-8
CC-12 AA+1
BB-7
CC-11 AA+2
BB-6
CC-10 AA+3
BB-5
CC-9 AA+4
BB-4
CC-8 AA+5
BB-3
CC-7 AA+6
BB-2
CC-6 AA+7
BB-1
CC-5 AA+8
BB
CC-4 AA+9
BB+1
CC-3 AA+10
BB+2
CC-2 AA+11
BB+3
CC-1 AA+12
BB+4
CC    AA+13
BB+5
CC+1 AA+14
BB+6
CC+2 AA+15
BB+7
CC+3 AA+16
BB+8
CC+4 AA+17
BB+9
CC+5

It is important to notice that the offset can be a positive or negative number and that there is nothing preventing this. Array-bound checking is only accomplished in high-level languages with the addition of additional code that you normally don't see! This is why in C, if you exceed the boundary of an array, you don't get an error message unless you attempt to use memory that is not allocated to your process.

An Issue With Addressing

The Program

section .data
wArray	DW	1234h, 2345h, 3456h, 4567h, 5678h, 6789h, 789Ah, 89ABh, 9ABCh, 0ABCDh
section .bss

section .text
    global main                       ;must be declared for linker (ld)

main:                                 ;tell linker entry point

	nop

	mov	eax, 0
	mov	ebx, 0
        mov	esi, wArray
	mov	ax, [ wArray ]

	;; What will we get when we use wArray + 3?

        mov	bx, [ wArray + 3 ]




        mov     ebx,0   ;successful termination of program
        mov     eax,1   ;system call number (sys_exit)
        int     0x80    ;call kernel

The Results

If we run this and after the instruction mov bx, [ wArray + 3 ] we look at the registers we see:

(gdb) break *main
Breakpoint 1 at 0x8048300: file addr.asm, line 10.
(gdb) run
Starting program: /home/burt/courses/umbc/CMSC313/spring04/lectures/Lect08/addr/addr 

Breakpoint 1, main () at addr.asm:10
(gdb) step
(gdb) step
(gdb) step
(gdb) step
(gdb) step
(gdb) step
(gdb) info registers
eax            0x1234	4660
ecx            0x42015554	1107383636
edx            0x40016bc8	1073834952
ebx            0x5623	22051
esp            0xbffff32c	0xbffff32c
ebp            0xbffff348	0xbffff348
esi            0x80493e8	134517736
edi            0x804835c	134513500
eip            0x804831d	0x804831d
eflags         0x346	838
cs             0x23	35
ss             0x2b	43
ds             0x2b	43
es             0x2b	43
fs             0x0	0
gs             0x33	51
(gdb)

Something does not look right:

eax            0x1234	4660
ebx            0x5623	22051

There is now 5623h in the definition of wArray! How did that happen? Let's dump the word array wArray and see for ourselves!

(gdb) x/10hx &wArray
0x80493e8 :	0x1234	0x2345	0x3456	0x4567	0x5678	0x6789	0x789a	0x89ab
0x80493f8 :	0x9abc	0xabcd

That did not help. Let's look at it in byte order

(gdb) x/20bx &wArray
0x80493e8 :	0x34	0x12	0x45	0x23	0x56	0x34	0x67	0x45
0x80493f0 :	0x78	0x56	0x89	0x67	0x9a	0x78	0xab	0x89
0x80493f8 :	0xbc	0x9a	0xcd	0xab
(gdb)

What we see is that the the wArray + 3 is not the third word, it is the third byte! Remember the bytes in memory are stored in little-endian order and when we shift down three bytes from the start we find the value in the register. The assembler does not help us out like the compiler did!

What Does The Code Look Like, Really

(gdb) set disassembly-flavor intel
(gdb) disassemble &main
Dump of assembler code for function main:
0x08048300 :	nop    
0x08048301 :	mov    eax,0x0
0x08048306 :	mov    ebx,0x0
0x0804830b :	mov    esi,0x80493e8
0x08048310 :	mov    ax,ds:0x80493e8
0x08048316 :	mov    bx,ds:0x80493eb
0x0804831d :	mov    ebx,0x0
0x08048322 :	mov    eax,0x1
0x08048327 :	int    0x80
0x08048329 :	nop    
0x0804832a :	nop    
0x0804832b :	nop    
End of assembler dump.
(gdb)

Rules for Address Expressions

An address and a number are two different types of objects. An address represents a physical location in member. A number is simply a bit pattern that has no inherent data type! We don't know if it represents years, days, minutes, seconds, oranges, airplanes, characters or whatever.

The most important difference is when a program is loaded into a different location in memory, the addresses change but the numbers do not!

Legal Address Arithmetic

Symbols (identifiers) are addresses if the label memory locations. Normally, that is when they are in front of something like DB or DW in the data segment or when followed by a colon in the code segment. Additionally, symbols can be EQUated to addresses are addresses. (However, symbols EQUated to constant numbers are just simply ordinary numbers.)

If A and B are addresses and n a ordinary number, then we can legally do:

A + n yields an address
A - n yields an address
A - B yields an ordinary number
(A) yields an address
any express involving only ordinary numbers yields an ordinary number.

Every address has a particular data type (word or byte) and every address express retains the data type of the base address.

Some examples are:

A + 14	address
B - A	number
CW - AW	number
AW + ( B - A )	address Remember B - A is a number that puts this into the form of A + n

The Dolllar Sign

There is also a special assembler symbol, the dollar sign. Remember project 0, we had:

msg:    db "Hello World",10     ; the string to print, 10=cr
len:    equ $-msg               ; "$" means "here"

NOTE: This is not '$', the quotes make it a character. The assembler symbol represents the next location that code will be assembled into.

Byte Swapping

When we write code that will store a word, such as:

AWord

1234h

produces memory that looks like this:

34h

12h

When moving data from memory (or the opposite direction), the data is put into the correct format.

Words in assembly language source code files and words in registers have bytes in their normal order. Words in memory have their bytes swapped!

Moving a word to or from memory automatically swaps bytes.

Previous | Next

Addressing

Memory Address versus Memory Contents

Address Arithmetic

An Issue With Addressing

The Program

The Results

What Does The Code Look Like, Really

Rules for Address Expressions

Legal Address Arithmetic

The Dolllar Sign

Byte Swapping

©2004, Gary L. Burt