A Program
There is a book entitled Algorithms + Data Structures = Programs by Nichols Wirth.
Essentially, that describes every program in every language. The code section implements
the algorithm, and the data sections implement the data structures.
We need to have the assembler help us as much as possible, so we don't have to work as hard!
The first trick is to use the include directive, just as you did in C/C++ and Java (different
directive, same affect!). You can include a number of files that you will need or you can use the following:
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
include \masm32\include\masm32rt.inc
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
That is an include that includes the rest!
Sections (Segments) of a program
Some of the sections of a program include:
.Model | The only thing we will use is FLAT, because it gives us all 4GB of memory |
.Stack | This is the amount of stack space you want to reserve. 1KB is the default |
.Data | Actually, we can have .DATA, .DATA?, .CONST, .FARDATA, and .FARDATA?
You can get away with using only the .DATA |
.Code | This is your algoritm, in syntaxically correct assembly language instructions! |
You can have more than one .DATA and one .CODE segment. The linker will get it figured out for you.
.Model
When we are writing 32-bit code, the easiest way is to use the "FLAT" memory model, because the addressing in linear. It
starts at address 00000000h and goes to 0FFFFFFFFh (simply 4GB!) The alternative is to use the segmented model which is
far more complicated!
Details About Using Hexadecimal Numbers
All numbers must begin with a symbol in the range of '0' to '9', otherwise the assembler does not know it is a number!
The easy way to handle that is to put a zero on the left side of the number, which does not change the value. Then
we have the problem of what base is the number? Unless we specify, it is decimal. To use a hexadecimal number, it
must end with an 'h', for hex. Then there is the question, what case is the letters in hex? It does not matter,
but the accepted style is use uppercase. There is no penalty for it being lower case, though.
(To use binary, use a 'b' after the number.)
.stack
The stack is a data structure we use for a number of purposes. It is very useful for us and we will talk about it later. So far,
the default has been sufficient for our needs.
.data And .data?
In your previous programs in the different languages, you normally had to declare your variable. Declaring a variable involved in
reserving an appropriate sized memory location, giving it a name (or label or identifier) and possibly an initial value. Use
.data? for uninitialized variables and .data for initialized data.
Declaring data can be interesting. You must keep track of the constraints of your data,
what is the minimium and maximum size, it is signed or unsigned, is it real or whole?
There is no special way to designate that the variable is signed or unsigned. You,
the programmer, must keep that straight! Also, today, the tendency is to make everything a double word,
because it is actually faster.
When we talk about type, what we really mean is how many bits in the data? If you try to put one size
variable into another size container (memory or register), the assembler will flag it as an error.
For the time being, we are only going to talk about integer data. Real numbers come later.
Intrinsic Whole Data Types
These are some of the built-in data types:
Size | Name1 | Name2 |
8-bit | BYTE | db |
sbyte | |
16-bit | WORD | dw |
sword | |
32-bit | DWORD | dd |
sdword | |
64-bit | QWORD | dq |
Notes:
- This shows a TYPE, but can be used as an initializer. TYPE must be used when the instruction could
apply to any time, such as INC DWORD [nrStudents]. Without the TYPE, this could refer to an 8-bit, 16-bit,
or 32-bit memory location.
- This is an initializer, used in the .DATA and .DATA? segments
I
.Code
An instruction or a line of code has up to four parts:
- label
- mnemonic
- operand(s)
- comment
All four are optional, depending on what you are doing. A line of code can
consist of just a label, just a mnemonic, or just a comment. The operand(s)
can not be present without a mnemonic.
Labels
Labels are identifiers. They are used to identify data and variables or locations in the code.
They allow us to refer to locations and data in a symbolic or meaningful way. That is why you
should use care in selecting your labels. It also helps document your code.
Mnemonics
"Mnemonic" is defined by Merriam-Webster OnLine Search as "assisting or intended to assist memory".
These are normally an abbreviation of the instruction in a form that is suppose to help you remember
what the instruction is and does. MOV is for "Move" or transfer data from one point to another.
JMP is to jump to location in the program. Some people (including authors of books) incorrectly
call them "opcodes", but opcodes are numeric version that only the computer really understands.
The MOV instruction can be a number of different opcodes, depending on the addressing mode, etc.
Operands
Operands are the extra information that the instruction needs to do that instruction. Of course,
not all instructions use operands. Some instructions have one operand, others have two operands.
Operands have no meaning without an instruction mnemonic, and
can not exist in isolation.
When an instruction has two operands, they are in the format of:
destination, source
although, the first one can also be a source as well as a destination. The destination and the source
must be the same size! You can never work with data of two different types or sizes at the same time. You must convert
the smallest one to match the largest one.
If you want to put a copy of the data in the EAX register into the EBX, it is:
mov EBX, EAX
EAX is the source and EBX is the destination.
If I want to add the values in the EAX and EBX register together and store the results in
EAX, it would be in the form of:
add EAX, EBX
NOTE: The destination must be one of the two registers holding the data. The Intel
chip does not allow you to put the result into a third register. You would have to do the
addition and then have another instruction to move the sum to the third register.
Comments
The comments do not affect the performance of the program. They exist to communicate.
This is where you tell the reader what you are doing. If others cannot understand your
code, your program is worthless. Comments should add value to the code, not
just repeat verbatim what the instruction does!
Bad Comment
inc EAX ;increment the EAX register
Good Comment
inc EAX ;increment the number of students processed
Notes
In the .code section, we must call the function to exit the process. If we do not, the computer will continue to
execute whatever comes next in memory. This is not normally a good thing and usually crashed the program.
In some programs, there is a way to create what looks like one instruction that is really many instructions.
These are called macros. We will be using ones that come with MASM32. In C and C++,
you can create a macro with the "#define" directive.
At other times, we will be using library functions to do some of our work, especially input and output!
Type Of Program
There are two types of programs, console and windows. MASM32 came with samples of each
Console (hello.asm)
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
; Build this with the "Project" menu using
; "Console Assemble and Link"
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
.486 ; create 32 bit code
.model flat, stdcall ; 32 bit memory model
option casemap :none ; case sensitive
include \masm32\include\windows.inc ; always first
include \masm32\macros\macros.asm ; MASM support macros
; -----------------------------------------------------------------
; include files that have MASM format prototypes for function calls
; -----------------------------------------------------------------
include \masm32\include\masm32.inc
include \masm32\include\gdi32.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
; ------------------------------------------------
; Library files that have definitions for function
; exports and tested reliable prebuilt code.
; ------------------------------------------------
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\gdi32.lib
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
.code ; Tell MASM where the code starts
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
start: ; The CODE entry point to the program
print chr$("Hey, this actually works.",13,10)
exit
; ллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллллл
end start ; Tell MASM where the program ends
Batch File To Create A Console Program
@echo off
if not exist rsrc.rc goto over1
\masm32\bin\rc /v rsrc.rc
\masm32\bin\cvtres /machine:ix86 rsrc.res
:over1
if exist "hello.obj" del "hello.obj"
if exist "hello.exe" del "hello.exe"
\masm32\bin\ml /c /coff "hello.asm"
if errorlevel 1 goto errasm
if not exist rsrc.obj goto nores
\masm32\bin\Link /SUBSYSTEM:CONSOLE "hello.obj" rsrc.res
if errorlevel 1 goto errlink
dir "hello.*"
goto TheEnd
:nores
\masm32\bin\Link /SUBSYSTEM:CONSOLE "hello.obj"
if errorlevel 1 goto errlink
dir "hello.*"
goto TheEnd
:errlink
echo _
echo Link error
goto TheEnd
:errasm
echo _
echo Assembly Error
goto TheEnd
:TheEnd
pause
Windows Program (minimum.asm)
; #########################################################################
.386
.model flat, stdcall
option casemap :none ; case sensitive
; #########################################################################
include \masm32\include\windows.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
; #########################################################################
.code
start:
jmp @F
szDlgTitle db "Minimum MASM",0
szMsg db " --- Assembler Pure and Simple --- ",0
@@:
push MB_OK
push offset szDlgTitle
push offset szMsg
push 0
call MessageBox
push 0
call ExitProcess
; --------------------------------------------------------
; The following are the same function calls using MASM
; "invoke" syntax. It is clearer code, it is type checked
; against a function prototype and it is less error prone.
; --------------------------------------------------------
; invoke MessageBox,0,ADDR szMsg,ADDR szDlgTitle,MB_OK
; invoke ExitProcess,0
end start
Batch File To Create Windows Program
@echo off
if not exist rsrc.rc goto over1
\masm32\bin\rc /v rsrc.rc
\masm32\bin\cvtres /machine:ix86 rsrc.res
:over1
if exist "minimum.obj" del "minimum.obj"
if exist "minimum.exe" del "minimum.exe"
\masm32\bin\ml /c /coff "minimum.asm"
if errorlevel 1 goto errasm
if not exist rsrc.obj goto nores
\masm32\bin\Link /SUBSYSTEM:WINDOWS "minimum.obj" rsrc.res
if errorlevel 1 goto errlink
dir "minimum.*"
goto TheEnd
:nores
\masm32\bin\Link /SUBSYSTEM:WINDOWS "minimum.obj"
if errorlevel 1 goto errlink
dir "minimum.*"
goto TheEnd
:errlink
echo _
echo Link error
goto TheEnd
:errasm
echo _
echo Assembly Error
goto TheEnd
:TheEnd
pause