CMSC 313 Lecture 1,

    <- previous    index    next ->

Lecture 1 Introduction and Number systems


You should be familiar with programming.
You edit your source code and have it on the disc.
A compiler reads your source code and typically converts
high level language to assembly language as another file on the disc.
The assembler reads the assembly language and produces a
binary object file with machine instructions.
The loader reads object files and creates an executable image.

This course is to provide a basic understanding of how computers
operate internally, e.g. computer architecture and assembly language.
Technically: The computer does not run a "program", the computer
has an operating system that runs a "process". A process starts
with loading the executable image of a program in memory.
A process sets up "segments" of memory with:
A ".text"   segment with computer instructions
A ".data"   segment with initialized data
A ".rodata" segment with initialized data, read only
A ".bss"    segment for variables and arrays, block starting symbols
A "stack"   for pushing and popping values
A "heap"    for dynamically getting more memory
And then the process is executed by having the program
address register set to the first executable instruction
in the process. You will be directly using segments in
your assembly language programs.

Computers store bits, binary digits, in memory and we usually
read the bits, four at a time as hexadecimal. The basic unit
of storage in the computer is two hex digits, eight bits, a byte.
The data may be integers, floating point or characters.
We start this course with a thorough understanding of numbers.
For Intel assembly language: two bytes are a word.
four bytes are a double word, eight bytes are a quad word.
Todays computers are almost all 64 bit machines, 
we will be programming using 64 bit quad words.

Numbers are represented as the coefficients of powers of a base.
(in plain text, we use "^" to mean, raise to power or exponentiation)

With no extra base indication, expect decimal numbers:

         12.34   is a representation of

  1*10^1 + 2*10^0 + 3*10^-1 + 4*10^-2  or

     10
      2
       .3
    +  .04
    ------
     12.34


Binary numbers, in NASM assembly language, have a trailing B or b.

     101.11B  is a representation of

  1*2^2 + 0*2^1 + 1*2^0 + 1*2^-1 + 1*2^-2   or

     4
     0
     1
      .5        (you may compute 2^-n or look up in table below)
   +  .25
   ------
     5.75

Converting a decimal number to binary may be accomplished:

   Convert  12.34  from decimal to binary

   Integer part                      Fraction part
        quotient remainder                integer fraction
   12/2 =   6       0              .34*2.0 =      0.68  use fmul, fistp get 0
    6/2 =   3       0              .68*2.0 =      1.36  use fmul, fistp get 1
    3/2 =   1       1              .36*2.0 =      0.72
    1/2 =   0       1              .72*2.0 =      1.44
    done                           .44*2.0 =      0.88
    read up  1100                  .88*2.0 =      1.76
                                   .76*2.0 =      1.52
                                   .52*2.0 =      1.04
                                   quit
                                   read down   .01010111
    answer is  1100.01010111

convert.c "C" program sample to do conversions
"C" program output


	      ; in nasm assembly language make fistp truncate rather than round
	fstcw  WORD [cwd]               ; store the FPU control word
	or     WORD [cwd],0x0c00        ; set rounding mode to "truncate"
	fldcw  WORD [cwd]               ; load updated control word

hint:nasm  storing a floating point fraction into an integer loacation, fistp
  0.68  becomes 0
  1.36  becomes 1
  0.72  becomes 0
  1.44  becomes 1
                  integer 34 to 34.0,  34.0/100.0 = .34 floating point
                  then multiply by 2.0


  Powers of 2
                   Decimal
                 n         -n
                2    n    2
                 1   0   1.0 
                 2   1   0.5 
                 4   2   0.25 
                 8   3   0.125 
                16   4   0.0625 
                32   5   0.03125 
                64   6   0.015625 
               128   7   0.0078125 
               256   8   0.00390625
               512   9   0.001953125
              1024  10   0.0009765625 
              2048  11   0.00048828125 
              4096  12   0.000244140625 
              8192  13   0.0001220703125 
             16384  14   0.00006103515625 
             32768  15   0.000030517578125 
             65536  16   0.0000152587890625 

For binary to decimal:

   2^3  2^2  2^1  2^0  2^-1  2^-2  2^-3
    1    1    1    1 .  1     1     1

    8 +  4 +  2 +  1 + .5 +  .25 + .125 = 15.875

 
                   Binary
                 n         -n
                2    n    2
                 1   0   1.0 
                10   1   0.1
               100   2   0.01 
              1000   3   0.001 
             10000   4   0.0001 
            100000   5   0.00001 
           1000000   6   0.000001 
          10000000   7   0.0000001 
         100000000   8   0.00000001
        1000000000   9   0.000000001
       10000000000  10   0.0000000001 
      100000000000  11   0.00000000001 
     1000000000000  12   0.000000000001 
    10000000000000  13   0.0000000000001 
   100000000000000  14   0.00000000000001 
  1000000000000000  15   0.000000000000001 
 10000000000000000  16   0.0000000000000001 


                  Hexadecimal
                 n         -n
                2    n    2
                 1   0   1.0 
                 2   1   0.8
                 4   2   0.4 
                 8   3   0.2 
                10   4   0.1 
                20   5   0.08 
                40   6   0.04 
                80   7   0.02 
               100   8   0.01
               200   9   0.008
               400  10   0.004 
               800  11   0.002 
              1000  12   0.001 
              2000  13   0.0008 
              4000  14   0.0004 
              8000  15   0.0002 
             10000  16   0.0001 

Decimal to Hexadecimal to Binary, 4 bits per hex digit
   0         0            0000
   1         1            0001
   2         2            0010
   3         3            0011
   4         4            0100
   5         5            0101
   6         6            0110
   7         7            0111
   8         8            1000
   9         9            1001
  10         A            1010
  11         B            1011
  12         C            1100
  13         D            1101
  14         E            1110
  15         F            1111
             
        n                       n
    n  2  hexadecimal          2  decimal  approx  notation
   10             400               1,024   10^3   K kilo
   20          100000           1,048,576   10^6   M mega
   30        40000000       1,073,741,824   10^9   G giga
   40     10000000000   1,099,511,627,776   10^12  T tera

The three representations of negative numbers that have been
used in computers are  twos complement,  ones complement  and
sign magnitude. In order to represent negative numbers, it must
be known where the "sign" bit is placed. All modern binary
computers use the leftmost bit of the computer word as a sign bit.

The examples below use a 4-bit register to show all possible
values for the three representations.

 decimal   twos complement  ones complement  sign magnitude
       0      0000            0000             0000
       1      0001            0001             0001
       2      0010            0010             0010
       3      0011            0011             0011
       4      0100            0100             0100
       5      0101            0101             0101
       6      0110            0110             0110
       7      0111            0111             0111 all same for positive
      -7      1001            1000             1111
      -6      1010            1001             1110
      -5      1011            1010             1101
      -4      1100            1011             1100
      -3      1101            1100             1011
      -2      1110            1101             1010
      -1      1111            1110             1001
          -8  1000        -0  1111         -0  1000
                  ^           /                ^||| 
                   \_ add 1 _/          sign__/ --- magnitude

To get the sign magnitude, convert the decimal to binary and
place a zero in the sign bit for positive, place a one in the
sign bit for negative.

To get the ones complement, convert the decimal to binary,
including leading zeros, then invert every bit. 1->0, 0->1.

To get the twos complement, get the ones complement and add 1.
(Throw away any bits that are outside of the register)

It may seem silly to have a negative zero, but it is
mathematically incorrect to have -(-8) = -8

Then, if you must use Roman Numerals roman.shtml
or  roman_numeral.shtml

Size in bytes, names and power of 10 approximate power of 2 power.shtml

IEEE Floating point formats
Almost all Numerical Computation arithmetic is performed using
IEEE 754-1985 Standard for Binary Floating-Point Arithmetic.
The two formats that we deal with in practice are the 32 bit and
64 bit formats.

IEEE Floating-Point numbers are stored as follows:
The single format 32 bit has
    1 bit for sign,  8 bits for exponent, 23 bits for fraction
The double format 64 bit has
    1 bit for sign, 11 bits for exponent, 52 bits for fraction

There is actually a '1' in the 24th and 53rd bit to the left
of the fraction that is not stored. The fraction including
the non stored bit is called a significand.

The exponent is stored as a biased value, not a signed value.
The 8-bit has 127 added, the 11-bit has 1023 added.
A few values of the exponent are "stolen" for
special values, +/- infinity, not a number, etc.

Floating point numbers are sign magnitude. Invert the sign bit to negate.

Some example numbers and their bit patterns:

   decimal
stored hexadecimal sign exponent  fraction                 significand 
                   bit                                     in binary
                                 The "1" is not stored 
                                 |                                   biased    
                    31  30....23  22....................0            exponent
   2.0
40 00 00 00          0  10000000  00000000000000000000000  1.0   * 2^(128-127) 

   1.0
3F 80 00 00          0  01111111  00000000000000000000000  1.0   * 2^(127-127) 

   0.5
3F 00 00 00          0  01111110  00000000000000000000000  1.0   * 2^(126-127)

   0.75
3F 40 00 00          0  01111110  10000000000000000000000  1.1   * 2^(126-127)

   0.9999995
3F 7F FF FF          0  01111110  11111111111111111111111  1.1111* 2^(126-127)

   0.1
3D CC CC CD          0  01111011  10011001100110011001101  1.1001* 2^(123-127)
 

                                          The "1" is not stored 
                                          | 
                          63  62...... 52  51 .....  0
   2.0
40 00 00 00 00 00 00 00    0  10000000000  000 ... 000  1.0    * 2^(1024-1023)

   1.0
3F F0 00 00 00 00 00 00    0  01111111111  000 ... 000  1.0    * 2^(1023-1023)

   0.5
3F E0 00 00 00 00 00 00    0  01111111110  000 ... 000  1.0    * 2^(1022-1023)

   0.75
3F E8 00 00 00 00 00 00    0  01111111110  100 ... 000  1.1    * 2^(1022-1023)

   0.9999999999999995
3F EF FF FF FF FF FF FF    0  01111111110  111 ...      1.11111* 2^(1022-1023)

   0.1
3F B9 99 99 99 99 99 9A    0  01111111011  10011..1010  1.10011* 2^(1019-1023)
                                                                           |
                        sign   exponent      fraction                      |
                                                before storing subtract bias

Note that an integer in the range 0 to 2^23 -1 may be represented exactly.
Any power of two in the range -126 to +127 times such an integer may also
be represented exactly. Numbers such as 0.1, 0.3, 1.0/5.0, 1.0/9.0 are
represented approximately. 0.75 is 3/4 which is exact.
Some languages are careful to represent approximated numbers
accurate to plus or minus the least significant bit.
Other languages may be less accurate.

The operations of add, subtract, multiply and divide are defined as:

  Given   x1 = 2^e1 * f1
          x2 = 2^e2 * f2  and e2 <= e1

  x1 + x2 = 2^e1 *(f1 + 2^-(e1-e2) * f2)  f2 is shifted then added to f1

  x1 - x2 = 2^e1 *(f1 - 2^-(e1-e2) * f2)  f2 is shifted then subtracted from f1

  x1 * x2 = 2^(e1+e2) * f1 * f2

  x1 / x2 = 2^(e1-e2) * (f1 / f2)

  an additional operation is usually needed, normalization.
  if the resulting "fraction" has digits to the left of the binary
  point, then the fraction is shifted right and one is added to
  the exponent for each bit shifted until the result is a fraction.

IEEE 754 Floating Point Standard

Strings of characters
We will use one of many character representations for
character strings, ASCII, one byte per character in a string.

symbol or name                            symbol or key stroke
    key stroke
       hexadecimal                            hexadecimal
          decimal                                 decimal
  
NUL ^@ 00   0   Spc 20  32   @   40  64   `   60  96
SOH ^A 01   1   !   21  33   A   41  65   a   61  97
STX ^B 02   2   "   22  34   B   42  66   b   62  98
ETX ^C 03   3   #   23  35   C   43  67   c   63  99
EOT ^D 04   4   $   24  36   D   44  68   d   64  100
ENQ ^E 05   5   %   25  37   E   45  69   e   65  101
ACK ^F 06   6   &   26  38   F   46  70   f   66  102
BEL ^G 07   7   '   27  39   G   47  71   g   67  103
BS  ^H 08   8   (   28  40   H   48  72   h   68  104
TAB ^I 09   9   )   29  41   I   49  73   i   69  105
LF  ^J 0A  10   *   2A  42   J   4A  74   j   6A  106
VT  ^K 0B  11   +   2B  43   K   4B  75   k   6B  107
FF  ^L 0C  12   ,   2C  44   L   4C  76   l   6C  108
CR  ^M 0D  13   -   2D  45   M   4D  77   m   6D  109
SO  ^N 0E  14   .   2E  46   N   4E  78   n   6E  110
SI  ^O 0F  15   /   2F  47   O   4F  79   o   6F  111
DLE ^P 10  16   0   30  48   P   50  80   p   70  112
DC1 ^Q 11  17   1   31  49   Q   51  81   q   71  113
DC2 ^R 12  18   2   32  50   R   52  82   r   72  114
DC3 ^S 13  19   3   33  51   S   53  83   s   73  115
DC4 ^T 14  20   4   34  52   T   54  84   t   74  116
NAK ^U 15  21   5   35  53   U   55  85   u   75  117
SYN ^V 16  22   6   36  54   V   56  86   v   76  118
ETB ^W 17  23   7   37  55   W   57  87   w   77  119
CAN ^X 18  24   8   38  56   X   58  88   x   78  120
EM  ^Y 19  25   9   39  57   Y   59  89   y   79  121
SUB ^Z 1A  26   :   3A  58   Z   5A  90   z   7A  122
ESC ^[ 1B  27   ;   3B  59   [   5B  91   {   7B  123
LeftSh 1C  28   <   3C  60   \   5C  92   |   7C  124
RighSh 1D  29   =   3D  61   ]   5D  93   }   7D  125
upAro  1E  30   >   3E  62   ^   5E  94   ~   7E  126
dnAro  1F  31   ?   3F  63   _   5F  95   DEL 7F  127



Optional future installation on your personal computer
Throughout this course, we will be writing some assembly language.
This will be for an Intel or Intel compatible computer, e.g. AMD.
The assembler program is "nasm" and can be run on
linux.gl.umbc.edu or on your computer.
If you are running linux on your computer, the command

sudo apt-get install nasm

will install nasm on your computer.

Throughout this course we will work with digital logic and
cover basic VHDL and verilog languages for describing
digital logic. There are free simulators, that will
simulate the operation of your digital logic for both languages
and graphical input simulator  logisim.
The commands for installing these on linux are:

sudo apt-get install freehdl
or use Makefile_vhdl from my download directory on linux.gl.umbc.edu

sudo apt-get install iverilog
or use Makefile_verilog from my download directory on linux.gl.umbc.edu

from  www.cburch.com/logisim/index.html   get logisim 
or use Makefile_logisim from my download directory on linux.gl.umbc.edu

These or similar programs may be available for installing
on some versions of Microsoft Windows or Mac OSX.

We will use 64-bit in this course, to expand your options.
In "C" int remains a 32-bit number although we have 64-bit computers
and 64-bit operating systems and 64-bit computers that are still
programmed as 32-bit computers.
test_factorial.c uses int, outputs:
test_factorial.c using int, note overflow
 0!=1 
 1!=1 
 2!=2 
 3!=6 
 4!=24 
 5!=120 
 6!=720 
 7!=5040 
 8!=40320 
 9!=362880 
10!=3628800 
11!=39916800 
12!=479001600 
13!=1932053504   BAD
14!=1278945280 
15!=2004310016 
16!=2004189184 
17!=-288522240 
18!=-898433024 
test_factorial_long.c uses long int, outputs:
test_factorial_long.c using long int, note overflow
 0!=1 
 1!=1 
 2!=2 
 3!=6 
 4!=24 
 5!=120 
 6!=720 
 7!=5040 
 8!=40320 
 9!=362880 
10!=3628800 
11!=39916800 
12!=479001600 
13!=6227020800 
14!=87178291200 
15!=1307674368000 
16!=20922789888000 
17!=355687428096000 
18!=6402373705728000 
19!=121645100408832000 
20!=2432902008176640000 
21!=-4249290049419214848  BAD 
22!=-1250660718674968576 

Well, 13! wrong vs 21! wrong may not be a big deal.

factorial.py by default, outputs:
factorial(0)= 1
factorial(1)= 1
factorial(2)= 2
factorial(3)= 6
factorial(4)= 24
factorial(52)= 80658175170943878571660636856403766975289505440883277824000000000000

Yet, 32-bit signed numbers can only index 2GB of ram, 64-bit are
needed for computers with 4GB, 8GB, 16GB, 32GB etc of ram, available today.
95% of all supercomputers, called HPC, are 64-bit running Linux.

A first quick look at assembly language:

In high order language, A = B + C;
You think of adding the value of C to B and storing in A

In assembly language you think of A, B, and C as addresses.
You load the contents of address B into a register.
You add the contents of address C to that register.
You store that register at address A.
Well, Intel uses the word  move, typed  mov, for load and store.
   mov rax,[B]    ; semicolon  starts a comment  (no end of statement symbol)
   add rax,[C]    ; the [ ] says  "contents of address"
   mov [A],rax    ; the  mov  is from second field to first field


; hello_64.asm    print a string using printf
; Assemble:	  nasm -f elf64 -l hello_64.lst  hello_64.asm
; Link:		  gcc -m64 -o hello_64  hello_64.o
; Run:		  ./hello_64 > hello_64.out
; Output:	  cat hello_64.out

; Equivalent C code
; // hello.c
; #include <stdio.h>
; int main()
; {
;   char msg[] = "Hello world";
;   printf("%s\n",msg);
;   return 0;
; }
	
; Declare needed C  functions
        extern	printf		; the C function, to be called

        section .data		; Data section, initialized variables
msg:	db "Hello world", 0	; C string needs 0
fmt:    db "%s", 10, 0          ; The printf format, "\n",'0'

        section .text           ; Code section.

        global main		; the standard gcc entry point
main:				; the program label for the entry point
        push    rbp		; set up stack frame, must be aligned
	
	mov	rdi,fmt         ; address of format, standard register rdi
	mov	rsi,msg         ; address of first data, standard register rsi
	mov	rax,0		; or can be  xor  rax,rax
        call    printf		; Call C function
                                ; printf can mess up many registers
                                ; save and reload registers with debug print

	pop	rbp		; restore stack

	mov	rax,0		; normal, no error, return value
	ret			; return

hello_64.lst  with many comments removed
    20                                          section .data		; Data section, initialized variables
    21 00000000 48656C6C6F20776F72-     msg:	db "Hello world", 0	; C string needs 0
    22 00000009 6C6400             
    23 0000000C 25730A00                fmt:    db "%s", 10, 0          ; The printf format, "\n",'0'
    24                                  
    25                                          section .text           ; Code section.
    26                                  
    27                                          global main		; the standard gcc entry point
    28                                  main:				; the program label for the entry point
    29 00000000 55                              push    rbp		; set up stack frame, must be aligned
    30                                  	
    31 00000001 48BF-                   	mov	rdi,fmt         ; address of format, standard register rdi
    32 00000003 [0C00000000000000] 
    33 0000000B 48BE-                   	mov	rsi,msg         ; address of first data, standard register rsi
    34 0000000D [0000000000000000] 
    35 00000015 B800000000              	mov	rax,0		; or can be  xor  rax,rax
    36 0000001A E8(00000000)                    call    printf		; Call C function
    37                                  
    38 0000001F 5D                      	pop	rbp		; restore stack
    39                                  
    40 00000020 B800000000              	mov	rax,0		; normal, no error, return value
    41 00000025 C3                      	ret			; return



You do not need C, computers come with BIOS.

; bios1.asm  use BIOS interrupt for printing
; Compiled and run using one Linux command line   
;  nasm -f elf64 bios1.asm && ld bios1.o && ./a.out   
	global  _start        ; standard ld main program

	section .text     
_start:

print1: mov rax,[ahal]
	int 10h		; write character to screen.
	mov rax,[ret]
	int 10h		; write new line '\n'
	mov rax,0
	ret
ahal:	dq 0x0E28	 ; output to screen ah has 0E
ret:	dq 0x0E0A	 ; '\n'
; end bios1.asm



First homework assigned
on web, www.cs.umbc.edu/~squire/cs313_hw.shtml
Due in one week. Best to do right after lecture.

    <- previous    index    next ->

Lecture 1 Introduction and Number systems

IEEE Floating point formats

Strings of characters

Optional future installation on your personal computer

We will use 64-bit in this course, to expand your options.

First homework assigned

Other links

Go to top