UMBC CMSC313

Project Extra Credit: Character Counting Due: 7 May 4 May

Requirements Specification

Create the source file jdoe1pec.asm, assembly it, link it, and run it. When it runs correctly, submit using Blackboard. (Remember that the jdoe1 is suppose to be your id that you use to log onto the GL computers, so that there is a unique file! Do not use jdoe1 unless that is in YOUR ID!!)

Description

For this project you will be working with a text file of unknown size. You are to read the file into a buffer, count how many of each character there is, and print out the counts for the letters of the alphabet. You are to convert all lowercase letters to uppercase before counting them. You get the name of the file from the user.

Set up a buffer of 8192 bytes (8K) since this is more efficient for the operating system. When you do the reads from the file, you will get back the number of bytes actually read. If it is not 8192, you have reached the end of the file.

Hint: Set up an array of 128 integers and use the ASCII code as an index into the array. (If you wish, you can have an array of 256, if you feel that is a better value.) Make sure your program will work for counts that are over 1000!

Since you will have counts that go over 256, you can not store the count in an array of bytes. You will need an array of words or double words. To get to the correct address of the count to increment, you need the effective address:

eff = base + offset

The offset is calculated as index times the size of the element in the array. Assuming you declared an array named counts:

int counts[ 256 ];

Lets assume the base address of the array is 10000.
The effective address of counts[ 0 ] is 10000 + 0 * 4 or 10000
The effective address of counts[ 1 ] is 10000 + 1 * 4 or 10004
The effective address of counts[ 2 ] is 10000 + 2 * 4 or 10008
The effective address of counts[ 3 ] is 10000 + 3 * 4 or 10012

If ebx holds the effective address, then you can use the instruction to increment the count of that character:

inc [ ebx ]

You can use printf, scanf, and any other function in the C library for this project.

Question: Will the counts be the same for text files that contain a language other than English? Why?

Filename Input

Scanf Version

        push    dword prompt
        call    printf
        add     esp, 4

        push    dword fname
        push    dword spec
        call    scanf
        add     esp, 8


(gdb) x/10bc &fname 0x80495ec : 97 'a' 98 'b' 99 'c' 100 'd' 101 'e' 102 'f' 0 '\0' 0 '\0' 0x80495f4 : 0 '\0' 0 '\0'

System Call Version
	mov	EAX, READ
	mov     EBX, STDIN
	mov	ECX, fname
	mov     EDX, 25
	int     80H

	mov     ebx, eax	;  get rid of nl character
	dec     ebx
	mov     byte [ fname + ebx ], 0


(gdb) x/10bc &fname 0x80495d4 : 97 'a' 98 'b' 99 'c' 100 'd' 101 'e' 102 'f' 10 '\n'0 '\0' 0x80495dc : 0 '\0' 0 '\0'

Sample output

The output will list the letters and digits and their counts (User input is in red):
Enter filename:    foo.txt
A          413
B          398
C          381
D          379
.
.
.
7          123
8           14
9         1789

System Calls Of Interest

;; The conventions used in Linux 2.2, the parameters are stored in left to right 
;; order in the registers EBX, ECX, EDX, EDI, and ESI respectively.
;; The man pages show:
;;       int creat(const char *pathname, mode_t mode);
;;        mov      eax, dword CREATE    ; System Call -- open
;;        mov      ebx, myFile1         ; Pointer to filename
;;        mov      ecx, O_WRONLY        ; Flag for Write Only
;;        mov      edx, 01FDh           ; Mode -- 755 in octal
;;        int      80h
;;        mov      dword [ fdWrite ], eax; save the file descriptor

;;       int open(const char *pathname, int flags, mode_t mode);
;;        mov      eax, dword OPEN      ; System Call -- open
;;        mov      ebx, myFile          ; Pointer to filename
;;        mov      ecx, O_RDONLY        ; Flag for Read Only
;;        mov      edx, 0               ; Mode -- not used to open existing file
;;        int      80h
;;        mov      dword [ fdRead ], eax ; save the file descriptor

;;       ssize_t read(int fd, void *buf, size_t count);
;;        mov      eax, READ            ; System Call -- read
;;        mov      ebx, dword [ fdRead ]; File descriptor opened for reading
;;        mov      ecx, dword inBuff    ; Pointer to input buffer
;;        mov      edx, dword BUFSIZE   ; Number of bytes to read
;;        int      80h
;;        mov      dword [ bytesRead ], eax ; Actual amount read

;;       ssize_t write(int fd, const void *buf, size_t count);
;;        mov      eax, WRITE            ; System Call -- write
;;        mov      ebx, dword [ fdWrite ]; File descriptor opened for writing
;;        mov      ecx, dword outBuff    ; Pointer to output buffer
;;        mov      edx, dword [ bytesWritten ]      ; Number of bytes to write
;;        mov      80h

;;       int close(int fd);
;;        mov      eax, CLOSE
;;        mov      ebx, dword [ fdRead ]
;;        int      80h
;;      

;; What is this size_t/ssize_t/mode_t stuff?????
;;    /usr/src/linux/include/asm-i386/posix_types.h:typedef unsigned short   __kernel_mode_t;
;;    /usr/src/linux/include/linux/types.h:         typedef __kernel_mode_t  mode_t;

;;    /usr/src/linux/include/asm-i386/posix_types.h:typedef int              __kernel_ssize_t;
;;    /usr/src/linux/include/linux/types.h:         typedef __kernel_ssize_t ssize_t;

;;    /usr/src/linux/include/asm-i386/posix_types.h:typedef unsigned int     __kernel_size_t
;;    /usr/src/linux/include/linux/types.h:         typedef __kernel_size_t  size_t;
;;

;; in /usr/include/bits/fcntl.h:
;;
;;     O_RDONLY  00
;;     O_WRONLY  01
;;     O_RDWR    02

%define  O_RDONLY 00
%define  O_WRONLY 01
%define  O_RDWR   02

More Defines

Once again, I prefer to give these system call numbers a constant name:
%define READ                 3
%define WRITE                4
%define OPEN                 5
%define CLOSE                6
%define CREATE               8
OPEN is easier to remember that 5.

Program Header Comment Block

Use the following comment block at the beginning of your source code:
;; Filename:       jdoe1p2.asm
;; Name:           Ima Student
;; email:          jdoe1@umbc.edu  
;; Date:           18 Oct 2005
;; Course:         CMSC313 
;; Description:    (Your psuedocode goes here.  Must be detailed)
;; Notes:          (As needed, such has how to compile)

©2006, Gary L. Burt