Cover page images (keyboard)

Proj1 - Stacks and Queues


design1.txt due on Sunday, 11/21/10, before 11:59 PM
proj1.py due on Sunday, 11/28/10, before 11:59 PM

Sue Evans

Hit the space bar for next page

What's In a Design Document ?




Background

XHTML

Hypertext Markup Language (HTML) is a notation used to describe how to display the contents of web pages and to focus on how the page looks. Extensible Markup Language (XML) is a notation used to describe data on web pages and to focus on what the data mean. A fairly recent HTML standard is XHTML, which combines both HTML and XML. Web browsers read HTML or XHTML to determine how web pages should be displayed. Other software can be used to read the same webpages and interpret the information contained there, if XML tags that program recognizes have been used in the document. HTML tags are enclosed in angle brackets (<>). In XHTML, tags generally appear in start-tag, end-tag combinations.

A start tag has the form: <name attributes>. The matching end tag contains that name preceeded by a "/". For example, a paragraph of text might be formatted as follows:

<p>This is a paragraph. </p>

XHTML also allows self-closing tags of the form: <name attributes />.

In a proper XHTML file, the tags will occur in properly nested pairs. Each start tag is matched by a corresponding end tag, and one structure may be embedded inside another, but they cannot overlap. This is like the matching of curly braces in a Java or C program.

For example:
<p>...<ol>...</ol>...</p> is OK, but
<p>...<ol>...</p>...</ol> is not.
A self-closing tag acts as a self-contained start-end pair.

The Task

You should become familiar with both HTML and XHTML first. You can use the w3schools.com website for both the HTML and XHTML tutorials, or search for other tutorials that you like better.

Your task is to write a program that checks html files, (web pages) to determine whether the embedded XHTML tags are balanced. Balanced tags are necessary for the file to be a valid xhtml file as explained above.

Your program will read from the file and print an analysis of the file to the screen. The sequence of tags in the file will be echoed to the output; any other text in the file is to be ignored. If there is a tag balancing error or the program reaches the end of the file while in the middle of a tag, the program must quit and print an error message. If the end of file is reached without any errors, print a message to that effect.

Program Requirements

The Phases

You must use both a stack and a queue to solve this problem. To divide up the effort of this problem, consider the program as consisting of two phases.

  1. The first phase takes care of reading the file and finding all of the tags from the file. The first phase will print out the tags as it finds them and also save them into a queue of tags. If the input file ends in the middle of a tag, the program will report the error and end.
  2. The second phase of the program takes the queue of tags generated in phase one and analyzes the sequence of tags to make sure that they are properly balanced. This phase will make use of the queue and a stack, as described on the previous slide. As the tags are matched, you should print the matching tags. If the tag is self-closing, you should print it and also that it is self-closing.

More Details

Data File &
Sample Run

The data file used to create the sample output for this project is called xhtml.dat. You may get a copy of this file from my pub directory. Do NOT cut and paste from this link.

Here's the data file

This file doesn't really test your program very well, it's just to give you an idea of what the output should look like. Your program should properly handle files with mismatched tags, EOF before a complete tag, etc., as described above.

Here's the sample run

As always, your output need not look exactly like mine, but it should contain all of the same information.

Submitting your work

You must be logged into your account and in the same directory as the file you are trying to submit.

To submit your design, type the following at the linux prompt:

submit cs201 Proj1 design1.txt

To submit your homework, type the following at the linux prompt:

submit cs201 Proj1 proj1.py

To verify that your file was submitted, you can execute the following command at the Unix prompt. It will show all files that you submitted in a format similar to the Unix 'ls' command.

submitls cs201 Proj1