CMSC 202/Advanced Section Project 3

You've Got Mail, Round 2!
Assigned Thu Oct 21
Class Design Plan Due In class Tue Oct 26
Program Due 9:00AM Thu Nov 4
Weight TBD
Objectives
Description

In this second part of writing a mail client application, you will now replace your "client programmer" hat with your "library designer" hat. Your tasks for this second phase are:

The additional resources provided to you for Project 3 will be minimal: you have most of what it takes already. You will use the EmailClient class you created in Project 2 to instantiate and test the classes you will be creating in Project 3. (If you failed miserably in Project 2, we can provide a basic EmailClient implementation for you, but you would learn a lot more if you did your own, and it would feel much more satisfying!)

In this project, you will also do most of your own file I/O, including opening files, reading from files, and being responsible for closing them. (You will not have to write to files for this project, however.)

The small set of support code we will provide is just what is necessary to overcome a couple of issues with handling IMAP-standard mail files. First, creating a parser for these files requires implementing a parser with "look-ahead": you don't have to know what this is, just that it makes interpreting input more complicated. A second problem is that recognition of the boundary between messages requires some slightly complex regular expression-based string matching. Both of these challenges are somewhat peripheral to the objectives of this project, so we are providing code to you for handling this. However, for extra credit, you can try to solve these tasks for yourself--see the "Extra Credit" description below.

This project requires a bit more OO design, especially as it is meant to lead up to Project 4. This is compounded by the fact that we cannot reveal the details for Project 4, yet! We want to simulate a real-world design scenario, where you would be asked to come up with a design based upon requirements that you know will be added to at the last minute by the client. In fact, your trying to anticipate what the next project will require is an essential part of this project. One clue that we can give you is that we will be asking much more of you in the network side of mail-reading.

Because good planning and design is an important part of what we want to teach in this project, we are breaking up the assignment into two separate "deliverables":

  1. By Tuesday (i.e., in 5 days), you should submit a design proposal for your hierarchy of classes. This can be scribbled in crayon on a napkin, as long as it is legible. You should diagram the set of classes you will be defining, the important instance variables (public and private), and the public methods. You do not need to include any details about the method implementation--just what you intend for them to do.

    This part will not receive a separate grade, but it is mandatory, and there will be a 10-point deduction for not handing in something that shows at least some effort.

  2. By the final due date (two weeks total from when this is released), you should have the implementation done. Note that some of your classes will be "stubs": compileable, but incomplete. This is detailed below.

In Project 2, we said you had to learn to walk before you can run. Now, you are running, but this is still just a short race: kind of like a 5K Run for Some Charity Benefit, with people on the sidelines shouting encouragement and providing drinks. Think of it as training for Project 4: the Boston Marathon.

Project Requirements and Specification
As outlined above, there are two parts to this project:

Phase I: Initial Class Hierarchy Design

By Tuesday, I would like you to turn in a class hierarchy diagram. You goal is to design a set of related classes that follow the design philosophies we've introduced in class, including using parent classes to hold common attributes and methods, creating the right intermediate classes, etc.. I also want you to apply what you've learned about abstract classes (recall the reason we turned Animal into an abstract class).

At the top of your hierarchy will be a general MailRepository class. If you recall, that was the top class I provided for Project 2. For Project 3, this will only serve as a parent class to derive other classes and inherit from (Hint: can you say "abstract class"?).

From the MailRepository class, you will derive subclasses for the various types of specific mail repository types you think are appropriate. As a basic dichotomy, mail can exist as local files, or it can be managed by a server somewhere on the Internet. So, at the least, you will have a subclass for simple file-based mail repositories, as well as one or more subclasses for network-based repositories.

You should gather together all the important instance variables that you can imagine would be important for storing state across any of the general kinds of mail repositories, and place them in the parent MailRepository class. You should also think about the kinds of behaviors that would be universal, and also put declarations for methods to cover these in this parent class.

Next, in each of the subclasses, you should add those instance variables and methods that would only be applicable to that specific subclass, and not to the parent or other "sibling" classes.

It is important to keep in mind here the role of abstract methods and abstract classes. When you put an abstract method declaration in a parent class, you are not making the claim that there is a common way to implement that method: just that what such a method would do applies to all subclasses, but not necessarily the how.

To start you off on thinking about what additional factors might apply to network-based mail repositories, think about what fields you have had to enter when you were configuring a real mailer (e.g., Thunderbird or Outlook Express) to fetch your UMBC mail, for instance.

Phase II: Project 3 Implementation

By the final due date (two weeks from initial assignment, and 9 days after the design is due), you must turn in a set of class implementations that defines Java classes for all of the classes in your design above. The Java implementations for the classes on the branch of the hierarchy leading down to the file-based repository class must be completely implemented, including fully-functional method bodies. In other words, a client programmer (e.g., someone writing the EmailClient class) should be able to use your class implementations to fully read file-based mail. For the other branches--mainly, the network-based repository (or repositories)--you would only need stub methods; that is, methods with full headers and descriptive header comments, but mostly-empty bodies. Note that "stub methods with empty bodies" is different from abstract method headers. The latter has the form:
public abstract boolean myMethod(int param1);
... while the former would look like:
public boolean myMethod(int param1) { return false; }
In other words, stub methods should be syntactically "complete", even though they don't perform any real functions.

In order for us to test your classes effectively, we would like you to use specific names for some of your classes:

As always, also pay attention to good procedural programming principles! You should still write clear, modular code. This means breaking up oversized, monolithic methods into smaller, logically divided submethods, abstracting out common, repeated chunks of logic into private helper methods, giving your variables meaningful names, etc.--you know the drill.

You are free (and encouraged) to add any helper classes you think would be useful and reflective of good OOP design, but only if it really makes your code "better", by which I mean "clearer" or "easier to understand" or "more logical". Recall what I said in lecture: performance is definitely a lesser consideration than clarity for any application that we will assign in this course, and even in the real world, it is rare that sacrificing clarity for efficiency is a Good Thing. Also note that trying to show off by making your code unnecessarily clever/complex just annoys and antagonizes the graders.

You are not to use any classes other than those in the standard Java libraries, and those specifically provided by me, without first checking with me.

Important: As with the prime number project, there are probably many implementations on the web of projects similar to this one. It would be best not to refer to those: this project should be simple enough that you don't need outside help at the design level, and you will only risk inadvertantly copying too much of someone else's idea.

Provided Classes and Sample Files
I will provide a library with some simple classes and methods that should help you to parse the file-based mail repository, to allow you to pull out messages and headers from the RFC 5322/822-formatted input file (no need to undertand those numbers: just wanted you to know you were being fully standards-compliant!).

The general format of a mail file is a concatenated set of individual mail items, each with the following form: (note that the text at the left margin is my annotation, and only the indented text is the real mail text)

An IMAP preamble header:
                From park@cs.umbc.edu Sun Sep  6 22:30:32 2010
Followed by 1 or more header lines:
                Received: from [192.168.1.153] (pool-173-79-24-32.washdc.fios.verizon.net [173.79.24.32])
	                (authenticated bits=0)
	                by mail.cs.umbc.edu (8.14.3/8.14.3) with ESMTP id o8R2UVV0022172
	                (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	                for ; Sun,  6 Sep 2010 22:30:31 -0400 (EDT)
                Message-ID: <4CA001CE.9070505@cs.umbc.edu>
                Date: Sun,  6 Sep 2010 22:30:38 -0400
                From: John Park 
                User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
                MIME-Version: 1.0
                To: park@cs.umbc.edu
                Subject: Test 1
                Content-Type: text/plain; charset=ISO-8859-1; format=flowed
                Content-Transfer-Encoding: 7bit
                X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0 (mail.cs.umbc.edu [130.85.36.69]); Sun,  6 Sep 2010 22:30:32 -0400 (EDT)
                Content-Length: 13
                Status: RO
                X-Status: 
                X-Keywords:                 
                X-UID: 1
Followed by a blank line, then the body:
                
                This is the first line of the body.
Here are the other rules you need to know:

The provided library has two classes you can use (any other classes/methods in the library are helpers for the internal implementation, and are not documented). The classes are all in the package proj3MailUtil. The two classes' public interfaces are:

The ReadaheadReader class:

    No public instance variables.

    Public constructors:

	/**
	* returns an instance of a readahead-enabled reader with input set
	* to file named "inFileName". This is the version you are most
	* likely to use.
	*/
	ReadaheadReader(String inFileName);

	/**
	* Alternative version if you want to provide your own
	* BufferedReader instance
	*/
	ReadaheadReader(BufferedReader in);

    Public methods:

	/* Reads the next line of text from the input, breaking at the next
	* newline/carriage return.  The terminating NL/CR is stripped off.
	*/
	String     readLine();

	/* Method to return a just-read line back into the input buffer.
	* This "unread" line will then be the next item returned by readLine()
	* You can sequentially "unread" multiple items, in which case
	* they will be returned by readLine() in reverse order
	* (i.e., last-unread is first-read).
	*/
	String     unreadLine(String line);


The P3Util class:

	/*
	* This class only has a few STATIC public helper methods:
	*/
    Public static methods:

	/*
	* This method takes a line of text, and does a regular expression
	* based match to see if it fits the pattern for an IMAP file's
	* per-message header line.  These specially-formatted lines
	* mark the boundary between the mail items in a mail file.
	* (A big name, for a little method.)
	*
	* Returns true if the argument fits the pattern
	*/
	boolean isImapMessageHeaderFlagLine(String line);

	/*
	* Method to help you read in email header specifications.
	* This method takes care of header fields that are continued
	* across multiple lines, returning the entire field as one
	* String.  It will remove the final NL/CR, like readLine(),
	* but will leave intermediate line breaks intact.
	* Note that unlike readLine(), this method is static, so
	*   a) it takes the reader as a parameter; and
	*   b) it has no "unread" capabilities.
	*/
	String readContinuedLine(ReadaheadReader reader);
}

As in Project 2, this library class will be provided as a JAR file, and so will a sample mail file. These will be available soon, to download at these links:

Once you've downloaded these two files, follow the same instructions as in Project 2 to copy the files to the correct directories and configure Eclipse to use the libraries.

Project Hints and Notes
  1. You do not have enough information yet to really implement the sendMessage() method properly. So, the method will not actually deliver any mail, nor even store it in any file. However, it will soon, in Project 4, so your method should at least do all the error-checking outlined in the description for this method in Project 2: for e.g.: the header's toAddr field must be non-null and non-empty.
  2. Other hints to come...
Sample Output
Since your EmailClient class will not have changed much, the output should be much as it was in Project 2.

Extra Credit Option
For extra credit, you can try to create your own implementations of the classes and methods provided for you to help with parsing the RFC5322-format mail file.

As mentioned earlier, there are three issues with parsing that file: first, you have to know how to open files, and create buffered streams so that you can get a line at a time. To do this, use:

    inReader = new BufferedReader(new FileReader(inFileName));
BufferedReader instances provide a line-at-a-time reading method analogous to a Scanner object's "nextLine()", so in the example above, you could invoke inReader.readLine() to read Strings nicely broken up at end-of-line points from the input file. You can then layer some kind of internal storage scheme on top of this to provide an "unreadLine()" method.

Note that BufferedReader objects are much more primitive than Scanner objects. You just keep reading until readLine() returns null. Also note that you will have to wrap this in the appropriate try-catch structure, since the FileReader() constructor might throw a "FileNotFoundException" -- you should know how to handle that now.

Another problem is that the "grammar" for the file format requires that you do some look-ahead: i.e., you need to have already read in some of the next part before you know you have all of the current part. This makes it difficult to write modular code, since some method like getNextHeaderField() would have possibly already read in a few characters of the next header, and there is no easy way to "put it back."

So, you will have to devise a way to accomplish exactly this: to "unread" some text in a controlled fashion.

The last problem is that detecting the inter-message boundary in mail files requires detecting a certain, slightly flexible format line of text. In turn, this is best handled by calling the regular expression matcher that is part of the Java library, and giving it the appropriate regular expression, neither of which is part of the syllabus. This part, I will provide for you. I will give you 2-3 lines of code that you can embed into a method to detect the inter-message marker.

First, in the class that will be doing the inter-message marker recognition, you will need to import the right package:

    import java.util.regex.*;
Then, you should insert into the class:
    private static String MESSAGE_HEADER_PATTERN = "^From (\\S+@\\S+|MAILER-DAEMON) [SMTWF][a-z][a-z] [JFMASOND][a-z][a-z] (\\d| )\\d \\d\\d:\\d\\d:\\d\\d \\d\\d\\d\\d( [+\\-]?\\d\\d\\d\\d)?$";
    private static Pattern pattern = Pattern.compile(MESSAGE_HEADER_PATTERN);
This will generate a regular expression engine that will recognize that specific pattern. (Make sure you get the pattern string exactly right! I worked hard on that, and every double-slash counts!) Finally, to use this engine, do the following, where the variable "line" contains the String to be tested:
    Matcher matcher = pattern.matcher(line);
        if (matcher.find()) {
            // Code here to handle case where line is inter-message marker.
            // --probably just set a flag.
        }

So, if you choose to do the extra credit, other than inserting the above few lines of code into your program, you will have built a complete top-to-bottom working application using only Java standard libraries that can parse and interpret RFC-5322-standard mail files. Good job!


Grading
The standard grading rules will apply to this project.


Project Submission
Before submitting your project, be sure to compile and test your code on the GL system. See the Project Compiling section of the course projects page for details on how to run and execute your project on GL.

(Note that the above section has been augmented to cover how to get files to the GL filesystem, and how to unpack JAR files to test your program.

To submit your project, type the command

submit cs202 Proj3a MailRepository.java FileMailRepository.java <any other .java files you created> See the Project Submission page for detailed instructions.

You may resubmit your files as often as you like, but only the last submittal will be graded and will be used to determine if your project is late. For more information, see the projects page on the course web site.

Do not submit the provided library or test input file--we have those already :-)

More complete documentation for submit and related commands can be found here.

Remember -- if you make any change to your program, no matter how insignificant it may seem, you should recompile and retest your program before submitting it. Even the smallest typo can cause errors and a reduction in your grade.