UMBC CMSC 491/691-I Fall 2002 Home  |  News  |  Syllabus  |  Project   ]
Last updated: 21 October 2002

Homework 5

Assignment: Search UMBC for relevant documents for nine search topics.

Goal: To turn umbc-crawl from a web crawl into a usable test collection. These relevant documents will be the relevance judgments with which, along with the topics written in Homework 4, we will measure our search systems' performance.

Due date:Tuesday, October 29, 2002.

Description

As part of Homework 4, you will bring to class on 10/22 three copies each of three search topics that you wrote. In class, we will exchange topics, so that at the end of class, you will have nine new search topics from nine of your fellow students. Nine students (possibly different) will have received one of your three topics. In class, make sure you understand what the author of the topic wanted to find.

For each of the nine topics you have, your task is to search the UMBC web space and find as many relevant documents as you can find for those topics. To do this, use either the UMBC site search system, or a whole-web search engine such as Google or Altavista limited to the umbc.edu domain, or both. Execute at least two different queries per topic. You should also feel free to browse from your search results in order to hunt down more relevant documents. Record the searches you perform (e.g., "Google site:umbc.edu search for `self defense martial arts'"), the URLs of documents which you deem relevant, and how you found each relevant URL (e.g., "Search result 3", or "Browsed from result 5").

The end product will be a log of your search which should be sufficient for someone else to know what you did and how you found the pages that you discovered. Follow the following format:

Searcher: Ian Soboroff <ian@cs.umbc.edu>
Search topic title: "Martial arts clubs at UMBC"
Google search: self defense martial arts
  • http://sta.umbc.edu/orgs/jujitsu/clubOverview.html (found at rank 3)
  • ...
Google search: taekwondo aikido
  • http://userpages.umbc.edu/~jlow1/martialarts (sole hit)
  • http://sta.umbc.edu/orgs/tkdo/ (browsed from above)
  • http://userpages.umbc.edu/~jrose1/aikido.html (browsed from above)
  • ...
When you give this list to the author of the search topic, he or she can easily understand how you searched and what you found. Moreover, since you put your name at the top, you can discuss any disagreements you may have.

Additionally, for each topic, print the raw search results (top 20) from your best search, and mark the relevant documents found. We will aggregate these scored searches in order to measure the engines' performance.

Your standard of relevance will be as follows. A web page will be relevant if it contains some topical content on that page which you might use to write an article or report on the topic. If a page merely links to relevant information, but doesn't contain any relevant content itself, it is not relevant. Topic content is still topical even if it appears on multiple web pages. Try to judge each page independently and fairly.

What to turn in

You will give a copy of your nine search logs to the authors of those topic. It is between you and each topic author whether to exchange hard copy or email. If you do not receive a set of search results from someone, let me know.

You will also hand in to me a copy of each of your nine search logs, stapled together with your name clearly indicated. Lastly, you will hand in to me your scored searches for each topic.

NOTE This can be a time-consuming assignment. Please give yourself sufficient time to complete it. The quality of our search collection directly depends on this!