Computer Science and Electrical Engineering
University of Maryland Baltimore County
Fall 1999 CS Graduate Seminar

Experiments with Haircut

James Mayfield
Johns Hopkins Applied Physics Laboratory

2:00pm Friday October 1, 1999
Lecture Hall V, ECS

The Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system is an experimental Java-based text retrieval engine. HAIRCUT uses both n-grams (fixed-length character sequences) and words as indexing terms, and both a vector space model and a Hidden Markov model for ranking documents. In this talk, I will describe how HAIRCUT achieves better retrieval results by combining these techniques than it does using the techniques individually. I will also report on a sequence of experiments that compare the efficacy of words and n-grams. Finally, I will touch on the use of these techniques for cross-language retrieval, in which a query is posed in English, and documents in one or more non-English languages are retrieved.

 


For more information see http://www.csee.umbc.edu/events , call 410-455-3500 or contact jklabrou@csee.umbc.edu