Yahoo! as an Ontology --
Using
Yahoo! Categories to Describe Documents
Yannis K. Labrou
Computer Science and Electrical Engineering
University of Maryland Baltimore County
2:00pm Friday September 10, 1999
Lecture Hall V, ECS
We suggest that one (or a collection) of names
of Yahoo! (or any other WWW indexer's) categories can be used
to describe the content of a document. Such categories offer a
standardized and universal way for referring to or describing
the nature of real world objects, activities, documents and so
on, and may be used (we suggest) to semantically characterize
the content of documents. WWW indices, like Yahoo provide a huge
hierarchy of categories (topics) that touch every aspect of human
endeavors. Such topics can be used as descriptors, similarly to
the way librarians use for example, the Library of Congress cataloging
system to annotate and categorize books. In the course of investigating
this idea, we address the problem of automatic categorization
of webpages in the Yahoo directory.
We use Telltale as our classifier; Telltale uses
n-grams to compute the similarity between documents. We experiment
with various types of descriptions for the Yahoo categories and
the webpages to be categorized. Our findings suggest that the
best results occur when using the very brief descriptions of the
Yahoo categorized entries; these brief descriptions are provided
either by the entries' submitters or by the Yahoo human indexers
and accompany most Yahoo-indexed entries.
Background reading:
http://www.cs.umbc.edu/~finin/papers/cikm99.pdf.