Multiresolution Text Analysis
Published in the
Workshop on New Paradigms in Information Visualization and Manipulation
at the CIKM '96 Conference.
- Amen Zwa,
Computer Science and Electrical Engineering Department,
University of Maryland Baltimore County.
- David S. Ebert,
Computer Science and Electrical Engineering Department,
University of Maryland Baltimore County.
- Ethan L. Miller,
Computer Science and Electrical Engineering Department,
University of Maryland Baltimore County.
Abstract
The n-gram analysis technique breaks up a text document into
several n-character long unique grams, and produces a vector
whose components are the counts of these grams.
A typical corpus contains hundreds of thousands of such grams.
Wavelet compression reduces the dimension of the n-gram vectors, and
speeds up document query operations.
Document vectors with their dimensions reduced to four components
is readily represented in a three dimensional volume.
- Download
the compressed PostScript file of the paper text.
Related Publications
-
Interactive Volumetric Information Visualization for
Document Corpus Management.
Ebert, D., Shaw, C., Zwa, A., Miller, E., Roberts, A.
-
Two-handed Interactive Stereoscopic Visualization.
Ebert, D., Shaw, C., Zwa, A., Starr, C.
Amen Zwa
(zwa@cs.umbc.edu)
Last modified: 21 November 1996