Ph.D. Dissertation Defense

Identifying and Isolating Text Classification Signals
from Domain and Genre Noise for Sentiment Analysis

Justin Martineau

1:30-4:00 Friday, 18 November 2011, ITE 325b, UMBC

Sentiment analysis is the automatic detection and measurement of sentiment in text segments by machines. This thesis provides methods to identify, characterize, and isolate the sentiment bearing terms to improve textual sentiment classification when there is little or no labeled data for the domain.

We introduce a new theoretical framework that explains the different sources of noise that affect term level sentiment bias. This noise comes from the genre the author communicates in and the domain or general topic that the author is writing about. To understand the affects of domain noise we defined sentimental domain independence and statistically described it in the multi-domain product review data set. This allowed us to design a Domain Independence Verification Algorithm (DIVA) to eliminate this noise and produce a domain-independent sentiment model using data drawn from a variety of different domains. This model is the most accurate method to classify documents in the 25 category product review data set.

Committee:

  • Dr. Tim Finin (chair)
  • Dr. Marie desJardins
  • Dr. Akshay Java
  • Dr. James Mayfield
  • Dr. Tim Oates