MS defense: Sawhney on Analyzing the Growth of Hoeffding Trees

MS Thesis Defense

Analyzing the Growth of Hoeffding Trees

Mayank Sawhney
12:00-1:30pm Thursday 1 December 2011, ITE 346

Mining high speed data streams has become a necessity because of the enormous growth in the volume of electronic data. In the past decade, researchers have suggested various models for learning in both stationary and concept drifting data streams. Hoeffding Trees (Domingos & Hulten 2000) are one such model for mining stationary data streams. Several modifications of the nave Hoeffding Tree algorithm have been proposed to study data streams.

Our work analyzes the behavior of Hoeffding Trees when they are trained on infinite and experiments, we show that the Hoeffding bound suffers from an inherent shortcoming. Even after reaching a stage where accuracy asymptotes, Hoeffding Trees continue to grow. We examine this behavior in data streams with both nominal and numeric attributes. We also study enhancements made to the naive Hoeffding Tree algorithm and also evaluate different discretization methods.

In our work, we analyze how the Hoeffding bound relates to the information gain when splits are made and also when we send a random distribution as a data stream. We conclude that this behavior is a result of decisions made for the early growth of Hoeffding Trees and the induced randomness in an online setting. We also argue that the presence of this behavior will impact the use of Hoeffding algorithms in real world online applications.

Committee Members

  • Dr. Tim Oates (Chair)
  • Dr. Tim Finin
  • Dr. Kostas Kalpakis

talk: Wolfson on Intelligent Transportation Systems, 1pm Fri 12/2, ITE 227

Silence of the labs: Why are we still commuting
the way we did 40 years ago?

Professor Ouri Wolfson
University of Illinois at Chicago

1:00pm Friday 2 December 2011, ITE 227

Intelligent Transportation Systems (ITS) have been in research and development since the 70's but their impact so far has been relatively small. In this talk I will argue that this is about to change, and that these systems will soon revolutionize the way we commute. I will describe research issues and Information Technology approaches related to ITS. I will focus on urban transportation, and discuss novel applications enabled by mobile wireless technologies. Such applications have the potential to improve safety, mobility, environmental impact, and energy efficiency of urban transportation. The applications are based on vehicle-to-vehicle and vehicle-to-infrastructure communication, and they epitomize ITS efforts currently undertaken throughout the world, particularly the IntelliDrive initiative of the US Department of Transportation. I will also relate these efforts to our NSF-sponsored IGERT PhD program in Computational Transportation Science.

Ouri Wolfson is the Richard and Loan Hill Professor of Computer Science at the University of Illinois at Chicago, and an Affiliate Professor in the Department of Computer Science at the University of Illinois at Urbana Champaign. He is the sole founder of Mobitrac, a venture-funded high-tech startup that was acquired by Fluensee Co. in 2006.

Ouri Wolfson authored over 180 publications, and holds seven patents. He is a Fellow of the Association of Computing Machinery, a Fellow of the American Association for the Advancement of Science (AAAS), a University of Illinois Scholar for 2009, and serves on the editorial boards of several journals. He co-authored three award winning papers, served as a Distinguished Lecturer for the Association of Computing Machinery during 2001-2003, and participated in numerous conferences as a keynote speaker, general chairman, program committee chairman or member, tutorial presenter, session chairman, and panelist. Most recently he was the keynote speaker at the Mobilware 2010 Conference, and the general chair of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009) . His research has been funded by the National Science Foundation (NSF), Air Force Office of Scientific Research (AFOSR), Defense Advanced Research Projects Agency (DARPA), NATO, US Army, NASA, the New York State Science and Technology Foundation, Hughes Research Laboratories, Informix Co., Accenture Co., and Hitachi Co.

Wolfson’s main research interests are in database systems, distributed systems, and mobile/pervasive computing. Before joining the University of Illinois he has been on the computer science faculty at the Technion, Columbia University, and a Member of Technical Staff at Bell Labs.

Host: Yelena Yesha

MS defense: Pilz on Approximation of Nonintegral Frequency Moments, 11/30

Masters Thesis Defense

Approximation of Nonintegral Frequency Moments

Brian Pilz

10:00am 30 November 2011, ITE325b

Let a data stream have length m over an alphabet of n letters, with letter i occurring m_i times for i = 1,…,n. For any k, define the frequency moments F_k as F_k = sum_{i=1}^n m_i^k. Alon, Matias, and Szegedy showed how to estimate F_k for integers k>0 with a one-pass algorithm using O(n^{1-1/k}log n) space for given length m, accuracy, and confidence. Here we extend those results to non-integral k obtaining bounds on the variance giving accuracy and confidence estimates, and giving quantitative results on the algorithm’s space requirements with particular interest to when k is near 1. We also give some performance statistics of the algorithm for these cases and consider an application to entropy estimation. This algorithm is known as a sketching algorithm. Sketching algorithms are probabilistic algorithms generally requiring sublinear space vs. a "classical" O(n) (linear) space requirement, and may have applications for anomaly detection of systems or networks.

Committee:

  • Drs. Samuel Lomonaco
  • Brooke Stephens
  • Kostas Kalpakis (chair)
  • Larry Wagoner

talk: Rutledge on multichannel amplitude compression for speech processing, 11/18

EE Graduate Seminar

Time-Varying Amplitude Compression Processing to
Preserve and Enhance Spectral Contrast in Speech Signals

Dr. Janet C. Rutledge
Dean, UMBC Graduate School
Vice-Provost for Graduate Education
Affiliate Associate Professor of Electrical Engineering

11:30-12:45 Friday, 18 November 2011, ITE 231

Multichannel amplitude compression processing is used to reduce the level variations of speech to fit the reduced dynamic ranges of listeners with sensorineural hearing loss. This processing, however, can result in smearing of temporal information, artifacts due to spectral discontinuities at fixed channel edges, and spectral flattening due to reduced peak-to-valley ratios. Presented here is an implementation of a time-varying compression processing algorithm based on a sinusoidal speech model. The algorithm operates on a time-varying, stimulus-dependent basis to adjust to the speech variations and the listeners hearing profile. The algorithm provides fast-acting compression with minimal artifact, has time-varying frequency channels, is computationally inexpensive and preserves the important spectral peaks in speech.

This method has been extended to provide real-time enhancement of spectral peaks and valleys. This work is also related to processing audio signals that will be transmitted over amplitude-limited noisy channels or for listeners in a noisy environment.

Dr. Janet Rutledge is Dean of the Graduate School and Affiliate Associate Professor in the CSEE Department at UMBC. She received the BS in electrical engineering from Rensselaer Polytechnic Institute and the MS and Ph.D. in electrical engineering from Georgia Tech. Prior to coming to UMBC in 2001, she was a faculty member at Northwestern University, and program director at the National Science Foundation.

Host: Prof. Joel M. Morris

Ph.D. Defense: Justin Martineau on Sentiment Analysis, 1:30pm Fri 11/18

Ph.D. Dissertation Defense

Identifying and Isolating Text Classification Signals
from Domain and Genre Noise for Sentiment Analysis

Justin Martineau

1:30-4:00 Friday, 18 November 2011, ITE 325b, UMBC

Sentiment analysis is the automatic detection and measurement of sentiment in text segments by machines. This thesis provides methods to identify, characterize, and isolate the sentiment bearing terms to improve textual sentiment classification when there is little or no labeled data for the domain.

We introduce a new theoretical framework that explains the different sources of noise that affect term level sentiment bias. This noise comes from the genre the author communicates in and the domain or general topic that the author is writing about. To understand the affects of domain noise we defined sentimental domain independence and statistically described it in the multi-domain product review data set. This allowed us to design a Domain Independence Verification Algorithm (DIVA) to eliminate this noise and produce a domain-independent sentiment model using data drawn from a variety of different domains. This model is the most accurate method to classify documents in the 25 category product review data set.

Committee:

  • Dr. Tim Finin (chair)
  • Dr. Marie desJardins
  • Dr. Akshay Java
  • Dr. James Mayfield
  • Dr. Tim Oates

Talk: Stochastic Graph Grammars, Oates, 11/11/11

EE Graduate Seminar

Stochastic Graph Grammars

Prof. Tim Oates
Associate Professor of Computer Science
Computer Science and Electrical Engineering, UMBC

11:30am Friday November 11, ITE 231, UMBC

Many important domains are naturally described relationally, often using graphs in which nodes correspond to entities and edges to relations. Stochastic graph grammars compactly represent probability distributions over graphs and can be learned from data, such as a set of graphs corresponding to proteins that have the same function.

In this talk we consider the problem of learning the parameters (i.e., the production probabilities) of stochastic graph grammars and the structure of the grammar (i.e., the productions) given a representative sample of graphs taken from the underlying distribution. We also present efficient algorithms for computing properties of the distribution over graphs defined by a graph grammar such as expectations of graph size, node degree, and number of edges.

Dr. Tim Oates is an Associate Professor in the CSEE Department at UMBC. He received B.S. degrees in Computer Science and Electrical Engineering from North Carolina State University in 1989, and M.S. and PhD degrees from the Univ of Massachusetts Amherst in 1997 and 2000, respectively. Prior to coming to UMBC in Fall 2001, Prof. Oates spent a year as a postdoc in the Artificial Intelligence Lab at MIT.

Host: Prof. Joel M. Morris

talk: Cyber Security Situation Awareness and Impact Assessment, 10:30am Tue 11/8

Cyber Security Situation Awareness and Impact Assessment:
Issues, Models and Applications

Dr. Gabriel Jakobson
Altusys Corporation, Princeton NJ

10:30-11:30am 8 November 2011, ITE 325

Cyber attacks committed against IT networks and services have profound impact both on ongoing mission and future missions, whose operations are based on these networks and services. The attacks, by exploiting the vulnerabilities of the software assets can push their impact through Cyber Terrain – a dependency network of structural, spatial, functional and other domain-specific dependencies that exist among software assets and services, and reach the missions. In this presentation we will introduce a novel approach of assessing impact of cyber attacks on missions (business process) and describe the basic models and algorithms of the approach.

Dr. Gabriel Jakobson is the VP and Chief Scientist at Altusys Corp., a consulting firm specializing in the development of intelligent situation management technologies for defence and cyber security applications. During his more than 20 years tenure at Verizon he had increasing responsibilities of leading advanced database, expert systems, artificial intelligence, and telecommunication network management programs. He has authored (and co-authored) more than 100 technical papers and is principal author of 5 US patents in situation management and event correlation. He received PhD degree in Computer Science from the Institute of Cybernetics, Estonia. Dr. Jakobson holds the honorary degree of Doctor Honorius Causa from the Tallinn Technical University, Estonia, and is Distinguished IEEE Lecturer. Dr. Jakobson is the member of the Board of Governors of IEEE Communications Society, Director, IEEE ComSoc North America Region, co-chair of the Tactical Communications and Operations Technical Committee of IEEE ComSoc, chair of the IEEE ComSoc Sub-Committee on Situation Management.

Host: Anupam Joshi

talk: Marti Hearst on Natural Search User Interfaces, 12pm Fri 11/8, ITE 459, UMBC

Human-Centered Computing Speaker Series
UMBC Information Systems Department

'Natural' Search User Interfaces

Professor Marti Hearst
School of Information
University of California, Berkeley

12:00-1:00pm Friday 18 November 2011, ITE 459

What does the future hold for search user interfaces? Following on a recently completed book on this topic, this talk identifies some important trends in the use of information technology and suggest how these may affect search in future. This includes is a notable trend towards more “natural'' user interfaces, a trend towards social rather than solo usage of information technology, and a trend in technology advancing the integration of massive quantities of user behavior and large-scale knowledge bases. These trends are, or will be, interweaving in various ways, which will have some interesting ramifications for search interfaces, and should suggest promising directions for research.

Dr. Marti Hearst is a professor in the School of Information at the University of California, Berkeley. She received BA, MS, and PhD degrees in Computer Science from UC Berkeley and was a Member of the Research Staff at Xerox PARC from 1994 to 1997. A primary focus of Dr. Hearst's research is user interfaces for search.

She just completed the first book on the topic of Search User Interfaces and she has invented or participated in several well-known search interface projects including the Flamenco project that investigated and the promoted the use of faceted metadata for collection navigation. Professor Hearst's other research areas include computational linguistics, information visualization, and analysis of social media.

Prof. Hearst has received an NSF CAREER award, an IBM Faculty Award, a Google Research Award, an Okawa Foundation Fellowship, two Excellence in Teaching Awards, and has been principle investigator for more than $3M in research grants.

See M. Hearst, 'Natural' Search User Interfaces, CACM, v54n11, pp. 60-97, 2011.

Host: Professor Anita Komlodi/p>

CSEE students to present research at upcoming symposiums

CSEE Students Joe Tuzo (CMSC), JJ Seymour (CMSC) and Varish Mulwad (CMSC) will present papers at the AAAI FAll Symposium on November 4-6 in Arlington, Virginia. Yasaman Haghpanah (CMSC) will present her research at the Grace Hopper Celebration for Women in Computing on November 9-12 in Portland, Oregon.

Joe Tuzo (CMSC BS '11) and JJ Seymour (CMSC BS '12) will be presenting papers at the upcoming AAAI Fall Symposium on Complex Adaptive Systems. Joe's paper (co-authored with JJ and Prof. Marie desJardins), " Using a Cellular Automaton Simulation to Determine an Optimal Lane Changing Strategy on a Multi-Lane Highway," and JJ's paper (co-authored with Joe and Prof. desJardins), "Ant Colony Optimization in a Changing Environment," were both based on the students' class projects for Prof. desJardins's "Computation, Complexity, and Emergence" course in Spring 2011.

Computer Science Ph.D. candidate Varish Mulwad will present a paper on his dissertation research at the AAAI Fall Symposium on Open Government Knowledge.  He is developing a system that can automatically infer the meaning of information in a spreadsheet or table and publish it as linked Web data using semantic web languages. Varish's advisor, Dr. Tim Finin, is one of the organizers of the symposium.

Yasaman Haghpanah (CMSC Ph.D. Candidate) will present her research on "A Trust and Reputation Model for Supply Chain Management" in the PhD Forum at the Grace Hopper Celebration for Women in Computing, which will be held November 9-12, 2011, in Portland, Oregon.  Yasaman's advisor, Dr. Marie desJardins, will also attend the conference as a participant in the Grace Hopper Senior Women's Summit.

Dr. Tinoosh Mohsenin to speak at National Electronics Museum

On November 15, Dr. Tinoosh Mohsenin, assistant professor of Computer Science and Electrical Engineering, will speak at the National Electronics Museum. Dr. Mohsenin runs UMBC's Energy Efficient and High Performance Computing Lab , where she works on developing highly accurate, low-power communication and healthcare devices. The talk, which is hosted by the Baltimore Chapter of IEEE Electron Devices & Solid-State Circuits, will discuss efficient algorithms and architectures for communication applications.

What: “ Energy Efficient and High Performance Architectures for Communication Applications"

Who: Dr. Tinoosh Mohsenin

Where: The National Electronics Museum, Pioneer Hall

Time: 5:30- 7:30 p.m.

 

 

 

1 41 42 43 44 45 50