PhD proposal: Finding Story Chains in Newswire Articles

Ph.D. Dissertation Proposal

Finding Story Chains in Newswire Articles

Xianshu Zhu

1:30pm Thursday 2 June 2011, ITE 325B

Huge amounts of information are shared on the Internet every day, such as online newspapers, digital libraries, blogs, and social network messages. While there are some excellent search engines, such as Google, to assist in retrieving information by simply providing keywords, large volumes of unstructured search results returned by search engines make it hard to keep a clear picture of the evolution of an event. Moreover, in addition to events themselves, people may be more interested in finding out the hidden relationships among different events or causes and effects of an event. However, traditional search engines provide limited support for dealing with these sophisticated search tasks. In this dissertation, we try to enrich search options of existing search engines and organize search results in a more structured and meaningful way.

More specifically, we propose to develop a News Story Reader, with functionality similar to Google maps, that contains the following characteristics: (1) Search results are organized into groups of causes and impacts of events, thus helping web users navigate through the search results in a more directional and efficient way; (2) Enriched search options will allow users to search for correlations between two stories by selecting two articles as start and end points respectively producing a coherent story chain as output; (3) An interactive user interface will provide the functionality to zoom in and zoom out, and add via points to the search result.

In our preliminary work, we start with a relatively simple problem: given a start and an end article we want to find a chain of articles that coherently connect them together. We developed a random walk based algorithm that can find story chains that are coherent and relevant, and with low redundancy. We applied two intelligent pruning methods to reduce the size of the graph so that the algorithm is efficient. Moreover, our next goal is to find hierarchical story chains that can show evolution of stories at different levels of granularity. Thus, we further extended our current algorithm by using random walks on the word-document co-clustering graph with weights biased on name entities to find hierarchical story chains.

The contributions of this dissertation include (1) a News Story Reader system that can help alleviate the information overload problem; (2) design and development of two story chain finding algorithms; (3) exploration of methods that can find story chains on which news articles are connected via causes and impacts; (4) exploration of methods on story chain visualization.


  • Dr. Tim Oates (chair)
  • Dr. Charles Nicholas
  • Dr. Tim Finin
  • Dr. Sergei Nirenburg

PhD Proposal: Generating Linked Data by inferring the semantics of tables

Ph.D. Preliminary Examination

Generating Linked Data by inferring the semantics of tables

Varish Mulwad

9:30am Wednesday 25 May, 2011, ITE 325b

A vast amount of information is encoded in tables on the web, spreadsheets and databases. Considerable work has been focused on exploiting unstructured free text; however techniques that are effective for documents and free text do not work well with tables. In this research we present techniques to generate high quality linked data from tables by jointly inferring the semantics of column headers, table cell values (e.g., strings and numbers), relations between columns, augmented with background knowledge from open data sources such as the Linked Open Data cloud. We represent a table's meaning by mapping columns to classes from an appropriate ontology, linking cell values to literal constants or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns. The interpreted meaning is represented as linked RDF assertions. An initial evaluation of our preliminary baseline system demonstrate the feasibility of tackling the problem. Based on this work and its evaluation, we are further developing our framework grounded in the theory of graphical models and probabilistic reasoning.

Committee members:

  • Dr. Tim Finin (chair)
  • Dr. Anupam Joshi
  • Dr. Tim Oates
  • Dr. Yun Peng
  • Dr. L V Subramaniam (IBM Research India)
  • Dr. Indrajit Bhattacharya (Indian Institute of Science)

Context-Aware Middleware for Activity Recognition, MS defense, Radhika Dharurkar, 10:30am 5/19

MS Thesis Defense

Context-Aware Middleware for Activity Recognition

Radhika Dharurkar

10:30am Thursday, 19 May 2011, ITE 325B

Smartphones and other mobile devices have a simple notion of context largely restricted to temporal and spatial coordinates. Service providers and enterprise administrators can deploy systems incorporating activity and relations context to enhance the user experience, but this raises considerable collaboration, trust and privacy issues between different service providers. Our work is an initial step toward enabling devices themselves to represent, acquire and use a richer notion of context that includes functional and social aspects such as co-located social organizations, nearby devices and people, typical and inferred activities, and the roles people fill in them.

We describe a system that learns to recognize richer contexts using sensor data from a person's Android phone along with annotations on her calendar and general background knowledge. Geo-social locations include the concepts of 'home' and 'school' and can be extended to others like 'work' or 'a restaurant'.

Our framework combines data from the phone's sensors (GPS, WI-FI, Bluetooth, acceleration, proximity, etc.) with data mined from applications (e.g., calendar) to produce features that can be used in a machine learning system. Training data from several university students and staff was collected using a system that periodically prompted the user for her true geo-social location and activity. The resulting classifier models were used to predict the individual user's context from new sensor data. The data from a set of users was combined to create a generic model.

We report on an evaluation of the individual and generic models in the university setting for predicting context. Finally, we discuss how our extended context notion can be applied to many interesting applications for smart phone users.


  • Dr. Tim Finin (chair)
  • Dr. Anupam Joshi
  • Dr. Yelena Yesha
  • Dr. Laura Zavala

Cybersecurity Webinar, 1pm Thur June 9

Dr. Rick Forno will discuss UMBC’s Cybersecurity programs and give updated details about the upcoming Maryland Cyber Challenge at a UMBC Cybersecurity Webinar at 1:00pm on Thursday June 9.

The webinar will describe the UMBC Cybersecurity programs, covering:

  • Master of Professional Studies and graduate certificate program details
  • Innovative curriculum highlights
  • Convenient and flexible class schedules
  • Opportunities for career development and professional advancement

Dr. Forno will also discuss the Maryland Cyber Challenge and Conference:

  • Participate in a competition to find Maryland’s best minds in Cybersecurity
  • Details will be given about the October 2011 Baltimore Conference
  • How to get involved as a sponsor or partner and promote cybersecurity in Maryland!

The webinar is free but requires registration.

Equation Modeling in Resting State Motor Network in Healthy Subjects, MS defense, Tejaswini Kavallappa

MSEE Thesis Defense

Reliability of Structural Equation Modeling in Examining Resting State Motor Network in Healthy Subjects

Tejaswini Kavallappa

3pm Monday, 16 May 2011, ITE 325

Resting state connectivity studies are of growing significance and interest in the current neuroimaging literature due to their potential in explaining various underlying brain mechanisms and, therefore, their utility in clinical applications. While functional connectivity has been extensively examined in the human brain, effective connectivity is a burgeoning field in functional neuroimaging studies, and there is an increased interest in quantifying effective connectivity that takes into account the directional influences of various brain regions active in a particular functional network. Studies have shown the presence of multiple functional networks in the resting state, which have been shown to be consistent across subjects and between sessions. However, this is not the case with resting state effective connectivity.

In this thesis we evaluate effective connectivity of the resting state motor network in normal subjects using structural equation modeling (SEM), a linear statistical analysis method. It has been shown that signals related to cardiac pulsatality and respiration effects can confound functional MRI results. Thus, we have investigated the effect of various filtering strategies on the reliability of effective connectivity measurements. Our thesis examined the effect of four methods of physiological filtering of resting state data:

  • preprocessed data without any filtering,
  • removal of prospectively recorded cardiac and respiratory fluctuations using RETROICOR,
  • removal of global average signal from all the brain voxels time series,
  • regressing out average signal of the white matter (WM) and cerebrospinal fluid (CSF), and
  • temporal filtering to remove frequencies pertinent to cardiac and respiratory sources.

The resulting effect of each of these methods on the estimation of resting state motor network effective connectivity was examined in this thesis.


  • Dr. Joel M. Morris (chair)
  • Dr. Rao P. Gullapalli (co-advisor)
  • Dr. Tulay Adali
  • Dr. Alan B. McMillan

Group Recognition in Social Networking Systems, MS Defense by Nagapradeep Chinnam

MS Thesis Defense

Group Recognition in Social Networking Systems

Nagapradeep Chinnam

1:30pm Tuesday, 17 May 2011, ITE 325

Recent years have seen an exponential growth in the use of social networking systems, enabling their users to easily share information with their connections. A typical Facebook user, as an example, might have 300-400 connections which include relatives, friends, business associates and casual acquaintances. Sharing information with a such a large and diverse set of people without violating social norms or privacy can be challenging. Allowing users to define groups and restrict information sharing by group reduces the problem but introduces new ones: managing groups and their members, relations and information sharing policies. This thesis addresses the problem of maintaining group membership.

We describe a system that learns to classify a user's new connections into one or more existing groups based on the connection's attributes and relations. We demonstrate the approach using data collected from real Facebook users. The two major tasks are identifying the relevant features for the classification and selecting the learning mechanism that best suits the task. Another significant challenge is posed by hierarchical and overlapping groups. We show that our system classifies new connections into these groups with high accuracy even with only 10-20% of labeled data.


  • Dr. Tim Finin (chair)
  • Dr. Anupam Joshi
  • Dr. Tim Oates

Community Detection in Twitter, MS defense by Mohit Kewalramani, 1pm Mon 5/16

MS Thesis Defense

Community Detection in Twitter

Mohit Kewalramani

1:00pm Monday, 16 May 2011, ITE 346

Twitter has evolved into a source of social, political and real time information in addition to being a means of mass-communication and marketing. Monitoring and analyzing information on Twitter can lead to invaluable insights, which might otherwise be hard to get using conventional media resources. An important task in analyzing highly networked information sources like twitter is to identify communities that are formed. A community on twitter can be defined as a set of users that are more similar to other members than to non-members.

We present a technique to devise a similarity metric between any two users on twitter based on the similarity of their content, links and metadata. The link structure on Twitter can be characterized using the twitter notion of followers, being followed and the @Mentions, @Reply and @RT tags in tweets. Content similarity is characterized by the words in the tweets combined with the hash-tags they are annotated with. Meta-data similarity includes similarity based on other sources of user information such as location, age and gender. We then use this similarity metric to cluster users into communities using spectral and bottom-up agglomerative hierarchical clustering. We evaluate the performance of clustering using different similarity measures on different types of datasets. We also present a heuristic to find communities in twitter that take advantage of the network characteristics of twitter.


  • Dr. Tim Finin (chair)
  • Dr. Anupam Joshi
  • Dr. Tim Oates

MS defense: Lohr on Semantic Light, 2:15 Thu 5/12

MS Thesis Defense

Semantic Light: Building Blocks

Charles Lohr

2:15pm Thursday, 12 May 2011, ITE 346

The concept of Semantic Light is simply that lighting systems can be aware of what they are lighting. This offers a number of potential advantages over conventional lighting in quality and efficiency. Semantic Light requires fine grained control of the output of many lights and requires sensors to take in information about what is being lit. It uses this information to control the output lighting in great detail. By running various algorithms, Semantic Light can provide information to the user and has a number of applications including augmented reality.

Traditional lighting that is currently in wide use has limited control of quality and quantity of the light produced. Few lights for large-scale use are intended to control their output in any kind of detailed manner. Most area lighting only has a switch that must be manually turned on or off. While there are many commercial systems that allow for more fine grained control, they are typically limited to remote control, motion control and extra manual controls. These systems can be wasteful, or they may provide inappropriate amounts of light, or they may be on when no one is using them.

While other Semantic Lighting systems may focus on "green" or powern saving aspects, we concentrate instead on innovative roles Semantic Light could play as well as on the technology to make it possible to fill those roles. By emphasizing new utility and maximizing our speed to prototype, we have made several tradeoffs that will cause our system to be less efficient than it could be, even less efficient than traditional lighting systems. The ideas and concepts covered, however, could be adapted to different underlying technologies to produce a product that could provide considerable power saving over conventional lighting.

It is important to think of the many concepts covered as primary building blocks, rather than a complete commercial system. A number of refinements and extensions will be needed to produce a commercial viable product. We demonstrate all of the needed building blocks in a concise, prototyped system.


  • Mark Olano
  • Yelena Yesha
  • Zary Segall (advisor)

CSEE Research Review awards and pictures

2011 CSEE Research Review

photos · program · posters · location · call for papers

The 2011 research review event was the largest to date, with more than eighty people attending. You can see pictures from the poster session and some of the presentations online.

The CRR-11 program committee selected students for best research based on submitted papers.

CSEE faculty who attended used range voting to honor three students for best poster presentations.

MS defense: Mahale on Group Centric Information Sharing, 10am Tue

MS Thesis Defense

Group Centric Information Sharing
using Hierarchical Models

Amit Mahale

10:00am Tuesday, 10 May 2011, ITE 346, UMBC

Traditional security policies are often based on the concept of “need to know” and are typified by predefined and often rigid specifications of which principals and roles are pre-authorized to access what information. A recommendations of the 9/11 commission was to find ways to move from this traditional perspective toward one that emphasizes the “need to share”. Ravi Sandhu and his colleagues have developed the Group centric secure information sharing model (gSIS) as a new model that is more adaptible to highly dynamic situations requiring information sharing. We present an implementation of gSIS and demonstrate its usefulness to usecases in information sharing in social media. Our contributions include the prototype implementation, extension to the model such as hierarchical groups and necessary and sufficient conditions, and the use of the semantic Web language OWL for representing the central gSIS concepts and associated data. Our framework uses a pragmatic approach of using semantic web technology to represent and reason about the hierarchy and procedural method to compute access decisions relying on the gSIS semantics.

Thesis Committee:

  • Dr. Tim Finin (chair)
  • Dr. Anupam Joshi
  • Dr. Yelena Yesha
  • Dr. Laura Zavala
1 30 31 32 33 34 36