From dralansherman@starpower.net Thu May 1 01:24:08 2008 Date: Thu, 1 May 2008 00:34:04 -0400 From: Dr. Alan T. Sherman To: CSEE ALL Subject: [Csee-faculty-lecturer] CSEE Research Review - Poster Abstracts CSEE Research Review - Poster Abstracts Friday, May 2, 2008 Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County (UMBC) BS Students 1. Stephen Sullivan, Ebiquity Polvox: Identifying Political Affiliations within the Blogosphere The Polvox project aims to develop tools and compare techniques for predicting political affiliations as well as finding memes and communities in the blogosphere through the application of semantic analysis. The focus of our research, a part of the larger Polvox effort, has been to use machine learning to identify the political leanings of blogs. For the purposes of our research effort we are only looking at democratic versus republican bias. In future work, we plan to extend our research to explore other characteristics such as political issues, candidates, geographical locations, and races. Our system has several components: a greasemonkey script for humans to use to tag if a site is democratic or republican, a component to parse and index blogs, and a classifier. We are investigating the use of several kinds of classifier techniques and combinations such as bag-of-words, n-grams, and Hyperlinks. To train the classifiers we are using data collected by humans that has been marked as being either democratic leaning or republican leaning. We will report on the results obtained and discuss which features seem to be the most discriminative in terms of identifying political blogs. MS Students 2. Jonathan Bronson (Advisor: Rheingans), VANGOGH Lab Statistically Weighted Visualization Hierarchies We are beginning to see an overload in the amount of information packed into a given visualization. In many cases, it is no longer possible to look at a single level of detail and obtain from it the answers we are looking for. This problem is especially relevant to datasets of high dimensionality. Not only does it become difficult to hone in on a particular dimension of possible interest, but even more difficult to find and understand the relationships between them. In computer graphics, varying orders of magnitude has traditionally been addressed by image hierarchies known as MipMaps. These hierarchies are extremely fast and provide a seamless transition from one level of detail to the next. Unfortunately, this approach does not carry over to textures full of scientific data. This approach introduces a series of errors which not only misrepresent and corrupt the underlying data as visible to the viewer, but hide interesting features which warrant further investigation. We propose an alternative hierarchical approach, using statistical analysis to generate more representative macroscopic views of extremely high detailed data fields. 3. Richard T. Carback III (Advisor: Alan Sherman), Cyber Defense Lab Scantegrity: Post-Election Voter Verifiable Optical-Scan Voting Scantegrity is a security enhancement for optical scan voting systems. It is part of an emerging class of post-election "end-to-end" (E2E) independent election verification systems that permit each voter to verify that her ballot was correctly recorded and counted. On the Scantegrity ballot, each candidate position is paired with a confirmation code that is shown to the voter after she marks her ballot. Election officials confirm receipt of the ballot by posting the confirmation code that is adjacent to the marked position. Scantegrity is the first voting system to offer strong post-election independent verification without changing the way voters mark optical scan ballots, and it complies with legislative proposals requiring "unencrypted'" paper audit records. 4. Sheetal Gupta, Ebiquity Query Distribution Estimation and Predictive Caching in Mobile Ad Hoc Networks The problem of data management has been studied widely in the field of mobile ad-hoc networks and pervasive computing.  The issue addressed is that finding the data required by a device depends on chance encounter with the source of data. Most existing research has focused on acquiring the required data by specifying the user or application intentions. These approaches take the semantics of data into account while caching data onto mobile devices from the wired sources. We propose a scheme by which mobile devices proactively increase the availability of data by pushing and caching the most popular data in the network. It involves a local distributed technique for estimating global query distribution in the network. The mobile devices have a finite sized cache to store the pushed data and use their estimation of queries for prioritizing the data to cache. We implement this technique in the network simulator, Glomosim and show that our scheme improves data availability as well as the response latency. PhD Students 5. Jesus J. Caban (Advisor: Rheingans), VANGOGH Lab Generating and Visualizing Statistical Volume Models Large digital repositories of volumetric data continue to raise questions about how differences, relationships, variability, and data uncertainty are best discovered and visualized.  Understanding the structural and statistical properties of a collection of 3D volumes is a difficult task due to the large amount of data involved.  Comparing specific members of the group or visualizing each member independently does not provide the means required to effectively learn the statistical properties of a given population.      We introduce statistical volumes and present a framework to generate and visualize statistical volumetric models.  Our technique loads a population of volumetric data, aligns them into a common coordinate system, and uses a volumetric decomposition to generate a hierarchical representation of the statistical volume.  The hierarchical model effectively captures the statistical properties of the input data by creating a set of probability density functions for each voxel or region of interest.  Visualization techniques are then used to show statistical properties, to illustrate structural attributes, to generate new instances of the group, and to effectively display characteristic regions of the collection under consideration. 6. Lushan Han (Advisor: Tim Finin), Ebiquity Lab Predicting Appropriate Semantic Web Terms from Words The Semantic Web language RDF was designed to unambiguously define and use ontologies to encode data and knowledge on the Web. Many people find it difficult, however, to write complex RDF statements and queries because doing so requires familiarity with the appropriate ontologies and the terms they define. We describe a system that suggests appropriate RDF terms given semantically related English words and general domain and context information. We use the Swoogle Semantic Web search engine to provide RDF term and namespace statistics, the WorldNet lexical ontology to find semantically related words, and a naïve Bayes classifier to suggest terms. A customized graph data structure of related namespaces is constructed from Swoogle's database to speed up the classifier model learning and prediction time. 7. Akshah Java (Advisor: Tim Finin) Approximating the Community Structure of the Long Tail Communities are central to online social media systems and detecting their structure and membership is critical for many applications. The large size of the underlying graphs makes community detection algorithms very expensive. We describe an approach to reducing the cost by estimating the community structure from only a small fraction of the graph. Our approach is based on an important assumption that large, scale-free networks are often very sparse. Such networks consist of a small, but high degree set of core nodes and a very large number of sparsely connected peripheral nodes (Borgatti & Everett 2000). The insight behind our technique is that the community structure of the overall graph is very well represented in the core. The community membership of the long tail can be approximated by first using the subgraph of the small core region and then analyzing the connections from the long tail to the core. A set of vertices can constitute a community if they 8. Palanivel Kodeswaran, Ebiquity Utilizing Semantic Policies for Managing BGP Route Dissemination Policies in BGP are implemented as routing configurations that determine how route information is shared among neighbors to control traffic flows across networks. This process is generally template driven, device centric, limited in its expressibility, time consuming and error prone which can lead to configurations where policies are violated or there are unintended consequences that are difficult to detect and resolve. In this work, we propose an alternate mechanism for policy based networking that relies on using additional semantic information associated with routes expressed in an OWL ontology. Policies are expressed using SWRL to provide fine-grained control where by the routers can reason over their routes and determine how they need to be exchanged. In this paper, we focus on security related BGP policies and show how our framework can be used in implementing them. Additional contextual information such as affiliations and route restrictions are incorporated into our policy specifications which can then be reasoned over to infer the correct configurations that need to be applied, resulting in a process which is easy to deploy, manage and verify for consistency. 9. John Krautheim (Advisors: Dhananjay Phatak, Alan T. Sherman), Cyber Defense Lab Identifying Trusted Virtual Machines Software operating on physical computer derives its identity from the underlying hardware components. When the same software is operating within a virtualized environment, the identity looses its binding to the hardware due to the interaction of the virtual machine monitor. We show that once software has been virtualized, it looses it unique identity, which exposes it to a reincarnation attack which allows licensed software and digital rights managed content protections to be subverted. We propose to develop a mechanism to uniquely identify instances of software running within virtualized environments.      A typical mechanism to identify a platform configuration is to utilize a Trusted Platform Module (TPM) to provide a unique identity. A virtual machine (VM) operating on that same platform no longer has the ability to uniquely identify itself as the virtual machine monitor, or hypervisor, adds a layer of uncertainty to the trust layer of the computing platform. The hypervisor operates below the virtual machine at the highest privilege in the system; therefore, it has the ability to subvert the normal protection mechanisms of typical operating systems and application software running within the virtual machine.      This project proposes to develop an architecture and protocol for enabling the identity normally bound to the hardware to be extended to the virtual machine. Recent advances in VM technology including Intel's Virtualization Technology (VT) and AMD's Pacifica have enabled virtualization functions into hardware. Additionally, Intel Trusted Execution Technology (TXT) provides mechanisms for a verifiably reporting platform identity and configuration. By leveraging Intel VT and TXT, the identity of the hardware can be lifted to the VM presentation layer through a virtualized trusted platform module. The identity can then be used to determine the trust level of the virtual machine through remote attestation of the platform configuration to a policy decision point or third party authenticator. 10. Wenjia Li, Ebiquity Gossip-Based Outlier Detection for Mobile Ad Hoc Networks It is well understood that Mobile Ad Hoc Networks (MANETs) are extremely susceptible to a variety of attacks. Many security schemes have been proposed that depend on identifying nodes that are exhibiting malicious behavior such as packet dropping, packet modification, and packet misrouting.  We argue that in general, this problem can be viewed as an instance of detecting nodes whose behavior is an outlier when compared to others. In this paper, we propose a gossip-based outlier detection algorithm for MANETs. The algorithm leads to a common outlier view amongst distributed nodes with a limited communication overhead. Simulation results demonstrate that the proposed algorithm is efficient and accurate. 11. Justin Martineau (Advisor: Tim Finin), Ebiquity Blog Link Classification Blog links raise three key questions: Why did the author make the link, what exactly is he pointing at, and what does he feel about it? In response to these questions we introduce a link model with three fundamental descriptive dimensions where each dimension is designed to answer one question. We believe the answers to these questions can be utilized to improve search engine results for blogs. While proving this is outside the scope of this paper, we do prove that knowing the rhetorical role of a link helps determine what the author was pointing at and how he feels about it. 12. Don Miner (Advisor: Marie desJardins), MAPLE Learning Abstract Rules for Swarm Systems Rule abstraction is an intuitive new tool that we propose for implementing swarm systems. The methods presented in this poster encourage a new paradigm for designing swarm applications: engineers can interact with a swarm at the abstract (swarm) level instead of the individual (agent) level. This is made possible by modeling and learning how particular swarm-level properties arise from low-level agent behaviors. We have developed a procedure for building abstract rules and discuss how they can be used. We also provide experimental results showing that abstract rules can be learned by observation.      The contribution of this work is the method of using rule abstraction and rule hierarchies to intuitively control groups of agents at the swarm level. We discuss how the connections between abstract rules and low-level rules can be defined and learned. Also, since rule abstraction is a feature of our Swarm Application Framework, we give background on this development platform. Finally, we describe a sample application that demonstrate the use of abstract rules. 13. Michael Oehler (Advisor: Phatak S. Dhananjay), Cyber Defense Lab Secret Key Authentication Using a Context Free Representation for Secure VoIP Communication This research defines a context free representation, one that is independent of a spoken language, to authenticate a Diffie-Hellman negotiated secret. This context free approach authenticates the negotiated key by presenting an image in the VoIP user-agent and the callers simply describe what they see. If they agree, the key is authenticated and the secure media session continues. The strength of the approach lies in the vocal recognition of the callers, and their ability to confer the image displayed by their system. The necessary degree of visual recognition is achieved by using basic shapes, color and count. People, regardless of language, age, and culture will have little difficulty identifying these images and can communicate them with little effort. We believe that this approach reverses the current trend in security to divest users from the underlying cryptographic principles supporting secure systems by abstracting these principles to a comprehensible form. This research demonstrates that the human factor can play a pivotal role in establishing a secure link and that a single system can be employed by people speaking many different languages. In this sense, the approach ameliorates VoIP security, and does so without a significant infrastructure for authentication. Our approach descends from the English specific approach found in ZRTP, and could be incorporated into ZRTP. Integration into other VoIP key agreement systems is also possible. We have named this approach the Short Authentication SymbolS VisuallY (SASSY.) 14. Randy Schauer, Ebiquity A Probabilistic Approach to Distributed System Management The management of large-scale distributed systems is a critical consideration when focusing on system reliability. As the number of commodity components within clusters continues to grow, it becomes increasingly difficult to track the multitude of parameters required regularly to ensure optimal performance from the system. In this paper, we discuss a distributed multi-agent system that utilizes statistical inference to provide the most effective means of managing these parameters. This solution uses Markov Logic Networks as the inference technique to validate configurations and operating environments. We showcase two examples, permission validation and temperature monitoring, as preliminary examples of how this approach is resolving differences between various compute nodes. 15. Zareen Saba Syed (Advisor: Tim Finin), Ebiquity Wikipedia as an Ontology for Describing Documents Identifying topics and concepts associated with a set of documents is a task common to many applications. It can help in the annotation and categorization of documents and be used to model a person's current interests for improving search results, business intelligence or selecting appropriate advertisements. One approach is to associate a document with a set of topics selected from a fixed ontology or vocabulary of terms. We have investigated using Wikipedia's articles and associated pages as a topic ontology for this purpose. The benefits of this approach are that the ontology terms are developed through a social process, maintained and kept current by the Wikipedia community, represent a consensus view, and have meaning that can be understood simply by reading the associated Wikipedia page. We use Wikipedia articles and the category and article link graphs to predict concepts common to a set of documents. We describe several algorithms that we implemented and evaluated to aggregate and refine results, including the use of spreading activation to select the most appropriate terms. While the Wikipedia category graph can be used to predict generalized concepts, the article links graph helps by predicting more specific concepts and concepts not in the category hierarchy. Our experiments show that it is possible to suggest new category concepts identified as a union of pages from the page link graph. Such predicted concepts can be used to define new categories or sub-categories within Wikipedia. [ Part 2: "Attached Text" ] _______________________________________________ Csee-faculty-lecturer mailing list Csee-faculty-lecturer@cs.umbc.edu http://www.cs.umbc.edu/mailman/listinfo/csee-faculty-lecturer