2013 Research Review Posters

13.1

· CRR-13 · program · talks · posters · call for papers · photos ·

BS Students

BS1. Bradley Potteiger and Stephen Dibenedetto, Embedded Systems and Networks Lab, Underwater Signal Reflection Enabled Localization Scheme

Traditional underwater localization relies on line-of-sight (LOS) links to properly utilize ranging information. Unfortunately, the accuracy of the ranging techniques such as time of arrival (TOA), time difference of arrival (TDOA) and angle of arrival (AOA) can be significantly degraded by LOS instabilities in the underwater medium due to increased multipath effects. This project proposes a novel underwater signal reflection-enabled acoustic-based localization scheme (UNREAL) that employs both LOS and surface-reflected non-line-of-sight (NLOS) ranging information to locate a node that has drifted away. The LOS and NLOS links are classified by incorporating a surface-based recovery mechanism, which recovers the channel impulse response information through homomorphic deconvolution. A closed-form least square method is developed to use such classification to locate the node by either using the LOS AOA measurements or the NLOS AOA from the estimated water surface reflection point. Every node in the network can be used as a reference point to locate the lost node when LOS AOAs are available. The AOAs are a collection of elevation and azimuth angles for each reference nodes in the 3D underwater environment. Simulation results are carried out by using a 3D camera to measure the water surface in a controlled tank, the measured water surface was then used in a simulated environment to validate the approach.

MS/MPS Students

MS1. Neha Sardesai (Advisor: J. Morris), Communication and Signal Processing Lab, Develop System Analysis Model for Trace Gas Detection Using a Puled LASER and QEPAS

This poster presents the development of a system analysis model for a QEPAS (Quartz Enhanced Photo Acoustic Spectroscopy) based trace gas detection system with pulsed-laser input and sample-mean test-statistic output. For this QEPAS-based system the input laser signal is pulsed, and its beam is centered between the tines of a quartz tuning fork placed in a gas cell. If the trace gas is present, it absorbs the energy from the beam, resulting in the generation of an acoustic pressure wave P(r,t). This pressure wave strikes the fork tines causing them to vibrate at a resonant frequency. These vibrations result in the flow of current within the tines, due to the piezoelectric effect. On obtaining the current flow signal waveform in the fork and using statistical signal processing of multiple samples of this waveform, the trace gas can be detected and its concentration estimated. This analysis model will allow one to assess the performance of such a system for various parameter values that define a given system. This is analogous to our earlier work on a photodetector-based gas-detection system analysis model [1]. The project involves the generation and solution of differential equations for the pulsed-laser signal intensity, the corresponding acoustic pressure wave, and the resulting tuning-fork tines vibration and generated current. From these equations we are assessing how well the QEPAS current waveform matches a desired ideal current waveform for detection and estimation signal processing: a sinusoidal signal with a periodic on-off magnitude corresponding to the laser pulses. Our work is a different, but related, approach to that of N. Petra, et al., which considered a QEPAS-based system with a continuous-wave wavelength-sinusoidal-modulated laser signal and corresponding detection/estimation scheme [2,3].

MS2. Shayna Weinstein, Gleaning Bat Wing Morphology Using a Bag-of-Patterns Representation

We present research exploring the suitability of representing bat flight data as a bag-of-patterns for the purpose of training machine learning classifiers to predict bat flight speed. Bats are the only mammals capable of sustained flight and they make use of efficient flight strategies different from other flying animals. Today, a great effort is being made to understand the aerodynamic characteristics of the wing as they may provide novel inspiration for aircraft design [1]. For this purpose we use the Symbolic Aggregate approXimation (SAX) algorithm. SAX produces a discrete representation of a time series, generating a bag (collection) of SAX words (local patterns) [2]. We applied this algorithm to recordings of 3D locations of sensors placed on joints along the bat wing in flight in wind tunnels. Our experiments revealed that the sensors along the leading edge of the wing are most significant in determining the speed of the flight.

MS3. Arnav Joshi, ebiquity, Generating a linked data resource for software security concepts and vulnerability descriptions

There exist several software security advisory sources and vulnerability enumeration schemes which report security related software flaws, detect the latest threat trends, and include summaries, remediation information and list of affected products. The National Vulnerability Database is such a repository of standards-based vulnerability management data. However, the data representation and interpretation pose certain limitations on the automation of vulnerability management, and obtaining further contextual information from other related resources. This poster demonstrates the generation and publishing of a quality linked data resource for software security concepts and vulnerability descriptions. The linked data resource will make it possible for applications to look up metadata, facilitate searching, information classification, and references; as well as incorporating the standards for vulnerability identification, mitigation and prevention efforts. A linked data representation will be an important and much needed resource for domain experts.

MS4. Deepal Dhariwal (Advisors: Anupam Joshi, Michael Grasso), ebiquity, Text and Ontology Driven Clinical Decision Support System

Vast amounts of information are present in unstructured format in physiciansÕ notes. Text processing techniques can be used to extract clinically relevant entities from such data. The extracted entities can then be mapped to concepts from medical ontologies to generate a structured Knowledge Base (KB) of patient facts. Clinical Rules written over this KB could then be used to develop systems that can help with a variety of clinical tasks such as decision support alerts in diagnostic process. We propose a generic text and ontology driven information extraction framework. In the first phase, pre processing techniques such as section tagging, dependency parsing, gazetteer lists are used filter clinical terms from the raw data. The clinical records are parsed using Clinical Text Analysis and Knowledge Extraction System, to extract prior medical history, medications, observations, laboratory results etc. For every concept we consider its polarity, section in which the concept occurs, the associated numerical value, synonyms etc. In the second phase, a domain specific medical ontology is used to establish relation between the extracted clinical terms. The output of this phase is a KB that stores medical facts about the patient. In the final phase, an OWL reasoner and clinical rules are used to infer additional facts about patient and generate a richer KB which can then be queried for a variety of clinical tasks. To demonstrate a proof of concept, we use discharge summaries from the cardiovascular domain to determine the TIMI Risk Score and San Francisco Syncope score for a patient.

MS5. Sandhya Krishnan (Advisor: Anupam Joshi), ebiquity, Social Media Analytics: Digital Footprints

Social media has greatly impacted the way we communicate today. With approximately 3000 tweets/sec and 55 million FB status updates a day, it is a great way to disseminate information to users across the world. However such a tool can also be used to disseminate misinformation in a quick and efficient manner which can have a harmful impact in multiple scenarios like national security cases, or business/marketing cases and hence needs to be curbed and kept in check. Our approach involves creating a social footprint of users, which can be used to distinguish real and imposter/ compromised accounts on social media. In this paper, we build the signature or the profile of users claiming to be the same entity on the social media Ð beginning with ÒfamousÓ personalities (Here we assume that spreading malicious content through such Òfamous accountsÓ would have more impact and thus a higher threat). This signature is built based on content and network analysis of such users on social media. We analyze the real time content of users (tweets/Facebook posts etc.) and compare the same with information about the user from reliable sources on the web (news papers / news channels etc.), in order to compute a similarity metric between content from the two sources. We also compute a metric based on the social network analysis of the users. Once the validity of such an user account is established based on the two above metrics, we filter down the social media network and apply the same technique to authenticate less famous peopleÕs (laymanÕs) user accounts.

MS6. Primal Pappachan and Prajit Das, ebiquity, Place Ontology based Context Generation Engine

We propose a context generation engine which defines the notion of an activity based context from the userÕs location. The context generation depends on user input and previous activity. The idea of the project is to provide the most accurate possible context data using the least costly sensor available on a smartphone. The selection of a sensor depends on the current activity of the user. The Platys ontology describes the place based on location and activity of the user [1]. This project aims to improve upon the Place ontology to incorporate additional features of context information as well as activities and places based on them. Using the indoor location tagging engine, tagin! [2], which generates unique ids for a place based on the wifi fingerprints, labels can be associated with the places/locations and be added the Place ontology. This knowledge base can be further used for inferring contextual level information for the user either locally or globally through collaboration. Additionally the information from other sensors such as bluetooth, GPS, accelerometer, gyroscope will be explored to see whether these additional sources of information would be useful in inferring the context of the user.

MS7. Puneet Sharma (Advisor: Anupam Joshi), ebiquity, A multilayer framework to catch data exfilteration

The globalization of the hardware manufacturing supply chain has raised a new cybersecurity issue Ð that of trojaned hardware. Given that almost all existing security work assumes that the hardware is trusted, this is a significant research challenge. Hardware trojan detection research generally has concentrated more on testing the circuit in order for it to match expectations. This approach can be subverted by a malicious circuit which does everything as is expected, thus passing all tests but adding more malicious logic of its own. We propose a multi layered approach with monitoring pieces covering the entire system stack from hardware up till software. We intend to take a complimentary approach to detect hardware trojans by studying the deviceÕs behavior and seeing if we can detect it behaving in an anomalous manner. At the hardware level, this includes power consumption patterns and the changes in the instruction set execution patterns. On the software side, we would be monitoring low level features at the network, system, process and kernel level. Significant events that may be early indicators of an impending intrusion, such as insertion of a new hardware devices or the loading of new loadable kernel modules will be logged. If our monitors give us data which tells that within a short period of time, there was a new network card inserted in the system, which led to the loading of a kernel module, and also produced an anomalous power consumption pattern, all of this information hints towards an intrusion taking place.

MS8. Asmita Korde and Damon Bradley (Advisor: Tinoosh Mohsenin), Energy Efficient High Performance, Computing Lab, Detection performance of radar compressive sensing in noisy environments

In this paper, radar detection via compressive sensing is explored. Compressive sensing is a new theory of sampling which allows the reconstruction of a sparse signal by sampling at a much lower rate than the nyquist rate. By using this technique in radar, the use of matched filter can be eliminated and high rate sampling can be replaced with low rate sampling. In this paper, compressive sensing is analyzed by applying varying factors such as noise and different measurement matrices. Different reconstruction algorithms are compared by generating ROC curves to determine their detection performance. We conduct simulations for a 64-length signal with 3 targets to determine the effectiveness of each algorithm in varying SNR. We also propose a simplified version of Orthogonal Matching Pursuit (OMP). Through numerous simulations, we find that a simplified version of Orthogonal Matching Pursuit (OMP), can give better results than the original OMP in noisy environments when sparsity is highly over estimated, but does not work as well for low noise environments.

MS9. Adam Page (Advisor: Tinoosh Mohsenin), Energy Efficient High Performance Computing Lab, An Efficient & Reconfigurable FPGA and ASIC Implementation of a Spectral Doppler Ultrasound Imaging System

Pulsed Wave (PW) Doppler ultrasound is an important technique commonly used for making non-invasive velocity measurements of blood flow in the human body. The technique makes use of what is known as the Doppler effect, a phenomenon in which there is a change in frequency of a wave for an observer moving relative to its source. Using the Doppler effect relationship between velocity and frequency, it is possible to determine the velocity of an object by measuring the change of the objectÕs frequency relative to the medium in which the waves are transmitted. In order for PW Doppler ultrasound systems to measure blood velocity, they must be able to analyze the change in the observed frequency relative to the emitted frequency while filtering out noise. Therefore, these systems rely heavily on the use of digital signal processing (DSP) techniques. Most common PW Doppler ultrasound imaging systems use fixed DSP hardware to accomplish this. As a consequence, these systems have limited target frequency ranges. In this paper, we propose an FPGA-based PW spectral Doppler ultrasound imaging system that is both highly efficient and versatile. The design is implemented in a Virtex-5 FPGA using Xilinx ISE design suite. There are currently only a few studies available for the implementation of an efficient, reconfigurable FPGA-based PW Doppler ultrasound system. These studies mainly discuss a few variations of the system but fail to discuss reconfigurability, accuracy, and performance details of the system. The proposed design addresses all of these issues. Each of the main components constituting the proposed design is discussed in detail including the reconfigurability aspect. The accuracy of the system was determined by constructing a similar design in MATLAB using 64-bit floating-point precision. For performance comparisons, the design was also implemented in 65 nm CMOS ASIC design. The Virtex-5 design requires 1,159 of 17,280 slice resources and consumes 1.128 watts of power when running at its maximum clock speed of 333 megahertz. The ASIC design has an area of .573 mm2 and consumes 41 mW of power at a maximum clock speed of 1 GHz.

PhD Students

PHD1. Yue Hu (Advisor: Curtis Menyuk), Computational Photonics Lab, Computational modeling of nonlinearities in a PIN photodetector

One-dimensional and two-dimensional drift-diffusion models were created to investigate the source of nonlinearities in a high-current p-i-n photodetector. Incomplete ionization, an external circuit, impact ionization, and the Franz-Keldysh effect are all included in the model. We obtained good agreement with the experimental data. We found that impact ionization is the dominant source of nonlinearity at high voltage (above 10 V).

PHD2. Lisa Mathews, ebiquity, A Collaborative Approach to Situational Awareness for CyberSecurity

Traditional intrusion detection and prevention systems (IDPSs) have well known limitations that decrease their utility against many kinds of attacks. Current state-of-the-art IDPSs are point based solutions that perform a simple analysis of host or network data and then flag an alert. Only known attacks whose signatures have been identified and stored in some form can be discovered by most of these systems. They cannot detect Òzero dayÓ type attacks or attacks that use Òlow-and-slowÓ vectors. Many times an attack is only revealed by post facto forensics after some damage has already been done. To address these issues, we are developing a semantic approach to intrusion detection that uses traditional as well nontraditional sensors collaboratively. Traditional sensors include hardware or software such as network scanners, host scanners, and IDPSs like Snort and Norton AntiVirus. Potential nontraditional sensors include sources such as online forums, blogs, and vulnerability databases which contain textual descriptions of proposed attacks or discovered exploits. After analyzing the data streams from these sensors, the information extracted is added as facts to a knowledge base using a W3C standards based ontology that our group has developed. We have also developed rules/policies that can identify the situation or context in which an attack can occur. By having different sources collaborate to discover potential security threats and create additional signatures, the resulting situational- aware IDPS is better equipped to stop creative attacks such as those that follow a low-and-slow intrusion pattern. A preliminary version of this system has been developed.

PHD3. Prajit Das and Dibyajyoti Ghosh (Advisors: Anupam Joshi, Tim Finin), ebiquity, Energy efficient semantic context model for managing privacy on smartphones

Modern smartphonesÕ ability to gather massive amounts of data about a user and her context allows the provision of variety of services adapted to the user but user data and context leakage can have disastrous results. Particularly true in case of enterprises that are adopting a Bring-Your-Own-Device model. We have shown in our work, application and user context-dependent information sharing policies that dynamically control data flow among applications at a fine-grained level. Using semantically rich policies, and reasoning over them and the user and application context we either release or obfuscate the sensor/context data being shared with the application providing fine-grained, context-dependent control to sensitive user data. Unfortunately, the process of gathering context has significant impact on energy consumption, since we need to refresh sensor data frequently. Current work, on energy consumption, focuses on battery utilization of specific applications, but has not dealt with creating an energy efficient context inference system that can be used for security. Our approach addresses this problem using three methods. First, only enable the sensors required to satisfy the antecedents of relevant policy rules and take one sensor reading and multiple rules. Second, use sensor with lowest energy footprint. Third, reorder the conditions in the ruleÕs antecedents in order of their energy usage. In our research, we observed that GPS consumed battery faster than Wi-Fi while providing more precise location updates. We take advantage of this trade-off to optimize our energy efficiency algorithm. We are currently adding the energy optimization modules to the Android framework.

PHD4. Shaokang Wang and Brian Marks (Advisor: Curtis Menyuk), Computational Photonics Lab, A Dynamical Approach To Determine the Stability Region of Modelocked Laser Systems

The Haus modelocking equations (HME) is the most widely used model for passively modelocked lasers, but it predicts a substantially smaller range of stable operating parameters than experiments show. To obtain better agreement with experiments, models of saturable nonlinearity have been proposed that include higher-order nonlinearity. However, these models have only been solved to date for a restricted range of parameters and the stable operating range as parameters vary has not been determined. Here, we develop computational algorithms that can rapidly determine the stability boundaries over a two-dimensional parameter space. We demonstrate the applicability of this approach to several different models of fast saturable absorption, indicating that it should be useful for practical laser design. We show that while the behavior is qualitatively similar for the different models, there are significant quantitative differences.

PHD5. Clare Grasso (Advisors: Anupam Joshi, Jeffrey Fink), ebiquity, Identifying Safety Risks Due to Medical Treatment in Patients with Chronic Kidney Disease using KDD

Patient safety is an important measure of the quality of medical care. Adverse safety events are unintended injuries or complications resulting in death, disability or prolonged hospital stay that arise from health care management, rather than by the patient's underlying disease process. People with chronic kidney disease (CKD) are at especially high risk for adverse outcomes due to these safety events. Medical treatment that may be routine for others may permanently hasten the decline of kidney function in these people and may result in hospitalization or even death. The Safe Kidney Care Cohort Study is seeking to find out what adverse medical events in patients with kidney disease might be preventable as well as other personal history that might increase a patient's chances of getting a medical injury. This study is collecting heterogeneous data for 350 patients, including IDC-9 hospital codes, medications, laboratory results, patient diaries, demographic information, and clinical office visits over a period of 4 years. It will use the data to research methods to classify and find associations between safety events and adverse outcomes in people with chronic kidney disease. Currently, this data is being analyzed using standard statistical methods which rely on a priori knowledge. This study will extend this work by applying supervised and unsupervised learning methods such as Na•ve Bayes, SVM, k-means clustering, hierarchical clustering, and the expectation-maximization algorithms.

PHD6. Guohao Zhang (Advisor: Jian Chen), VANGOGH Lab, Render Vis Real: Effects of Visual Realism on Three-Dimensional Streamtube Visualization of Diffusion MRI Datasets

We present a head-to-head comparison of five visualization techniques in three-dimensional (3D) streamtube visualizations generated from diffusion magnetic resonance imaging (DMRI). Our goal is to find the effectiveness of rendering visual realism on neurologists discovery tasks from DMRI brain tractography that consists of over tens of thousands fiber tracts within the volume of a human head. Interpreting this amount of information is visually challenging. We hypothesize that (1) realism, though preferred by neurologists, may not improve overall accuracy of their judgment. On the contrary, interactive and clearly depicted shading or artistic hatching is sufficient to leverage the spatial structure of the complex brain tractography and (2) distance-based shadow, such as ambient occlusion, could support brain lesion detection. Sixteen participants from neurology department will use 3D streamtube visualizations to accomplish five representative tasks related to the structure of a human brain. The effectiveness of rendering algorithms will be analyzed based on the task completion time, accuracy, and subjective comments. Our work will be among the first to balance the realism and interaction to provide practical design guidelines for dense tube renderings.

PHD7. Jennifer Sleeman (Advisor: Tim Finin), ebiquity, Online Coreference Resolution for Semi-Structured Heterogeneous Data

Semantic Web data is represented as Resource Description Framework (RDF) graphs. A pair of RDF instances are said to corefer when they are intended to denote the same thing in the world. This problem is central to integrating and inter-linking semi-structured datasets. Existing research tends to be based on the premise that data is processed as a batch and that the ontologies that define the data are known. In real world systems often data is not processed as a batch and the ontologies may not be known in advance. We are developing an online, unsupervised coreference resolution framework for heterogeneous semi-structured data. The online aspect requires us to process new instances as they appear. The instances are heterogeneous in that they may contain terms from different ontologies whose alignments are not known in advance. Our framework includes a two-phased clustering approach that is distributable, an attribute model to support robust schema mappings, and an instance consolidation algorithm to improve accuracy rates over time. In this paper we describe our preliminary research that supports the development of our attribute model and our approach to performing entity type recognition given this context.

PHD8. Amey Kulkarni, Smriti Prathapan (Advisor: Tinoosh Mohsenin), Energy Efficient High Performance Computing Lab, Orthogonal Matching Pursuit Algorithm for Compressive Sensing on Many-core Platform

The poster presents the implementation of Orthogonal Matching Pursuit (OMP) reconstruction algorithm for Compressive Sensing on a many-core platform. Compressive Sensing is a novel scheme in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. However, the signal reconstruction is computationally intensive, power consuming and typically slow when implemented in software. The reconfigurable many-core platform performs fixed point DSP applications and supports up to 64 cores routed in a hierarchical network. By reducing communication between the processor and main memory, a reduction in the execution time and power can be achieved. Therefore, each core has an instruction and data memory of only 128 words. Despite the restriction in memory, performing fixed point DSP is possible through parallelization of the cores. It is more advantageous, when each core operates at different clock rates through Globally Asynchronous Locally Synchronous (GALS) architecture thereby eliminating global clock gating. Using the GALS paradigm, cores that are not configured for an application will have their local clock disabled, which will turn 'off' any unused cores. In this poster, OMP is mapped onto the processor while considering parallel computing model. The results show that, when OMP for the measurement matrix of size 5×10, is mapped onto 44 cores running in parallel, operating at 1.18 GHz, the algorithm takes 1936 cycles. Hence, it takes approximately 2µs to execute one iteration of the reconstruction algorithm and consumes 32.23mW. Our estimate shows that for the measurement matrix of size 85×256, OMP algorithm takes 23571 cycles, or 20.14µs to finish.

PHD9. Vlad Korolev, ebiquity, On Use of Machine Learning Techniques and Genotypes for Prediction of Chronic Diseases

Recently NIH has done a number of Gene Wide Association Studies (GWAS) that resulted in massive datasets that contain subjects generic makeup in the form of single nucleotide polymorphisms (SNP) and labeled with clinical data. This has led to the emergence of personalized medicine. Our work addresses the problem of at- tempting to predict an individuals predisposition towards certain chronic diseases based on their genetic makeup. This has evident benefits in tailoring treatment and testing. Attempts to employ traditional statistical modeling to build computer based predictors did not show any improvement over using standard clinical data. This is attributed to a curse of dimensionality which is a predominance of the count features over the count samples in such datasets. Attempts to combat this problem were focused on manually selecting the features based on a literature search, these showed only marginal improvement of predictive power. Our work attempts to solve this problem by taking an entire available genetic profile into consideration and to see if analytics can be used to refine the hypothesis generation. To achieve this goal, we propose to employ state of the art machine learning techniques on a modern big data platform. Our processing pipeline consists of the following steps. First, we perform an initial filtering to delete obvious not relevant features such as markers common for the entire population or markers that occur only in a tiny fraction of the samples. Next filtered data is passed through our forward feature selection algorithm that builds a number of classifiers. The classifiers are ranked based on their predictive power and then the top N classifiers are selected for further use. These selected classifiers are combined together to build a super classifier using boosted ensemble of classifiers. The final ensemble is used as a predictor for the further patients. Another goal of this work is to ensure the repeatability of the experiments and flexibility to run with any similar dataset from other studies. To fulfill this goal we have developed an experiment management system based on Jenkins and Hamake. This system keeps track of changes to the incoming datasets, as well as code and parameters used in processing. Using such system, it is possible to repeat any experiment that was run in the past. It also makes it easier to collaborate with third parties by giving them access to our system.

PHD10. Keqin Wu, VANGOGH Lab, PathBubbles: An Interactive Tool for Multi-Scale Biological Pathway visualization

We present PathBubbles, a new pathway analysis approach, designed to accept data from high throughput gene and protein expression experiments. It employs a metaphorical interface of bubbles that support multiple views of heterogeneous data and fluent workflow in large-variable-space exploration. A fully editable bubble presents a pathway, a fragment of pathway, or its related information. It can split into multiple bubbles, join and synchronize with other bubbles, and be removed or recovered, providing an intuitive way for biologists to explore their data. We will validate the effectiveness of our tool in user studies with domain experts who will use it to compare pathways, trace information flows between pathways and cellular compartments, query genes of interests, and knock down downstream genes.

PHD11. Lushan Han and Abhay Lokesh Kashyap (Advisors: Tim Finin, Anupam Joshi), ebiquity, Semantic Textual Similarity

We describe a semantic text similarity system developed for STS shared task of the *SEM2013 conference and the results of three evaluation runs of different versions of our system. The task is to ask automatic systems to compute sentence similarity according to a scale definition ranging from 5 to 0 with 5 indicating identical sentences and 0 indicating unrelated sentences. Our system used a word similarity model which combined LSA word similarity and WordNet knowledge as a building block to estimate the sentence similarity scores. We show a simple baseline system and a machine learning approach using SVM regression which pair words with highest semantic similarity and predict similarity scores for the given sentence pair. The systems were evaluated on 4 datasets from different domains comprising a total of about 2300 sentences. Our team ranked first out of the 36 participating teams and the three system runs were ranked first, second and fourth out of the 88 submitted runs.

Department of Computer Science and Electrical Engineering

Inspiring Innovation