A Semantically Rich Approach to Automating Big Data and Cloud

Dr. Karuna Joshi
University of Maryland, Baltimore County

12:00pm Monday, 20 February 2017, ITE 325b, UMBC

With the explosion of Big Data and the growth of data science, there is an urgent need to automate the data lifecycle of generation, ingestion, analytics, knowledge extraction, and archival and deletion. With a promise of rapid provisioning, scalability and high computing capability, cloud based services are being adopted as the default computing environment for Big Data analytics.

To effectively manage their data on cloud, organizations need to continuously monitor the rules/constraints and performance metrics listed in a variety of legal contracts. However, these documents, like Service Level Agreements (SLA), privacy policy, regulatory documents, etc., are currently managed as plain text files meant principally for human consumption. Additionally, providers often define their own performance metrics for their services. These factors hinder the automation of steps of the data lifecycle, leading to inefficiencies in using the dynamic and elastic elements of the Data+Cloud ecosystem and require manual effort to monitor the service performance. Moreover, Cloud-based service providers are collecting large amounts of data about their consumers including Personally Identifiable Information (PII) like contact addresses, credit card details, bank account details, etc. They are offering customized service level agreements which indicate how such data will be handled. To see whether these agreements meet individual or corporate requirements, or comply with statutory constraints, currently involves significant human effort.

In this talk, we present the semantically rich approach that we have developed to automatically extract knowledge from large textual datasets, specially legal documents, using text analytics and Semantic Web technologies. We describe the OWL ontologies that we have developed, and the techniques to extract key terms and rules from textual legal documents. We will also illustrate application of our work in domains such as education, healthcare and cybersecurity.

Karuna P. Joshi is a Research Assistant Professor of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County. Her research focuses on Data Science and Big Data Analytics, especially legal text analytics; knowledge representation and reasoning; privacy and security of Big Data and Cloud; and cloud enabled Health IT services. She has published over 30 papers, including in journals like IEEE Transactions on Service Computing and conferences like IEEE Big Data and IEEE CLOUD. Her research is supported by organizations like DoD, ONR, NIST, NSF, GE and IBM. She was also awarded the TEDCO MII award for exploring the commercialization of her research. She has been awarded the prestigious IBM PhD Fellowship. She also has over 15 years of industrial experience, primarily as an IT project manager. She worked at the International Monetary Fund for nearly a decade. Her managerial experience includes portfolio/program/project management across various domains. She received the MS and PhD degrees in Computer Science from UMBC and bachelor’s degree in Computer Engineering from the University of Mumbai, India.