Information Systems Eminent Scholar Talk
Big Microbiome Data
10:00am Tuesday, 2 May 2017, ITE 459, UMBC
We know little about the microbial world. Microbiome sequencing (i.e., metagenome, 16s rRNA) extracts DNA directly from a microbial environment without culturing any species. Recently, huge amount of data are generated from many micorbiome projects such as Human Microbiome Project (HMP), Metagenomics of the Human Intestinal Tract (MetaHIT), et al. Analyzing these data will help us to better understand the function and structure of microbial community of human body, earth and other environmental eco-systems. However, the huge data volume, the complexity of microbial community and the intricate data properties have created a lot of opportunities and challenges for data analysis and mining. For example, it is estimate that in the microbial eco- system of human gut, there are about 1000 kinds of bacteria with ten billion bacteria and more than four million genes in more than 6000 orthologous gene family. The challenges are due to the complex properties of microbiome: large-scale, complicated, diversity, correlation, composition, hierarchy, incompleteness etc.
Current microbiomes data analysis methods seldom consider these data properties and often make some assumptions such as linear, Euclidean space, metric-space, continue data type, which conflict with the true data properties. For example, some similarities are non-metric because the prevalent existence of some species; and the interactions among species and environment are complex in high order. Thus it is urgent to develop novel computational methods to overcome these assumptions and consider the microbiome data properties in the analysis procedure. In this talk, we will discuss some computational methods to analyze and visualize microbiome big data. Our studies are focusing on 1) novel machine learning and computational technologies for dimension reduction and visualization of microbiome data based on non-Euclidean spaces (manifold learning) to discover nonlinear intrinsic features and patterns in these data to overcome the linear assumptions, 2) novel statistical methods for variable selection in microbiome data by integrating group information among variables.
Xiaohua Tony Hu is a full professor and the founding director of the data mining and bioinformatics lab at the College of Computing and Informatics. He is also serving as the founding Co-Director of the NSF Center on Visual and Decision Informatics, IEEE Computer Society Bioinformatics and Biomedicine Steering Committee Chair, and IEEE Computer Society Big Data Steering Committee Chair. He joined Drexel University in 2002. He founded the International Journal of Data Mining and Bioinformatics, the IEEE International Conference on Big Data and the IEEE International Conference on Bioinformatics and Biomedicine. In 2001, he founded the DMW Software in Silicon Valley, California. He received many awards, including NSF CAREER Award and IEEE Data Mining Outstanding Service Award. Tony’s current research interests are in data/text/web mining, big data, bioinformatics, information retrieval and information extraction, social network analysis, healthcare informatics, rough set theory and application. He has published more than 270 peer-reviewed research papers in various journals, conferences and books He has obtained more than US$8.5 million research grants in the past ten years as PI or Co-PI. He has graduated 19 Ph.D. students from 2006 to 2017 and is currently supervising nine Ph.D. students.