Mining social media data for health, public health, and popular events

Anietie Andy, University of Pennsylvania

1:00-2:00 pm ET, Friday, 2 April 2021

online via WebEx


Increasingly, individuals are turning to social media and online forums such as Twitter and Reddit to communicate about a range of issues including their health and well-being, public health concerns, and large public events such as the presidential debates. These user-generated social media data are prone to noise and misinformation. Developing and applying Artificial Intelligence (AI) algorithms can enable researchers to glean pertinent information from social media and online forums for a range of uses.  For example, patients’ social media data may include information about their lifestyle that might not typically be reported to clinicians; however, this information may allow clinicians to provide individualized recommendations for managing their patients’ health. Separately, insights obtained from social media data can aid government agencies and other relevant institutions in better understanding the concerns of the populace as it relates to public health issues such as COVID-19 and its long-term effects on the well-being of the public. Finally, insights obtained from social media posts can capture how individuals react to an event and can be combined with other data sources, such as videos, to create multimedia summaries. In all these examples, there is much to be gained by applying AI algorithms to user-generated social media data.

In this talk, I will discuss my work in creating and applying AI algorithms that harness data from various sources (online forums, electronic medical records, and health care facility ratings) to gain insights about health and well-being and public health. I will also discuss the development of an algorithm for resolving pronoun mentions in event-related social media comments and a pipeline of algorithms for creating a multimedia summary of popular events. I will conclude by discussing my current and future work around creating and applying AI algorithms to: (a) gain insights about county-level COVID-19 vaccine concerns, (b) detect, reduce, and mitigate misinformation in text and online forums, and (c) understand the expression and evolution of bias (expressed in text) over time. 


Anietie Andy is a senior data scientist at Penn Medicine Center for Digital Health. His research focuses on developing and applying natural language processing and machine learning algorithms to health care, public health, and well-being. Also, he is interested in developing natural language processing and machine learning algorithms that use multimodal sources (text, video, images) to summarize and gain insights about events and online communities.