Ph.D. Dissertation Defense
Joint Inference for Extracting Soft Biometric
Text Descriptorsfrom Patient Triage Images
10:00am Friday, 24 August 2012, ITE 325b, UMBC
Disaster events can result in mass casualties and missing persons, giving rise to a need to provide information about victims to the public. This can be achieved by digitally documenting information available at emergency medical care centers in the form of pictures. The images and other identifying information, such as fingerprints, cannot be broadcast due to privacy concerns, leading to a need to extract appearance-related non-unique features from this data to facilitate locating missing persons. Using humans and machines to compare images is not feasible due to the scale of the situation and the nature (presence of blood and debris) of the images. Extracting a soft biometric text descriptor (text labels describing different soft biometric features) makes it possible to organize information about individuals from these images in a searchable format without revealing the person's identity. The main aim of this thesis is to extract soft biometric features from person images to label appearance-related information and make it available as a text descriptor.
We begin by presenting soft biometric feature detectors for patient images that include an ensemble-based face detection algorithm, template-based eye detection, and eyeglasses, hair color, and skin color detection. We also present a facial hair detector that uses a combination of face and hair information. The feature detection results indicate a need to combine and exploit feature relationships for better performance. We propose a novel probabilistic graphical model that consists of different feature detectors and exploits relationships between these features using a message-passing inference algorithm to build a coherent text descriptor. Further, to understand the utility and the nature of the text descriptors, we present a study based on human descriptions that aims at extracting order and structure information about the features.
We evaluate the performance of individual feature detectors for standard and triage images and establish the challenges posed by the latter. Further, our text analysis shows extreme variability in human descriptions. However, we succeed in extracting some insights about the order of a natural text description. Through our evaluations of the graphical model, we show that for different feature detectors, datasets, and graph sizes the graphical model helps improve the accuracy of the text output. We also show that the performance of the graphical model depends on the individual nodes (feature detectors) and that the model can be used to improve the performance of the individual feature detectors. This thesis illustrates the whole process from images to text descriptors while evaluating components as we proceed. This work presents an approach to extract text labels from images using computer vision, a probabilistic graphical model, and natural language processing techniques.
Committee: Drs. Dr. Tim Oates (Chair), Marie desJardins, Tim Finin, Glenn Pearson and Jesus Caban