PhD Dissertation Defense
Deploying Deep Neural Networks in Real-Time Embedded Systems
10:00am Monday, 21 November 2016, ITE 325b
Deep neural networks have been shown to outperform prior state-of-the-art solutions that rely heavily on hand-engineered features coupled with simple classification techniques. In addition to achieving several orders of magnitude improvement, they offer a number of additional benefits such as the ability to perform end-to-end learning by performing both hierarchical feature abstraction and inference. Furthermore, their success continues to be demonstrated in a growing number of fields for a wide-range of applications, including computer vision, speech recognition, biomedical, and model forecasting. As this area of machine learning matures, a major challenge that remains is the ability to efficiently deploy such deep networks in embedded, resource-bound settings that have strict power and area budgets. While GPUs have been shown to improve throughput and energy efficiency over traditional computing paradigms, they still impose significant power burden for such low-power embedded settings. In order to further reduce power while still achieving desired throughput and accuracy, classification-efficient networks are required in addition to optimal deployment onto embedded hardware.
In this dissertation, we target both of these enterprises. For the first objective, we analyze simple, biologically-inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in neural networks. The sparsification techniques developed include feature compression partition, structured filter pruning and dynamic feature pruning.
In the second contribution, we propose scalable, hardware-based accelerators that enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. In particular, we developed SPARCNet, an efficient and scalable hardware convolutional neural network accelerator, along with a corresponding resource-aware API to reduce, translate, and deploy a pre-trained network. The SPARCNet accelerator has been fully implemented in FPGA hardware and successfully employed for a number of case studies and evaluated against several existing state-of-the-art embedded platforms including NVIDIA Jetson TK1/TX1 in real-time. A full hardware demonstration with the developed API will be showcased that enables selecting between hardware platforms and state-of-the-art vision datasets while performing real-time power, throughput, and classification analysis.
Committee: Drs. Tinoosh Mohsenin (chair), Anupam Joshi, Tim Oates, Mohamed Younis, Farinaz Koushanfar