## High Performance Architectures for OMP Compressive Sensing Reconstruction Algorithm Amey Kulkarni and Tinoosh Mohsenin Energy Efficient High Performance Computing Lab, CSEE Department, University of Maryland, Baltimore County

### **Abstract and Contributions**

•Compressive Sensing (CS) has emerged as a novel technique which enables reconstruction of sparse signals sampled at sub-Nyquist rates.

•The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications. •In this paper, we present both algorithmic modification and novel hardware implementation on FPGA, ASIC and on a custom many-core platform for Orthogonal Matching Pursuit (OMP), one of the popular CS reconstruction algorithms. •For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m.

•For the demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 µs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 µs.

•ASIC implementation reconstructs the signal in 13 µs. Our custom many-core, operating at 1.18 GHz, takes 18.28 µs to complete.

•Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.

## **Compressive Sensing Applications and OMP Reconstruction Algorithm**

•Applications : MRI, Radar Imaging (ISAR and TWR)

•Though the CS has several advantages, reconstruction of CS is very complex and computational intensive.

•OMP is partitioned into three main kernels: Dot product, sort and least square (which involves matrix inversion).



### **OMP Reconstruction Algorithm**



**Basic Block Diagram for OMP Reconstruction Algorithm** 

# **Proposed Work and Results**

•Parallel and Reconfigurable Architectures by using Blocked algorithms •Thresholding Method to reduce execution time of Dot Product calculations





**(B)** 





**Reconstruction algorithm.** 



**Proposed FPGA and ASIC architecture diagram of OMP reconstruction algorithm.** (a) Dot product and Sort block. Operates at 85 MHz. (b) finds inverse of a 8x8 matrix. **Operates at 69 MHz. Both blocks iterate 8 times to solve optimization problem** 



Detailed analysis on the work performed by [1] shows that the increased data path latency was due to the dot product calculation and also due to the fixed point division operations. Therefore, the architecture proposed here has highly optimized data path logic which nearly doubles the performance of the hardware. •The main change in the improved algorithm is the calculation of average and eliminating the columns of Ø whose dot product is below a predefined threshold.

**Post-layout view of OMP Compressive Sensing Reconstruction in 65 nm CMOS** 

| Design                     | Signal Length (N) | Sparsity | Cycles | Time(µs) |  |  |  |
|----------------------------|-------------------|----------|--------|----------|--|--|--|
| Without<br>Threshold (α)   | 256               | 8        | 2100   | 27.14    |  |  |  |
| Without<br>Threshold (α)   | 128               | 5        | 820    | 10       |  |  |  |
| With Threshold<br>(α=0.25) | 256               | 8        | 1280   | 18       |  |  |  |
| With Threshold<br>(α=0.25) | 128               | 5        | 444    | 7.13     |  |  |  |

| Program                   | Cycles | Time(µs) |
|---------------------------|--------|----------|
| Dot Product (R*Ø)         | 340    | 0.28     |
| Sort                      | 1870   | 1.58     |
| Least Square Minimization | 19366  | 16.41    |
| Total                     | 21576  | 18.28    |



**Proposed Many-Core Mapping of OMP** algorithm on the custom many-core platform

## **Comparison with Others**

| Device                | Signal Length | Sparsity | Frequency | Time to Execute |
|-----------------------|---------------|----------|-----------|-----------------|
| FPGA (Virtex5)[7]     | 128           | 5        | 39MHz     | 24µs            |
| Intel Core i7[2]      | 128           | -        | 2.6GHz    | 68ms            |
| CUBLAS(GPU)[9]        | 128           | -        | -         | 37.5ms          |
| This Work (Many-Core) | 256           | 8        | 1.18GHz   | 18.28µs         |
| This Work (FPGA)      | 256           | 8        | 165MHz    | 18µs            |
| This Work (ASIC)      | 256           | 8        | 165MHz    | 13.7µs          |

**Comparison Of OMP Reconstruction Time For The Previous Implementations** References

- Symposium on , vol., no., pp.29,32, 20-23 May 2012
- Sensing, May 2013.
- Sensing, May 2014.



**FPGA Implementation Summary For OMP Reconstruction Hardware On Virtex-5** 

**Cycle Count For OMP Algorithm** With 256x256 and Sparsity = 8 On **Custom Many-Core Platform**, **Operating At 1.18GHz** 

pipelined by using tree multipliers. Whereas, sort algorithm is parallelized using binary trees. Third kernel is Least Square Minimization which consists of matrix inversion. For matrix inversion, block LU decomposition algorithm is used. •Large matrix size leads to increased complexity with memory architecture. Since, our many-core has limited data memory (128 words each core) and matrix cannot be stored entirely on-chip, we take advantage of parallelism, blocked algorithms are used for implementing matrix inversion.

•First kernel is implemented in parallel and

Stanislaus, J.L.V.M.; Mohsenin, T., "High performance compressive sensing reconstruction hardware with QRD process," Circuits and Systems (ISCAS), 2012 IEEE International

A. Korde, D. Bradley, and T. Mohsenin. "Detection performance of radar OMP compressive sensing in noisy environnements", International SPIE Conférence on Defense, Security, and

A. Kulkarni and T. Mohsenin. "Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction", International SPIE Conference on Defense, Security, and