Dissertation Defense

The Lightweight Virtual File System

Navid Golpayegani

10:00-12:00 Thursday, 20 July 2017, ITE 325, UMBC

 

A data center today is responsible for safely managing big data volumes and balancing the complex needs between data producers and consumers. This balance often involves reconciling the needs of easy access and rapid retrieval in ways desired by the consumers with the needs of long term availability, reliability, and expandability of data producers. The long term continuous support of data storage adds another layer of complexity for the file system. As storage architecture and big data volumes evolve, existing file system’s primary focus is performance while less attention is payed to addressing the problems of the above long term servicing needs of their clients.

I have developed the Lightweight Virtual File System (LVFS) to address these problems through the unique conceptual approach of separating the most common tasks involved in a file system; namely storing data, locating data, and organizing data. Standard file systems are developed as single monolithic systems performing all three tasks. LVFS replaces these tasks with an architecture which enables the dynamic combination of different algorithms for each of those tasks. Using this approach, LVFS is capable of constructing a storage system, which allows for ready availability, reliability, expandability, and long term support while, simultaneously, assuring the performance of a stable system customizable to meet the needs of data consumers.

After successful development and testing to allow for merging decades old storage architecture with new and incompatible ones, such as HGST Active Archive System, NASA Goddard Space Flight Center’s Terrestrial Information Systems Laboratory adopted LVFS for their production environment to create a single, integrated storage system without any software modifications. UMBC’s Center for Hybrid Multicore Productivity Research deployed an instance on the IBM iDataPlex ‘BlueWave’ cluster to utilize Seagate’s Active Drive systems as a storage and on-disk compute platform. With LVFS we show we were able to perform MapReduce computation directly on the drive with comparable performance to Hadoop running on BlueWave. It also shows a significant reduction in data leaving the active drive during computation thereby significantly increasing throughput.

Committee Members: Dr.s Milton Halem (Advisor), Yelena Yesha, John Dorband, Charles Nicholas, Curt Tilmes