CSEE Colloquium

A Novel Dynamic Task Scheduling Environment
for High Performance Distributed Systems

Tyler Simon

Faculty Research Assistant, UMBC

1:00pm Friday, 21 September 2012, ITE 227, UMBC

The number of concurrently executing tasks required for a single application to perform at the petascale is on the order of hundreds of thousands. Given current manycore hardware trends, future peta- and exa-scale class systems will require applications to run tasks on the order of hundreds of millions to billions. To address the problem of creating, running and managing jobs of this scale, both from a system user and administration perspective we have developed, ARRIA, an Autonomic Runtime for Resource Intensive Applications. ARRIA uses a decentralized bag of tasks and workload scheduler that increases individual job priorities based on weighed factors that are of interest to the application programmer or the system administrator. ARRIA is designed to run millions of independent tasks reliably and efficiently without explicit message passing from the user. In previous work, using the ARRIA scheduler for scientific MapReduce workloads, we have shown a 2.1x speedup over the Hadoop Fair Share scheduler. We investigate novel scheduling parameters and strategies that guarantee efficient job execution for a wide range of realistic and simulated workloads with both user and administrator objectives, such as increased throughput and maximized utilization with minimal wait times for specific job classes. Finally our experiments investigate the long tail phenomenon for mixed workloads and the overheads incurred for increased system size.

Mr. Simon has undergraduate degrees in Computer Science and Philosophy with a Master of Science in Computer Science from the University of Mississippi, he is currently pursuing a PhD in Computer Science at the University of Maryland Baltimore County. Mr. Simon has worked professionally in the high performance computing (HPC) field for over a decade. In 2005 he earned a Department of Energy graduate research fellowship at Oak Ridge National Laboratory, where he worked for in the Computer Science and Mathematics Division developing and implementing the Freeloader distributed storage system. Mr. Simon has worked as a computational scientist for the Department of Defense High Performance Computing Modernization Office based at the U.S. Army Engineer Research and Development Center in Vicksburg, MS, evaluating both current and future HPC system requirements for applications of interest to the Department of Defense. Since 2009 Mr. Simon has been a computational scientist and manager of HPC user services at the NASA Center for Climate Simulation at Goddard Space Flight Center and is currently a Faculty Research Assistant at the University of Maryland Baltimore County working at the NSF Center for Hybrid Multicore Productivity Research. Mr. Simon’s research involves the study of dynamic distributed runtime environments, parallelization strategies and scheduling of large scale scientific applications for current petascale and future HPC architectures.

For more information and directions see http://bit.ly/UMBCtalks.