The Power of Many: Abstractions, Models and Tools for Scalable Approaches for Many Simulations
There are several important science and engineering problems that require the coordinated execution of multiple high-performance simulations. Some common scenarios include but are not limited to, "an ensemble of tasks", "loosely-coupled simulations of tightly-coupled simulations" or "multi-component multi-physics simulations". We posit that the tools and capabilities to support scalable requirements of multiple simulations are limited. A promising way to overcome this surprisingly common limitation is the use of "Pilot-Jobs" --- which can be defined as a container job to provide mullet-level scheduling capabilities via an application-level overlay on system schedulers. We discuss both the theory and practice of Pilot abstractions: Specifically, we introduce the P* Model of Pilot-abstractions, and present "RADICAL-Pilot" -- a SAGA-based extensible, interoperable and scalable implementation of the P* Model. We will present several science problems that have/are using RADICAL-Pilot to execute multiple simulations at unprecedented scales on a range of supercomputers and distributed supercomputing infrastructure such as (the US) NSF XSEDE. In the process, we will convey some of the cyberinfrastructure research questions and "practical" challenges that we have addressed in order to support science at scale.
Shantenu is an Assistant Professor at Rutgers University, and a Visiting Scientist at the School of Informatics (University of Edinburgh) and at University College London. Before moving to Rutgers, he was the lead for Cyberinfrastructure Research and Development at the CCT at Louisiana State University. His research interests lie at the triple point of Applied Computing, Cyberinfrastructure R&D and Computational Science. Shantenu is the lead investigator of the SAGA project (http://www.saga-project.org), which is a community standard and is part of the official middleware/software stack of most major Production Distributed Cyberinfrastructure -- such as US NSF's XSEDE and the European Grid Infrastructure. His research is funded by multiple NSF and US Department of Energy and recently by US National Institute for Health (NIH) as well as the UK EPSRC (OMII-UK project and Research theme at the e-Science Institute). He is the recipient of the NSF CAREER Award in 2013 and has won several prestigious awards at ACM/IEEE Supercomputing and the International Supercomputing Series. He seeks fearless and revolutionary young minds to join the RADICAL (thinking) group! Away from work, Jha tries middle-distance running and biking, tends to be an economics-junky, enjoys reading and writing random musings and tries to use his copious amounts of free time with a conscience.