Paper to be published: Server-Side Workflow Execution using Data Grid Technology for Reproducible Analyses of Data-Intensive Hydrologic Systems

A paper describing the federation of the DataNet Federation Consortium (DFC), the Sustainable Environment-Actionable Data (SEAD), and the Terra Populous (TerraPop) Datanet projects will be published in Earth and Space Science.  The title and authors of the paper are:

Server-Side Workflow Execution using Data Grid Technology for Reproducible Analyses of Data-Intensive Hydrologic Systems

Bakinam T. Essawy (orcid.org/0000-0003-2295-7981)
Department of Civil and Environmental Engineering,
University of Virginia, Charlottesville, Virginia

Jonathan L. Goodall* (orcid.org/0000-0002-1112-4522)
Department of Civil and Environmental Engineering,
University of Virginia, Charlottesville, Virginia

Hao Xu (orcid.org/0000-0001-6659-6511)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC

Arcot Rajasekar (orcid.org/0000-0003-2280-386X)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC

James D. Myers (orcid.org/0000-0001-8462-650X)
Inter-university Consortium for Political and Social Research,
University of Michigan, Ann Arbor, MI

Tracy Kugler (orcid.org/0000-0002-3427-9789)
Minnesota Population Center,
University of Minnesota, Minneapolis, MN

Mirza M. Billah (orcid.org/0000-0002-9716-4102)
Department of Biological Systems Engineering,
Virginia Tech, Blacksburg, Virginia

Mary C. Whitton (orcid.org/0000-0003-2880-2550)
Renaissance Computing Institute (RENCI)
University of North Carolina, Chapel Hill, NC

Reagan W. Moore (orcid.org/0000-0003-2363-413X)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC

Abstract:

Many disciplines in the geosciences utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data pre- and post-processing routines can be challenging for a number of reasons including (1) accessing and pre-processing the large volume and variety of data required by the model, (2) post-processing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object (WSO) functionality of the Integrated Rule-Oriented Data System (iRODS) and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure. The scientific contribution is a methodology for creating reproducible scientific workflows where computation routines are co-located with distributed reference data collections. The methodology is demonstrated through an example that leverages both the Terra Populus (TerraPop) and Sustainable Environment-Actionable Data (SEAD) cyberinfrastructure projects for data input, management, and publication. The work is part of a larger effort under the DataNet Federation Consortium (DFC) project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

Resources:

TerraPop Data: http://dx.doi.org/10.5967/M08P5XH5

VIC Output for Carolina, 1998-2007: http://dx.doi.org/10.5967/M0DF6P6F

WSO :  http://dx.doi.org/10.5967/M0J67DXR

WSO_OuputViz: http://dx.doi.org/10.5967/M0513W51

Posted in