The DataNet Federation Consortium provides a highly extensible environment that can be tuned to support a wide variety of use cases. Each collaborating Science and Engineering discipline has different data management applications that are enabled by data grid software:
- Science Observatory Network: Implement a data grid to manage sensor data from observatories. Examples include oceanography data, seismic data, and environmental data.
- Hydrology initiative: Automate data retrieval, transformation, and analysis within hydrology research workflows. Register and share analysis workflows.
- Information based engineering: Build a digital library of engineering documents, support transformation between engineering data formats, and archive engineering records.
- Plant biology data sharing environment: Support scalability of the iPlant Collaborative environment to enable research collaborations across 20,000 plant biologists, and 100 million data sets.
- Cognitive Science research networks. Support data sharing within the Temporal Dynamics of Learning Center, and enable formation of research collaborations.
- Social Science. Implement preservation policies appropriate for social science data collections, and automate the enforcement of the policies.
A national network is being created that manages real-time sensor data acquired through the Antelope Real Time System (ARTS). The data sources are highly distributed from multiple types of sensors. The storage systems are also distributed across sites linked by an iRODS data grid. The combined system enables the extraction of sensor data, the application of workflows to analyze the data, and the archiving of the analysis products.
Hydrology initiative:
Hydrologic scientists have expressed a “grand challenge” of creating a National Water Model to better address future water resource challenges. A critical component in achieving this grand challenge will be the data infrastructure required to support national-scale modeling activities. The DFC will explore a hydrologic modeling use case where data is accessed from multiple sources (CUAHSI-HIS, NOAA, USGS, NASA, etc.), transformed for use by a hydrologic model, and used by that model to make predictions. The entire data preparation and model execution workflow is documented, sharable, and reproducible. This use case will demonstrate how Data Grid technology enables collaboration by researchers across multiple institutions, while automating analysis procedures.
Information based engineering relies upon the formation of a digital library that organizes engineering documents, supports discovery, and manages long-term retention. Given an appropriate organization, a community will be able to retrieve data sets for analyses, publish results, and share analyses with other engineers.
Science of Learning initiative:
TDLC requires support for geographically distributed collaborations that share large datasets across tasks and species, coordinate joint analyses of data, share novel stimulus sets, and support computational modeling, while ensuring IRB, HIPAA, and IACUC data restrictions are maintained.
Social Science Initiative:
Use Case: A hydrologist using the Dataverse’s multidisciplinary search for impacts of drought may find that researchers down the watershed are engaged with similar issues from coastal estuary damage and its effect on fisheries, to changes in oceanographic currents and their effect on weather patterns. This DFC infrastructure will act as a catalyst for innovative combinations of digital data. Seamless discovery of data from diverse disciplines could prove to be a fundamental shift in the way researchers gather and use digital data.
Plant Biology Initiative:
It is common for iPlant Collaborative users to have 100GB-2TB of data. Team science and virtual organizations are often global and highly distributed. Consortiums have 10-100TB data to share. Data movement, sharing, leveraging compute resources are a necessity. Compute resources are provided by multiple partners. Support is needed for modern ways to share and collaborate with data-driven teams. What is needed is to establish iPlant Data Commons and improve discoverability.
To access the DataNet Federation Consortium data grid, click on
This provides a link to a public directory on the DataNet Federation Consortium federation hub that contains papers, posters, presentations, tutorials, and videos.