Jonathan Crabtree and Christopher (Cal) Lee to present workshop on “Introduction to Data Curation”.
The CyVerse discovery environment handles the technology of collaborative research so researchers can focus on the science.
As part of their outreach/educational activities the DFC has developed a video to introduce potential new users to the DataNet Federation Consortium.
This material is based upon work supported by the National Science Foundation under Grant Number OCI 0940841
DFC’s Mike Conway participated in the EarthCube Architecture Workshop, May 2016
Reagan Moore presented at the Best Practices in Data Infrastructure Workshop, Pittsburgh, PA, May 17-18
Hao Xu presented on “Bidirectional Integration of Multiple Metadata Sources” at the iRODS User Group Meeting, June 9.
Mike Conway presented on “CyVerse Discovery Environment and Cloud Browser” at the iRODS User Group Meeting, June 9.
Presenters: Jonathan Crabtree, UNC Odum Institute; Thu-Mai Christian, UNC Odum Institute
Tuesday, May 31, 09:00-11:00
Abstract: The workshop will highlight work of the Odum Institute as part of the DataNet Federation Consortiums effort to join the Odum Institutes archive platform with the Integrated Rule-Oriented Data System. Participants will see how archive workflows within the Dataverse platform can be connected to iRODS and leverage the policy based rules enforcement capabilities of iRODS. Participants will be able to create working Dataverse virtual archives that are integrated with the iRODS storage grid technology. The workshop will describe and utilize policies sets that have been selected from the new ISO 16363 audit standards for trustworthy digital repositories. These policies are written into iRODS rules that can be machine enforced. These data management and preservation rules will enforce and monitor a wide range of policies:
- Number of preservation copies
- Checksum calculations
- Frequency of integrity checks
- Creation of preservation formats
- Verification of preservation formats
- Movement of digital objects through a secure firewall
- Scans for sensitive information to protect human subjects
- Reporting of preservation status
- Verification of geographic distributed copies
- Enforce and report access control
Participants will see machine actionable rules in practice and be introduced to an environment where written policies can be expressed in ways an archive can automate their enforcement.
NDS Labs aims to foster the mission of the NDSC (http://www.nationaldataservice.org) of enabling a National Data Service by bringing together the R&D data cyberinfrastructure efforts currently under way. Towards this NDS Labs allows users to easily deploy containerized versions of these actively developed data technologies and deploy them, elastically, on a number of Cloud resources. Here we show the deployment of the data infrastructure for the Odum effort for the preservation of social science data (http://arc.irss.unc.edu/dvn), specifically the deployment of iRODS (http://irods.org), a federated archive technology being utilized within the DataNet Federation Consortium (http://datafed.org), the deployment of BitCurator (http://www.bitcurator.net) to identify sensitive information within uploaded data (e.g. social security numbers), and the deployment of Dataverse (http://dataverse.org), a data publishing service out of Harvard being linked with the previously deployed iRODS instance to recreate the Odum archive.
2016), Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems, Earth and Space Science, 3, doi:10.1002/2015EA000139.
, , , , , , , , and (A recent CyVerse Webinar (March 25) provided an introduction to Docker technology and how to use Docker to bring tools into the Discovery Environment.
A paper describing the federation of the DataNet Federation Consortium (DFC), the Sustainable Environment-Actionable Data (SEAD), and the Terra Populous (TerraPop) Datanet projects will be published in Earth and Space Science. The title and authors of the paper are:
Server-Side Workflow Execution using Data Grid Technology for Reproducible Analyses of Data-Intensive Hydrologic Systems
Bakinam T. Essawy (orcid.org/0000-0003-2295-7981)
Department of Civil and Environmental Engineering,
University of Virginia, Charlottesville, Virginia
Jonathan L. Goodall* (orcid.org/0000-0002-1112-4522)
Department of Civil and Environmental Engineering,
University of Virginia, Charlottesville, Virginia
Hao Xu (orcid.org/0000-0001-6659-6511)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC
Arcot Rajasekar (orcid.org/0000-0003-2280-386X)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC
James D. Myers (orcid.org/0000-0001-8462-650X)
Inter-university Consortium for Political and Social Research,
University of Michigan, Ann Arbor, MI
Tracy Kugler (orcid.org/0000-0002-3427-9789)
Minnesota Population Center,
University of Minnesota, Minneapolis, MN
Mirza M. Billah (orcid.org/0000-0002-9716-4102)
Department of Biological Systems Engineering,
Virginia Tech, Blacksburg, Virginia
Mary C. Whitton (orcid.org/0000-0003-2880-2550)
Renaissance Computing Institute (RENCI)
University of North Carolina, Chapel Hill, NC
Reagan W. Moore (orcid.org/0000-0003-2363-413X)
School of Library and Information Science,
University of North Carolina, Chapel Hill, NC
Abstract:
Many disciplines in the geosciences utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data pre- and post-processing routines can be challenging for a number of reasons including (1) accessing and pre-processing the large volume and variety of data required by the model, (2) post-processing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object (WSO) functionality of the Integrated Rule-Oriented Data System (iRODS) and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure. The scientific contribution is a methodology for creating reproducible scientific workflows where computation routines are co-located with distributed reference data collections. The methodology is demonstrated through an example that leverages both the Terra Populus (TerraPop) and Sustainable Environment-Actionable Data (SEAD) cyberinfrastructure projects for data input, management, and publication. The work is part of a larger effort under the DataNet Federation Consortium (DFC) project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.
Resources:
TerraPop Data: http://dx.doi.org/10.5967/M08P5XH5
VIC Output for Carolina, 1998-2007: http://dx.doi.org/10.5967/M0DF6P6F
WSO : http://dx.doi.org/10.5967/M0J67DXR
WSO_OuputViz: http://dx.doi.org/10.5967/M0513W51
Reagan Moore has developed an on-line course on policy-based data management. The material for this course was developed through the DataNet Federation Consortium for three audiences: masters students at the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill (UNC-CH); academic partners of the DFC federation hub; and users of the integrated Rule-Oriented Data System (iRODS). The iRODS middleware is a policy-based data management system that is used internationally for the management of distributed data. Please click the link to view the course workbook, videos, rules, slides and syllabus. Then in the menu on the left of the browser click on the folder of choice for a list of downloadable files.
DFC poster recently published on ESIP Commons.
iPres 2015, the 12th International Conference on Digital Preservation, selected “Preservation Policy Toolkit” as best paper. The DataNet Federation Consortium uses a policy-based data management system to apply and enforce preservation requirements. This paper describes the Preservation Policy Toolkit developed by the consortium. In particular, the paper describes the infrastructure needed for preservation, presents examples of computer actionable forms of policies, and provides a generic template for designing actionable preservation policies.
The Royal Swedish Academy of Sciences has announced that T2K collaboration member Takaaki Kajita will be awarded the 2015 Nobel prize in physics. Prof Kajita, Director of The Institute for Cosmic Ray Research (ICRR), University of Tokyo, shares the award with Prof Arthur McDonald (Queen’s University, CAN) “for the discovery of neutrino oscillations, which shows that neutrinos have mass.” The T2K experiment has used iRODS since 2010. This has been described in a paper, “First iRODS Experience in a Neutrino Experiment” presented at the 2011 iRODS user group meeting.
The Research Data Alliance Outputs include a report of the Metadata Standards Working Group (Jane Greenberg, Co-Chair). The Metadata Standards Working Group report is found on pages 16-17 of the RDA Outputs document.
The Research Data Alliance Outputs include a report of the Practical Policy Working Group (Reagan Moore,Co-Chair). The Practical Policy Working Group report is found on pages 10-11 of the RDA Outputs document.
The iRODS data management platform and the iRODS Consortium that works to sustain it are making waves well beyond their home base in Chapel Hill, NC.
The iRODS consortium recently posted information on installing Cyberduck.
DFC partner, iPlant Collaborative, recently announced that “Transferring data into and out of the iPlant Collaborative’s scalable data-management platform, the Data Store, is now easier and faster than ever before, thanks to new capabilities of the popular data transfer application Cyberduck.”
The iRODS User Group Meeting 2015 was held at the William and Ida Friday Center in Chapel Hill, North Carolina, Wednesday, June 10th and Thursday, June 11th, 2015.
Research Data Alliance
Repository requirements from 25 science and engineering domains are supported by data grids. A high-level categorization of the requirements is provided. Reagan Moore
Jane Greenberg presented on the topic of “Assigning Metadata: A Key Responsibility in Enhancing Discovery of Research Data” at the NFAIS Hybrid One-Day Worskhop (Mastering the Curation, Integrity and Citation of Quality Research Data: Research Data Publication, Part II). Jane referred to DFC, HIVE and noted the current work in materials science.
CardioVascular Research Grid, DataBridge, Arcot Rajasekar
Drexel’s Metadata Research Center Debuts at Official Opening and Area Conferences | College of Computing & Informatics | Drexel University
GABBS to collaborate with the DataNet iRODS project.
Libraries and command-line scripts for performing ecohydrology data preparation workflows.
Libraries and command-line scripts for performing ecohydrology data preparation workflows.
iPlant Collaborative News: The iPlant Collaborative at the University of Arizona’s BIO5 Institute hosted a three-day hackathon session for the iRODS Consortium.
The Information Association for the Information Age (asis&t) Bulletin
Digital Curation Center News: Mary Whitton presented an update on DFC including a description of “What our users want”.
By Reagan Moore, Scientific Computing World
UNC School of Information and Library Science News: The Assembly of the Federation of Earth Science Information Partners (ESIP Federation) elected four new member organizations, two of which have an affiliation with the University of North Carolina at Chapel Hill’s School of Information and Library Science (SILS), bringing total membership to more than 140 organizations.
ASIST 75th Anniversary Bulletin
The Data Management Plans and Policies panel discussed scientific research data support and implementation of NSF data management plans. The following specific topics were discussed:
- Ryan Steans from the Texas Digital Library described organization and management plans for a regional digital library.
- Peter Wittenburg represented the European Union Data Infrastructure and discussed organization and strategy for building a data grid.
- Suzie Allard of the NSF DataOne project presented on the organization and management for registering and accessing distributed collections.
- Dave Fellinger from Data Direct Networks, talked about integration of collection processing into the storage system.
- Carol Beaton Meyer of Earth Science Information Partners discussed building community consensus on data sharing.
- Aletia Morgan from Rutgers University Community Resource detailed implementing an institutional repository.
- Reagan Moore of the NSF Datanet Federation Consortium closed the presentation with discussion of implementing a community-based collection lifecycle.
By Gregory Goth, Communications of the ACM
doi: 10.1145/2133806.2133811
This article cites iRODS (the foundation of the DFC infrastructure) as a “standout example” of ways to “share data across domains and disciplines.”
The Daily Tar Heel: Researchers in the School of Information and Library Science have become part of a multi-million dollar effort to create a national data network. Plans for the infrastructure are in their earliest stages, with researchers saying they don’t know exactly how far their research could take them. “The significance of the grant is bigger than the actual monetary award as it position’s Carolina as a leader in data management,” said Karen Green, communications director of the Renaissance Computing Institute, which will be involved in the research.
RENCI News: The National Science Foundation has funded the University of North Carolina at Chapel Hill to lead a multi-institutional team that will build and deploy a prototype national data management infrastructure that addresses some of the key data challenges facing scientific researchers in the digital age. The infrastructure will support collaborative multidisciplinary research through shared collections, data publication within digital libraries and reference collections within persistent archives.