Architecture

The DFC is based on a bottom-up federation of existing national data management systems through use of the iRODS policy-based data grid technology. A sharable collection is created from data sets that are located at remote sites. The collection is managed by policies that control data ingestion, data access, and collection properties. The policies automate administrative tasks, validate assessment criteria, and control federation between independent domain data grids.

Each data grid contains peer-to-peer servers that manage interactions with external resources. An iRODS data grid consists of clients and middleware. The clients that may be used to access the data grid include web browsers, web services, digital libraries, workflow systems, scripting load libraries, I/O libraries, file system interfaces, shell commands, grid tools, dropbox interfaces, and portals. Actions by clients are identified and controlled by policy enforcement points within the iRODS middleware.

The peer-to-peer servers include
iRES – storage resource server that translates from standard operations to the storage protocol. Each iRES server includes a local rule engine that applies policies from a local rule base.
iCAT – metadata catalog interface that manages interactions with a relational database
iSEC – rule scheduler that manages execution of deferred and periodic rules
iXMS – message system for tracking progress and supporting distributed debugging of the system.

The data grids can be federated to assemble national data infrastructure.
A snowflake federation mechanism is used to establish trust between each pair of data grids. Persons are authenticated by their home data grid, and authorized by the data grid they are accessing.

Each data grid is identified by a “Zone” name and a port number that is used to communicate with the metadata catalog “iCAT”.  Each data grid may provide access to multiple storage resources located anywhere on the network.  A user can log onto a data grid and store files at any storage location for which they are given access permission.

Client Graph

The clients that can be used to access the federated environment include development libraries that support the porting of new clients, a DFC surface that defines the protocols that can be used to interact with the data grid, and community-specific clients tuned to the needs of each discipline.

The DFC has implemented three types of federation mechanisms:

  1. Tightly coupled federations, in which name spaces for users, collections, and files are shared.  This is usually done through federation of iRODS data grids.  This method is used in the DFC
    Federation Hub to federate with the Temporal Dynamics of Learning Center, the HydroShare project, and the iPlant Collaborative testbed.
  2. Loosely coupled federations, in which the protocol for interacting with a remote repository is captured in a micro-service.  Interactions are usually through web services that support discovery and retrieval of data sets.  This method is used to federate the DataNet Partners with the DFC Federation Hub, including DataONE, SEAD, and TerraPoP.
  3. Asynchronous federations, in which communication is done through a third party.  There is no direct interaction between the remote repositories.  Instead, a message queue is used for posting queries and tracking responses.  This method is used to federate the DFC Federation Hub with external indexing systems, the DataBridge project, and the DataBook technology.