Exposing Data to DataONE

This assumes you have at least one permanent, immutable data set you would like to expose via the DataONE member node catalog system, and that you have the requisite resources (e.g., expertise, time, hardware)  to manage the data. If you answer YES to all these questions, you may be ready to become a DataONE member node. Otherwise, consider handing the data off to a recognized scientific data repository.

  • Do you have the expertise in-house to implement the DataONE API at the desired tier of service?
  • Do you have the capability to use and administer an event logger (e.g., Log4J)
  • DataONE accepts earth science data. Can your data be situated within the earth science domain?
  • Are you using persistent, globally unique identifiers for data objects (e.g., DOIs, or handles using a handle server)?
  • Do metadata documents exist alongside the data objects?
  • DataONE requires Resource Maps that describe the relationship between a metadata object and the data object(s) to which it refers. Do you have a Resource Map?
  • Is the data ready for publication?
    • Have you determined how best to assemble the data files into a discrete package(s) for easier search and retrieval? For example, is your data best organized by topic, time, location, or some other theme? If for example, by time, have you decided on how to slice and dice it by temporal units?
    • Are you using a consistent naming scheme?
    • Do you have the requisite permissions to share the data? Consider confidentiality and privacy agreements, data use agreements, regulatory requirements (e.g., HIPAA, FISMA, Institutional Review Board, etc.)
    • Is the data in a non-proprietary format (e.g., text, NetCDF, comma-delimited file)?
    • Have you included descriptions and contextual documentation that can explain what your data mean, how they were collected, the methods used to create them, and terminologies and variable names used?
    • Have you selected an interoperable metadata scheme (e.g., Dublin Core) to describe your data? Have you described the data in adherence to the scheme’s standards?
    • Have you prepared any other documentation needed for sharing the data (e.g., data dictionary)
  • Are you regularly backing up the data to guard against loss from technical failures (e.g., disk failure, data corruption) or disasters (e.g., fire or flooding of a data center)?
  • If you are using iRODS, do have the expertise in-house to write/edit iRODS rules?

For more information: https://www.dataone.org/best-practices, and https://www.dataone.org/developer-resources