Note: ROLite and DataCrate have been merged to RO-Crate. This is the new homepage http://researchobject.org/ro-crate/
A Research Object (RO) provide a machine-readable mechanism to communicate the diverse set of digital and real-world resources that contribute to an item of research. The aim of an RO is to replace traditional academic publication as a PDF with a couple of supplementary materials; to instead provide a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations.
This is of particular importance as all domains of research and science are increasingly relying in computational analysis, yet we are facing a reproducibility crisis because key components are not sufficiently tracked, archived or reported.
This project define Research Object Crate (or RO-Crate for short), an emerging lightweight approach to package research data with their structured metadata, based on schema.org annotations in a formalized JSON-LD format that can be used independent of infrastructure to encourage FAIR sharing of reproducible datasets and analytical methods.
Examples of items that should be included in a Research Object:
Many data packaging initiatives arrive at similar principles: simple folder structure; JSON-LD manifest; schema.org for core metadata; BagIt for fixity; OAI-ORE for aggregation. This points to: a) appetite for general package/folder-oriented approach in different contexts; b) a generic solution won’t work for all and needs to be domain-extensible; c) a tendency to re-invent the wheel, leading to sub-optimal interoperability and duplication of effort. We have identified a gap for a solid base format for data packaging that also allow communities to build domain-specific solutions.
Our proposal is to build on DataCrate to evolve RO-Crate, based around these principles: a) metadata as Linked Data, using schema.org as much as possible; b) extensible for different domains; c) retain the core Research Object principles Identity, Aggregation, Annotation; d) inferred metadata rather than repetition; e) “just-enough” provenance; f) layered validation; g) archivable with BagIt; h) hooks to reuse existing domain formats; i) lightweight programmatic generation and consumption. Similar to the approach of BioSchemas, rather than building new specifications from scratch, we aim to build best-practice guides and validatable profiles for building rich research data packages with existing standards, without requiring expert knowledge for developing producers and consumers.
The RO-Crate specification is currently work in progress using Google Docs before being added to our GitHub repository.
Historical note: After the initial draft, the community decided to base the specification on DataCrate, and changed the name from ROLite to RO-Crate.
The RO-Crate team is:
To suggest changes, improvements or issues, use the GitHub repository https://github.com/ResearchObject/ro-crate - if you are new to GitHub or Open Source you may appreciate the GitHub guides like Hello World, MarkDown and How to contribute to open source
The RO-Crate team try to meet in a monthly telcon, see the rolling agenda for schedule, call-in details and minutes.
Eoghan Ó Carragáin; Carole Goble; Peter Sefton; Stian Soiland-Reyes (2019): A lightweight approach to research object data packaging Bioinformatics Open Source Conference (BOSC2019) https://doi.org/10.5281/zenodo.3250687