Research Object Crate

View the Project on GitHub ResearchObject/ro-crate

Research Object Crate (RO-Crate)

  1. Research Object Crate (RO-Crate)
    1. What is a Research Object
      1. RO-Crate
      2. Motivation
    2. Drafts
    3. Use cases
    4. Contribute
      1. Meetings
    5. Cite RO-Crate

Note: ROLite and DataCrate have been merged to RO-Crate. This is the new homepage

What is a Research Object

A Research Object (RO) provide a machine-readable mechanism to communicate the diverse set of digital and real-world resources that contribute to an item of research. The aim of an RO is to replace traditional academic publication as a PDF with a couple of supplementary materials; to instead provide a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations.

This is of particular importance as all domains of research and science are increasingly relying in computational analysis, yet we are facing a reproducibility crisis because key components are not sufficiently tracked, archived or reported.


This project define Research Object Crate (or RO-Crate for short), an emerging lightweight approach to package research data with their structured metadata, based on annotations in a formalized JSON-LD format that can be used independent of infrastructure to encourage FAIR sharing of reproducible datasets and analytical methods.

Examples of items that should be included in a Research Object:

See background for how this work build on existing Research Object specifications.


Many data packaging initiatives arrive at similar principles: simple folder structure; JSON-LD manifest; for core metadata; BagIt for fixity; OAI-ORE for aggregation. This points to: a) appetite for general package/folder-oriented approach in different contexts; b) a generic solution won’t work for all and needs to be domain-extensible; c) a tendency to re-invent the wheel, leading to sub-optimal interoperability and duplication of effort. We have identified a gap for a solid base format for data packaging that also allow communities to build domain-specific solutions.

Our proposal is to build on DataCrate to evolve RO-Crate, based around these principles: a) metadata as Linked Data, using as much as possible; b) extensible for different domains; c) retain the core Research Object principles Identity, Aggregation, Annotation; d) inferred metadata rather than repetition; e) “just-enough” provenance; f) layered validation; g) archivable with BagIt; h) hooks to reuse existing domain formats; i) lightweight programmatic generation and consumption. Similar to the approach of BioSchemas, rather than building new specifications from scratch, we aim to build best-practice guides and validatable profiles for building rich research data packages with existing standards, without requiring expert knowledge for developing producers and consumers.


The RO-Crate specification is currently work in progress using Google Docs before being added to our GitHub repository.

Historical note: After the initial draft, the community decided to base the specification on DataCrate, and changed the name from ROLite to RO-Crate.

Use cases

We are also gathering usecases, please help us by adding more.


The RO-Crate team is:

To suggest changes, improvements or issues, use the GitHub repository - if you are new to GitHub or Open Source you may appreciate the GitHub guides like Hello World, MarkDown and How to contribute to open source

You are welcome to join us! Contributors are expected to comply with our Code of Conduct to ensure an open and inclusive environment.

This specification and documentation is Open Source and licensed as Apache License, version 2.0, see for details.


The RO-Crate team try to meet in a monthly telcon, see the rolling agenda for schedule, call-in details and minutes.

Cite RO-Crate

Eoghan Ó Carragáin; Carole Goble; Peter Sefton; Stian Soiland-Reyes (2019): A lightweight approach to research object data packaging Bioinformatics Open Source Conference (BOSC2019)