DiSSCo PID Infrastructure Gets an Upgrade: Introducing our (test) DOI Infrastructure!

Bringing together collections data from hundreds of sources requires sophisticated coordination. How do we track provenance, annotate, and reference all these Digital Specimens? Using globally resolvable Persistent Identifiers, or PIDs. We’ve discussed our PID infrastructure before on this blog. In this blog post, we will describe an exciting new update: the potential to upgrade our identifiers to DOIs. We’ll discuss what DOIs are, what our technical stack looks like, and what this means for the future of Digital Specimens.

Image by pikisuperstar on Freepik

Persistent Identifiers are a key component of the FAIRness of the DiSSCo infrastructure, allowing machines and humans to consistently reference, annotate, and cite Digital Specimens from natural history collections across the world. These identifiers are globally unique and assigned to a specimen upon ingestion into DiSSCo – no need to worry if two institutions both call one of their specimens “REPTILE.1”. In DiSSCo, a machine or human user can unambiguously identify any specimen. PIDs also provide stable references to a given resource – even if the resource in question has moved.

Handles and DOIs

So far, we’ve used Handles as PIDs for all our digital objects. The Handle System is a tool developed by CNRI for managing and resolving PIDs. It’s a distributed system, meaning DiSSCo is part of a global PID infrastructure. We manage our own handle infrastructure that communicates with the Global Handle Registry, which controls the resolution of all Handles.

The Handle System is a powerful tool to create globally resolvable identifiers, but Handles alone aren’t persistent. To put the “P” in PID, we need DOIs, or Digital Object Identifiers. DOIs are Handles with guaranteed persistence, which makes them a reliable tool for citation and provenance. You can identify a DOI by its prefix beginning with “10.XXX”.

The DOI Foundation governs the DOI system, ensuring once a DOI is minted, it stays up, regardless of whether or not the original institution exists or not. We want to use DOIs as identifiers for Digital Specimens and Media Objects to facilitate citation and persistent referencing of our resources. There’s a catch, however: only trusted organisations, called Registration Agencies (RAs), can mint DOIs. These organisations form a community dedicated to persistent, reliable identifiers. While it’s unclear whether or not DiSSCo will establish a new RA or partner with an existing one, we do know that we’ll need to set up comprehensive DOI infrastructure.

DOI Infrastructure

In this section, we’ll go over our preliminary DOI infrastructure. Code referenced in this section is all available on our GitHub:

Infrastructure as Code Using Terraform

Our first step was to provision resources on AWS. To do so, we used Terraform, an Infrastructure as Code (IaC) tool that lets us describe exactly what resources we want deployed on the cloud. By using IaC, we can easily manage complex infrastructure and track any changes through version control.

For the DOI infrastructure, our setup is relatively simple: a compute instance connected to a PostgreSQL database, all within a controlled private network. The DOI infrastructure was made completely independent from the rest of DiSSCo infrastructure by design; this will allow whatever RA mints DOIs for Digital Specimens to serve the wider collections community, rather than just DiSSCo. Because we’ve used Infrastructure as Code, we can easily transfer ownership to another organisation if need arises.

Containerized Applications

Once our cloud infrastructure was set up, we containerized the applications we needed to run our DOI system. We set up a handle server, which acts as the resolution system for our DOIs. We also containerized our Identifier Manager API, which allows us to quickly create batches of identifiers. The DiSSCo Infrastructure connects to the API via a secured public endpoint. The API and the server use the same PostgreSQL database as storage. Essentially, the API acts as our writer, and the handle Server exposes the contents of this database to the Global Handle Registry.

With all the containerized applications (plus a handy Nginx proxy setup and Let’s Encrypt-issued SSL certificates), the system looks like this:

An actor looking to publish a Digital Specimen may do so through DiSSCo services. DiSSCo services call our DOI API endpoint, which publishes the record (including the FDO Profile!) to the DOI database. Anything published to that database can be looked up by the global handle infrastructure, allowing the identifiers to be resolved.

What’s Next?

This test setup demonstrates that DiSSCo is capable of developing and maintaining reliable DOI infrastructure. While we’re not quite ready for permanent specimen data, this is a step in the right direction towards FAIR and FAIR Digital Objects implementation. As our infrastructure develops and our data model matures, we’re getting closer to linking and annotating collections data across DiSSCo partners.

Image by rawpixel.com on Freepik

DiSSCo PID Infrastructure Gets an Upgrade: Introducing our (test) DOI Infrastructure!

Handles and DOIs

DOI Infrastructure

What’s Next?

2 thoughts on “DiSSCo PID Infrastructure Gets an Upgrade: Introducing our (test) DOI Infrastructure!”

Leave a comment Cancel reply

Handles and DOIs

DOI Infrastructure

What’s Next?

Share this:

Related

2 thoughts on “DiSSCo PID Infrastructure Gets an Upgrade: Introducing our (test) DOI Infrastructure!”

Leave a comment Cancel reply