ELViS 1.0.0 is here: An important milestone for DiSSCo

On March 18, 2021 a new deployment of ELViS (European Loans and Visits System) became available. ELViS 1.0.0 is currently being used to facilitate the 3rd Transnational Access call for SYNTHESYS+ (to fund short-term research visits to consortium institutions) and the 2nd Virtual Access call (to fund digitisation-on-demand requests).

Preseucoela imallshookupis is a species of gall wasp. The genus name, Preseucoela, is named after Elvis Presley. Image source: Berenbaum, M., 2010. Preseucoela imallshookupis has left the building. American Entomologist56(4), pp.196-197. https://doi.org/10.1093/ae/56.4.196

The current version of ELViS is an important milestone in the SYNTHESYS+ project and also towards building DiSSCo — a new world-class Research Infrastructure for natural science collections. We have come a long way since mid 2019 when we started gathering user surveys and requirements with our development partner Picturae. The surveys, workshops and weekly meetings contributed to user-stories. Here is a collection of user stories that have been addressed in this version of ELViS.

GitHub project board from the ELViS repo

ELViS is a great example of a tool that is being built together with the community and the user base that will ultimately use it. As members in the SYNTHESYS+ project are based in different parts of Europe, we were already holding regular zoom meetings to facilitate the development process. Github was extensively used to create wireframes and guide the sprint activities. Although not all the efforts of the SYNTHESYS+ WP6 partners and the talented developers at Picturae are reflected in the following chart, you can see activities based on issues submitted over the past few months during our test and development process.

Chart generated in https://jerrywu.dev/github-issue-visualizer/

We still have a long way to go to support the loans and visits transactions but we are excited about the launch of the 3rd Transnational Access call and future of ELViS and DiSSCo.

The DiSSCo Knowledgebase

Authors: Mareike Petersen*, Julia Pim Reis*, Sabine von Mering*, Falko Glöckler*
* Museum für Naturkunde Berlin, Germany


As an initiative formed by public research institutions, DiSSCo is committed to Open Science. We believe that Open Science not only makes the scientific work more transparent and accessible but also enables a whole new set of collaborative and IT-based scientific methods. Therefore, the outputs of our common research projects are openly available as much as possible and research data easily Findable, more Accessible, Interoperable and Reusable (FAIR principle).

DiSSCo Prepare (DPP), the preparatory project phase of DiSSCo, will build on profound technical knowledge from various sources and initiatives. In order to allow for efficient knowledge and technology transfer for partners building the DiSSCo technical backbone, a central and freely accessible DiSSCo Knowledgebase will be designed and implemented within the project. The conceptual and developmental work is done under the Work Package “Common Resources and Standards” and the Task “DiSSCo Knowledgebase for technical development” (both led by the Museum für Naturkunde Berlin). This hub for knowledge management relevant within the DiSSCo context will not only store all research outputs from DiSSCo-linked projects in one place but also act as a reference for further building blocks relevant for the DiSSCo Research Infrastructure (RI).


As a first step, the extent of information types expected to be stored in the knowledgebase was collected. To get the most complete picture we discussed this topic within the respective project task group and work package, but also together with project overarching bodies such as the DiSSCo Technical Team. As a last preparatory step, we sent a survey to all task and work package leads of DPP to evaluate which information types partners are planning to make available via the knowledgebase. The feedback was included in the discussions and planning steps. The latest overview of desired information types is given in Figure 1.

Figure 1: Information Types in the DiSSCo Knowledgebase. Expected cluster of information categories (blue dots) based on DPP Project outcomes and relevant external resources. The format of resources varies within and among information types.

As the term knowledgebase traditionally was used in a context of providing machines with a database of facts for reasoning processes, the partners agreed that we would use the term with a main focus on human readability in the DiSSCo Knowledgebase in the first place. The importance of machine readability varies amongst different information types. However, the metadata will be machine-readable in a consistent manner.

According to our findings, the different information types vary in formats and system requirements and cannot be stored in one single system. Whereas for some information types the target system is more or less set (e.g. GitHub for software code), for others a well considered decision is necessary. Task partners focused on a decision about a software system for the most common information type “Public Documents and External Resources” in order to aggregate references to distributed documents and sources in a single point of entry. A comprehensive landscape analysis with short presentations of each system took place during two task group meetings. For the decision process, requirements of the knowledgebase were collected and prioritised. 

Criteria of top priority for the decision of an appropriate component for the knowledgebase to serve the information type “Public Documents and External Resources” were:

  • Capability of storing documents and free text for referencing deliverables, publications and Questions and Answers / FAQs
  • Extensibility & customization (plugins or extensions)
  • Comprehensive public technical documentation and user documentation
  • Comprehensive REST API
  • Mechanisms for stable versioning of content
  • Search index (including the capability of indexing of customizable metadata)
  • Hierarchical structuring of pages and other entities
  • Capability of structuring the content by categories, tags or labels
  • File upload, storage and download
  • User-friendly search functionality
  • Regular security updates
  • View and download functionality for common document and image file formats
  • Option to run an instance in a cloud environment (rather than a Software as a Service approach)
  • Sustainability of the software product (e.g. organisation in place to support and maintain)

Based on the requirements, the most promising systems were DSpace, CKAN, and Alfresco. All three products meet the requirements for the respective information type “Public documents and external resources” in the knowledgebase according to the prioritized criteria. So, the following additional aspects with respect to the implementation and maintenance have been included in the decision process: latest releases, size of user community, regular support and good software maintenance allowing the correction of possible bugs, and regular security updates. Thus, the team chose DSpace, an open source repository software package of rich and powerful features that focus on long-term storage, access and preservation of digital content. It is available as free software under an open-source license in a public GitHub repository and has a huge user community and a very active group of developers. It offers customizable interfaces, a full-text-search where the provided metadata for content is indexed to be searchable and accessible with the use of a REST API enabling the data to be FAIR. A reliable search functionality allows the end-users to find the content without delay even for huge amounts of data which is essential regarding scalability with an increasing amount of linked information. A list of more convincing key features of DSpace can be accessed at the official website.

First Version

The implementation of DSpace as a first version of the DiSSCo Knowledgebase core will have a customized layout with the DiSSCo branding. It will allow to create a hierarchy of DiSSCo-linked projects and their respective collections of documents and references. In order to store content like Frequently Asked Questions (FAQs), best practices, guidelines, recommendations and documented decisions on the RI, the DiSSCo partners will be enabled to extend the knowledgebase with their content (being free text or files) with the help of easy-to-use web-forms that include a rich text editor. An editorial workflow modelled in the system will allow the platform administrators to review the content prior to publication via role based access. This will also allow for preparing documents privately before publishing them and conducting a profound quality assurance.
The first version of the DiSSCo Knowledgebase will be launched by end of January 2021 at http://know.dissco.eu 

Next Steps

As a next step, the current results of the implementation of the DiSSCo Knowledgebase will be presented at the first All Hands (virtual) Meeting of DiSSCo Prepare (18 – 22 January 2021). This is an event that will bring together leaders and partners of the project, with the objective to present, discuss and produce key elements of what will become Europe’s leading natural science collections Research Infrastructure, DiSSCo RI. In a dedicated session,  the participants will have the opportunity to test the first version of the knowledgebase by browsing the software and testing the features, allowing us to collect feedback and requirements from the project partners.

The DiSSCo Knowledgebase, in its final version, will provide structured technical documentation of identified DiSSCo technical building blocks, such as web services, PID systems, controlled vocabularies, ontologies and data standards for bio- and geo-collection objects, collection descriptions, digital assets standards as well as domain-specific software products for quality assurance and monitoring; an assessment of their technical readiness for DiSSCo as well as specifications on their relevance for the overall DiSSCo technical infrastructure and the DiSSCo data model.


DiSSCo uses a DOI namespace provided by DataCite for assigning DOIs to documents like public deliverables and reports. This process will be automated with the help of a DSpace plugin on the document’s submission. In addition, depositing and linking documents on Zenodo will be integrated. 

To increase the findability of content the metadata will be linked and enriched by cross-references to related content and external resources (e.g. ORCID). In order to optimize the findability even outside the search interface of the knowledgebase the JSON-LD format will be embedded in the landing pages, so the visibility of DiSSCo outcome and knowledge is maximized in the big search engines.

Over the course of the upcoming year 2021 all the other information types will be accommodated or linked in the DiSSCo Knowledgebase. This can be assured by submitting at least a metadata description about information that will be managed outside of DSpace (e.g. software code on GitHub or controlled vocabulary in WikiBase). But by providing machine-readable formats, custom plugins in DSpace will allow even richer connections between different components of the DiSSCo Knowledgebase.

Want to get involved? Feel free to check our remote repository on GitHub or contact us here! 

Debunking reliability myths of PIDs for Digital Specimens

In this post I address an erroneous assertion – a myth perhaps, that the proposed Digital Specimen Architecture relies heavily on a centralized resolver and registry for persistent identifiers that is inherently not distributed and that this makes the proposed “persistent” identifiers (PID) for Digital Specimens unreliable. By unreliable is meant link rot (‘404 not found’) and/or content drift (content today is not the same as content yesterday).

Continue reading “Debunking reliability myths of PIDs for Digital Specimens”

Reflections on TDWG 2020 Virtual sessions and other thoughts on long term data infrastructures

This year the annual conference of the Biodiversity Information Standards (historically known as the Taxonomic Databases Working Group — TDWG) is virtual and happening in two parts. The working sessions were concluded a few weeks ago and are separated from the virtual conference, which will be held on October 19-23. All the recordings of the working sessions are now available in youtube.

As several people already mentioned in twitter (#TDWG2020) the single track and the virtual format allowed participation from around the world which generated a wide range of discussions on not just data standards but also about data curation, attribution, annotation, integration, publication and most importantly the human efforts that are behind the data and systems.

It is this human aspect in the midst of our current data-intensive approach got me thinking about several contrasting aspects of biodiversity informatics and natural science collections management. Thinking about these two aspects together should be more at the forefront of our data and infrastructure discussions.

One contrast that lurks behind the “data-intensive” approach is the mix of structured collection of items (such as databases, spreadsheets) with narratives. This is what Lev Manovich called in his 1999 article the “database/narrative” opposition:

“As a cultural form, database represents the world as a list of items and it refuses to order the list. In contrast, a narrative creates a cause-and-effect trajectory of seemingly unordered items (events). Therefore, database and narrative are natural enemies. Competing for the same territory of human culture, each claims an exclusive right to make meaning out of the world.”

The physical objects stored and curated by the Natural History Museums and other institutes — elements for scientific meaning-making of the world — provide an interesting opportunity to explore this contrast further. In one hand, we have data collected about specimens and related objects stored in different formats (databases, spreadsheets, etc.). Most often there is some structure to these datasets. For instance this snippet from a GBIF occurrence record:

Branta bernicla nigricans (Lawrence, 1846)
North America, United States, Alaska, Aleutians West Census Area
Saint Paul Island, Pribilof Islands
G. Hanna
NMNH Extant Biology

With the help of APIs and parsing tools, we can figure out the structure of this snippet and derive at an assessment that this contains species name, collector name, a place, and a specimen identifier. On the other hand, we find snippets like the following hidden among the structured elements. This is from the European Nucleotide Archive (ENA) accession data derived from the above specimen:

DT 02-JUL-2002 (Rel. 72, Created) 
DT 29-NOV-2004 (Rel. 81, Last updated, Version 4)
note="obtained from preserved Brant goose (Branta bernicula)
specimen from the Smithsonian Institution's Museum of Natural History; 
specimen was originally collected by G.D. Hanna from St. Paul Island, 
Pribilof Islands, Alaska, on Sept. 9, 1917

Here we find a narrative — an ordered event list — it describes who, when, and what was collected. Of course, from the linked data, semantic interoperability, machine readability, actionability and FAIR point view, there are plenty of issues here that the community are struggling with. But let’s focus on what it means when our systems and workflows encounter these two very different types of data.

First of all, with tools and APIs, these two datasets (GBIF and ENA) eventually can be linked and made interoperable, FAIR — a definitely useful endeavour. But what is much harder to study and provide is to understand the theoretical underpinning and the context of these data. From several publications related to this specimen (mentioned in the GBIF snippet above), we learn that it was used in research related to the 1918 pandemic virus (the Smithsonian has several thousand such specimens from the early part of the 20th century). As we are living through another pandemic, one might wonder what were the historical, social, and political contexts of collecting and preserving these specimens? Who are the people behind these collection events? (see Bionomia profile of G.D. Hanna).

Scientists and data engineers might not be interested in these questions. Still, we often overlook that there’s no such thing as raw data and contexts, history influence scientific reasoning and the direction of research. This is echoed by different philosophers and historians of science. Most recently by Sabina Leonelli in the context of big data biology where she says, “increasing power of computational algorithms requires a proportional increase in critical thinking”. And as more data-intensive and automated, our research is becoming, the more we need to seriously look at:

“value- and theory-laden history of data objects. It also promotes efforts to document that history within databases, so that future data users can assess the quality of data for themselves and according to their own standards.”

The second point pertains to this aspect of history — in particular when data moves from one system to another. As data are collected from field sites then added to spreadsheets, imported into a database and then published to an aggregator they get denormalized, decontextualized, and then normalized and contextualized again. An API endpoint might provide some provenance information and summary, but the narrative and “data events” usually are missing. And we probably do not expect all systems to capture all these events. But these practices, events, data migrations leave traces of prior use that have impacts on later workflows (see the article by Andrea Thomer et al. that talks about data ghosts that haunt Collection Management System (CMS) data migration).

As we are building and working on data infrastructures to support scientists and eventually the society, we should have a pragmatic and holistic approach in understanding the database/narrative mix. With our unbridled euphoria about all things machine learning, automation and AI, we should be cautious about the long term implications and build something that is here to last.

This brings us back to the human aspect of the data. I will end the article from a quote by historian Mar Hicks. Recently COBOL (designed in 1959) become the scapegoat as the U.S unemployment insurance systems were overwhelmed during the pandemic. It turns out the issue was not with COBOL, it was the web front end that people used to file the claims (written in Java). Her article talks about the notion of “labor of care” — the engineers, people behind COBOL, the care and effort that goes behind maintaining large, complicated software and infrastructures — especially the ones that are needed during a crisis. Our tech innovation culture is too much focused on speed and structure side of things instead of the narrative. I leave you with her concluding sentence:

If we want to care for people in a pandemic, we also have to be willing to pay for the labor of care. This means the nurses and doctors who treat COVID patients; the students and teachers who require smaller, online classes to return to school; and the grocery workers who risk their lives every day. It also means making long-term investments in the engineers who care for the digital infrastructures that care for us in a crisis.

When systems are built to last for decades, we often don’t see the disaster unfolding until the people who cared for those systems have been gone for quite some time. The blessing and the curse of good infrastructure is that when it works, it is invisible: which means that too often, we don’t devote much care to it until it collapses.

Natural Science Identifiers & CETAF Stable Identifiers

The DiSSCo Technical Team gets asked a lot about Natural Science Identifiers (NSId). What are they? Why do we need them in addition to CETAF Stable Identifiers? Are they just for DiSSCo/Europe or are they global? In this post we answer those questions.

Q1. What is a Natural Science Identifier (NSId)?

A Natural Science Identifier (NSId) is a universal, unique persistent identifier for digitised natural science specimens (i.e., Digital Specimens) and other associated object types. An NSId will help you unambiguously refer to a specimen you are working with or will help to find a specimen that someone else has told you about by giving you the NSId e.g., as a reference in a journal article.

Continue reading “Natural Science Identifiers & CETAF Stable Identifiers”

Importance of FAIR data mobilization for taxonomic research

As the world is dealing with the outbreak of SARS-CoV-2  a great deal of effort has been focusing on data sharing, linking and open science in general. For instance the community around taxonomic research (see Joint CETAF-DiSSCo COVID-19 Task Force) is focusing on international collaborations to make use of data that can help us understand the origin and distribution of disease agent, host and vector. Initiatives like Virus Outbreak Network are working on FAIR data exchange. The research infrastructures that provide access to biodiversity, taxonomic, genomic and biotic interaction data thus are important technical components for researchers. However, in order for a comprehensive picture to emerge from the existing and new datasets, we also need to assess the current status of data repositories and research infrastructures. All these momentum around pandemic related research gives us a unique opportunity to reflect on current scope and future possibilities.

In this post, I comment on a recent paper (“Repositories for Taxonomic Data: Where We Are and What is Missing“; published online: 16 April 2020) in Systematic Biology that brings attention to lack of efforts in “safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15–20,000 species every year”. This is a timely article that also aligns with various discussions around Advancing the Catalogue of the World’s Natural History Collections.

The article does an excellent job by first quantitatively assessing number of alpha-taxonomic studies. Then an impressive tabulation of items from manually screening 4178 alpha-taxonomic works (published in 2002, 2010, and 2018) to conclude:

..images are the most universal data type produced in alpha-taxonomic work. This is true of all regions of the world. As a conservative estimate, ten images may typically be produced of the holotype and paratypes of a new species and published as part of the taxonomic study. Mostly, these are photographs and drawings, sometimes scanning electron microscopy (SEM). We may assume that in comprehensive revisionary studies, up to 100 images (of comparative voucher specimens, or of different morphological characters) will be produced per newly named species. Most are probably neither published nor submitted to repositories.

Miralles, Aurélien, et al. “Repositories for Taxonomic Data: Where We Are and What is Missing.” Systematic Biology (2020).

Compared to the amount of feline images and videos that are abundant in the social media this is a very manageable volume for modern data infrastructures. But how are taxonomic data infrastructures FAIR-ing? The article provides some valuable information and guidance.

Surveying a number of generalist and specialist repositories (the list is in the appendix of the article) the article proposes “Criteria for taxonomic data repositories” with FAIR principles in mind:

Taxonomic data, repositories should be (i) free of charge for data contributors, (ii) user-friendly, with a low-complexity submission workflow, not requiring affiliation to academic institutions and not requiring cumbersome registration or login procedures, and (iii) including careful and prompt quality-checks of submissions by dedicated data curators.

The article also highlights the importance of specimen identifiers however points out various technical and organizational issues that still need to be addressed by the community. For instance, “…the International Code of Zoological Nomenclature does not require individual identifiers for type specimens.”

Polyphylla decemlineata https://en.wikipedia.org/wiki/June_beetle#/media/File:Ten-Line-June-Beetle-2.JPG

Some of these issues around alpha-taxonomy data mobilization have been raised before (see this 2019 opinion paper in the European Journal of Taxonomy). However the article argues that:

Despite innovations such as semantic markup or tagging, a method that assigns markers, or tags, to taxonomic names, gene sequences, localities, designations of nomenclatural novelties and so on (Penev et al. 2018), standardization and sharing of raw data is far from being widely implemented in taxonomy.

This provides a fruitful avenue to think about data infrastructure design and operation around alpha-taxonomy workflow. From the perspective of DiSSCo (where specimen based data, metadata, and persistent identifiers are core elements), the survey and conclusion presented in the article will be immensely useful to understand pluralistic and pragmatic user requirements as we are working with European museums and natural science collections to create a data-driven research infrastructure (where alpha-taxonomy data and related workflows are invaluable components).

The proposed concept of Digital Specimen builds and extends on existing ideas (cyberspecimen, cybertaxonomy and Extended Specimen Network mentioned in the article) by bringing in FAIR Digital Object and machine actionable services to the forefront. The goal is to augment and support taxonomic and other research capabilities by creating and maintaining a sustainable infrastructure (the article mentions the importance of perpetual data storage but also associated carbon footprint).


A few more links and recommendation for further reading. The article emphasises the centrality of images generated from alpha-taxonomy research. In this regard, interoperability frameworks such as IIIF are important initiative for workflow and data repositories (check also SYNTHESYS+ projects on IIIF). We also need to think about data contextualization and re-contextualization not just from a technical perspective but other dimensions as well. I recommend Data-Centric Biology: A Philosphical Study by Sabina Leonelli for a comprehensive philosophical context.

As the article aptly points out that “taxonomic assignments are quasi-facts for most biological disciplines” thus data re-use and linking need to accommodate not just innovative approaches such as integrated taxonomy and machine learning technique but also a pluralistic, human centric understanding as well. Works by Beckett Sterner and Nico Franz speak to this. Also check out the inaugural issue of Megataxa in particular the editorial entitled “Taxonomy needs pluralism, but a controlled and manageable one“.

Identifiers for our institutes – GRID and ROR

DiSSCo aims to describe relationships between specimen and e.g. the collections in which they are curated, the collection holding institute, contributors, contributions, funders and scholarly publications. All these objects need to be uniquely identified to be able to connect their information.

To show the importance of connecting specimen with their institutes we can look at the European Loans and Visits System, ELViS. This is a DiSSCo service under development to provide physical and virtual access to the specimen that are curated and preserved in the collections held in our institutes. Many user stories collected for this service require to be able to uniquely identify institutes. A few examples:

To uniquely identify an institute it needs a persistent identifier, a PID. The Persistent Identifier policy for the European Open Science Cloud (EOSC) lists several requirements for a good persistent identifier: “A Persistent Identifier that supports and enables research that is FAIR is one that is globally unique, persistent, and resolvable“. It also describes what resolvable means: A PID is resolvable when it allows both human and machine users to access an object or its representation, and its Kernel Information. Kernel Information is a structured record that contains information (metadata) about the referred object, like a pointer to the location where the data (bit sequence) for the object can be found.

When an object or its representation are no longer available the PID still needs to resolve, with other words: resolution to Kernel Information must still be possible. It will then contain some ‘tombstone’ information about the object. The PID will thus need to remain forever, something which is very hard to achieve. It requires robust governance structures for PID registries where multiple organisations share the responsibility.

In order to be able to connect scientific data on a global level, DiSSCo aims to make use of a global PID registry for research institutes, rather than to create its own local solution for only the DiSSCo institutes or to use a registry that lists only natural history collection institutes. Since all DiSSCo institutes participate in the DiSSCo Research Infrastructure, they are a research organisation by default. Our natural history museums have no properties that require them to have a PID that is different from other research organisations.

There are several existing registries that provide PIDs or organisations. An overview of organisation identifier providers can be found in the report from a survey that was carried out by ORCID. It lists for example ISNI (International Standard Name Identifier), LEI (Legal Entity Identifier) and GRID (Global Research Identifier Database). Which one to choose? For selecting an organisation identifier provider the DiSSCo technical team had several requirements:

  • it should meet the requirements outlined in the EOSC PID Policy (see above)
  • have an established registry with enough research organisations already (“critical mass”) and all DiSSCo institutes should fit in its scope
  • the PIDs with kernel data should be public domain (Creative Commons Zero);
  • have transparent, non-profit governance;
  • Offer the ability for organizations to manage their own records, if possible without significant costs
  • Have appropriate metadata associated with them (e.g. things like a Name as human understandable label and relationship metadata to be interoperable with other identifiers)
  • Resolve to HTTP(S) URIs to allow easy access by both humans and machines

Non of the existing system fit all the requirements but GRID, the Global Research Identifier Database is the registry that fits these requirements the best. It meets all requirements above except for one important requirement: it is managed by a commercial company, Digital Science, and does therefore not meet the requirement of having a transparent, non-profit governance. However there is a community effort to fix that, which is ROR, the Research Organization Registry. This is a community-led project to develop an open, sustainable, usable, and unique identifier for every research organization in the world. Its steering group contains members from organisations like DataCite, Crossref, the California Digital Library and Digital Science. There is currently a 1:1 relationship between GRID and ROR identifiers, and both refer to each other in their metadata.

ROR just started and published its minimal viable product early 2019, a registry seeded with data from GRID. At the moment it contains the same number of organisations but less metadata than GRID provides and the ROR organisation is seeking funding to become sustainable. Therefore we will use both but use GRID as the primary system during the development of the ELViS service. It contains some metadata that is not yet present in ROR but useful for DiSSCo, like the geolocation of an institute.

GRID and ROR currently have PIDs for almost 100.000 research organisations in 217 countries. The data is public domain, and contains information about an organization like its name, alternate names, and location. This data is extracted from research funding grants and research paper affiliations. Source data is associated manually to the corresponding GRID record in a process called mapping. Whenever a source data row can not be mapped to a GRID record, a new record is created. This proces of manual curation ensures that each organisation has only one record. Records are named by using the generally recognised name of the institution, which is determined by querying the official website, encyclopaedic records and other trusted data sources.

Since institutes in DiSSCo often participate in research grants and produce research papers, most of them likely already have a GRID and ROR identifier. In SYNTHESYS+ we are piloting the use of these identifiers. As the ELViS Minimum Viable Product, a system has been developed that supports a Virtual Access pilot provided by 22 institutes. Most of these institutes indeed appeared to already have a GRID and ROR identifiers. The ones that did not have one, appeared to be part of a university that had an identifier. GRID supports relationships like child institutes and related institutes though. The institutes without an identifier applied for one through a form supplied at https://www.grid.ac/institutes. This proved to be an easy process, that can also be carried out through ROR: https://ror.org/curation/. There are no costs involved, getting a GRID or ROR is free. A few minor errors in the metadata were also fixed though the GRID form.

The use of GRID and ROR provides several benefits for DiSSCo. RORs are supported in version 4.3 of the DataCite Metadata Schema making it easy to connect with datasets published through DataCite such as the GBIF datasets. GRIDs are supported in ORCID which provides identifiers for researchers, making it easy to connect people with ORCID iDs with their institutes. Not only the institute name is supported as Name label but also alternative name labels in the form of aliases, language variants and acronyms. These are commonly used in our community.

Let’s look at an example for the National Museum of Natural History in Paris:

the GRID identifier is grid.410350.3 and the ROR identifier is 03wkt5x30. Since the identifiers contain name labels this could be displayed in a DiSSCo service for a human user with a label plus link to the identifier landing page as: National Museum of Natural History or as MNHN, depending on the need.

Using content negotiation, a machine will see the data different from what a users sees in a webbrowser. To see what a machine or a piece of software sees you will need to use for instance a cURL command:

curl -L -H "Accept: application/rdf+xml" https://www.grid.ac/institutes/grid.410350.3

This returns:

GRID and ROR unfortunately do not use the Handle system, like e.g DOI and ePIC do. So you need to know the URI: https://www.grid.ac/institutes/grid.410350.3 or https://ror.org/03wkt5x30 and the PID URIs cannot move to another location without breaking things. For instance if ror.org or grid.org websites cease to exist in the future. So although ROR and GRID provide the best solution so far for research organisation PIDS, there are still some improvements to make.

What is a Digital Specimen?

With projected lifespans of many decades, infrastructure initiatives such as Europe’s Distributed Systems of Scientific Collections (DiSSCo), USA’s Integrated Digitized Biocollections (iDigBio), National Specimen Information Infrastructure (NSII) of China and Australia’s digitisation of national research collections (NRCA Digital, available through the Atlas of Living Australia) aim at transforming today’s slow, inefficient and limited practices of working with natural science collections. The need to borrow specimens (plants, animals, fossils or rocks) or physically visit collections, and the absence of linkages to other relevant information represent significant impediments to answering today’s important scientific and societal questions.

Continue reading “What is a Digital Specimen?”

Designing and building the European Loans and Visits System for natural science collections

Along with physical access to the natural science collections, “Virtual Access” (VA) is becoming more important as it can provide much wider access to the data stored in the collections. Within SYNTHESYS+ (a DiSSCo linked project) the Virtual Access program is aiming to remove the reliance on physical access by piloting a “Digitisation on Demand” service model. One of the first steps towards this is to create a portal for the VA proposal submission and review process. We have been busy working on this for a while and the Virtual Access Applications portal has been ready for use since Feb 20, 2020.

Continue reading “Designing and building the European Loans and Visits System for natural science collections”

Fundamentals of Digital Specimen Architecture

We name the architecture we’re going to use for DiSSCo as “Digital Specimen Architecture”, or “DSArch” for short. It has three fundamental components to it:

  • Digital Object Architecture (DOA) as its core basis
  • Built-in support for the FAIR Guiding Principles
  • Evolutionary with Protected Characteristics

Here we explain why each component has been chosen and brought together in DSArch.

Continue reading “Fundamentals of Digital Specimen Architecture”