We name the architecture we’re going to use for DiSSCo as “Digital Specimen Architecture”, or “DSArch” for short. It has three fundamental components to it:
- Digital Object Architecture (DOA) as its core basis
- Built-in support for the FAIR Guiding Principles
- Evolutionary with Protected Characteristics
Here we explain why each component has been chosen and brought together in DSArch.
Digital Object Architecture (DOA) as the core basis
Several technical approaches were considered as the basis for DiSSCo ICT infrastructure, including the approaches of Semantic Web (Linked Open Data, RDF, Triples) and Object Reuse and Exchange (OAI-ORE, aggregations of Web resources described by resource maps). Historical and current patterns of infrastructure development suggest show these as evolutionary steps in the technology of Web infrastructure. Much more interesting is an emergence of new data architectures. Such approaches include decentralised applications (d-apps) enabled by blockchain technologies, data intensive federations and marketplaces, and Digital Object Architecture (DOA). The last of these is considered from the research data perspective as a new kind of data fabric – the Internet of FAIR Data and Services. Differing from all other alternatives, it is a fundamental extension of the basic Internet architecture, responding to the ‘Big Data’ explosion in scientific research that has been in progress for the past two decades. With its own communication protocol (Digital Object Interface Protocol, DOIP), Digital Object Architecture (DOA) sits alongside Web approaches [Kahn 2006, Weigel 2017]. It is gathering strong interest from multiple ESFRI research infrastructures across Europe as the means of implementing the European Open Science Cloud (EOSC). DOA is the principal component of DSArch and is DiSSCo’s choice reflecting this trend and its basic need to be able to efficiently manage research data pertaining to natural sciences specimens as ‘specimens on the Internet’.
Built-in support for the FAIR Guiding Principles
Two decades of scientific research based on ‘Big Data’, coupled with political movements towards open access to publicly funded research has led to recognition of the need to make scientific data increasingly ‘findable, accessible, interoperable and re-usable’ (FAIR). These four attributes form the basis of the now widely adopted FAIR Guiding Principles (and here). DiSSCo intends to take an active approach to data management planning and stewardship, with focus on achieving maximum accessibility and reusability of data according to these core principles, longevity of data and data preservation, community curation, linking to third-party information and reproducible science. The FAIR Guiding Principles and their intrinsic support by DOA are manifested as FAIR Digital Objects (FDO) through a Joint Statement on a FAIR Digital Object Framework, which DiSSCo Coordination and Support Office (DiSSCo CSO) and DiSSCo Technical Team have both endorsed. Thus, this represents the second principal component of DSArch.
Evolutionary with Protected Characteristics
Several characteristics of DiSSCo data management are essential to protect throughout and ultimately beyond the lifetime of the DiSSCo data infrastructure for engendering community trust in the value, veracity and reliability of the data to be managed. These are listed below and described in detail in the provisional Data Management Plan for the DiSSCo infrastructure:
- Centrality of the Digital Specimen
- Accuracy and authenticity of the Digital Specimen
- Protection of data
- Preserving readability and retrievability
- Traceability (provenance) of specimens
- Annotation history
- Determinability (status and trends) of digitization
Nevertheless, considering the expected lifetime of DiSSCo ICT infrastructure, it’s inevitable that the infrastructure, its design, implementation and operation will evolve over its lifetime – both to meet new needs from users and organisations but also as underlying technologies change. The ‘evolutionary architecture’ approach recognises and addresses such evolution by assigning protected status to dimensions considered essential (the characteristics mentioned above) to the integrity of the infrastructure over the very long-term. This is the third principal component of DSArch.
Protecting the essential characteristics mean that proposals for design decisions and changes (technical, procedural and organisational) must be assessed for their effect on those aspects. Ideally, all design decisions and changes must not destroy or lessen any of the protected characteristics and should aim to enhance one or more of the characteristics.
 See the ‘Berlin presentation’ here: https://www.rd-alliance.org/rda-11th-plenary-joint-meeting-ig-data-fabric-wg-research-data-collections under agenda item ‘2.3 Digital Object Principles’ by Wittenburg/Strawn.