DataFinder Technical Architecture

DataFinder holds two types of records for data objects that it catalogues, depending on the source of the record, which is, in turn, determined partly by the location of the data object:

Location of Data Object

Harvested metadata record (green)

Human curated metadata record (blue)

Machine-readable repository with metadata capability that meet DataFinder requirements


Location described in a record from a secondary (more local) DataFinder instance


Machine-readable repository with metadata capability that does NOT meet DataFinder requirements



Non-machine readable repository


Non-digital data


DataBank is a data archiving service – it uses an identical data model to DataFinder (and consequently shares some code elements) but also stores the data objects themselves. As a result, DataBank records can be ingested and used by DataFinder without further processing.

Harvested records are not touched by DataFinder and requests to edit such metadata will direct a potential editor to the source system. A human curated record is used when the source system cannot meet the DataFinder metadata requirements, the source system does not support corrections or there is no source system. The human curated record supersedes any machine harvested information (although both are indexed and made available to users). For pragmatic reasons, DataFinder will support OAI-PMH as a harvest protocol in the current release (and consequently will also be OAI-PMH harvestable in turn).

DataFinder assigns DOI’s to data objects provided they meet DataCite minimum metadata requirements and do not have one already. Similarly, funding bodies such as research councils also have minimum metadata requirements for data whose production they have funded. If harvested metadata does not meet these requirements then it can be augmented via a human curated record.

DOI’s also need to resolve to a web landing page for each data object. This will be handled by Oxford’s PURL Resolver service. If the source repository meets DataFinder requirements then requests for a landing page are proxied through to the source repository. Otherwise, DataFinder will provide such a page using the metadata that it holds. DataFinder search results will reference DOI’s so that they can be used to generate data citations directly.

DataFinder can also obtain additional metadata from other systems. Oxford DMPOnline provides a tool for the online creation of data management plans in support of funding proposals. This tool can provide information to create a stub record in DataFinder or DataBank for future data deposit so that a depositor need not re-enter details such as Project Name; Funder and Project Description. Oxford University Research Archive (ORA) holds research publications that will reference data catalogued in DataFinder, which should automatically generate a reciprocal link in the corresponding DataFinder record. Finally, Databases held in Oxford’s Online Research Database Services (ORDS) include metadata in a format readily compatible with DataFinder.

ColWiz is an online research environment that can interact with DataFinder in a number of ways:

  1. During the research process, ColWiz will consume DataFinder metadata to allow researchers to find data and build data citation lists.
  2. If researchers come across data that has been omitted from DataFinder then they can submit a record to DataFinder (subject to editorial review).
  3. At the end of the process, users can submit manual records for their own data to DataFinder (or deposit that data itself into DataBank which finds its way into DataFinder via harvesting)

DataReporter harvests metadata and access statistics from DataFinder to provide a variety of reports – both for internal consumption but also to inform the REF process and for funding bodies seeking to validate their data management mandates.

Under the hood, DataBank, DataFinder and DMPOnline share an underlying object storage platform (ASTROS – A Streamlined RDF Object Store) based on the use of RDF to describe complex objects and the relationships between them. This means that DataFinder metadata records are compatible with Open Linked Data initiatives.

Posted in Uncategorized | Leave a comment

Leave a Reply