A EuroCRIS Task Groups meeting [doc], hosted by the University of Bath, was held on 10th February 2012. EuroCRIS is the professional not-for-profit association that develops and maintains the CERIF (Common European Research Information Format) data format. The meeting, which was attended by about 40 representatives from a wide variety of European and international research organizations, councils, and companies, was intended as an open forum for discussion of a number of topics that are covered by the CERIF, CRIS-IR (Institutional Repositories), Architecture, Best Practice/DRIS (Directory of Research Information Systems), and LOD (Linked Open Data) task groups. The meeting had the #cerifbath Twitter hashtag.
The largely discussion-focussed workshop went into considerable detail in the discussion of individual entities and attributes in the CERIF data model. The agenda of the morning session included new entities to be considered (identifiers, geo-location, education, person titles), modelling issues (dates, class scheme Ids), person name entities, as well as several cross-task group activities (CASRAI, VIVO, LOD, Architecture, CRIS-IR, DRIS), and a brief presentation on the new CERIF Toolkit, which has been designed to improve the development workflows of the CERIF data model and the deliverables that are produced with each release, thus enabling quicker and more frequent update cycles.
The group discussed among other things the introduction of Federated Identifiers, in the form of a URI that would be used e.g. for the hesaStaffID, in addition to the entities and properties available already (cfInstanceID=e.g. 12345 (string); cfEntity;=e.g. cfPerson (string); cfFederatedIdentifier (URI), e.g. http://www.,hesaStaffID (string); cfClassID, cfClassSchemeID; cfStartDate, cfEndDate). In the context of research infrastructure, geolocation was also a topic of discussion, particularly the bounding box vs other types of representation.
It was agreed that the bounding box is a useful concept for the purpose, but that other forms, ranging from postal address to lat/long combinations, should be possible alternatives. The group also discussed person names, particularly the relationship between preferred form, name variant, and other names. At the moment, the current preferred name is required, it is linked to the cfPerson, additionally name variant entities are possible, which are linked to the preferred form.
Name variants have an important use case e.g. for a female academic whose name changes after marriage, but who continues to use her maiden name in her publications. The topic will be discussed further in upcoming meetings and on the mailing list.
The morning session ended with a brief presentation on the new CERIF toolkit. The toolkit is intended to simplify and improve the quality of updates to the CERIF data model as well as to the creation of the associated deliverables. These deliverables include the old style of XML schema, the new unified style of XML schema, XML examples, the Semantics XML document, and the SQL DDL scripts that are used for implementation of the CERIF system. The toolkit is now in place to generate all these deliverables from the CERIF data model, in order to enable smaller and more frequent changes to the model.
All the XML schemas and examples are created from the SQL, which is based on the database modeller that implements the CERIF data model. The software is a project on SourceForge and is currently being finalized for release, licensed under EUPL. It is a JAVA 6-based tool built on XSLT. There is already a functional command line interface, a GUI is currently being developed.
The afternoon session saw separate meetings of the Architecture, Best Practice/DRIS, and LOD task groups. In the LOD session, Jon Corson-Rikert, chief SW developer, introduced the VIVO software, a tool to build a “Semantic Web” with LOD. VIVO includes an OWL ontology editor, an RDF content editor, HTML Web display and search, and LOD facility. The tool focuses on publicly available data. It offers a single point of discovery as well as integrated, filterable feeds. VIVO models types and relationships in the world in an ontology, the VIVO SW expresses this model. Vitro is a tool built on VIVO, which can be used to model classes and properties, it is available on SourceForge for download. CERIF and VIVO have a number of common classes, which can be used to align the two. LOD has requirements of its own that are not always identical with the same data published on Web pages, there may be multiple URIs for the same data for example, which can be dealt with via “sameAs” relations. Rich Exports may be a way of simplifying the LOD experience, e.g. a rich export would export from LOD the entire information typically found on a CV. VIVO is embedded in a wider information ecosystem, it leverages existing repositories of bibliographic metadata and other publicly available LOD. VIVO runs annual workshops for implementers.
David Shotton presented his work on CERRO: the CERIF Roles and Relationships Ontology [ppt]. Time-constrained roles and relationships are often a necessary requirement to increase the specificity of RDF triples. The Publishing Roles ontology allows for representation of roles with relation to time, the Publishing Status Ontology represents the status of a publication in time. CERRO is following this approach to model roles and publications in CERIF. CERRO is available on SourceForge.
The LOD Task Group aims to leverage CRIS data by exposing it through Web technology so that it can be interlinked and become more useful. It identifies use cases and scenarios for LOD, addresses ethical issues with LOD, produces recommendations for publishing CERIF data as LOD, informs the CERIF Task Group about useful changes to the data model. The Task Group will produce a CERIF LD recommendation as well as a prototype implementation. It limits itself to CERIF 1.3 for 2012 and focuses on easy transition/extension of existing CERIF deployments. The LD recommendations are restricted to the semantics of the CERIF version used, and enrichments via additional ontologies should be restricted to optional modules. The plan for 2012 is to produce the CERIF use cases for LD, study the technical and managerial challenges of LCD (Linked Closed Data), produce recommendations for exposing CRIS data using the CERIF model, produce a mapping between VIVO and CERIF, produce documentation for vendors and other interested parties on the Website. The chosen LD approach is to use RDF to expose CRIS data in XML format, enhance the data with links to controlled vocabularies and other datasets, publication of guidelines, as well as SPARQL and Web interfaces. Stakeholders include institutions, funding bodies, researchers, but also the general public which funds much of the research activity. The LD implementation will require an extension to the CERIF data model, as a link needs to be created from an internal entity within the CERIF system to an external entity published as LOD.