This is a summary of some of the key outcomes of the Discovering Babel project, with links to where you can find out more.
For those of you looking to find electronic literary and linguistic resources please visit the Oxford Text Archive (OTA) and the CLARIN Virtual Language Observatory. The OTA will shortly relaunch with a new look and feel,and many new resources. The VLO is constantly improving and under development.
Those of you creating and sharing language resources, please join the CLARIN-UK mailing list. This list is a forum for creators and users of linguistic resources and tools to discuss how we can go forward to develop better facilities and shared services, and to gather user requirements.
Evidence of reuse
The metadata that has been made available as part of the Discovering Babel project is being harvested by the CLARIN Virtual Language Observatory, and can be viewed on their portal. At the moment, we still have some performance issues with delivering the files via OAI-PMH, so there may only be a few records listed there, but we have identified the problem and will be fixing it in the next few days!
The work in Discovering Babel has contributed to an enhanced Oxford Text Archive, with more reliable and more easily discovered catalogue records, and with open access texts at persistent locations. This is designed to allow others to build services on top of our data, in a distributed environment. It has already helped to make possible the JSC-funded Great Writers project, which will, among other things, link to source texts in various formats, including epub, in the OTA.
The OTA is now also working together with the creators of Voyant at the University of Alberta, who have under development exactly the sort of tools that we imagined would bring our texts alive. Visit http://voyeurtools.org/ and paste in the following URI to get a flavour of what will be possible:
You can see more about this text at http://www.ota.ox.ac.uk/desc/3253. At the beginning of 2011, texts from the OTA were only available on request for download. Already now, thanks in large part to Discovering Babel, we are seeing on our desktop the emergence of seamless access to distributed texts with remote tools in a service-oriented architecture.
Further collaborations with the National Grid Service in the UK to host language resources in the Cloud for UK researchers, with the development of a cross-repository search service for CLARIN, and shared services in Project Bamboo will all be underpinned in part by work done in Discovering Babel.
Skills needed for the project
The basic technical skills needed were for processing XML, e.g. XSLT 1.0 and 2.0, plus installation of modules in an Apache server, including Shibboleth access and identity management software. Various perl scripts were also deployed. Exactly how to do these things in this circumstances in which we were working were not things that anyone in the team had done before. For example, we had to read about and learn the specifications for the Open Archives Initiative Protocol for Metadata Harvesting, and the about the element set for describing language resources from the Open Language Archives Community, as well as the Shibboleth software. We were able to call on expertise in the Oxford University Computing Services for the fundamental technical areas and administrative procedures, and on experts in the CLARIN network across Europe for guidance on implementation in the specific scenarios for sharing language resources. Perhaps more than technical skills, knowledge of the work that was going on in our institution, nationally, and around Europe in the relevant areas were key to the success of the project.
Most significant lessons learned
- don’t build a digital silo: engage with infrastructure initiatives, such as CLARIN, and find out about recommendations for good practice in connecting resources, such as the Resource Discovery Task Force, and avoid building an online resource which is difficult to find and unconnected to other data and tools;
- at the technical level, be flexible. This work touched on fast-changing fields, and we needed to be prepared to learn about new things, and to change the technological solutions which we deployed. This also meant planning for future change in order to make services sustainable;
- keep it simple: our successes were not the result of great leaps forward, or building complex and flashy front-ends and tools. Instead, we applied good practice in a systematic way in order to provide reliable services to underpin and fit into a shared services infrastructure. So simply providing crosswalks to Dublin Core from our metadata, and establishing an OAI-PMH service opened many doors. Putting the resource files at accessble URIs on the web allows new types of service to be developed, with much easier access and more powerful functionality.