A workshop on How to make your language resources discoverable was held at Oxford University Computing Services on Friday June 24th, as part of the JISC-funded Discovering Babel project.
Ylva Berglund-Prytz from OUCS welcomed the participants, who introduced themselves and revealed that they came from numerous universities, representing teachers, researchers, post-graduate students and archivists, from the UK and abroad. See slides (pptx).
Andy McGregor introduced the work of the Resource Discovery Task Force and the JISC programme ‘Infrastructure for Resource Discovery’, with a refreshing willingness to acknowledge the different standards and practices in different disciplines. See slides (pptx).
Martin Wynne then spoke about Discovering Babel, the project within the programme which relates to language resources, focussing on the issues relating to the different ways of describing and cataloguing language corpora (and other resources) and making those descriptions available to users in a variety of ways. See slides (pdf).
Alexander König of the Max Planck Institute for Psycholinguistics then gave a demonstration of the CLARIN Virtual Language Observatory, which is collecting and making available to users in a single place the information about language resources from all around Europe. Most impressive was the overlay of the geographical data on Google Earth, allowing users to find resources via the map. See slides (ppt).
James Wilson then spoke about the suite of projects (many of them JISC-funded) in OUCS which are addressing the more general data management needs of researchers. After the discipline-based and pan-European scope of the CLARIN initiative, it was fascinating to compare the idea of service provision which we might hope to find within an institution. See slides (pptx).
In the afternoon, a ‘show-and-tell’ session then allowed participants to share information about the resources and services that they were sharing with other researchers. This fascinating whirlwind tour of a snapshot of the resources available in the UK showed us all what a variety of extremely valuable datasets continue to be created.
The presentations included:
- David Nathan (SOAS) on the Endangered Languages Archive (ELAR) – see slides (ppt)
- Ylva Berglund-Prytz on the British National Corpus (BNC) – see slides (ppt)
- John Coleman on the spoken BNC – see slides (pdf) and accompanying audio files one and two (wav)
- Wafya Hamouda on the Nafs Arabic corpus – see slides (ppt)
- Hermann Moisl on the North-East English dialect corpora – see slides (ppt)
- Nancy Tracy-Ventura on French and Spanish learner corpora – see slides (pptx)
- Slides were also provided by Jean Anderson from Glasgow – see slides (ppt)
The final session was a discussion which went beyond concerns about discovering resources, and focussed more on the re-use of resources, and on ways in which they can be exploited online, cross-searched, combined together, and connected with online tools and services.
From a very open and frank discussion about our needs, concerns and frustrations there emerged a strong feeling that a UK network was needed to express our requirements more forcefully to funders and other relevant organisations who can help us to build the kind of services that we need.
Recent informal meetings with partially overlapping set of people in Glasgow, Newcastle and Oxford have reinforced my impression that there is a strong desire to form a UK network of researchers interested in language data and tools. The motivations and proposed activities are to:
- find ways to find, share and reuse resources;
- develop joint projects to build resources and services;
- promote interoperability of resources so that they can more easily be used with generic tools, and combined with each other;
- lobby for UK funders to invest in infrastructure for creating and using language resources;
- lobby for language data and tools to be included in national computing infrastructure;
- lobby for UK participation in the European CLARIN infrastructure;
- provide channels of communication between UK researchers and CLARIN, e.g. to feed in our requirements, get access to services, participate in technical discussions, etc.).
Clearly this meeting was only a starting point!