A hackfest to open up encoded texts

Freely available, yet difficult to access

In January 2015, over 25,000 texts from the Early English Books Online Text Creation Partnership (EEBO-TCP) were made freely available as open data under a Creative Commons (CC 0) licence. This unique corpus represents a history of the printed word in England from the birth of the printing press to the reign of William and Mary, and contains texts of major significance for research in a range of academic disciplines.

However, the texts are encoded in formats (.epub and .xml) which makes it difficult for users to manipulate or query them unless they have the necessary technical expertise. Furthermore, without several good examples of usage it is not easy for experts to demonstrate the value of such a large corpus.

The homepage of the EEBO database

The EEBO homepage at http://eebo.chadwyck.com/home

To provide a hands-on opportunity to access the documents, and to find creative ways of working with them, the Bodleian Libraries hit on the idea of a hackfest. A hackfest, or hackathon, is an event that brings together a number of experts to work intensively on a particular project or issue. The concept originated in the field of software development, but has since expanded to other areas.

A plethora of creative solutions

The Bodleian hosted its EEBO-TCP Hackfest at the Weston Library in March 2015.  Participants were invited to demonstrate innovative and creative approaches to either the full data set or a number of subsets (provided by project staff), and to apply imaginative methodologies that might include an element of ‘surprise’.

Slideebbo

Facsimile view of a text in EEBO

The hackfest attracted people from a wide range of disciplines, including typography, linguistics, computer science and business history. The day began with a ‘speed dating’ exercise, which provided an opportunity for them to pitch their ideas and find collaborators. Groups then worked intensively on their respective ideas for the next four hours or so. EEBO-TCP and Bodleian Libraries staff provided expertise and technical know-how as required. At the end of the day, representatives from each group presented their work.

The outputs from the hackfest were varied. They included a visualisation of the relative frequency of colour terminology in the full data set, an examination of the ratio of Latinate to Germanic words used in the full data set, and analysis of the structural features in fictional and alchemical works. Ideas developed in response to an exercise to identify an ideal public (as opposed to academic) interface for EEBO-TCP included an interactive narrative game based around the transcript of a witch’s trial.

For those unable to attend the hackfest itself, the Bodleian Libraries ran an ‘ideas hack’ competition over a period of two months. This encouraged students, researchers and members of the public to explore creative approaches to the data and identify potential paths for future activity. The winning entries are listed on the Text Creation Partnership website.

Discoveries and lessons for the future

The hackfest provided an opportunity to bring the open content in the EEBO to the attention of researchers and the public, and allowed them to access skill sets and collaborators outside their usual fields. For the academics present, the day revealed the enormous potential of data-mining and computing tools for research and analysis.

Following the event, there was an increased interest in the texts from the corpus linguistics community and there were plans to include the material in British History Online.

Another important, but unplanned, outcome of the hackfest was the identification of requirements for a user-friendly interface for accessing the texts and similar material. As a consequence, the organisers intended to include this aspect in the planning of future events.

 Tips for organising your own hackfest

Liz McCarthy, Communications & Social Media Officer at the Bodleian Libraries, has this advice for would-be organisers:

  • Hackfests can be simple to run. All you need are a good theme and content, space to work in, a robust wifi connection and food.
  • It is helpful to ‘seed’ the event with a few experts, to ensure that people with useful technical skills are present.
  • Hackfest ‘speed dating’ works really well as both an ice-breaker activity and a way to help people figure out what others are doing and where their interests and skills overlap.

Further information

  • Read more about the EEBO Hackfest on the Bodleian Digital Libraries blog.
  • The Research Support Team in Academic IT Services offers specialist advice to researchers who are looking for help with digitization, text encoding, text analysis and visualization. Training in these topics is also available and can be booked through the ITLP team in IT Services.
  • The Education Enhancement Team in Academic IT Services offers specialist advice in outreach and public engagement through its annual #OxEngage programme and associated Engage website.

OxTALENT 2015 LogoRunner-up, OxTALENT 2015 award for open practices. The text and images in this case study have been adapted from Liz McCarthy’s entry for the OxTALENT competition.

Posted in Gardens, Libraries & Museums, Humanities, OxTALENT Winner | Tagged , | Leave a comment

Comments are closed.