The Historic Hospital Admission Registers Project (HHARP) is a useful example of how volunteers can help make material that would otherwise be hidden from most users widely available online. As further shown below, it is also a good case study for how to manage certain issues that will come up in digitisation and community contribution projects.
About the project
The HHARP database contains almost 120,000 records that have been transcribed from Victorian and Edwardian patient admission registers by a set of volunteers. The records all relate to children and cover periods between February 1852 and December 1914. The project, which started in 2001, now makes available material from four hospitals: the Hospital for Sick Children (Great Ormond Street), the Evelina Hospital (now part of Guy’s and St Thomas’s NHS Trust), the Alexandra Hospital for Children with Hip Disease, and the Royal Hospital for Sick Children in Glasgow.
a portal into the world of Victorian and Edwardian children’s hospitals (from the HHARP website)
The HHARP database is a useful resource for family historians but also for anyone interested in the period. The website offers access to the database but also articles about the hospitals and ‘pen portraits’ and images of some of the people featuring in the database. Access is free and anyone can search the collection. Registered users get access to advanced search options and more detailed records and can download and print the records.
Although the HHARP is not making use of an online community of volunteers for their transcription work, the project can serve as a useful example for other community collection projects. The project describes its methodology in some detail on their webpage. Basically, their workflow has been:
- The original records were microfilmed and digital photocopies were made.
- Batches of the records were issued to volunteers.
- Volunteers transcribed the records exactly as they appeared. Standardised versions of the key data elements were added.
- A set of indexes was developed to facilitate searching.
- The database was expanded to allow entry of new data elements as additional hospitals, with slightly different records, were added to the project.
Some points that any project may benefit from considering include:
- Comparison between data sets of different origin: Use the same methodology for each set of records to enable meaningful comparisons.
- Keep track of the originals: Number and label each batch of data to be able to keep track of them and make sure all get done and unnecessary duplication is avoided.
Balance authenticity and standardisation
Historical records and material created by hand by a large number of different contributors (such as hospital admission records) will display a number of inconsistencies, errors, and variant spellings. That means that it can be difficult to cross-search the collection, as the same person, disease, place, etc can be written in different ways. One way to get around that is to standardise spellings – to decide that all instances of a particular word always be input in the same way. Doing that will facilitate cross-searching but at the loss of authenticity. It may also require more user training, to make sure the transcribers choose the right version and users know what to search for.
Each project needs to decide what is the right approach for them but as the HHARP shows, it is possible to both have the cake and eat it. HHARP has chosen to preserve the authenticity of the records and transcribe any variants and errors as they occur in the original. In addition to this, they added standardised versions of some elements. That means that it is possible to search the database and retrieve all relevant records, but without loosing the original variation.
Standardised versions of the key data elements enabled such original errors to be corrected while maintaining the integrity of the source material. (from the HHARP website)
A further way to facilitate searching of the data is to create indexes. HHARP have produced indexes for various elements, including doctor’s name, disease, and street name, which means that users of the database can identify relevant records more easily.
Volunteer training and support
The support and training of volunteers is an important factor to ensure that the output is as good as possible. HHARP notes that the fact that the same group of volunteers have worked on their project the whole time means that they are now very skilled.
as they gained experience their ability to transcribe the specialised content accurately grew (from the HHARP website)
Not all projects have the luxury of a set group of volunteers, which means that it is even more important to offer the contributors as much help and support as possible. One useful tool is crib sheets. HHARP used lists of common disease terms, terms used in post mortem reports, and 19th century therapeutics and also provided their volunteers with some additional reference material such as indexes of old street maps to help identify street names and addresses.
Initially, although most volunteers had experience of the Victorian hand, very few had medical knowledge or were familiar with the streets of Victorian London. (from the HHARP website)
Quality control to check performance and ensure consistency can be done in many ways. HHARP makes sure their records are proofread and checked and they also use computerised validations. Another approach is to have the same operation performed by more than one volunteer. The output can then be compared and where there are differences, further checks or calibrations can be used to consolidate the result. This method is used by, for example, transcription services where each item is transcribed twice, compared automatically and any divergences are flagged and investigated.
Another approach is to make sure each operation is performed by a large number of volunteers and the average or most frequent value is then used. This was done in the Galaxy Zoo project where volunteers classify galaxies. By having each image looked at by several people, the project can be more certain that the right option has been applied. The variation between different classifications can also be taken into account, for example to find borderline cases or when designing user training and documentation.
Having multiple classifications of the same object is important, as it allows us to assess how reliable each one is. (from the Galaxy Zoo website)
It may not always be feasible to have different people re-do the same work or check everything that is produced by a project. One option is to make random checks where a small, random set of data is looked at in more detail. Another option is to have a small proportion of the work be done more than once. By then looking at the items that have been transcribed, classified, or tagged more than once the project can get an idea of how great the variation may be and decide whether that is acceptable or if something needs to be done about it. It may not be necessary to fundamentally change the method for a project even if large variation is discovered in its output. In some cases more careful user instruction or training can help improve performance. Needless to say it is easier to make changes to improve performance at the beginning of a project, before all the work is done.
More information about the HHARP project can be found on their website http://hharp.org/. JISC Digital Media provides advice and guidance on the creation and use of digital media collections in learning, teaching and research. Although no longer updated, the material on the AHDS Advice on Creating Digital Resources page still offers some useful information and case studies illustrating good practice for digitisation projects.