From the 1 January 2015 the first phase of EEBO-TCP (Early English Books Online – Text Creation Partnership) transcribed books entered the public domain. They join those created by ECCO-TCP (Eighteenth Century Collections Online – Text Creation Partnership) and Evans-TCP (Evans Early American Imprints – Text Creation Partnership). The goal of the Text Creation Partnership is to create accurate XML/SGML encoded electronic text editions of early printed books. They transcribe and encode the page images of books from ProQuest’s Early English Books Online, Gale Cengage’s Eighteenth Century Collections Online, and Readex’s Evans Early American Imprints. The work the TCP does, and hence the resulting transcriptions that they create, are jointly funded and owned by more than 150 libraries worldwide. Eventually all of the TCP’s work will be placed into the public domain for anyone to use and the release of Phase 1 of EEBO-TCP is a milestone in this process.
The TCP began in 1999 as a partnership among the libraries of the University of Michigan and the University of Oxford, ProQuest, and the Council on Library and Information Resources (CLIR). As and when TCP texts have entered into the public domain we have made them available at the Oxford Text Archive. This was already distributing the public domain copies of ECCO-TCP, and now adds the phase one of EEBO-TCP and Evans-TCP this collection. The hard work of managing the creation, encoding, checking, and providing the texts have been done by the Bodleian Library at the University of Oxford and the University of Michigan Library, while the Academic IT group of IT Services at the University of Oxford has undertaken the task of bringing the encoding into full conformance with the Text Encoding Initiative P5 Guidelines and making the results available in various forms.
The Academic IT group of IT Services at the University of Oxford has made use of these texts for a number of projects and so wanted to make sure that the texts were easily available now that they have entered the public domain. To do so we have placed them in a special collection at the OTA which displays the metadata (stored in a postgresql database) as a jQuery dataTable enabling sorting and filtering by any aspect of this. This table currently lists 61315 texts, but this includes 28462 texts which are ‘restricted’. These are not in the public domain yet, but are available to those at the University of Oxford to use in the meantime. The remaining 32853 texts are freely available to the public. You can see only the free ones by filtering by ‘Free’ in the availability column. Each entry in the table provides basic metadata of the TCP ID, links, the title, availability, date, other IDs associated with the text, keyword terms TCP provided it, and a rough page count. The links provided are to:
- Web: This is a basic HTML rendering using the XSLT Stylesheets of the Text Encoding Initiative Consortium
- ePub: This is a basic conversion to ePub format, as above using the XSLT Stylesheets of the TEI Consortium, for reading on mobile and table devices which support this format
- Images: This link is only present for certain texts and takes you to the JISC Historical Texts Platform entry for this text. Historical Texts is a JISC-funded service available via subscription to UK HE and FE institutions and Research Councils who are full Jisc Collections members. We recognise that this is not useful for those at institutions who do not subscribe to this service or are not in the UK. It may also be possible to go back to proquest’s EEBO and find the page images directly if your institution subscribes to that. It was decided it was better to include the link for the benefit of users at UK subscribing institutions rather than not include it.
- Source: In the case of public domain texts we have created a github repository per text, and a couple of additional ones. These are all part of the Text Creation Partnership organization at github which has representatives from the libraries at both Oxford and Michigan. This is located at https://github.com/textcreationpartnership/ and the repositories take the form of https://github.com/textcreationpartnership/TCP-ID where TCP-ID is the number provided this work by the TCP. e.g. https://github.com/textcreationpartnership/A00021 We hope that the TEI P5 XML provided in such repositories will serve as the base for enhancements and corrections.
- Each repository has a readme.md markdown file which provides various metadata, revisions of the file, content summary, and tag summary. It is expected that updates to individual texts will be submitted as pull requests (from forked repositories) or versions modified for particular uses could be submitted back to the repository under a different name.
- There are two additional repositories https://github.com/textcreationpartnership/TCPTools and https://github.com/textcreationpartnership/Texts which have additional tools or information about the texts. For example, there is a CSV file listing all the metadata for the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv (JSON also available), and a Linux shell script to simplify cloning all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh
- Analysis: Currently there are no links to text analysis engines, but we are considering the possibility of adding them where they function by giving a simple link with the URL of a source in it. Obviously this will only be able to be provided for freely available texts.
A lot of the work to make these texts available via the Oxford Text Archive, after they were created by the TCP, has been done by Sebastian Rahtz, Magdalena Turska, and James Cummings. The research support team at IT Services can be reached at: email@example.com. You can read more about TCP and EEBO at http://www.textcreationpartnership.org/tcp-eebo/ and http://www.bodleian.ox.ac.uk/eebotcp/.