This post is the fifth in a series of posts providing a reading course of the TEI Guidelines. It starts with
- a basic one on Introducing XML and Markup
- an Introduction to the Text Encoding Initiative Guidelines
- and one on the TEI Default Text Structure
- and one on TEI Core Elements
None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful.
This fifth post is looking at The TEI Header.
The <teiHeader> is an essential part of every TEI file; it is where you record metadata for the digital text you are creating, document what you have done and why, as well as put additional information which may be useful in understanding or interrogating this file.
The <teiHeader>, often just casually referred to as ‘the header’, is in some ways the most important part of your TEI file. Without it we can’t know what the file consists of, what you were trying to do when you created it, what we are allowed to do with it, or anything else about this electronic file. A digital file without proper metadata is only of very limited use. However, the provision of basic metadata need not be an onerous task only completed by well qualified librarians and bibliographers: you too can provide decent metadata for your digital text.
At its very minimal the TEI requires that the header have a <fileDesc> element and that in turn this have child elements for a <titleStmt> (information about the title of the digital file), a <publicationStmt> (information about the publication of the digital file), and a <sourceDesc> (information about the source of the digital file even if newly created).
As siblings to the <fileDesc> one could also have the elements <encodingDesc> (to store information about the encoding of the digital text), <profileDesc> (a text profile of additional information), or <revisionDesc> (to store information about major revisions).
The <fileDesc> Element
Inside <fileDesc> you can store all sorts of information about the file. The RelaxNG Compact Syntax for this content model (excluding its membership in attribute classes) is:
(titleStmt, editionStmt?, extent?, publicationStmt, seriesStmt?, notesStmt?), sourceDesc+
This means that there is:
- a required <titleStmt> which allows you to record one or more <title> (required) and responsibilities such as <author>, <editor>, <funder>, <meeting>, <principal>, <sponsor>, or general purpose <respStmt> followed by
- an optional <editionStmt>, to record information about this digital edition followed by
- an optional <extent> element to give a place for information about size followed by
- a required <publicationStmt> to record necessary information about the publication of the digital file either as prose paragraphs or structured information on the <distributor>, <authority>, <availability>, <address>, <date>, <publisher>, <pubPlace> or one or more <idno> element. This is followed by
- an optional <seriesStmt> gives a place for relating this digital file to a series of any sort of which it might be a part
- an optional <notesStmt> gives a place for any notes relating to the file not encoded elsewhere
- and after all of this at least one <sourceDesc> is required to record information concerning one or more sources for this electronic file. This can contain either prose paragraphs or more structured information about the bibliographic sources in a variety of formats.
And that is it! That is all that is required for a valid and useful <teiHeader>.
The <encodingDesc> Element
But of course, sometimes we don’t want to only record the minimal amount of information, we may wish to record other things. As mentioned above after the <fileDesc> we can also have an <encodingDesc> (to store information about the encoding of the digital text), <profileDesc> (a text profile of additional information), or <revisionDesc> (to store information about major revisions).
The <encodingDesc> element is where one can store information about what decisions were made in the encoding of the text. Like many metadata categories in the TEI this can either be given as prose paragraphs or more structured forms concentrating on the following:
- when the header module (required) is loaded:
- information about an application which has edited the TEI file: <appInfo>
- taxonomies defining any classificatory codes used elsewhere in the text: <classDecl>
- details of editorial principles and practices applied during the encoding of a text: <editorialDecl>
- a geographic coordinates declaration: <geoDecl>
- a list of definitions of prefixing schemes used in data.pointer values: <listPrefixDef>
- a project description: <projectDesc>
- a declaration specifying how canonical references are constructed for this text: <refsDecl>
- a description of the rationale and methods used in sampling texts in the creation of a corpus or collection: <samplingDecl>
- information about the language in which style information used to describe the original object is supplied: <styleDefDecl>
- detailed information about the tagging applied to a document: <tagsDecl>
- when the gaiji module is loaded:
- information about nonstandard characters and glyphs: <charDecl>
- when the iso-fs module is loaded:
- when the tagdocs module is loaded:
- a specification of the schema the document is intended to validate against: <schemaSpec>
- when the textcrit module is loaded:
- a declaration of the method used to encode text-critical variants: <variantEncoding>
- when the verse module is loaded:
- a metrical notation declaration: <metDecl>
Of course, these are all optional or instead of using structured elements you can just use the <p> element (or if the linking module is loaded the <ab> element) to provide one or more prose paragraphs.
The <profileDesc> Element
After the <encodingDesc> it is possible to have a <profileDesc> element to record various non-bibliographic aspects of a text. The information recorded again depends on what modules are loaded when creating your schemas. This allows metadata categories including:
- when the header module (required) is loaded:
- a record of the calendaring system used in the dating elements: <calendarDesc>
- information about the creation of a text: <creation>
- a description of the languages, sublanguages, registers, or dialects, represented within a text: <langUsage>
- a collection of information describing the nature or topic of a text in terms of a standard classification or keywords scheme: <textClass>
- when the corpus module is loaded:
- when the transcr module is loaded:
The <revisionDesc> Element
The final component of the <teiHeader> is an optional single <revisionDesc> which summarises the the revision history of the file. Inside <revisionDesc> you usually place a series of change elements ordered so the most recent is at the top. The change element has a both dating attributes like @when to provide the date of the change as well as a @who attribute to point to information (such an author, editor, or more general respStmt in the <titleStmt>.
And that is the <teiHeader>!
Ok, there are indeed lots more that can be said about each of those individual grandchildren in the XML hierarchy, and some aspects, such as the description of manuscripts and early printed books (using <msDesc>) even gets a chapter of its very own (Manuscript Description) that I’ll cover in another post. But this is meant to be a series of blog posts as a reading course of the TEI Guidelines. So below are some basic questions you should be able to answer if you’ve read the TEI Header chapter.
Questions About the <teiHeader> Chapter
- What are the four major components of the <teiHeader>?
- Inside <titleStmt> inside a <fileDesc> what element would you use to record who transcribed a manuscript?
- What is the difference between a new edition of your file and a revision of it? How would you document each of these?
- Where would you put general notes about your text?
- What element would you use inside <sourceDesc> to provide a manuscript description? What about a script for a spoken text? What about the recordings used to produce a transcription?
- Inside the <editorialDecl> how do you indicate whether end-of-line hyphenation has been retained in a text?
- What is the rendition element used to describe? What global attribute do you use to reference it from the text?
- What elements do you need to construct an arbitrarily-deeply nested taxonomy?
- If you were writing a computer program which modified a TEI file, where in the <teiHeader> would you store information about how your program had modified the file?
- How (and where) would you indicate that approximately 80% of a text was in Latin and 20% was in English?
- How do you provide information about a date that is in a non-Gregorian calendaring system?
- The TEI Guidelines can not enforce the provision of all possible metadata. What information do you think should be provided as a minimum? What would you include as recommended components of the <teiHeader> for your own project? How might this differ if you aren’t encoding just one document but hundreds or thousands of them?
Encoding Your Own Material
Continue encoding your own material, but this time return to the <teiHeader> and improve it as much as you can. Think about those aspects that might be useful for you to encode to be able to find this text amongst many others; think about those aspects of the text that might be helpful for you to encode for those that wish to study texts like this in large collections through examining their metadata through (semi)automated means. Hopefully but doing so you’ll make better use of the <teiHeader>.