Self Study (part 6) Primary Sources

Self Study (Part 6) Primary Sources

This post is the sixth in a series of posts providing a reading course of the TEI Guidelines. It starts with

  1. a basic one on Introducing XML and Markup then
  2. on Introduction to the Text Encoding Initiative Guidelines then
  3. one on the TEI Default Text Structure then
  4. one on the TEI Core Elements then
  5. one looking at at The TEI Header.

None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful. This sixth post is looking at how to represent primary source documents, including transcription, linking transcriptions to facsimiles, and genetic editing. Already in the core module of the TEI a number of elements are defined specifically for encoding primary sources. If you’ve got this far then you’ve already read about those, for example unclear or the choice element and its component parts abbr/expan, sic/corr, orig/reg. Some of these are further supplemented with additional elements if the ‘transcr’ module (the ‘Primary Sources’ chapter) is included in your schema. For example, the addition of am to abbr to record the abbreviation marker and ex inside expan to mark an editorial expansion.   Other elements provided if the ‘transcr’ module is included in the TEI ODD file
that created your schema include:

addSpan am damage damageSpan delSpan ex facsimile fw handNotes handShift
line listTranspose metamark mod  redo restore retrace sourceDoc space subst substJoin
supplied surface surfaceGrp surplus transpose undo zone

Annotating the activities of transcription and the relationship of this transcript with the original source document is at the heart of this chapter. This has several aspects including: more detailed encoding of the act transcription, the creation of digital facsimiles, and recording the writing process.


As you already know from reading about it the choice element is a way of encoding multiple transcriptional interpretations at a single point in a text. For example, an
abbreviation with abbr and its expansion with expan, or an apparent error with sic and its editorial correction with corr, or an original reading with orig and a regularised form with reg.
These children of choice are repeatable, so it is possible to encode an abbreviation with multiple possible expansions (hint one could use the @cert attribute to indicate which of these expansions is more certain).

What could be encoded as:

could also be encoded as:

As mentioned above this chapter also adds am to abbr to record the abbreviation marker and ex inside expan to mark an editorial expansion. The abbreviation marker may or may not be present but is the thing in the original text which indicates to you that the word should be interpreted as an abbreviation. In this case, although NATO is more commonly abbreviated as an initialism with no ‘.’ marking the individual letters, here it has these. We could encode this, using ex inside expan to mark the expanded portions of text as:

This form of markup is different from the superficially similar to subst (added in this module) which contains add and del to record additions and deletions. The prime difference is that whereas in choice only one of the child elements is truly transcribing something in the text, with subst both the deletion and addition are present to be transcribed.   There are also elements to record damage to the source, text that has been supplied by the editor, is considered surplus to editorial requirements, or for recording unusual space in the document. There is also a way to note a change of scribal hand, using the handShift element.

Digital Facsimiles

In many scholarly editions the provision of digital images acts a facsimile or surrogate of the original document to such a degree as to enable primary source research without recourse to the original object. Although the TEI stands for the Text Encoding Initiative, it is indeed possible to have a TEI document which does not contain a text element. Inside the TEI, after the teiHeader there must be either a facsimilesourceDoc, or text element. But you could have a document which at this point only had a facsimile and no transcribed text.   The facsimile element contains images rather than text, and these can appear either directly as graphic elements, or be organised by surface elements for each surface and zone to
specify sections on those surfaces. The surfaceGrp element can be used to group multiple surfaces together (e.g. recto and verso of a folio, or indeed gatherings).
A basic facsimile element may have looked like:

These instead could be grouped as individual surfaces:

or with zones and coordinates:

These can be given as x/y coordinates for the upper left and lower right to draw a rectangular bounding box. The @xml:id attributes in these examples can be pointed to from the page breaks in the transcription of the text (if provided):

Or from any other element, such as a division:

Recording the Writing Process

Linking a transcription to a zone in a facsimile by pointing to it with the @facs attribute is one way to relate the text to images. Another does not prioritise the final text, but the process which was undertaken to create this text. In this case the transcription can be made in the sourceDoc element. When surface, zone and line (for transcription of topographic lines on the document) are used inside sourceDoc they are for transcriptions of the text as they appear as units on the physical document without the semantic interpretation that we find in transcriptions that use the text element. (For example, deciding that they form paragraphs or speeches by particular characters.)

Of course, all of the surfaces, zones and lines can have coordinates on them or use the @points attribute for a series of coordinates for non-rectangular areas.   There are other elements, for use in recording the text which also relate to the process of writing. These include metamark which records any symbol which indicates how it should be read rather than forming part of the content. (For example, an arrow ‘moving’ a paragraph above the one which proceeds it in the document.) A general mod element can also be used to record a modification in the document without the semantic interpretation of some of the other transcriptional elements.

Additional elements are available that help to record the process of writing the document, including the restore element to indicate a deletion that has been marked as reverting to a previous version by cancelling some textual interaction. This is used for comparatively simple cases, whereas the more general undo element can be used to indicate any form of cancellation. If a cancellation is then marked as being reaffirmed or reasserted in some manner then a redo element can be used. There is also a way to record the act of transposition by using a transpose, sometimes gathered in a listTranspose, to point to the elements that are transposed.

A retrace element can be used where writing has been overwritten, usually with the intention of clarifying or fixing the text. This is sometimes a distinct phase in the production of the text. Any distinct stages in the text, such as campaigns of revision or editing phases, can be recorded using the listChange and change elements. These are not provided in this chapter, but the Header, where they are used in revisionDesc to record stages of revisions in the creation of the electronic file. When used in the creation element in the header it instead records phases of development of the text itself.

Questions about Encoding Primary Sources

As usual I’ve got some self-assessment questions for you to test that you’ve read the chapter carefully.

  1. What is the difference between a surface and a zone?
  2. What are the options for child elements of the TEI element?
  3. Can a zone be larger than its parent surface?
  4. How can you point from a surface element to a page break rather than the other way around?
  5. What do you use to break up textual transcription inside a line element?
  6. Think of an example of metamark used in documents your familiar with. How would you encode it?
  7. Why might you use a g element inside an am element?
  8. What is a substJoin element used for?
  9. What is the difference between using damage and unclear with textual content?
  10. How is line different from zone? When would you use line?
  11. How would you record how large an unexpected space was?
  12. Can you think of a reason why recording stages of production of the texts you are interested in
    might benefit your own work?

You may wish to look at the Image Markup Tool, written by Martin Holmes from the University of Victoria in Canada. This uses the facsimile, zone and surface elements to record the coordinates of the annotation and links the transcription to these.

Posted in SelfStudy, TEI, XML | 3 Comments

3 Responses to “Self Study (part 6) Primary Sources”

  1. […] Collections and Reference Resources Grant to support ongoing enrichment of and access to Livingstone Online. This site — a well established, transatlantic, digital archive initiative — seeks to […]

  2. Lou says:

    Shouldn’t this begin “This post is the *sixth* in a series of posts” ?

  3. […] explore: James Cummings’ TEI tutorial Part 1 Part 2 Part 3 Part 4 Part 5 Part 6
TEI By Example Project Sample TEI projects (Brown WWP) Initiation à l’encodage XML-TEI des […]

Leave a Reply