This post is the fourth in a series of posts providing a reading course of the TEI Guidelines. It starts with
- a basic one on Introducing XML and Markup
- an Introduction to the Text Encoding Initiative Guidelines
- and one on the TEI Default Text Structure
None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful.
This fourth post is looking at Chapter 3 of the TEI P5 Guidelines: Elements Available in All TEI Documents. This is, of course, a terrible name for this chapter. It has been called this or something similar for quite a number of versions of the TEI so probably not worth campaigning to change it until a major new version of the TEI is on the cards. The reason it is a bad name, of course, is that you cannot guarantee that the elements listed in this chapter are available in all TEI Documents. Every use of the TEI is through some form of customisation. Even if someone uses one of the pre-prepared schemas generated by the TEI Consortium, they are all the result of a TEI customisation stored in an TEI ODD file.
Go and read the chapter Elements Available in All TEI Documents and answer the following questions to force yourself to make sure you’ve read it. While you do so imagine how you might use these common elements in a document you would like to encode. This will be useful because the assignment following reading the chapter will be to encode a small amount of material from your chosen document.
Elements Available in All TEI Documents: Questions
- What is the difference between paragraphs, ‘phrase-level’ elements, ‘chunks’ and ‘inter-level’ elements? (Give an example element name of each!) Is this way of describing elements useful?
- Think about the hyphens (and hyphen like symbols) that occur in the documents you are interested in. How do they function? Is there a difference in how you might encode them if they are at the end of line?
- Highlighting is often how texts indicate that a segment of text has a feature or characteristic that is different, in some way, from the surrounding text. Think about the way text you are interested in is highlighted: What does it use colour, special marks, or characters to highlight? List what the text is trying to convey through this highlighting and how you would mark it using TEI elements.
- How do you mark a bit of Lingua Latina that appears in the middle of some English text?
- Quotation marks, another form of highlighting, are used to indicate a wide variety of things. What is the difference between <q>, <said>, <mentioned>, <soCalled>, <quote>, and <cit>? Can you think of instances in the material you are interested in where you might use these?
- The <term> element is used to mark technical terms. Why might you wish to mark technical terms? Which might be useful to mark in your material?
- When might the @cert attribute be useful in your encoding?
- The <choice> element enables you to present two or more conflicting editorial choices at the same time — does this mean that software processing this needs always to choose just one of these? The <choice> element enables us to group: <abbr> (abbreviations) with their <expan>, a <sic> (apparent error) with a <corr> (corrected form), and an <orig> (original form) with a <reg> (regularization). When might it be useful to have multiple <expan>, <corr>, or <reg>? Is there something fundamentally different between abbreviations and expansions compared to the other two sets of elements concerning which is the original?
- How do you indicate that some material is missing because you cannot read it? What if you want to provide your guess as to what the material is?
- We’ll skim over looking the <name> element in detail because there is a whole chapter about more detailed names; but, the <name> element has a @type attribute, what types of name occur to you? When there are specialised forms of this, such as <persName> (personal name, which will be introduced in a later blog posting) why might you want to use the simpler <name> element?
- How would you encode your own address using the more semantically-rich forms rather than <addrLine>?
- As with names, there are more complex discussions to have about <date> elements in a later post; But using as precise attributes as possible how would you encode the following dates (try it out in oXygen making sure your document remains valid):
- The date text: “17 March 1999”
- From 17 March 1999 to April 2013
- The phrase “the thirteenth century”
- A single date where you know it did not occur before 1971 and certainly could not have happened after the 1st of January
- The 17th March when you do not know the year
- What is the difference between a <ptr/> and a <ref> and why might you prefer one over another?
- Lists are very ubiquitous in texts of most periods and cultures: How would you encode a list from a text of your choice? When might you encounter nested lists?
- The <note> element can appear many places: what different types of notes can you envision using if you were encoding a modern edition of your favourite text?
- The <graphic/> element enables you to point to an image to include at this point. Why might this be a bit limited? (Hint: The <figure> element is defined in chapter 14.)
- What are milestone elements? What is the main difference between <milestone/> and <pb/>, <gb/>, <cb/>, <lb/>? Can you think of instances when you would use <milestone/>? What might you do if you want to record that a line-break is artificially breaking a word?
- There are three main forms of bibliographic citation <bibl>, <biblFull>, and <biblStruct>: Why might you choose <bibl> over <biblStruct> (<biblFull> is used a lot less frequently)? What kind of elements are allowed inside them (compare using their reference pages) and how might that inform your decision to use them? Try to encode the bibliographic reference for an academic journal article of your choice using both <bibl> and <biblStruct> … of these which do you prefer and why?
- What is the difference between <biblScope> and <citedRange>?
- How would you mark up this simple piece of drama from Hamlet?
QUEEN GERTRUDE: Came this from Hamlet to her?
LORD POLONIUS: Good madam, stay awhile; I will be faithful.
Doubt thou the stars are fire;
Doubt that the sun doth move;
Doubt truth to be a liar;
But never doubt I love.
Encoding Your Own Material
That is an awful lot of questions above! Sorry! If you still have time left then try to encode a small amount of material that you are interested in creating a valid TEI XML file. (If you don’t, well, do it next time you get a chance before moving on to the next blog post!) Where appropriate encode:
- The structure of the text including any paragraphs and lists
- Forms of highlighting, colours, or what you feel the highlighting is indicating
- Quotations and citations if they exist
- Notes, both existing, and editorial notes you wish to make
- Expand some abbreviations if there are some using <choice>, correct any errors
- Mark any page breaks, line-breaks (if not encoding metrical lines), gathering breaks, column breaks, etc.
- In a <p> in the <sourceDesc> element in the header make a note of the date of the material using as precise a date as you can
- The list of elements created by the core module (and chapter) are at the bottom of the chapter; are there features you want to encode which are not covered by these? Make a list of them and think about what chapter may enable you to encode these.
- What other problems or limitations in encoding your text do you find? Are these problems likely to be unique? Try to find a good TEI way of solving them!
Next time, we’ll move on to looking at the <teiHeader> element and how to make better use of it.