Self Study (Part 7) Customising the TEI
This post is the seventh in a series of posts providing a reading course of the TEI Guidelines. It starts with
- a basic one on Introducing XML and Markup then
- on Introduction to the Text Encoding Initiative Guidelines then
- one on the TEI Default Text Structure then
- one on the TEI Core Elements then
- one looking at at The TEI Header.
- and a sixth one on transcribing primary sources.
None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful. This seventh post is looking at customising the TEI for your own uses.
The TEI has many different modules and lots of elements that you may or may not need for your project. One of the strongest aspects of the TEI Guidelines compared to other standards is that any project is able to constrain, customise and extend the Guidelines. One reason for customising the Guidelines is because most projects do not need the vast array of elements provided in the TEI Guidelines and in order to reduce human error and speed up encoding providing less choice is a good thing. The generalised Guidelines need to provide as much choice and flexibility as possible — in order to cope with the different needs of projects and intellectual methods to be captured — and yet I’d not be surprised if the consistency of a project is proportionally related to the amount it constrains that same flexibility.
The TEI Consortium provides a (quite dated) web interface to customise the TEI. This allows you to do some sorts of customisation. This is available at: http:/www.tei-c.org/Roma/. You should explore this. It is fairly straight forward. I recommend doing the following:
- Visit http:/www.tei-c.org/Roma/ and notice that you have various options on how to start your customisation, including being able to upload a customisation you had saved earlier.
- Choose the ‘Build Up’ method; This takes you to a screen which allows you
to change some basic metadata about the customisation. If you change anything click ‘Save’.
- Click on the ‘Modules’ tab to see a list of modules on the left, which you can ‘add’ to the customisation, and a list of modules on the right which have already been added to your customisation. Notice that the core, tei, header, and textstructure modules are already selected.
- Add a few more modules, maybe manuscript description, names and dates, critical apparatus, and transcription of primary sources.
- Clicking on any individual module name in the customisation you are making takes you to the list of elements in that customisation. For example, click on ‘Core’.
- Clicking on ‘Core’ takes you to a list of the elements in the ‘Core’ module. You can choose to ‘Include’ or ‘Exclude’ elements from your customisation (by clicking the radio buttons for each element or clicking the ‘Include’ or ‘Exclude’ at the top to include/exclude all of the elements).
- Choose to exclude certain elements from the Core module. For example, you may wish to remove ‘analytic’, ‘biblStruct’, ‘binaryObject’, ‘imprint’, ‘monogr’, ‘series’ and then click ‘Save’ at the bottom of the school.
- Choose the ‘Schema’ tab and look at the options for generating a schema. I recommend a Relax NG schema (compact or XML syntax). Choose one of these and click ‘Generate’ to create and download your schema.
- In an XML editor like oXygen you can Associate a document with this schema that you’ve just generated.(Or take an existing TEI document associated with a schema and change the association to point to you new schema.) Hopefully this uses an xml-model processing instruction at the top of the file. Maybe try this out! You should find that you are unable to use any of the elements you excluded!
- Back in Roma (you didn’t shut the browser down did you? If so you’d have to go do the above again!) you should be able to return to the ‘Modules’ tab, click on the ‘textstructure’ module, and then notice that where the ‘div’ element is listed on the far right-hand there is a ‘Change Attributes’ link. Click on it!
- This lists all the attributes available on div (sometimes provided directly on element, sometimes by an attribute class it is a member of.
- Scroll down to the ‘type’ attribute and click on it. This takes you to some settings you can change about this attribute. Some of the things you can change include:
- You can say that it is not optional (i.e. it is required)
- You can say whether it is a closed list (whether the values you provide are the only ones)
- You can provide a list of comma-separated values
I suggest that you say that it is not optional, a closed list, and give “chapter,section,other” as values. Remember to click save.
- You could now go back to the ‘Schema’ tab, generate and download a schema, and re-associate it in your document (your operating system will most likely name it something different if there is already a file there…another option is to move the previous schema out of the way).
- Something else you should do is click on the ‘Save Customization’ tab. This should download an XML file (‘myTEI.xml’ if you didn’t change the name of the filename on the ‘Customize’ tab)
- Open that ‘myTEI.xml’ customisation in your XML editor and have a look at it. This records all of the details of your customisation.
It should look something like:
Here a <schemaSpec> element contains <moduleRef> elements for each of the modules you included. In this case Roma defaulted to an ‘exclusion’ method of referencing the elements. (So “give me all elements from the ‘core’ module except this list of elements. Using the ‘include’ attribute could have had us give a list of specific elements to include. The difference between these is that if you save this customisation and come back to Roma at some point in the future, with the exclusion method you will get any new elements added by the TEI, whereas the inclusion method would never get any new elements. Both approaches have their uses. Below that you have documented a change to the <div> element (using the <elementSpec>) where the <attDef> element records that usage of this is required, and has a closed <valList> replacing the existing one.
Your customisation is written using the TEI ODD language, a part of the TEI Guidelines for describing markup. This is ‘One Document Does-it-all’, named because from this you can generate project-specific documentation. (The ‘Documentation’ tab in Roma.) There are elements in the TEI for referring to phrase-level discussion of markup (with the <gi>, <att>, and <val> elements) as well as ways to document the customisation or extension of a schema (e.g. the <schemaSpec>).
Read more about these documentation elements at: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TD.html When you have done so, you should have no problem in answering these questions (to make sure for yourself that you have read it)
- What is the difference between a <gi> element and a <tag>?
- What does the ‘atts’ attribute on <specDesc> record?
- What is the difference between <gloss>, <desc>, and <remarks>?
- How does one use the <equiv/> element?
- What is the difference between a <eg> and a <egXML>?
- What is a <content> element used for?
- Why might you want to use a <constraintSpec>?
- How do you provide a <gloss> for an <attDef>?
- What is a <classSpec> for?
There are many other important parts of this chapter, but if you understand the above that is a good start.
An example ODD showing some of the basic techniques (with a lot of documentation) from the LEAP project is available at: https://github.com/jamescummings/LEAP-ODD/blob/master/leap.odd.xml