<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>In my &#60;element/&#62;</title>
	<atom:link href="http://blogs.it.ox.ac.uk/jamesc/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.it.ox.ac.uk/jamesc</link>
	<description>Work-Related Unkempt Thoughts</description>
	<lastBuildDate>Mon, 20 May 2013 21:37:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Self Study (part 6) Primary Sources</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/#comments</comments>
		<pubDate>Mon, 20 May 2013 19:47:46 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.it.ox.ac.uk/jamesc/?p=407</guid>
		<description><![CDATA[Self Study (Part 6) Primary Sources This post is the fifth in a series of posts providing a reading course of the TEI Guidelines. It starts with a basic one on Introducing XML and Markup then on Introduction to the Text Encoding Initiative &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1>Self Study (Part 6) Primary Sources</h1>
<p>This post is the fifth in a series of posts providing a reading<br />
course of the TEI Guidelines. It starts with</p>
<ol>
<li>a basic one on <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">Introducing XML and Markup</a> then</li>
<li>on <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Introduction to the Text Encoding Initiative Guidelines</a> then</li>
<li>one on the <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/">TEI Default Text Structure</a> then</li>
<li>one on the <a href="http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/">TEI Core Elements</a> then</li>
<li>one looking at at <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html">The TEI Header</a>.</li>
</ol>
<p>None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful. This sixth post is looking at <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html">how to represent primary source documents</a>, including transcription, linking transcriptions to facsimiles, and genetic editing. Already in the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html">core module of the TEI</a> a number of elements are defined specifically for encoding primary sources. If you&#8217;ve got this far then you&#8217;ve already read about those, for example <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-unclear.html">unclear</a> or the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a> element and its component parts <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-abbr.html">abbr</a>/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a>, <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sic.html">sic</a>/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-corr.html">corr</a>, <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-orig.html">orig</a>/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-reg.html">reg</a>. Some of these are further supplemented with additional elements if the &#8216;transcr&#8217; module (the &#8216;Primary Sources&#8217; chapter) is included in your schema. For example, the addition of <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-am.html">am</a> to <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-abbr.html">abbr</a> to record the abbreviation marker and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ex.html">ex</a> inside <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a> to mark an editorial expansion.   Other elements provided if the &#8216;transcr&#8217; module is included in the TEI ODD file<br />
that created your schema include:</p>
<p><a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-addSpan.html">addSpan</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-am.html">am</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-damage.html">damage</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-damageSpan.html">damageSpan</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-delSpan.html">delSpan</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ex.html">ex</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fw.html">fw</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handNotes.html">handNotes</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handShift.html">handShift</a><br />
<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-line.html">line</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-listTranspose.html">listTranspose</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-metamark.html">metamark</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-mod.html">mod</a>  <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-redo.html">redo</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-restore.html">restore</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-retrace.html">retrace</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-space.html">space</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-subst.html">subst</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-substJoin.html">substJoin</a><br />
<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-supplied.html">supplied</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surface.html">surface</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surfaceGrp.html">surfaceGrp</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surplus.html">surplus</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-transpose.html">transpose</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-undo.html">undo</a> <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-zone.html">zone</a></p>
<p>Annotating the activities of transcription and the relationship of this transcript with the original source document is at the heart of this chapter. This has several aspects including: more detailed encoding of the act transcription, the creation of digital facsimiles, and recording the writing process.</p>
<h2>Transcription</h2>
<p>As you already know from reading about it the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a> element is a way of encoding multiple transcriptional interpretations at a single point in a text. For example, an<br />
abbreviation with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-abbr.html">abbr</a> and its expansion with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a>, or an apparent error with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sic.html">sic</a> and its editorial correction with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-corr.html">corr</a>, or an original reading with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-orig.html">orig</a> and a regularised form with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-reg.html">reg</a>.<br />
These children of <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a> are repeatable, so it is possible to encode an abbreviation with multiple possible expansions (hint one could use the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.responsibility.html#tei_att.cert">@cert</a> attribute to indicate which of these expansions is more certain).</p>
<p>What could be encoded as:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/choice1/" rel="attachment wp-att-410"><img class="aligncenter size-full wp-image-410" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/choice1.png" alt="" width="787" height="149" /></a></p>
<p>could also be encoded as:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/choice2/" rel="attachment wp-att-411"><img class="aligncenter size-full wp-image-411" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/choice2.png" alt="" width="987" height="165" /></a></p>
<p>As mentioned above this chapter also adds <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-am.html">am</a> to <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-abbr.html">abbr</a> to record the abbreviation marker and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ex.html">ex</a> inside <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a> to mark an editorial expansion. The abbreviation marker may or may not be present but is the thing in the original text which indicates to you that the word should be interpreted as an abbreviation. In this case, although NATO is more commonly abbreviated as an initialism with no &#8216;.&#8217; marking the individual letters, here it has these. We could encode this, using <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ex.html">ex</a> inside <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a> to mark the expanded portions of text as:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/choice3/" rel="attachment wp-att-412"><img class="aligncenter size-full wp-image-412" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/choice3.png" alt="" width="1009" height="226" /></a></p>
<p>This form of markup is different from the superficially similar to <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-subst.html">subst</a> (added in this module) which contains <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-add.html">add</a> and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-del.html">del</a> to record additions and deletions. The prime difference is that whereas in <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a> only one of the child elements is truly transcribing something in the text, with <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-subst.html">subst</a> both the deletion and addition are present to be transcribed.   There are also elements to record <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-damage.html">damage</a> to the source, text that has been <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-supplied.html">supplied</a> by the editor, is considered <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surplus.html">surplus</a> to editorial requirements, or for recording unusual <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-space.html">space</a> in the document. There is also a way to note a change of scribal hand, using the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handShift.html">handShift</a> element.</p>
<h2>Digital Facsimiles</h2>
<p>In many scholarly editions the provision of digital images acts a facsimile or surrogate of the original document to such a degree as to enable primary source research without recourse to the original object. Although the TEI stands for the <strong>Text</strong> Encoding Initiative, it is indeed possible to have a TEI document which does not contain a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a> element. Inside the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-TEI.html">TEI</a>, after the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a> there must be either a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a>, <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a>, or <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a> element. But you could have a document which at this point only had a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a> and no transcribed <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a>.   The <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a> element contains images rather than text, and these can appear either directly as graphic elements, or be organised by <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surface.html">surface</a> elements for each surface and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-zone.html">zone</a> to<br />
specify sections on those surfaces. The <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surfaceGrp.html">surfaceGrp</a> element can be used to group multiple surfaces together (e.g. recto and verso of a folio, or indeed gatherings).<br />
A basic <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a> element may have looked like:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/facsimile1/" rel="attachment wp-att-413"><img class="aligncenter size-full wp-image-413" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/facsimile1.png" alt="" width="792" height="244" /></a></p>
<p>These instead could be grouped as individual surfaces:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/facsimile2/" rel="attachment wp-att-414"><img class="aligncenter size-full wp-image-414" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/facsimile2.png" alt="" width="866" height="319" /></a></p>
<p>or with zones and coordinates:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/facsimile3/" rel="attachment wp-att-415"><img class="aligncenter size-full wp-image-415" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/facsimile3.png" alt="" width="1004" height="388" /></a></p>
<p>These can be given as x/y coordinates for the upper left and lower right to draw a rectangular bounding box. The @xml:id attributes in these examples can be pointed to from the page breaks in the transcription of the text (if provided):</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/pb/" rel="attachment wp-att-416"><img class="aligncenter size-full wp-image-416" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/pb.png" alt="" width="510" height="59" /></a></p>
<p>Or from any other element, such as a division:</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/divfacs/" rel="attachment wp-att-417"><img class="aligncenter size-full wp-image-417" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/divFacs.png" alt="" width="445" height="122" /></a></p>
<h2>Recording the Writing Process</h2>
<p>Linking a transcription to a zone in a facsimile by pointing to it with the @facs attribute is one way to relate the text to images. Another does not prioritise the final text, but the process which was undertaken to create this text. In this case the transcription can be made in the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a> element. When <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-surface.html">surface</a>, <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-zone.html">zone</a> and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-line.html">line</a> (for transcription of topographic lines on the document) are used inside <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a> they are for transcriptions of the text as they appear as units on the physical document without the semantic interpretation that we find in transcriptions that use the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a> element. (For example, deciding that they form paragraphs or speeches by particular characters.)</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/sourcedoc/" rel="attachment wp-att-418"><img class="aligncenter size-full wp-image-418" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/05/sourceDoc.png" alt="" width="776" height="295" /></a></p>
<p>Of course, all of the surfaces, zones and lines can have coordinates on them or use the @points attribute for a series of coordinates for non-rectangular areas.   There are other elements, for use in recording the text which also relate to the process of writing. These include <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-metamark.html">metamark</a> which records any symbol which indicates how it should be read rather than forming part of the content. (For example, an arrow &#8216;moving&#8217; a paragraph above the one which proceeds it in the document.) A general <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-mod.html">mod</a> element can also be used to record a modification in the document without the semantic interpretation of some of the other transcriptional elements.</p>
<p>Additional elements are available that help to record the process of writing the document, including the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-restore.html">restore</a> element to indicate a deletion that has been marked as reverting to a previous version by cancelling some textual interaction. This is used for comparatively simple cases, whereas the more general <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-undo.html">undo</a> element can be used to indicate any form of cancellation. If a cancellation is then marked as being reaffirmed or reasserted in some manner then a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-redo.html">redo</a> element can be used. There is also a way to record the act of transposition by using a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-transpose.html">transpose</a>, sometimes gathered in a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-listTranspose.html">listTranspose</a>, to point to the elements that are transposed.</p>
<p>A <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-retrace.html">retrace</a> element can be used where writing has been overwritten, usually with the intention of clarifying or fixing the text. This is sometimes a distinct phase in the production of the text. Any distinct stages in the text, such as campaigns of revision or editing phases, can be recorded using the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-listChange.html">listChange</a> and <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-change.html">change</a> elements. These are not provided in this chapter, but the Header, where they are used in <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a> to record stages of revisions in the creation of the electronic file. When used in the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-creation.html">creation</a> element in the header it instead records phases of development of the text itself.</p>
<h2>Questions about Encoding Primary Sources</h2>
<p>As usual I&#8217;ve got some self-assessment questions for you to test that you&#8217;ve read the chapter carefully.</p>
<ol>
<li>What is the difference between a surface and a zone?</li>
<li>What are the options for child elements of the TEI element?</li>
<li>Can a zone be larger than its parent surface?</li>
<li>How can you point from a surface element to a page break rather than the other way around?</li>
<li>What do you use to break up textual transcription inside a line element?</li>
<li>Think of an example of metamark used in documents your familiar with. How would you encode it?</li>
<li>Why might you use a g element inside an am element?</li>
<li>What is a substJoin element used for?</li>
<li>What is the difference between using damage and unclear with textual content?</li>
<li>How is line different from zone? When would you use line?</li>
<li>How would you record how large an unexpected space was?</li>
<li>Can you think of a reason why recording stages of production of the texts you are interested in<br />
might benefit your own work?</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2013/05/20/self-study-part-6-primary-sources/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Self Study (part 5) The TEI Header</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2013/04/20/self-study-part-5-the-tei-header/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2013/04/20/self-study-part-5-the-tei-header/#comments</comments>
		<pubDate>Sat, 20 Apr 2013 19:19:39 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=389</guid>
		<description><![CDATA[This post is the fifth in a series of posts providing a reading course of the TEI Guidelines.  It starts with a basic one on Introducing XML and Markup an Introduction to the Text Encoding Initiative Guidelines and one on the TEI Default &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2013/04/20/self-study-part-5-the-tei-header/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This post is the fifth in a series of posts providing a reading course of the TEI Guidelines.  It starts with</p>
<ol>
<li>a basic one on <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">Introducing XML and Markup</a></li>
<li>an <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Introduction to the Text Encoding Initiative Guidelines</a></li>
<li>and one on the <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/">TEI Default Text Structure</a></li>
<li>and one on <a title="TEI Core Elements" href="http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/">TEI Core Elements</a></li>
</ol>
<p>None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful.</p>
<p>This fifth post is looking at <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html">The TEI Header</a>.</p>
<p>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; is an essential part of every TEI file; it is where you record metadata for the digital text you are creating, document what you have done and why, as well as put additional information which may be useful in understanding or interrogating this file.</p>
<p>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt;, often just casually referred to as &#8216;the header&#8217;, is in some ways the most important part of your TEI file. Without it we can&#8217;t know what the file consists of, what you were trying to do when you created it, what we are allowed to do with it, or anything else about this electronic file. A digital file without proper metadata is only of very limited use. However, the provision of basic metadata need not be an onerous task only completed by well qualified librarians and bibliographers: you too can provide decent metadata for your digital text.</p>
<p>At its very minimal the TEI requires that the header have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; element and that in turn this have child elements for a  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">titleStmt</a>&gt; (information about the title of the digital file), a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-publicationStmt.html">publicationStmt</a>&gt; (information about the publication of the digital file), and a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html">sourceDesc</a>&gt; (information about the source of the digital file even if newly created).</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/04/20/self-study-part-5-the-tei-header/teiheader-min/" rel="attachment wp-att-393"><img class="aligncenter size-full wp-image-393" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/04/teiHeader-min.png" alt="Minimal teiHeader Element" width="875" height="489" /></a></p>
<p>As siblings to the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; one could also have the elements &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; (to store information about the encoding of the digital text), &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-profileDesc.html">profileDesc</a>&gt; (a text profile of additional information), or &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a>&gt; (to store information about major revisions).</p>
<h2>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; Element</h2>
<p>Inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; you can store all sorts of information about the file. The RelaxNG Compact Syntax for this content model (excluding its membership in attribute classes) is:</p>
<blockquote>
<pre>(titleStmt, editionStmt?, extent?, publicationStmt,
seriesStmt?, notesStmt?), sourceDesc+</pre>
</blockquote>
<p>This means that there is:</p>
<ul>
<li>a <strong>required</strong> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">titleStmt</a>&gt; which allows you to record one or more  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-title.html">title</a>&gt; (<strong>required</strong>) and responsibilities such as &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-author.html">author</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-editor.html">editor</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-funder.html">funder</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-meeting.html">meeting</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-principal.html">principal</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sponsor.html">sponsor</a>&gt;, or general purpose &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-respStmt.html">respStmt</a>&gt; followed by</li>
<li>an <em>optional</em> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-editionStmt.html">editionStmt</a>&gt;, to record information about this digital edition followed by</li>
<li>an <em>optional</em> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-extent.html">extent</a>&gt; element to give a place for information about size followed by</li>
<li>a <strong>required</strong> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-publicationStmt.html">publicationStmt</a>&gt; to record necessary information about the publication of the digital file either as prose paragraphs or structured information on the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-distributor.html">distributor</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-authority.html">authority</a>&gt;,  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-availability.html">availability</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html">address</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-date.html">date</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-publisher.html">publisher</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-pubPlace.html">pubPlace</a>&gt; or one or more &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-idno.html">idno</a>&gt; element. This is followed by</li>
<li>an <em>optional</em> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-seriesStmt.html">seriesStmt</a>&gt; gives a place for relating this digital file to a series of any sort of which it might be a part</li>
<li>an <em>optional</em> &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-notesStmt.html">notesStmt</a>&gt; gives a place for any notes relating to the file not encoded elsewhere</li>
<li>and after all of this at least one &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html">sourceDesc</a>&gt; is <strong>required</strong> to record information concerning one or more sources for this electronic file. This can contain either prose paragraphs or more structured information about the bibliographic sources in a variety of formats.</li>
</ul>
<p>Note that the required elements inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; are &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">titleStm</a><a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">t</a>&gt; (itself with a required title), &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-publicationStmt.html">publicationStmt</a>&gt;, and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html">sourceDesc</a>&gt;.</p>
<p>And that is it! That is all that is required for a valid and useful &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt;.</p>
<h2>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; Element</h2>
<p>But of course, sometimes we don&#8217;t want to only record the minimal amount of information, we may wish to record other things. As mentioned above after the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; we can also have an  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; (to store information about the encoding of the digital text), &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-profileDesc.html">profileDesc</a>&gt; (a text profile of additional information), or &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a>&gt; (to store information about major revisions).</p>
<p>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; element is where one can store information about what decisions were made in the encoding of the text. Like many metadata categories in the TEI this can either be given as prose paragraphs or more structured forms concentrating on the following:</p>
<ul>
<li>when the <strong>header </strong>module (required) is loaded:
<ul>
<li>information about an application which has edited the TEI file: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-appInfo.html">appInfo</a>&gt;</li>
<li>taxonomies defining any classificatory codes used elsewhere in the text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-classDecl.html">classDecl</a>&gt;</li>
<li>details of editorial principles and practices applied during the encoding of a text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-editorialDecl.html">editorialDecl</a>&gt;</li>
<li>a geographic coordinates declaration: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-geoDecl.html">geoDecl</a>&gt;</li>
<li>a list of definitions of prefixing schemes used in <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-data.pointer.html">data.pointer</a> values: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-listPrefixDef.html">listPrefixDef</a>&gt;</li>
<li>a project description: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-projectDesc.html">projectDesc</a>&gt;</li>
<li>a declaration specifying how canonical references are constructed for this text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-refsDecl.html">refsDecl</a>&gt;</li>
<li>a description of the rationale and methods used in sampling texts in the creation of a corpus or collection: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-samplingDecl.html">samplingDecl</a>&gt;</li>
<li>information about the language in which style information used to describe the original object is supplied: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-styleDefDecl.html">styleDefDecl</a>&gt;</li>
<li>detailed information about the tagging applied to a document: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-tagsDecl.html">tagsDecl</a>&gt;</li>
</ul>
</li>
<li>when the <strong>gaiji </strong>module is loaded:
<ul>
<li>information about nonstandard characters and glyphs: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-charDecl.html">charDecl</a>&gt;</li>
</ul>
</li>
<li>when the <strong>iso-fs </strong>module is loaded:
<ul>
<li>a feature system declaration comprising one or more feature structure declarations: <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fsdDecl.html">f</a><a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fsdDecl.html">sdDecl</a></li>
</ul>
</li>
<li>when the <strong>tagdocs </strong>module is loaded:
<ul>
<li>a specification of the schema the document is intended to validate against: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-schemaSpec.html">schemaSpec</a>&gt;</li>
</ul>
</li>
<li>when the <strong>textcrit </strong>module is loaded:
<ul>
<li>a declaration of the method used to encode text-critical variants: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-variantEncoding.html">variantEncoding</a>&gt;</li>
</ul>
</li>
<li>when the <strong>verse </strong>module is loaded:
<ul>
<li>a metrical notation declaration: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-metDecl.html">metDecl</a>&gt;</li>
</ul>
</li>
</ul>
<p>Of course, these are all optional or instead of using structured elements you can just use the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; element (or if the <strong>linking</strong> module is loaded the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ab.html">ab</a>&gt; element) to provide one or more prose paragraphs.</p>
<h2>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-profileDesc.html">profileDesc</a>&gt; Element</h2>
<p>After the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; it is possible to have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-profileDesc.html">profileDesc</a>&gt; element to record various non-bibliographic aspects of a text. The information recorded again depends on what modules are loaded when creating your schemas. This allows metadata categories including:</p>
<ul>
<li>when the <strong>header </strong>module (required) is loaded:
<ul>
<li>a record of the calendaring system used in the dating elements:  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-calendarDesc.html">calendarDesc</a>&gt;</li>
<li>information about the creation of a text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-creation.html">creation</a>&gt;</li>
<li>a description of the languages, sublanguages, registers, or dialects, represented within a text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-langUsage.html">langUsage</a>&gt;</li>
<li>a collection of information describing the nature or topic of a text in terms of a standard classification or keywords scheme: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-textClass.html">textClass</a>&gt;</li>
</ul>
</li>
<li>when the <strong>corpus </strong>module is loaded:
<ul>
<li>information about identifiable speakers or other participants (of any sort) in the text: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-particDesc.html">particDesc</a>&gt;</li>
<li>a record of the setting(s) within which a language interaction takes place:  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-settingDesc.html">settingDesc</a>&gt;</li>
<li>a description of a text in terms of its situational parameters: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-textDesc.html">textDesc</a>&gt;</li>
</ul>
</li>
<li>when the <strong>transcr </strong>module is loaded:
<ul>
<li>documentation of the different hands identified within the source texts: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-handNotes.html">handNotes</a>&gt;</li>
<li>a list of transpositions, each of which is indicated at some point in a document typically by means of metamarks: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-listTranspose.html">listTranspose</a>&gt;</li>
</ul>
</li>
</ul>
<p>Unlike &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt; you cannot provide just paragraphs inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-profileDesc.html">profileDesc</a>&gt;, however, you can do so inside many of its child elements.</p>
<h2>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a>&gt; Element</h2>
<p>The final component of the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; is an optional single &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a>&gt; which summarises the the revision history of the file. Inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-revisionDesc.html">revisionDesc</a>&gt; you usually place a series of change elements ordered so the most recent is at the top. The change element has a both dating attributes like @when to provide the date of the change as well as a @who attribute to point to information (such an author, editor, or more general respStmt in the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">titleStmt</a>&gt;.</p>
<p>And that is the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt;!</p>
<p>Ok, there are indeed lots more that can be said about each of those individual grandchildren in the XML hierarchy, and some aspects, such as the description of manuscripts and early printed books (using &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-msDesc.html">msDesc</a>&gt;) even gets a chapter of its very own (Manuscript Description) that I&#8217;ll cover in another post. But this is meant to be a series of blog posts as a reading course of the TEI Guidelines. So below are some basic questions you should be able to answer if you&#8217;ve read <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html">the TEI Header chapter</a>.</p>
<h2>Questions About the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; Chapter</h2>
<ol>
<li>What are the four major components of the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt;?</li>
<li>Inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-titleStmt.html">titleStmt</a>&gt; inside a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fileDesc.html">fileDesc</a>&gt; what element would you use to record who transcribed a manuscript?</li>
<li>What is the difference between a new edition of your file and a revision of it? How would you document each of these?</li>
<li>Where would you put general notes about your text?</li>
<li>What element would you use inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html">sourceDesc</a>&gt; to provide a manuscript description? What about a script for a spoken text? What about the recordings used to produce a transcription?</li>
<li>Inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-editorialDecl.html">editorialDecl</a>&gt; how do you indicate whether end-of-line hyphenation has been retained in a text?</li>
<li>What is the rendition element used to describe? What global attribute do you use to reference it from the text?</li>
<li>What elements do you need to construct an arbitrarily-deeply nested taxonomy?</li>
<li>If you were writing a computer program which modified a TEI file, where in the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; would you store information about how your program had modified the file?</li>
<li>How (and where) would you indicate that approximately 80% of a text was in Latin and 20% was in English?</li>
<li>How do you provide information about a date that is in a non-Gregorian calendaring system?</li>
<li>The TEI Guidelines can not enforce the provision of all possible metadata. What information do you think should be provided as a minimum? What would you include as recommended components of the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; for your own project? How might this differ if you aren&#8217;t encoding just one document but hundreds or thousands of them?</li>
</ol>
<h2>Encoding Your Own Material</h2>
<p>Continue encoding your own material, but this time return to the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; and improve it as much as you can. Think about those aspects that might be useful for you to encode to be able to find this text amongst many others; think about those aspects of the text that might be helpful for you to encode for those that wish to study texts like this in large collections through examining their metadata through (semi)automated means. Hopefully but doing so you&#8217;ll make better use of the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt;.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2013/04/20/self-study-part-5-the-tei-header/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Self Study (part 4) TEI Core Elements</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/#comments</comments>
		<pubDate>Sat, 23 Feb 2013 18:23:34 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=363</guid>
		<description><![CDATA[This post is the fourth in a series of posts providing a reading course of the TEI Guidelines.  It starts with a basic one on Introducing XML and Markup an Introduction to the Text Encoding Initiative Guidelines and one on the TEI &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This post is the fourth in a series of posts providing a reading course of the TEI Guidelines.  It starts with</p>
<ol>
<li>a basic one on <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">Introducing XML and Markup</a></li>
<li>an <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Introduction to the Text Encoding Initiative Guidelines</a></li>
<li>and one on the <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/">TEI Default Text Structure</a></li>
</ol>
<p>None of these are really complete in themselves and barely scratch the surface but are offered up as a help should people think them useful.</p>
<p>This fourth post is looking at Chapter 3 of the TEI P5 Guidelines: <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html">Elements Available in All TEI Documents</a>. This is, of course, a terrible name for this chapter.  It has been called this or something similar for quite a number of versions of the TEI so probably not worth campaigning to change it until a major new version of the TEI is on the cards.  The reason it is a bad name, of course, is that you cannot guarantee that the elements listed in this chapter are available in <strong>all</strong> TEI Documents.<strong> Every</strong> use of the TEI is through some form of customisation. Even if someone uses one of the pre-prepared schemas generated by the TEI Consortium, they are all the result of a TEI customisation stored in an TEI ODD file.</p>
<p>Go and read the chapter <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html">Elements Available in All TEI Documents</a> and answer the following questions to force yourself to make sure you&#8217;ve read it.  While you do so imagine how you might use these common elements in a document you would like to encode.  This will be useful because the assignment following reading the chapter will be to encode a small amount of material from your chosen document.</p>
<h2>Elements Available in All TEI Documents: Questions</h2>
<ol>
<li>What is the difference between paragraphs, &#8216;phrase-level&#8217; elements, &#8216;chunks&#8217;  and &#8216;inter-level&#8217; elements? (Give an example element name of each!) Is this way of describing elements useful?</li>
<li>Think about the hyphens (and hyphen like symbols) that occur in the documents you are interested in.  How do they function? Is there a difference in how you might encode them if they are at the end of line?</li>
<li>Highlighting is often how texts indicate that a segment of text has a feature or characteristic that is different, in some way, from the surrounding text. Think about the way text you are interested in is highlighted: What does it use colour, special marks, or characters to highlight? List what the text is trying to convey through this highlighting and how you would mark it using TEI elements.</li>
<li>How do you mark a bit of <em>Lingua Latina</em> that appears in the middle of some English text?</li>
<li>Quotation marks, another form of highlighting, are used to indicate a wide variety of things.  What is the difference between &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-q.html">q</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-said.html">said</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-mentioned.html">mentioned</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-soCalled.html">soCalled</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-quote.html">quote</a>&gt;, and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-cit.html">cit</a>&gt;? Can you think of instances in the material you are interested in where you might use these?</li>
<li>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-term.html">term</a>&gt; element is used to mark technical terms. Why might you wish to mark technical terms? Which might be useful to mark in your material?</li>
<li>When might the @cert attribute be useful in your encoding?</li>
<li>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a>&gt; element enables you to present two or more conflicting editorial choices at the same time &#8212; does this mean that software processing this needs always to choose just one of these? The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a>&gt; element enables us to group: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-abbr.html">abbr</a>&gt; (abbreviations) with their &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a>&gt;, a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sic.html">sic</a>&gt; (apparent error) with a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-corr.html">corr</a>&gt; (corrected form), and an &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-orig.html">orig</a>&gt; (original form) with a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-reg.html">reg</a>&gt; (regularization). When might it be useful to have multiple &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-expan.html">expan</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-corr.html">corr</a>&gt;, or &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-reg.html">reg</a>&gt;? Is there something fundamentally different between abbreviations and expansions compared to the other two sets of elements concerning which is the original?</li>
<li>How do you indicate that some material is missing because you cannot read it? What if you want to provide your guess as to what the material is?</li>
<li>We&#8217;ll skim over looking the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-name.html">name</a>&gt; element in detail because there is a whole chapter about more detailed names; but, the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-name.html">name</a>&gt; element has a @type attribute, what <strong>types</strong> of name occur to you? When there are specialised forms of this, such as &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-persName.html">persName</a>&gt; (personal name, which will be introduced in a later blog posting) why might you want to use the simpler &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-name.html">name</a>&gt; element?</li>
<li>How would you encode your own address using the more semantically-rich forms rather than &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-addrLine.html">addrLine</a>&gt;?</li>
<li>As with names, there are more complex discussions to have about &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-date.html">date</a>&gt; elements in a later post; But using as precise attributes as possible how would you encode the following dates (try it out in oXygen making sure your document remains valid):
<ul>
<li>The date text: &#8220;17 March 1999&#8243;</li>
<li>From 17 March 1999 to April 2013</li>
<li>The phrase &#8220;the thirteenth century&#8221;</li>
<li>A single date where you know it did not occur before 1971 and certainly could not have happened after the 1st of January</li>
<li>The 17th March when you do not know the year</li>
</ul>
</li>
<li>What is the difference between a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ptr.html">ptr/</a>&gt; and a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-ref.html">ref</a>&gt; and why might you prefer one over another?</li>
<li>Lists are very ubiquitous in texts of most periods and cultures: How would you encode a list from a text of your choice? When might you encounter nested lists?</li>
<li>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-note.html">note</a>&gt; element can appear many places: what different types of notes can you envision using if you were encoding a modern edition of your favourite text?</li>
<li>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-graphic.html">graphic/</a>&gt; element enables you to point to an image to include at this point. Why might this be a bit limited? (Hint: The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-figure.html">figure</a>&gt; element is defined in chapter 14.)</li>
<li>What are milestone elements? What is the main difference between &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-milestone.html">milestone/</a>&gt; and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-pb.html">pb/</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-gb.html">gb/</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-cb.html">cb/</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html">lb/</a>&gt;? Can you think of instances when you would use &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-milestone.html">milestone/</a>&gt;? What might you do if you want to record that a line-break is artificially breaking a word?</li>
<li>There are three main forms of bibliographic citation &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html">bibl</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblFull.html">biblFull</a>&gt;, and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblStruct.html">biblStruct</a>&gt;: Why might you choose &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html">bibl</a>&gt; over &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblStruct.html">biblStruct</a>&gt; (&lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblFull.html">biblFull</a>&gt; is used a lot less frequently)? What kind of elements are allowed inside them (compare using their reference pages) and how might that inform your decision to use them?  Try to encode the bibliographic reference for an academic journal article of your choice using both &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-bibl.html">bibl</a>&gt; and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblStruct.html">biblStruct</a>&gt; &#8230; of these which do you prefer and why?</li>
<li>What is the difference between &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-biblScope.html">biblScope</a>&gt; and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-citedRange.html">citedRange</a>&gt;?</li>
<li>How would you mark up this simple piece of drama from Hamlet?<br />
<blockquote><p><strong>QUEEN GERTRUDE</strong>: Came this from Hamlet to her?<br />
<strong>LORD POLONIUS</strong>: Good madam, stay awhile; I will be faithful.<br />
[<span style="color: red"><em>Reads</em></span>]<br />
Doubt thou the stars are fire;<br />
Doubt that the sun doth move;<br />
Doubt truth to be a liar;<br />
But never doubt I love.</p></blockquote>
</li>
</ol>
<h2>Encoding Your Own Material</h2>
<p>That is an awful lot of questions above!  Sorry! If you still have time left then try to encode a small amount of material that you are interested in creating a valid TEI XML file. (If you don&#8217;t, well, do it next time you get a chance before moving on to the next blog post!) Where appropriate encode:</p>
<ul>
<li>The structure of the text including any paragraphs and lists</li>
<li>Forms of highlighting, colours, or what you feel the highlighting is indicating</li>
<li>Quotations and citations if they exist</li>
<li>Notes, both existing, and editorial notes you wish to make</li>
<li>Expand some abbreviations if there are some using  &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-choice.html">choice</a>&gt;, correct any errors</li>
<li>Mark any page breaks, line-breaks (if not encoding metrical lines), gathering breaks, column breaks, etc.</li>
<li>In a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; in the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html">sourceDesc</a>&gt; element in the header make a note of the date of the material using as precise a date as you can</li>
<li>The list of elements created by the core module (and chapter) are at the bottom of the chapter; are there features you want to encode which are not covered by these? Make a list of them and think about what chapter may enable you to encode these.</li>
<li>What other problems or limitations in encoding your text do you find? Are these problems likely to be unique? Try to find a good TEI way of solving them!</li>
</ul>
<p>Next time, we&#8217;ll move on to looking at the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; element and how to make better use of it.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Self Study (part 3): The TEI Default Text Structure</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/#comments</comments>
		<pubDate>Thu, 31 Jan 2013 13:04:18 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=339</guid>
		<description><![CDATA[This (long) post follows on from posts on a basic one Introducing XML and Markup, and one on an Introduction to the Text Encoding Initiative Guidelines. Neither of these are really complete in themselves and barely scratch the surface, but &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This (long) post follows on from posts on a basic one <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">Introducing XML and Markup</a>, and one on an <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Introduction to the Text Encoding Initiative Guidelines</a>. Neither of these are really complete in themselves and barely scratch the surface, but are offered up as a help should people think them useful.</p>
<p>In this post we look at the overall basic structure of a TEI File. In many ways this is much more concrete than the infrastructure of the TEI where it is possible to get lost in the differences between TEI ODD files and the schemas generated from them, or modules, model classes, and attribute classes. Instead here we&#8217;re looking at the markup that is part of almost every TEI file, its default text structure. Readers may notice that the &#8216;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html">Default Text Structure&#8217;</a> chapter of the TEI Guidelines comes after two that I&#8217;ve skipped: &#8216;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html">The TEI Header</a>&#8216; (chapter 2) and the slightly inaccurately named &#8216;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html">Elements Available in All TEI Documents</a>&#8216; (chapter 3). Have no fear if you are following this set of blog posts, I will be returning to chapter 3 next and then chapter 2, I just feel it is good to get a sense of a TEI file as a whole before learning about all the core elements and metadata.</p>
<h2><strong>A Basic TEI File Structure</strong></h2>
<p>A basic TEI file might look like this image below.</p>
<p><a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/tei-file-2/" rel="attachment wp-att-359"><img class="aligncenter size-large wp-image-359" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/01/tei-file1-1024x779.png" alt="" width="640" height="486" /></a></p>
<p>In this image the element names are in blue and XML comments (delineated by &lt;! -<!-- comment -->- comment -<!-- comment -->-&gt; ) are in green.</p>
<p>An XML file always should start with an XML Declaration (here at the top in purple). After that we have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-TEI.html">TEI</a>&gt; element in the TEI Namespace (<a href="http://www.tei-c.org/ns/1.0">http://www.tei-c.org/ns/1.0</a>). Inside all &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-TEI.html">TEI</a>&gt; elements the TEI Guidelines require there to be a &lt;teiHeader&gt; element. In order for this to be a real and valid TEI P5 file, there are some elements which would need to appear inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; element, but I&#8217;ll talk about those in another post.</p>
<p>After the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiHeader.html">teiHeader</a>&gt; element you can have one or more optional &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a>&gt; or &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a>&gt; elements. These are for recording image facsimile information, or for a non-interpretative transcription method sometimes used for creating genetic editions.</p>
<p>After these we have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a>&gt; element. Technically this is optional if you have &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-facsimile.html">facsimile</a>&gt; or &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDoc.html">sourceDoc</a>&gt; elements but really for most introductory uses of the TEI it is probably a good idea to have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a>&gt; element. If you do use one it has to come last.</p>
<p>Inside a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a>&gt; element you can optionally have a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt; element. This is for containing front matter like titlepages or prefaces, anything that comes before the main body of the text.</p>
<p>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; element is required, because whatever text you are creating (whether a transcription of ancient clay tablets, medieval manuscripts, modern web-pages or teaching slides) it will have a body of some sort. Inside &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; you might get divisions (the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt; element) or just paragraphs (the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; element) or a wide variety of other things. (We&#8217;ll talk more about these in a bit.)</p>
<p>The &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-back.html">back</a>&gt; element which follows the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; element, as with &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt;, is optional but is intended for back matter such as indexes, appendices, bibliographies, addenda, etc.</p>
<p>Now one of the things you might notice about this is it brings to bear certain assumptions of the TEI. This default text structure reflects the assumption that most text-bearing objects can be transcribed and editing in a way which resembles something that we might usually associate with a codex-like structure. (e.g. front matter, the main body stuff, then stuff that comes after). Our association of this with an assumed codex structure probably is a bit misplaced. For example, manuscript rolls, for example, often have optional &#8216;stuff at the top&#8217; then &#8216;the main body stuff&#8217;, then optional &#8216;stuff at the end&#8217; and many other cultures and methods of writing text on objects also have such systems. People have used the TEI to successfully encode a huge variety of texts from different times and cultures so it is unlikely that this structure will impose too much of a semantic burden on your own use of it.</p>
<h2> <strong>The TEI Default Text Structure Chapter</strong></h2>
<p>This is a long chapter which covers a lot of ground. It looks at the default text structure of the TEI (that I&#8217;ve tried to explain briefly above), and then investigates the kind of things that happen inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-text.html">text</a>&gt; element. This includes looking at the types of divisions available inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt;, &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt; and &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-back.html">back</a>&gt; elements and the elements available inside these divisions. It includes ways of encoding groups of texts (such as anthologies and collections), virtual divisions that can be automatically generated such as tables of contents. It also looks at the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt; element, title pages, and the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-back.html">back</a>&gt; element.</p>
<p>Read this chapter and in order to make sure you have, answer these questions:</p>
<ul>
<li>How might you decide whether a text is <em>unitary</em> or <em>composite</em>?</li>
<li>Personally I have a strong preference for almost always using un-numbered divisions &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt; rather than numbered ones &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div1.html">div1</a>&gt;. In what circumstances might numbered ones be more appropriate to use?</li>
<li>Why does the TEI not use numbered headings (c.f. HTML where there are elements &lt;h1&gt;, &lt;h2&gt;, &lt;h3&gt;, etc.) but just a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-head.html">head</a>&gt; element?</li>
<li>If you were digitising my love letters (who knows why?!), how would you mark up the closing bit at the end of a letter where I say:</li>
</ul>
<blockquote>
<pre><strong><span style="color: #c53a42">With love and cuddles,</span></strong>
<strong><span style="color: #c53a42">James</span></strong>
<strong><span style="color: #c53a42">xxx</span></strong></pre>
</blockquote>
<ul>
<li>When would you use &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-group.html">group</a>&gt; element rather than have separate TEI files?</li>
<li>What is a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-floatingText.html">floatingText</a>&gt; element used to indicate? Try to think of examples from your own area of work?</li>
<li>Do the texts you work with have front matter that you would encode in the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt; element? How would you encode it? How do you decide to encode something as front matter rather than as the body of the file?</li>
<li>On a title page how would you encode a title that has several parts to it?</li>
<li>Are there differences between what is allowed in &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-front.html">front</a>&gt; and what is allowed in &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-back.html">back</a>&gt;? Why is this the case?</li>
</ul>
<h2><strong>Try it out</strong></h2>
<p>I always think, if possible, it is good to have practical exercises to reinforce things you have learned. If you have time try this:</p>
<ul>
<li>Start up the oXygen editor</li>
<li>Create a new document by going to File ? New and double-click to expand &#8216;Framework templates&#8217; scroll down inside it and do the same to open &#8216;TEI P5&#8242;. Inside this select &#8216;All&#8217;, and click on &#8216;Create&#8217; to open a new document.</li>
<li>Ignoring the schema declarations at the top you should get a file which looks something like this:</li>
</ul>
<p style="text-align: center"><a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/basicteitemplate/" rel="attachment wp-att-341"><img class="aligncenter  wp-image-341" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/01/basicTEITemplate.png" alt="" width="465" height="370" /></a></p>
<ul>
<li>Assuming you&#8217;ve not turned off automatic document checking, you should have a happy green square in the upper right-hand corner of the editor, near where a scrollbar would appear if our document was longer. This tells you not only that it is well-formed but also valid according to the rules of the tei_all schema.</li>
<li>Delete the entire paragraph element (including &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; and &lt;/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; tags) that says:</li>
</ul>
<blockquote>
<pre>&lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt;Some text here.&lt;/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt;</pre>
</blockquote>
<ul>
<li>Does that happy green square disappear? Is it angry and red? If document checking is turned on the opening &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; tag should be underlined in red, that happy green square should now be red and there should be a red line part way down the right-hand side indicating where the error is in the document.</li>
<li>At the bottom of the screen there will be an error message, in this case saying &#8216;element “body” incomplete&#8217; because it is expecting one of any number of elements.</li>
<li>Instead of replacing this paragraph, let&#8217;s instead add a division. Move to inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; element between the opening tag and the closing &lt;/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; tag where the paragraph was previously. Press the &lt; key and wait a second; oXygen should be helpful and give a drop down list of the elements allowed by the TEI at this point. Scrolling up and down this list can give you a sense of the vast array of things you could be encoding at this point, but is also a bit of a mixture because you can have texts with divisions or without them at this point. Select the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt; element and notice what oXygen does.</li>
<li>oXygen should have added both an opening and closing division tag: &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt;&lt;/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt; . Move the cursor between these two tags and press enter a couple times to get some space.</li>
<li>Add a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-head.html">head</a>&gt; element and inside it put the text content “My First Heading”.</li>
<li>After the closing &lt;/<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-head.html">head</a>&gt; tag, add a paragraph using the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt; element and the text “My first paragraph.”</li>
<li>In all cases make sure you only stop when you have a happy green square indicating that your document is well-formed and valid.</li>
<li>Your &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; element should now look something like:</li>
</ul>
<p style="text-align: center"><a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/bodywithdiv/" rel="attachment wp-att-342"><img class="aligncenter  wp-image-342" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/01/bodyWithDiv.png" alt="" width="385" height="130" /></a></p>
<ul>
<li>Add at least one more division after this. (If you had a document with only one division, you don&#8217;t really need to use the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-div.html">div</a>&gt; element at all.) Inside this second division, try nesting a sub-division!</li>
<li>If you do your &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-body.html">body</a>&gt; element might look something like:</li>
</ul>
<div><a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/nesteddiv/" rel="attachment wp-att-343"><img class="aligncenter  wp-image-343" src="http://blogs.it.ox.ac.uk/jamesc/files/2013/01/nestedDiv.png" alt="" width="552" height="325" /></a></div>
<ul>
<li>Save your document.</li>
<li>The oxygen-tei framework comes complete with some transformations to other formats. From the oXygen menus choose Document ? Transformation ? Configure Transformation Scenario(s) and select &#8216;TEI P5 XHTML&#8217; and click on &#8216;Apply associated&#8217; (though this may be slightly different if you are using a different version of oXygen).</li>
<li>You should get a minimal HTML rendering of your file appearing in a browser. Note some of the information that the transformation has added. Try some other transformations or changing the document and seeing the effect.</li>
<li>Think about the nature of your own materials and how you might structure them if encoding them according to the default text structure of the TEI!</li>
</ul>
<p>I&#8217;ve intentionally glossed over the introduction of many of the core TEI elements (such as &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-p.html">p</a>&gt;), but don&#8217;t worry we will survey these next time!</p>
<p>Go on to <a href="http://blogs.it.ox.ac.uk/jamesc/2013/02/23/self-study-part-4-tei-core-elements/">Self Study (part 4) TEI Core Elements</a> next!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Self Study (part 2): Introduction to the Text Encoding Initiative Guidelines</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/#comments</comments>
		<pubDate>Wed, 23 Jan 2013 22:57:54 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=325</guid>
		<description><![CDATA[Quite awhile ago I posted http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/ as a list of reading and steps I would recommend someone follow if they were wanting to learn TEI XML and related technologies. This first step was to learn a little bit about XML &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Quite awhile ago I posted <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/</a> as a list of reading and steps I would recommend someone follow if they were wanting to learn TEI XML and related technologies. This first step was to learn a little bit about XML and markup languages like HTML to a bit of background.</p>
<p>The next step I&#8217;d recommend  is to learn a bit more about the Text Encoding Initiative and the Guidelines it produces.</p>
<ul>
<li>Start with:  <a href="http://www.tei-c.org/Support/Learn/intro.xml">http://www.tei-c.org/Support/Learn/intro.xml</a>. This has some basic concepts that are useful in understanding the TEI and the Guidelines that are the main output of the TEI.</li>
</ul>
<p>Questions:</p>
<ol>
<li>In what markup language did documents using TEI P1 to TEI P3 use?</li>
<li>How was this changed for TEI P4 and then TEI P5?</li>
<li>In what way is the TEI &#8216;extensible&#8217;?</li>
</ol>
<ul>
<li>Next read The TEI Infrastructure chapter: <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html</a>. This is the first main chapter of the TEI P5 Guidelines and describes the infrastructure and framework for the whole of the TEI is based on.</li>
</ul>
<p>Questions:</p>
<ol start="4">
<li>What does &#8216;ODD&#8217; stand for? What can one generate from a TEI ODD file?</li>
<li>What is a TEI module? What is the relationship between modules and chapters?</li>
<li>What language does one use to define a TEI schema?</li>
<li>Why might a single project use more than one schema at different stages in their project workflow?</li>
<li>What is an attribute class? The att.global attribute class provides @xml:id and @n attributes to every element in the TEI; what is the difference between these two attributes? When might it be useful to use @n to number verse lines? When might this be a silly waste of time?</li>
<li>What is the @xml:lang attribute for?</li>
<li>What is the difference between the @rend, @style, and @rendition attributes?</li>
<li>What is @xml:space for?</li>
<li>What is a TEI model class, and what do members of the same class share?</li>
<li>Why are model and attribute classes a good idea?</li>
<li>What is a TEI datatype?</li>
</ol>
<p>Note: If you are confused about modules vs model classes vs attribute class the following blog post might help: <a href="http://blogs.it.ox.ac.uk/jamesc/2008/09/01/modules-vs-model-classes-vs-attribute-classes/">http://blogs.it.ox.ac.uk/jamesc/2008/09/01/modules-vs-model-classes-vs-attribute-classes/</a></p>
<ul>
<li>Next, familiarise yourself with the table of contents of the TEI Guidelines: <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html</a></li>
<li>And then browse <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html</a> which contains a complete list of elements provided by the TEI.</li>
<li>Choose a couple elements which you think you might know what they are used to encode and click on them to explore their reference pages. For example <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html</a></li>
<li>The information on this page seem confusing but it lists:</li>
<ul>
<li>the element&#8217;s definition; which &#8216;module&#8217; it comes from</li>
<li>what attributes it has (and if they come from attribute classes)</li>
<li>what model classes the element might claim membership of (which controls where it is allowed to appear in your document)</li>
<li>a list (by module) of these elements which are allowed to contain this element</li>
<li>a list (by module) of which elements this element is allowed to contain</li>
<li>a declaration of the content model of the element (which can be toggled between Relax NG compact syntax and XML syntax)</li>
<li>one or more examples</li>
<li>possibly some additional notes on usage.</li>
</ul>
</ul>
<p>Questions:</p>
<ul>
<li>The <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html">address</a> element does not define any attributes of its own. How does this compare in layout to the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-availability.html">availability</a> element? What attribute does this (at time of writing) define for itself rather than getting it from a class?</li>
<li>The <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html">address</a> element has two examples; what is the difference between them?</li>
<li>If you click on the &#8216;Show all&#8217; link in one of the examples what do you get? Notice, for example how <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-address.html">address</a> is used inside the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-publicationStmt.html">publicationStmt</a> element to give the address of the publisher of the electronic text.</li>
</ul>
<p>This is a very basic survey of some of the initial things you might want to learn before diving into the Guidelines in more detail.  I plan to continue this with similar directed reading and questions on some of the topics they cover in the future. In fact, the next post in this series is <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/">http://blogs.it.ox.ac.uk/jamesc/2013/01/31/self-study-part-3-the-tei-default-text-structure/</a> which looks at the TEI&#8217;s Default Text Structure.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tokenizing and grouping rhyme schemes with XSLT functions</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2012/11/28/tokenizing-and-grouping-rhyme-schemes-with-xslt-functions/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2012/11/28/tokenizing-and-grouping-rhyme-schemes-with-xslt-functions/#comments</comments>
		<pubDate>Wed, 28 Nov 2012 18:45:56 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=310</guid>
		<description><![CDATA[There is a project I work for which has encoded rhyme schemes in TEI using the @rhyme attribute on &#60;lg&#62; elements.  This contains some complex strings as they have used parentheses to indicate an internal rhyme and asterisks to indicate &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2012/11/28/tokenizing-and-grouping-rhyme-schemes-with-xslt-functions/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There is a project I work for which has encoded rhyme schemes in TEI using the @rhyme attribute on &lt;lg&gt; elements.  This contains some complex strings as they have used parentheses to indicate an internal rhyme and asterisks to indicate whether a particular rhyme is a feminine (multi-syllable) rhyme. Rhymes are also marked   So for example you get values that look like:</p>
<blockquote><p>rhyme=&#8221;(a*)a*(a*)b(c*)c*(c*)bddee(f)fg(h)hg/&#8221;</p></blockquote>
<p>But I need at any particular point to be able to get at least 2 things from this string:</p>
<ol>
<li>The documented rhyme above for the current &lt;rhyme&gt; element that I&#8217;m processing</li>
<li>Whether the current rhyme is an internal (parentheses) or a feminine (asterisk) rhyme or not.</li>
<li>The set of rhymes for the current line</li>
<li>Whether the current line has any internal (parentheses) or feminine (asterisk) rhymes or not.</li>
</ol>
<p>So the first step with this is to tokenize the given rhyme scheme.  I do this as an XSLT function and if I want to output it I could have something like:</p>
<pre class="brush: xml; title: ; notranslate">
 &lt;xsl:variable name=&quot;rhyme&quot;&gt;
(a*)a*(a*)b(c*)c*(c*)bddee(f)fg(h)hg/
&lt;/xsl:variable&gt;
&lt;tokenized-rhymes&gt;
  &lt;xsl:copy-of select=&quot;jc:tokenizeRhymes($rhyme)&quot;/&gt;
&lt;/tokenized-rhymes&gt;
</pre>
<p>Here, inside some unseen template, I&#8217;ve got a variable with the rhyme scheme in it, and I&#8217;m getting a copy-of the output of a function I&#8217;ve created called jc:tokenizeRhymes(). This isn&#8217;t a very difficult XSLT function it just consists of some xsl:analyze-string as so:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsl:function name=&quot;jc:tokenizeRhymes&quot; as=&quot;item()*&quot;&gt;
&lt;xsl:param name=&quot;rhyme&quot;/&gt;
&lt;xsl:variable name=&quot;rhymes&quot;&gt;
&lt;list&gt;
    &lt;xsl:analyze-string select=&quot;$rhyme&quot; regex=&quot;\(*[a-zA-Z]\**\)*&quot;&gt;
        &lt;xsl:matching-substring&gt;
            &lt;item&gt;
                &lt;xsl:value-of select=&quot;.&quot;/&gt;
            &lt;/item&gt;
        &lt;/xsl:matching-substring&gt;
        &lt;xsl:non-matching-substring/&gt;
    &lt;/xsl:analyze-string&gt;
&lt;/list&gt;
&lt;/xsl:variable&gt;
&lt;xsl:copy-of select=&quot;$rhymes&quot;/&gt;
&lt;/xsl:function&gt;
</pre>
<p>All this does is have a function which takes a single parameter (rhyme), and creates a variable containing a list with a bunch of items inside.  To do this is uses a regular expression on xsl:analyze-string which looks optionally for an opening parenthesis \(* then any letter from a-zA-Z optionally an asterisk \** follow by an optional closing parenthesis \)* &#8230; see, simple.  The output from this lookst like:</p>
<pre class="brush: xml; title: ; notranslate">

  &lt;list&gt;
         &lt;item&gt;(a*)&lt;/item&gt;
         &lt;item&gt;a*&lt;/item&gt;
         &lt;item&gt;(a*)&lt;/item&gt;
         &lt;item&gt;b&lt;/item&gt;
         &lt;item&gt;(c*)&lt;/item&gt;
         &lt;item&gt;c*&lt;/item&gt;
         &lt;item&gt;(c*)&lt;/item&gt;
         &lt;item&gt;b&lt;/item&gt;
         &lt;item&gt;d&lt;/item&gt;
         &lt;item&gt;d&lt;/item&gt;
         &lt;item&gt;e&lt;/item&gt;
         &lt;item&gt;e&lt;/item&gt;
         &lt;item&gt;(f)&lt;/item&gt;
         &lt;item&gt;f&lt;/item&gt;
         &lt;item&gt;g&lt;/item&gt;
         &lt;item&gt;(h)&lt;/item&gt;
         &lt;item&gt;h&lt;/item&gt;
         &lt;item&gt;g&lt;/item&gt;
      &lt;/list&gt;
</pre>
<p>Well then, getting the current rhyme when I&#8217;m processing a rhyme is fairly easy then. I just create a variable $rhymePosition (the current number of rhymes I&#8217;m on) and then can call another function jc:getCurrentRhyme with that and the rhyme variable.</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsl:variable name=&quot;currentRhyme&quot;&gt;
  &lt;xsl:value-of select=&quot;jc:getCurrentRhyme($rhyme, $rhymePosition)&quot;/&gt;
&lt;/xsl:variable&gt;
</pre>
<p>The jc:getCurrentRhyme function is fairly straightforward as well.  It looks like:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsl:function name=&quot;jc:getCurrentRhyme&quot; as=&quot;item()*&quot;&gt;
   &lt;xsl:param name=&quot;rhyme&quot;/&gt;
   &lt;xsl:param name=&quot;currentRhyme&quot; as=&quot;xs:integer&quot;/&gt;
   &lt;xsl:variable name=&quot;rhymes&quot; select=&quot;jc:tokenizeRhymes($rhyme)&quot;/&gt;
   &lt;xsl:copy-of select=&quot;$rhymes/list/item[$currentRhyme]&quot;/&gt;
&lt;/xsl:function&gt;
</pre>
<p>It takes two parameters, the $rhyme and the $currentRhyme (which is an integer of how many rhymes there are so far in the &lt;lg&gt; including the one we are processing).  It then creates a new variable $rhymes which has the output of the jc:tokenizeRhymes above. Then getting the current rhyme from the list is easy because we know its number so we just make a copy of the &lt;item&gt; we&#8217;ve created in that variable by using xsl:copy-of and filtering it by the number $currentRhyme.  (This is why we made sure that this parameter was an integer.)</p>
<p>In order to check whether these are internal or feminine rhymes it is now very straight-forward, we just test the $currentRhyme we&#8217;ve created above to see whether it contains($currentRhyme, &#8216;)&#8217;) or contains($currentRhyme, &#8216;*&#8217;).</p>
<p>In order to get all the rhymes for this line, we need to re-process this tokenized list somewhat.  We want to group those items which have parentheses together with the letter which follows them, splitting on each non-parenthesised letter (optionally having an asterisk).  It took me awhile to get my brain around that but eventually I came up with: </p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsl:function name=&quot;jc:groupRhymes&quot; as=&quot;item()*&quot;&gt;
&lt;xsl:param name=&quot;rhyme&quot;/&gt;
&lt;xsl:variable name=&quot;rhymes&quot; select=&quot;jc:tokenizeRhymes($rhyme)&quot;/&gt;
&lt;xsl:variable name=&quot;groupedRhymes&quot;&gt;
  &lt;list&gt;
   &lt;xsl:for-each-group select=&quot;$rhymes/list/item&quot;
      group-ending-with=&quot;*[matches(., '^[a-zA-Z]\**$')]&quot;&gt;
     &lt;item&gt;
      &lt;list&gt;
       &lt;xsl:for-each select=&quot;current-group()&quot;&gt;
        &lt;item&gt;
         &lt;xsl:value-of select=&quot;.&quot;/&gt;
        &lt;/item&gt;
       &lt;/xsl:for-each&gt;
      &lt;/list&gt;
     &lt;/item&gt;
    &lt;/xsl:for-each-group&gt;
  &lt;/list&gt;
&lt;/xsl:variable&gt;
&lt;xsl:copy-of select=&quot;$groupedRhymes&quot;/&gt;
&lt;/xsl:function&gt;
</pre>
<p>This function takes in the parameter $rhyme and tokenizes it using the earlier function, so now we have a list with some individual items in. It then creates a new list and uses xsl:for-each-group to select all the tokenized items. It creates groups ending with any item where the content matches a full line going from start to finish of a letter followed by an optional asterisk.  This means each group will end with a normal rhyme letter and any internal rhymes (in parentheses) will be included in that group.  For each group it puts out a new item with a nested list and makes each rhyme in that line an item in that nested list.  This might seem overkill to some, but having the extra nesting, regardless of whether there are 1, 2, or 20 rhymes in the line just makes things easier. So this output from this looks like:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;list&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(a*)&lt;/item&gt;
        &lt;item&gt;a*&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(a*)&lt;/item&gt;
        &lt;item&gt;b&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(c*)&lt;/item&gt;
        &lt;item&gt;c*&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(c*)&lt;/item&gt;
        &lt;item&gt;b&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;d&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;d&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;e&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;e&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(f)&lt;/item&gt;
        &lt;item&gt;f&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;g&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;(h)&lt;/item&gt;
        &lt;item&gt;h&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;item&gt;
    &lt;list&gt;
        &lt;item&gt;g&lt;/item&gt;
    &lt;/list&gt;
&lt;/item&gt;
&lt;/list&gt;
</pre>
<p>Which, admittedly, is fairly verbose.  But you can now have a function that just gets the individual line&#8217;s items that you are interested in which would look something like:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsl:function name=&quot;jc:getCurrentLineRhymes&quot; as=&quot;item()*&quot;&gt;
  &lt;xsl:param name=&quot;rhyme&quot;/&gt;
  &lt;xsl:param name=&quot;currentLine&quot; as=&quot;xs:integer&quot;/&gt;
  &lt;xsl:variable name=&quot;rhymes&quot; select=&quot;jc:groupRhymes($rhyme)&quot;/&gt;
  &lt;xsl:copy-of select=&quot;$rhymes/list/item[$currentLine]&quot;/&gt;&lt;/xsl:function&gt;
</pre>
<p>Which when called with something like:</p>
<pre class="brush: xml; title: ; notranslate">
 &lt;xsl:copy-of select=&quot;jc:getCurrentLineRhymes($rhyme, 4)&quot;/&gt;
</pre>
<p>(where &#8217;4&#8242; here usually would be a variable containing the current line number) it will produce something like:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;item&gt;
 &lt;list&gt;
  &lt;item&gt;(c*)&lt;/item&gt;
  &lt;item&gt;b&lt;/item&gt;
 &lt;/list&gt;
&lt;/item&gt;
</pre>
<p>Which a simple string test using contains() can again tell you whether there are any feminine (asterisk) rhymes or internal (parentheses) rhymes, etc.</p>
<p>Hurrah! See that wasn&#8217;t that difficult after all. In this case it makes a good example of using XSLT2 functions to call other functions to break the overall task down into manageable more object oriented-like tasks which can be re-used for a variety of purposes.  (There are a lot of efficiencies which could be implemented here&#8230; the jc:getCurrentLineRhymes and jc:getCurrentRhyme are almost identical, except that one uses jc:groupRhymes() and the other uses jc:tokenizeRhymes(). This could be one function which tests a parameter to see which is intended.</p>
<p>The whole XSLT stylesheet is available from <a href="https://github.com/jamescummings/conluvies/blob/master/xslt-misc/tokenize-rhyme-test.xsl">https://github.com/jamescummings/conluvies/blob/master/xslt-misc/tokenize-rhyme-test.xsl</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2012/11/28/tokenizing-and-grouping-rhyme-schemes-with-xslt-functions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teaching the TEI-Panel</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2012/11/22/teaching-the-tei-panel/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2012/11/22/teaching-the-tei-panel/#comments</comments>
		<pubDate>Thu, 22 Nov 2012 13:56:12 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[TEI]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=304</guid>
		<description><![CDATA[As part of the Text Encoding Initiative Consortium&#8217;s annual conference I participated in a panel organised by Elena Pierazzo called “Teaching the TEI: from training to academic curricula” see http://idhmc.tamu.edu/teiconference/program/papers/#teach for the abstract. Florence Clavaud and Susan Schreibman were unable &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2012/11/22/teaching-the-tei-panel/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As part of the Text Encoding Initiative Consortium&#8217;s annual conference I participated in a panel organised by Elena Pierazzo called “Teaching the TEI: from training to academic curricula” see <a href="http://idhmc.tamu.edu/teiconference/program/papers/#teach">http://idhmc.tamu.edu/teiconference/program/papers/#teach</a> for the abstract. Florence Clavaud and Susan Schreibman were unable to attend and so at the very last moment Julia Flanders from Brown University graciously agreed to join the panel. The panel consisted of: Elena Pierazzo, Marjorie Burghart, James Cummings, and Julia Flanders.</p>
<p>Elena Pierazzo started off the panel by introducing what it would cover. It looked and the differences and the similarities in teaching the TEI in a range of contexts: from a dedicated intensive workshop targeted at professionals to the teaching of TEI as part of a related academic course. These have differences in aims, methodologies, and overall coverage and the syllabus of each of these types of teaching might cover different chapters of the TEI Guidelines.</p>
<p>The panel discussed which of these approaches seemed most successful, and what was meant by success when teaching the TEI. The question of whether the TEI works better as a tool to solve a problem researchers are currently facing (e.g. a digital edition of a manuscript, a dictionary, a corpus…) or as a method for approaching analysis or tool for modelling concepts? Throughout the panel these two types of teaching were contrasted to see what might be learned to benefit the other pedagogical form.</p>
<p>Marjorie Burghart contrasted the similarities and differences between the BA and MA level training provided in Lyon as compared to Elena’s examples. She insisted on the importance of embedding TEI teaching in other disciplines, giving the example of one of her courses where students are taught editorial techniques as a whole, from the historical developments of diplomatics and philology to their digital translation. The central message being that not all TEI teaching is done in “TEI courses” or even those specifically about digital technology, some of it occurs in academic field-related courses that happen to include sections about the TEI.</p>
<p>James Cummings briefly surveyed the types of TEI training provided at the University of Oxford, noting the Digital.Humanities@Oxford Summer School (<a href="http://digital.humanities.ox.ac.uk/dhoxss/">http://digital.humanities.ox.ac.uk/dhoxss/</a>) evolved out of many years of TEI Summer Schools and now always included a week-long TEI workshop. The introductory TEI workshop in such a context tends to cover a large amount of the TEI Guidelines, giving a broad but shallow and intensive overview. He mentioned that they also did bespoke training for individual research projects where the whole of the TEI was not taught, but just a brief overview followed by specific training in the aspects that project would be using.</p>
<p>Julia Flanders provided a description of the workshops they teach at Brown University and the workshops provided at DHSI.org, and how these compare to those in Oxford and differ from those that form part of larger academic courses. She discussed various approaches to teaching the underlying concepts and how existing tools such as Roma might be improved to facilitate this. She suggested that introductory tools which allowed ‘Finger painting with semantics’ should be created to allow people to play with the concepts of data modelling in a user friendly manner.</p>
<p>There was much wide-ranging discussion with the audience with many interesting points made and questions raised.  Several participants mentioned that they used text encoding generally, and the TEI in specific, to teach different things. That is, the process of learning the TEI helps students to understand more about other topics (e.g. the nature of text). Michael Gavin commented that it would be nice to have a survey of both TEI courses and courses which include the TEI in higher education. Marjorie mentioned that Marjorie mentioned that Florence Clavaud (EnC) had started a similar survey for France / Europe, and that it would be good to get in touch with her.</p>
<p>TEI is taught in a variety of different ways, and the more teaching of it the better, but what has to be closely examined by providers is why any particular course is being offered. Is it to induct a large number of people into a basic understanding of the scope and coverage of the TEI Guidlines? Is it to teach them the practical skills to undertake the work on one particular research project? Or is it to teach them other more ethereal concepts, of which the TEI is one practical and concrete example?  The teachers on this panel had all taught in a variety of these kinds of contexts and the differences in approach and coverage made for an interesting comparison.  As the TEI continues to grow and be used more pervasively as the de facto standard for the encoding of digital texts (especially in academic contexts) then the community will need to continue to improve its teaching and organization of teaching.  One promising sign is the network of Digital Humanities training institutes (see those referenced at <a href="http://digital.humanities.ox.ac.uk/dhoxss/">http://digital.humanities.ox.ac.uk/dhoxss/</a>) which are slowly cooperating to produce a consistent pedagogical basis while retaining their unique character and experiences.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2012/11/22/teaching-the-tei-panel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More about @rend</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2012/03/26/more-about-rend/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2012/03/26/more-about-rend/#comments</comments>
		<pubDate>Sun, 25 Mar 2012 23:08:43 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=294</guid>
		<description><![CDATA[Lou Burnard has provided a technical summary of some of the recently issues discussed concerning @rend, but I thought I might provide some more explanation for those not as familiar with the technical background to the discussion. I would have &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/26/more-about-rend/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Lou Burnard has provided <a href="http://tei-l.970651.n3.nabble.com/CSS-in-rend-was-Re-Why-no-space-etc-in-DIV-s-type-tp3850195p3852304.html">a technical summary of some of the recently issues discussed concerning @rend</a>, but I thought I might provide some more explanation for those not as familiar with the technical background to the discussion. I would have done so sooner but was driving around too narrow farm roads in Cornwall on holiday without much reception on my phone. What follows are my own opinions and interpretations of the TEI Guidelines which are continually evolving based on community consensus.</p>
<h2>The @rend attribute</h2>
<p>The <a href="http://www.tei-c.org/">TEI</a> provides a <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html">@rend</a> attribute which indicates an interpretation of how the element in question was rendered or presented in the source text. It has nothing to say about what should be done with the element in any particular output from processing or displaying the TEI text. The assumption that many people make is that processing TEI means outputting HTML designed to help you read the text, but this is certainly not necessarily the case. The TEI text might have any number of outputs, just for reading it might be HTML, ePub, PDF, DOCX, and many more, moreover those encoding the texts might not be intending to read it but process it for other forms of text analysis in any number of formats. While individual projects can provide project documentation on how they intend certain elements to be presented in particular forms of output, other people processing those texts could choose to do something completely different.</p>
<h2>@type and @rend values and their whitespace</h2>
<p>During a TEI-L discussion concerning <a href="http://tei-l.970651.n3.nabble.com/Why-no-space-etc-in-DIV-s-type-td3840709.html">why the @type attribute did not allow spaces</a> it was explained that this is because the @type attribute does not contain free text, but a special token that categorises the element in some way. Moreover, the recommended practice is for projects to customise the TEI to constrain the choices available for the value of the @type attribute on some elements and document in their customisation exactly what those special tokens mean. @type attribute values are a datatype of data.enumerated which means that they are &#8220;expressed as a single XML name taken from a list of documented possibilities&#8221;. That means that this value has to obey the rules of what it means to be an XML name, and it should be from a set list that the project has documented (preferably in its TEI customisation, but possibly just in prose documentation preserved with the TEI file). Most elements that have a @type attribute get it from claiming membership in the att.typed attribute class, and if a secondary type classification is allowed they also get @subtype.</p>
<p>The discussion moved on (possibly because I referenced <a href="http://blogs.it.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/">my earlier post on @rend</a>) to the difference with the <a href="http://tei-l.970651.n3.nabble.com/CSS-in-rend-was-Re-Why-no-space-etc-in-DIV-s-type-td3850195.html">@rend attribute and using CSS inside it</a>. However, with the @rend attribute though the situation is slightly more confusing. It allows 1 to infinity occurrences of the datatype <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-data.word.html">data.word</a> in it. A data.word datatype &#8220;defines the range of attribute values expressed as a single word or token.&#8221; As I&#8217;ve discussed elsewhere, this means if someone marks up a text using:</p>
<blockquote><p>&lt;hi rend=&#8221;It looks a bit like that other one&#8221;&gt;text&lt;/hi&gt;</p></blockquote>
<p>This actually has 8 tokens “It”, “looks”, “a”, “bit”, “like”, “that”, “other”, “one”. The point is that the whitespace between these words in the attribute make these each separate values or tokens, not a phrase. The encoder might just have written:</p>
<blockquote><p>&lt;hi rend=&#8221;big bold beautiful&#8221;&gt;text&lt;/hi&gt;</p></blockquote>
<p>or indeed</p>
<blockquote><p>&lt;hi rend=&#8221;largeStyle42&#8243;&gt;text&lt;/hi&gt;</p></blockquote>
<p>The data.word datatype  says that &#8220;Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.&#8221;</p>
<p>Some encoders believe that the TEI should reverse its decision on free text in attributes and allow @rend to contain &#8220;It looks like that other one&#8221; and this not to be a set of discrete tokens. Personally, I disagree and feel that would be a retrograde step.</p>
<h2>@rend values and their order</h2>
<p>Other than defining it as a set of data.word occurrences the TEI does not dictate what the @rend values should look like. In my opinion it would be wrong if the TEI try to codify all the possible rendition values that appear in every sort of text. Moreover, describing the way something appears in a text is always an interpretative process and two separate encoders looking at the same text, or looking at it for different reasons, might perceive it in very different ways.  In fact the Guidelines explicitly say:</p>
<blockquote><p>&#8220;These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project.&#8221; (<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html</a>)</p></blockquote>
<p>Some encoders believe that it is a shame that the TEI has not defined a syntax by which they should specify the @rend attribute values. I disagree because I feel that the greatest flexibility should be given to projects and sub-communities to customise and constrain such values for themselves. It could be argued that the TEI has indeed provided a syntax, but in a very general way, that these are whitespace separated tokens containing only letters, digits, punctuation characters or symbols. The point is that these are entirely meant to be intended as magic tokens that individual projects can decide for the meaning for their own use (and document).  If I put in the magic token &#8216;bold&#8217; it might mean in my project something different than it means in yours.</p>
<p>It came out in the TEI-L discussion that some encoders believe that the order of @rend values provided should be important, as if they are making a phrase. Others tend to put the most important rendition classification first, and still others always provide different types of classification in the same order. I find these all prone to human inconsistency and so I choose to believe that they are an unordered set of values that could be entered in any order.  i.e. that:</p>
<blockquote><p>&lt;hi rend=&#8221;big bold beautiful&#8221;&gt;text&lt;/hi&gt;</p></blockquote>
<p>should be understood to be semantically equivalent to:</p>
<blockquote><p>&lt;hi rend=&#8221;beautiful big bold&#8221;&gt;text&lt;/hi&gt;</p></blockquote>
<p>My beliefs here are, perhaps unduly, influenced by long and painful experience in processing hand-encoded texts (which also influences my beliefs on the value of automatic and semi-automatic up-converting markup). In my encoding projects I recommend that no special significance be granted based on the order of the tokens present in the @rend value. The TEI, I think sensibly, allows individual projects to do what they want but does specify that these are individual tokens.</p>
<p>Some projects decide to put various standard presentation-description formats, e.g. Cascading StyleSheets, into the @rend attribute. I personally feel that this is misguided and sloppy. Partly this is because I suspect that some of them are actually encoding for a particular output format (rather than documenting what the original source looked like) and this is the wrong place to store this information. Partly this is because such presentation-description formats often use significant whitespace (which then means an abuse of the data.word datatype). And partly this is because I feel there is a better and easier way to do this more consistently using the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html">@rendition</a> attribute.</p>
<h2>@rendition and &lt;rendition&gt; really aren&#8217;t extreme</h2>
<p>As with many other things in the TEI, the Guidelines provide a simple use-case (@rend&#8217;s magic tokens) and a more complex system (@rendition). The @rendition attribute allows you to point to a &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-rendition.html">rendition</a>&gt; element up in the header where you can use any form of free text to describe how this was rendered in the original source. This means that instead of putting a set of magic tokens or classifications like &#8220;largeStyle42&#8243; an encoder can completely transparently point to a fuller description using the standard URI fragment pointing mechanism that is common throughout the TEI recommendations.  Thus instead of writing:</p>
<blockquote><p>&lt;hi rend=&#8221;largeStyle42&#8243;&gt;text&lt;/hi&gt;</p></blockquote>
<p>And having it documented somewhere what this meant.  The encoder can point to a &lt;rendition&gt; element by its @xml:id attribute and have a fuller description there.  For example this could be:</p>
<blockquote><p>&lt;hi rendition=&#8221;#largeStyle42&#8243;&gt;text&lt;/hi&gt;</p></blockquote>
<p>and while that doesn&#8217;t look much different the URL fragment &#8216;#largeStyle42&#8242; points to a place inside the TEI file&#8217;s &lt;teiHeader&gt;  (specifically inside the &lt;tagsDecl&gt; element) where there is a better description:</p>
<blockquote><p>&lt;rendition scheme=&#8221;free&#8221; xml:id=&#8221;largeStyle42&#8243;&gt;This text is really big, bold, and beautiful&lt;/rendition&gt;</p></blockquote>
<p>Okay, admittedly that might not be a very useful description. But the point with the &#8216;free&#8217; scheme is that it is free text. It can be any prose, in any language, and way of describing it.  The @scheme attribute also allows for &#8216;css&#8217; for those people wishing to use cascading stylesheet language, &#8216;xslfo&#8217; for those wanting to use extensible stylesheet language formatting objects, and &#8216;other&#8217; for those using another set rendition description language. So &#8216;#largeStyle42&#8242; could point to something using CSS that looked like:</p>
<blockquote><p>&lt;rendition scheme=&#8221;css&#8221; xml:id=&#8221;largeStyle42&#8243;&gt;<br />
font-weight:bold;<br />
font-size: 75pt;<br />
font-family:&#8221;brushstroke&#8221;, fantasy;<br />
color:#002147;<br />
&lt;/rendition&gt;</p></blockquote>
<p>If a more precise description (in whatever language) is able to be provided for &#8216;largeStyle42&#8242;, then this can be changed at a later date. Equally this could be broken up into multiple &lt;rendition&gt; elements and you can have:</p>
<blockquote><p>&lt;rendition scheme=&#8221;css&#8221; xml:id=&#8221;bold&#8221;&gt;font-weight:bold;&lt;/rendition&gt;<br />
&lt;rendition scheme=&#8221;css&#8221; xml:id=&#8221;big&#8221;&gt;font-size:75pt;&lt;/rendition&gt;<br />
&lt;rendition scheme=&#8221;css&#8221; xml:id=&#8221;beautiful&#8221;&gt;font-family:&#8221;brushstroke&#8221;, fantasy;&lt;/rendition&gt;<br />
&lt;rendition scheme=&#8221;css&#8221; xml:id=&#8221;oxBlue&#8221;&gt;color:#002147;&lt;/rendition&gt;</p></blockquote>
<p>and in the text:</p>
<blockquote><p>&lt;hi rendition=&#8221;#big #oxBlue #bold #beautiful&#8221;&gt;text&lt;/hi&gt;</p></blockquote>
<p>Moreover, because @rendition is one of the TEI&#8217;s many pointing elements it does not need to point to a &lt;rendition&gt; element in the very same file! Instead a project could centralise all their rendition information to a single place. So that might look like:</p>
<blockquote><p>&lt;hi rendition=&#8221;renditionFile.xml#largeStyle42&#8243;&gt;text&lt;/hi&gt;</p></blockquote>
<p>or indeed</p>
<blockquote><p>&lt;hi rendition=&#8221;http://www.example.com/renditionFile.xml#largeStyle42&#8243;&gt;text&lt;/hi&gt;</p></blockquote>
<p>Some encoders feel that pointing to a &lt;rendition&gt; element is a lot harder than just sticking some tokens into the @rend attribute. Others argue that as part of the process of hand encoding users should be able to add whatever they want to @rend, and for this to be valid because rationalising these in advance is more difficult than doing so afterwards. Or indeed that it is more convenient to encode unusual variants &#8216;in-line&#8217; rather than pointing back to the header. Both of these are good points, and have some truth to them. In the first case, it depends on the level of specification needed. Most encoders in my experience use very general and imprecise @rend categorisations. That is, they could have a rend value of &#8216;big72pt&#8217; but they tend to just use &#8216;big&#8217; (or small/medium/large/x-large).</p>
<p>How much time and energy one wants to spend worrying about specifying @rend and/or @rendition values depends on how important to your project that that this information is documented and done so in a consistent manner. If it is just that you want record whether something is in one of a handful of different colours, sizes, or styles, then you probably just want to agree a project specification of @rend values (and what they mean) for your TEI customisation.</p>
<h2>Other @rend issues</h2>
<p>Some encoders believe that there is no formal way of indicating what syntax you have used for your @rend values. I disagree because I believe these are magic tokens which are most properly documented in the TEI customisation. This enables an encoder to give a free text description for every magic token used in @rend attribute values, and moreover if they wish it enables a project to constrain it to be just this set of values. If a project is using a specified syntax inside their @rend attribute values (so-called &#8216;rendition ladders&#8217; are one such format) then this should be documented inside the &lt;<a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-encodingDesc.html">encodingDesc</a>&gt;, perhaps in prose or perhaps the TEI will add a mechanism in response to the TEI-L discussion which enables categorisation and description of the taxonomy of @rend attribute values.</p>
<h2>Changing @rend</h2>
<p>My arguments here are based on my own views and understanding of the current (P5 2.0.2) version of the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html">TEI Guidelines</a>. However, these are subject to change (both my views and the Guidelines).  I&#8217;ve often been told that the TEI recommendations seem like dictates coming down from on high saying &#8220;do it this way&#8221;, but that is really not how I view the TEI Guidelines or the community that creates them. The TEI is an open source project which takes solicitations for bug and feature requests from anyone and everyone. This can be from someone encoding their very first TEI document, reading the Guidelines for the first time, or it can be from those with a long history of experience with the TEI. Each and every bug and feature request should be considered on its own merits by the TEI Technical Council elected by the TEI community. [Note: there is scope for electoral reform, but this is a very different topic.]  The recommendations of the TEI are not a fixed quantity but an evolving record of the concerns and experience of the community that produces it. In many ways hearing what users new to the TEI have difficulty with, or where they find the Guidelines confusing is more valuable in the long run than some of the more arcane technical discussions.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2012/03/26/more-about-rend/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Self Study (part 1): Introducing XML and Markup</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 21:24:40 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[SelfStudy]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=284</guid>
		<description><![CDATA[I&#8217;m occasionally asked what people should read and do if they want to teach themselves TEI P5 XML. Where should they start? This depends, obviously, on what time they have and what resources. I tend to recommend directed intensive training &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m occasionally asked what people should read and do if they want to teach themselves <a href="http://www.tei-c.org" target="_blank">TEI P5 XML</a>. Where should they start? This depends, obviously, on what time they have and what resources. I tend to recommend directed intensive training such as the <a href="http://digital.humanities.ox.ac.uk/dhoxss/">Digital.Humanties@Oxford Summer School</a> as good ways to get an introduction to such topics.</p>
<p>However, some people are unable to participate in such training and prefer self-directed learning. What should they do? There are lots of resources online such as <a href="http://tbe.kantl.be/TBE/">TEI By Example</a> and the <a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html">TEI Guidelines</a>. Where to start?</p>
<p>When people are taking an Introduction to TEI workshop I usually introduce markup but move onto TEI and XML very quickly because in such intensive workshops time is limited. Instead, when people are undertaking self-directed learning I think they should use the time they have to learn more about HTML and then XML before starting to learn about the TEI vocabulary of XML itself.</p>
<p>There is so much reading that is possible to suggest for an initial exploration of XML and Markup.  I would suggest at least looking at:</p>
<ul>
<li><a href="http://www.w3schools.com/html/">http://www.w3schools.com/html/</a> (basic section and xhtml section)</li>
<li><a href="http://www.w3schools.com/xml/">http://www.w3schools.com/xml/</a> (basic section and page on namespaces)</li>
<li><a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html</a></li>
<li><a href="http://en.wikipedia.org/wiki/XML">http://en.wikipedia.org/wiki/XML</a></li>
<li><a href="http://www.tei-c.org/About/Archive_new/ETE/Preview/guidelines.xml">http://www.tei-c.org/About/Archive_new/ETE/Preview/guidelines.xml</a></li>
<li><a href="http://www.tei-c.org/About/Archive_new/ETE/Preview/principles.xml">http://www.tei-c.org/About/Archive_new/ETE/Preview/principles.xml</a></li>
<li><a href="http://www.youtube.com/watch?v=NLlGopyXT_g">http://www.youtube.com/watch?v=NLlGopyXT_g</a></li>
<li><a href="http://en.wikipedia.org/wiki/RSS">http://en.wikipedia.org/wiki/RSS</a></li>
</ul>
<p>as a good start.</p>
<p>If I were to suggest a series of assignments someone might undertake based on this reading it would be to do the following, writing up answers to the questions.</p>
<ol>
<li>Read the W3Schools HTML basic section and XHTML section, do the HTML and XHTML quizzes</li>
<li>Read the W3Schools XML basic section and XML Namespaces page, do the XML quiz</li>
<li>Read the TEI Guidelines Gentle Introduction to XML; and the wikipedia article on XML.</li>
<li>How does XML differ from HTML? Why might it be more powerful to describe what some piece of data is, rather than say how it should be presented?</li>
<li>Download and install the <a href="http://www.oxygenxml.com">oXygen XML editor</a> (you can get a 1 month free trial license, otherwise costs $64 USD)</li>
<li>Choose a very short (1 page) sample of a document you are interested in.</li>
<li>Create a list of the overall structural aspects you feel define this sort of document. Create a list of any of data-like entries (like names or dates) in the document. Create a list of presentational aspects of the document that you think important to record.</li>
<li>Funding challenge part 1: Hypothetically, imagine you had funding to mark up several thousand pages of this material. Look at the list of aspects you would like to record. Why is each one important? What benefit does recording each of these things give those wanting to use or understand the text (or culture from which it originates)? Which would you choose to markup? How consistently can you mark up this feature? Such document analysis should be done long before any project starts (or asks for funding).</li>
<li>Funding challenge part 2: An uncaring government has slashed its funding for higher education research projects and has reduced your project&#8217;s funding by 50%! What would you do? Will you mark up only 50% of the material? If so, how do you decide which parts? Will you only mark up certain aspects? If so, which ones and why?</li>
<li>Using the &#8216;Text&#8217; (code view) mode of the  oXygen XML editor create a well-formed XML file of your sample document with elements and attributes that you have invented yourself. What difficulties do you encounter doing this?</li>
<li>Why might it be better for communities of users to agree on elements, what they mean, and how they should be used?</li>
<li>What are the central ideas of Michael Wesch&#8217;s youtube video? How do they relate to the nature of XML and how it is used?</li>
<li>Read the wikipedia article on RSS, and find an RSS feed to subscribe to in google reader to see its application.</li>
<li>Does order really matter in an XML document?  What is the difference between:<br />
<blockquote><p>&lt;list&gt;&lt;item n=&#8221;1&#8243;&gt;item 1&lt;/item&gt;&lt;item n=&#8221;2&#8243;&gt;item number 2&lt;/item&gt;&lt;/list&gt;  and<br />
&lt;list&gt;&lt;item n=&#8221;2&#8243;&gt;item number 2&lt;/item&gt;&lt;item n=&#8221;1&#8243;&gt;item 1&lt;/item&gt;&lt;/list&gt;</p></blockquote>
<p>And how much difference does this make when viewing XML as a data storage format rather than a presentational one?</li>
<li>Join the<a href="http://www.tei-c.org/Support/index.xml#tei-l"> TEI-L mailing list</a> and start lurking.</li>
</ol>
<p>This certainly isn&#8217;t exhaustive, but with a bit of support, I suggest someone undertaking this would be much better placed to start learning about TEI P5 XML from the online sources available.</p>
<p>The next post in this series is an <a href="http://blogs.it.ox.ac.uk/jamesc/2013/01/23/self-study-part-2-introduction-to-the-text-encoding-initiative-guidelines/">Introduction to the Text Encoding Initiative Guidelines</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2012/03/15/self-study-introducing-xml-and-markup/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>@rend and the war on text-bearing attributes</title>
		<link>http://blogs.it.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/</link>
		<comments>http://blogs.it.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 21:25:44 +0000</pubDate>
		<dc:creator>James Cummings</dc:creator>
				<category><![CDATA[TEI]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/jamesc/?p=280</guid>
		<description><![CDATA[In discussing that the TEI attribute @rend from att.global although it allows you to type just about anything in it, doesn&#8217;t actually allow anything more that a set of single tokens. I recently explained to John, Paul, George, or Ringo &#8230; <a href="http://blogs.it.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In discussing that the <a href="http://www.tei-c.org/">TEI</a> attribute @rend from <a href="http://www.tei-c.org/Vault/P5/current/doc/tei-p5-doc/en/html/ref-att.global.html">att.global</a> although it allows you to type just about anything in it, doesn&#8217;t actually allow anything more that a set of single tokens. I recently explained to John, Paul, George, or Ringo (can&#8217;t remember which), that it really doesn&#8217;t mean that spaces are allowed, simply that whitespace is the delimiter in the attribute value.</p>
<p>The definition of @rend is &#8220;(rendition) indicates how the element in question was rendered or presented in the source text.&#8221; but it is very often used by some encoders to signal to processing how you want the <strong>output </strong>to appear.  In the remarks on the values allowed for the attribute it says:</p>
<blockquote><p>may contain any number of tokens, each of which may contain letters, punctuation marks, or symbols, but not word-separating characters.</p></blockquote>
<p>The point here being the &#8216;word-separating characters&#8217; part. So although you can say &lt;hi rend=&#8221;It looks a bit like that other one&#8221;&gt;text&lt;/hi&gt;, this actually has 8 tokens &#8220;It&#8221;, &#8220;looks&#8221;, &#8220;a&#8221;, &#8220;bit&#8221;, &#8220;like&#8221;, &#8220;that&#8221;, &#8220;other&#8221;, &#8220;one&#8221;. Sometimes people stick CSS or CSS-like rendition information into @rend so have values like &#8220;text-align: right&#8221;. Which I would say was wrong&#8230; or at least saying that there are two classifications applicable to its rendition in the source material, one that it is &#8220;text-align:&#8221; and another that it is &#8220;right&#8221;.  Of course they could solve this just be deleting the space &#8220;text-align:right&#8221; would be better, or even &#8220;text-align:right; font-size:large;&#8221; if you wanted to add another token.  However, even better would be to use @rendition to point to at least one @xml:id of a &lt;rendition&gt; element in the header.  This allows you to specify exactly what scheme you are using (e.g. CSS) and to give multiple statements for one classification.</p>
<p>Why does this matter you might ask? Well, of course, it doesn&#8217;t really &#8212; they are all magic tokens of one sort or the other to be interpreted (or not) by your processing for whatever reason you are undertaking the encoding. The &lt;rendition&gt; method is the most detailed in documenting precisely how you are interpreting the rendition in the original document.</p>
<p>However, the reason it matters to me is that there are NO attributes in the TEI which allow free-text.</p>
<p>By that I mean that all attributes are assigned to one datatype or another, and in none of them can you just type sentences of prose and have it be semantically meaningful.  This is as a result of the long <strong>War on Text-Bearing Attributes </strong>that was undertaken in the run-up to the first release of TEI P5. This took as one of its many principles that because <strong>any</strong> bit of free text <em>might</em> have a need to use a non-Unicode character, and that the TEI&#8217;s method for documenting non-Unicode characters was to use its &lt;<a href="http://www.tei-c.org/Vault/P5/current/doc/tei-p5-doc/en/html/ref-g.html">g</a>&gt; element, that you couldn&#8217;t have free-text attributes because you can&#8217;t use an element inside an attribute value. This is the reason for the creation of many new child elements like &lt;desc&gt; which are intended to contain free text concerning the nature of the element that contains them.</p>
<p>In the case of the @rend attribute it allows one to infinity of the <a href="http://www.tei-c.org/Vault/P5/current/doc/tei-p5-doc/en/html/ref-data.word.html">data.word</a> datatype.  This data type, even in <a href="http://www.tei-c.org/Vault/P5/1.0.0/doc/tei-p5-doc/en/html/ref-data.word.html">P5 1.0.0</a> &#8220;defines the range of attribute values expressed as a single word or token.&#8221;  Thus when people put space separated characters into it, they are really putting in multiple tokens.  The war of text-bearing attributes attempted to limit the places where people were able to do this by the use of <a href="http://www.tei-c.org/Vault/P5/1.0.0/doc/tei-p5-doc/en/html/REF-MACROS.html">datatypes</a> and the removal of free text in attribute values.</p>
<p>This helps to highlight the difference between syntactic and semantic validity. Just because your document validates against a schema, does not mean that it is semantically valid.  You can put the text of a title inside an &lt;<a href="http://www.tei-c.org/Vault/P5/current/doc/tei-p5-doc/en/html/ref-author.html">author</a>&gt; element and vice-versa and there is no way your schema can know that you have done this.</p>
<p>So really, I&#8217;ve posted this post so I can point to it later when people ask me about spaces in @rend and similar datatype kerfuffles.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
