More about @rend

Lou Burnard has provided a technical summary of some of the recently issues discussed concerning @rend, but I thought I might provide some more explanation for those not as familiar with the technical background to the discussion. I would have done so sooner but was driving around too narrow farm roads in Cornwall on holiday without much reception on my phone. What follows are my own opinions and interpretations of the TEI Guidelines which are continually evolving based on community consensus.

The @rend attribute

The TEI provides a @rend attribute which indicates an interpretation of how the element in question was rendered or presented in the source text. It has nothing to say about what should be done with the element in any particular output from processing or displaying the TEI text. The assumption that many people make is that processing TEI means outputting HTML designed to help you read the text, but this is certainly not necessarily the case. The TEI text might have any number of outputs, just for reading it might be HTML, ePub, PDF, DOCX, and many more, moreover those encoding the texts might not be intending to read it but process it for other forms of text analysis in any number of formats. While individual projects can provide project documentation on how they intend certain elements to be presented in particular forms of output, other people processing those texts could choose to do something completely different.

@type and @rend values and their whitespace

During a TEI-L discussion concerning why the @type attribute did not allow spaces it was explained that this is because the @type attribute does not contain free text, but a special token that categorises the element in some way. Moreover, the recommended practice is for projects to customise the TEI to constrain the choices available for the value of the @type attribute on some elements and document in their customisation exactly what those special tokens mean. @type attribute values are a datatype of data.enumerated which means that they are “expressed as a single XML name taken from a list of documented possibilities”. That means that this value has to obey the rules of what it means to be an XML name, and it should be from a set list that the project has documented (preferably in its TEI customisation, but possibly just in prose documentation preserved with the TEI file). Most elements that have a @type attribute get it from claiming membership in the att.typed attribute class, and if a secondary type classification is allowed they also get @subtype.

The discussion moved on (possibly because I referenced my earlier post on @rend) to the difference with the @rend attribute and using CSS inside it. However, with the @rend attribute though the situation is slightly more confusing. It allows 1 to infinity occurrences of the datatype data.word in it. A data.word datatype “defines the range of attribute values expressed as a single word or token.” As I’ve discussed elsewhere, this means if someone marks up a text using:

<hi rend=”It looks a bit like that other one”>text</hi>

This actually has 8 tokens “It”, “looks”, “a”, “bit”, “like”, “that”, “other”, “one”. The point is that the whitespace between these words in the attribute make these each separate values or tokens, not a phrase. The encoder might just have written:

<hi rend=”big bold beautiful”>text</hi>

or indeed

<hi rend=”largeStyle42″>text</hi>

The data.word datatype says that “Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.”

Some encoders believe that the TEI should reverse its decision on free text in attributes and allow @rend to contain “It looks like that other one” and this not to be a set of discrete tokens. Personally, I disagree and feel that would be a retrograde step.

@rend values and their order

Other than defining it as a set of data.word occurrences the TEI does not dictate what the @rend values should look like. In my opinion it would be wrong if the TEI try to codify all the possible rendition values that appear in every sort of text. Moreover, describing the way something appears in a text is always an interpretative process and two separate encoders looking at the same text, or looking at it for different reasons, might perceive it in very different ways. In fact the Guidelines explicitly say:

“These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project.” (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html)

Some encoders believe that it is a shame that the TEI has not defined a syntax by which they should specify the @rend attribute values. I disagree because I feel that the greatest flexibility should be given to projects and sub-communities to customise and constrain such values for themselves. It could be argued that the TEI has indeed provided a syntax, but in a very general way, that these are whitespace separated tokens containing only letters, digits, punctuation characters or symbols. The point is that these are entirely meant to be intended as magic tokens that individual projects can decide for the meaning for their own use (and document). If I put in the magic token ‘bold’ it might mean in my project something different than it means in yours.

It came out in the TEI-L discussion that some encoders believe that the order of @rend values provided should be important, as if they are making a phrase. Others tend to put the most important rendition classification first, and still others always provide different types of classification in the same order. I find these all prone to human inconsistency and so I choose to believe that they are an unordered set of values that could be entered in any order. i.e. that:

<hi rend=”big bold beautiful”>text</hi>

should be understood to be semantically equivalent to:

<hi rend=”beautiful big bold”>text</hi>

My beliefs here are, perhaps unduly, influenced by long and painful experience in processing hand-encoded texts (which also influences my beliefs on the value of automatic and semi-automatic up-converting markup). In my encoding projects I recommend that no special significance be granted based on the order of the tokens present in the @rend value. The TEI, I think sensibly, allows individual projects to do what they want but does specify that these are individual tokens.

Some projects decide to put various standard presentation-description formats, e.g. Cascading StyleSheets, into the @rend attribute. I personally feel that this is misguided and sloppy. Partly this is because I suspect that some of them are actually encoding for a particular output format (rather than documenting what the original source looked like) and this is the wrong place to store this information. Partly this is because such presentation-description formats often use significant whitespace (which then means an abuse of the data.word datatype). And partly this is because I feel there is a better and easier way to do this more consistently using the @rendition attribute.

@rendition and <rendition> really aren’t extreme

As with many other things in the TEI, the Guidelines provide a simple use-case (@rend’s magic tokens) and a more complex system (@rendition). The @rendition attribute allows you to point to a <rendition> element up in the header where you can use any form of free text to describe how this was rendered in the original source. This means that instead of putting a set of magic tokens or classifications like “largeStyle42″ an encoder can completely transparently point to a fuller description using the standard URI fragment pointing mechanism that is common throughout the TEI recommendations. Thus instead of writing:

<hi rend=”largeStyle42″>text</hi>

And having it documented somewhere what this meant. The encoder can point to a <rendition> element by its @xml:id attribute and have a fuller description there. For example this could be:

<hi rendition=”#largeStyle42″>text</hi>

and while that doesn’t look much different the URL fragment ‘#largeStyle42′ points to a place inside the TEI file’s <teiHeader> (specifically inside the <tagsDecl> element) where there is a better description:

<rendition scheme=”free” xml:id=”largeStyle42″>This text is really big, bold, and beautiful</rendition>

Okay, admittedly that might not be a very useful description. But the point with the ‘free’ scheme is that it is free text. It can be any prose, in any language, and way of describing it. The @scheme attribute also allows for ‘css’ for those people wishing to use cascading stylesheet language, ‘xslfo’ for those wanting to use extensible stylesheet language formatting objects, and ‘other’ for those using another set rendition description language. So ‘#largeStyle42′ could point to something using CSS that looked like:

<rendition scheme=”css” xml:id=”largeStyle42″>
font-weight:bold;
font-size: 75pt;
font-family:”brushstroke”, fantasy;
color:#002147;
</rendition>

If a more precise description (in whatever language) is able to be provided for ‘largeStyle42′, then this can be changed at a later date. Equally this could be broken up into multiple <rendition> elements and you can have:

<rendition scheme=”css” xml:id=”bold”>font-weight:bold;</rendition>
<rendition scheme=”css” xml:id=”big”>font-size:75pt;</rendition>
<rendition scheme=”css” xml:id=”beautiful”>font-family:”brushstroke”, fantasy;</rendition>
<rendition scheme=”css” xml:id=”oxBlue”>color:#002147;</rendition>

and in the text:

<hi rendition=”#big #oxBlue #bold #beautiful”>text</hi>

Moreover, because @rendition is one of the TEI’s many pointing elements it does not need to point to a <rendition> element in the very same file! Instead a project could centralise all their rendition information to a single place. So that might look like:

<hi rendition=”renditionFile.xml#largeStyle42″>text</hi>

or indeed

<hi rendition=”http://www.example.com/renditionFile.xml#largeStyle42″>text</hi>

Some encoders feel that pointing to a <rendition> element is a lot harder than just sticking some tokens into the @rend attribute. Others argue that as part of the process of hand encoding users should be able to add whatever they want to @rend, and for this to be valid because rationalising these in advance is more difficult than doing so afterwards. Or indeed that it is more convenient to encode unusual variants ‘in-line’ rather than pointing back to the header. Both of these are good points, and have some truth to them. In the first case, it depends on the level of specification needed. Most encoders in my experience use very general and imprecise @rend categorisations. That is, they could have a rend value of ‘big72pt’ but they tend to just use ‘big’ (or small/medium/large/x-large).

How much time and energy one wants to spend worrying about specifying @rend and/or @rendition values depends on how important to your project that that this information is documented and done so in a consistent manner. If it is just that you want record whether something is in one of a handful of different colours, sizes, or styles, then you probably just want to agree a project specification of @rend values (and what they mean) for your TEI customisation.

Other @rend issues

Some encoders believe that there is no formal way of indicating what syntax you have used for your @rend values. I disagree because I believe these are magic tokens which are most properly documented in the TEI customisation. This enables an encoder to give a free text description for every magic token used in @rend attribute values, and moreover if they wish it enables a project to constrain it to be just this set of values. If a project is using a specified syntax inside their @rend attribute values (so-called ‘rendition ladders’ are one such format) then this should be documented inside the <encodingDesc>, perhaps in prose or perhaps the TEI will add a mechanism in response to the TEI-L discussion which enables categorisation and description of the taxonomy of @rend attribute values.

Changing @rend

My arguments here are based on my own views and understanding of the current (P5 2.0.2) version of the TEI Guidelines. However, these are subject to change (both my views and the Guidelines). I’ve often been told that the TEI recommendations seem like dictates coming down from on high saying “do it this way”, but that is really not how I view the TEI Guidelines or the community that creates them. The TEI is an open source project which takes solicitations for bug and feature requests from anyone and everyone. This can be from someone encoding their very first TEI document, reading the Guidelines for the first time, or it can be from those with a long history of experience with the TEI. Each and every bug and feature request should be considered on its own merits by the TEI Technical Council elected by the TEI community. [Note: there is scope for electoral reform, but this is a very different topic.] The recommendations of the TEI are not a fixed quantity but an evolving record of the concerns and experience of the community that produces it. In many ways hearing what users new to the TEI have difficulty with, or where they find the Guidelines confusing is more valuable in the long run than some of the more arcane technical discussions.

Posted in TEI, XML | 1 Comment

One Response to “More about @rend”

  1. Piotr says:

    Thanks for your posts and the links — they have been very useful to me.

    One thing I was missing is a demonstration of where data.word might actually harm CSS. I’ve found these in one of my stylesheets:

    publicationStmt > date:before {content: “Publication date: “}
    pubPlace:before {content: “Published at: “}

    rend=’content:”Publication date: “‘ would be split in an ugly manner indeed. Would the ugliness harm, though? Note that I am assuming, with you, that it may be better not to go back to free text in attributes — but now my question is rather “so what if it’s data.word?”. Well, for the case at hand, we either reassemble the tokens upon processing, or we admit that whenever a space has to be used, is the way to go (and this is the non-fuzzy threshold for such cases).

    Suppose that someone does exploratory markup. They want to use @rend, because it’s much handier in this sort of enterprise. Well, they use “magic tokens” in @rend, they have to, because that is the datatype. But what if their “magic tokens” just happen to be simple CSS statements? They are still magic, and the encoders may just be used to express themselves in CSS magic. And note that, in exploratory markup, we are talking of the activity of “recording the rendering of the source”, so everything is kosher.

    This last item is again something that I find itchy. In an earlier post, you say:

    >>The definition of @rend is “(rendition) indicates how the element in question was rendered or presented in the source text.” but it is very often used by some encoders to signal to processing how you want the *output* to appear.<<

    To which my reaction is: well, high time the definition got changed, why is it so totalitarianistically monodirectional? Not everyone uses the TEI to only encode the past — look at the Guidelines themselves…

    I started to rehash these issues in the context of my tracker item on expressing simple rendering information in the Guidelines themselves:
    https://sourceforge.net/tracker/?func=detail&atid=644062&aid=3519682&group_id=106328

    What worries me here is that, well, suppose we get rid of @rend because it's so horribly bad (well, suppose). We have the powerful instead and still, we are not willing to express the simple fact that “this little piece of the Guidelines should be rendered in monospaced font”?? If we admit to that, I think we will be admitting to quite a lot, and in fact losing part of the potential target audience. . Thanks again for your posts, I feel I should explore this place further.