Rehdon asked me about giving @xml:id attributes to things, so I whipped up this quick XSLT stylesheet. Some people prefer to use generate-id() to get a truly random and unique ID without semantic baggage. In many cases, where IDs are exposed to the public, I prefer to use some which make sense and are human readable.

Warning: there is a distinct flaw in the lack of testing I’ve done before applying the @xml:id. If something other than a <p> element already has xml:id=”p5″ then it will still add ‘p5’ as an @xml:id to the fifth paragraph. This means that it will produce an xml document that is not well-formed since one of the requirements of @xml:id is that it is unique in the document. Also it would number paragraphs in other namespaces as well. (This may be a bug or a feature depending on your outlook.) It numbers from tei:text so if you don’t have that in your document you should change that variable.

The XSLT stylesheet takes a parameter ‘e’ which you can pass the local-name of the element in question. It assumes ‘p’ otherwise, but you could use it number div, head, w, or really any element just by passing it e=w (or whatever).

Update: Rehdon asked about a configurable optional prefix to the ID and a 4-digit zero-padded number for it. So I changed the script to do that.

   <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl=""
    <!-- Parameter to pass to the stylesheet, assumes 'p' if nothing given -->
    <xsl:param name="e" select="'p'"/>
    <!-- If it exists, a prefix string: include a separator, like 'text1_' to get 'text1_p0005' -->
    <xsl:param name="pre"/>
    <!-- typical copy-all template -->
    <xsl:template match="@*|node()|comment()|processing-instruction()" priority="-1">
    <xsl:copy><xsl:apply-templates select="@*|node()|comment()|processing-instruction()"/></xsl:copy>
    <!-- higher priority one to match elements -->
    <xsl:template match="*" >
    <!-- If the local-name is the element we've passed it, and there is not an @xml:id attribute  -->
    <xsl:if test="local-name() = $e and not(@xml:id)">
    <!-- make a variable numbering current nodes at any level from tei:text -->
    <xsl:variable name="num"><xsl:number level="any" from="tei:text" format="1111"/></xsl:variable>
    <!-- Then create an @xml:id attribute with the name and the number concatenated -->
    <xsl:attribute name="xml:id"><xsl:value-of select="concat($pre, local-name(), $num)"/></xsl:attribute>
    <!-- apply any other templates (i.e. copy other stuff) -->
    <xsl:apply-templates select="@*|node()|comment()|processing-instruction()"/></xsl:copy>

Hope that is useful. I’ll try to remember to add it to the TEI wiki as well.

Posted in TEI, XSLT | 2 Comments

2 Responses to “addingIDs”

  1. robertordt says:

    Great job as usual, James, should rightly be posted on the TEI wiki. Of course I already have two enhancement requests ;) :

    – customizable prefix, i.e. text1_p300 instead of simply p300
    – 0 padding lower numbers, i.e. w00001 instead of w1

    Keep up the good work! :)


  2. James Cummings says:

    Updated stylesheet to do this. pass a ‘pre’ parameter of ‘text1_’ to get text1_p0300


Leave a Reply