TEI P4 Support, Survey Results


This post contains the results of a survey that  collected information which the TEI Technical Council will use to assess the need for ongoing support for the TEI P4 version of its Guidelines. These have largely been replaced by the TEI P5 Guidelines since November 2007. At that point it was promised that support would continue for TEI P4 for 5 years, until November 2012. As that is just over a year away we are starting a slow process of phasing out support for the TEI P4 Guidelines. The TEI Technical Council is planning to de-emphasize the appearance of TEI P4 as an offering since support for it will be ending in November 2012. We will continue to support it over the next year but may take steps to stop it being indexed by search engines or make it less prominent on the website. These are the results of this survey, which I’ve also transformed to TEI P5 XML at http://users.ox.ac.uk/~jamesc/SurveySummary.tei.xml.

1. Are you involved with projects that are still using TEI P4?

Answers for Question 1

My reading of these results is that many people are either not using TEI P4, or planning to migrate it to TEI P5. I suspect, given the other answers that those with TEI P4 projects probably do not rely on a lot of support from the TEI Consortium.

2. How important is ongoing TEI P4 support to you?

Answers to question 2

This seems fairly clear: out of 54 respondents 44 said it was not important, unnecessary or that we should get rid of it. But that it is important or very important for 18.5% of respondents is still significant and must be remember when making decisions concerning ongoing support for TEI P4.

3. How much should the TEI Consortium begin to de-emphasize TEI P4 on its website before November 2012?

Answers to question 3

There seems to be a strong vote for making TEI P4 available only from the TEI Vault and making sure existing links redirect.

4. Should search engines be dissuaded from index TEI P4 materials?

Answers to question 4

This result is less clear cut with some people feeling it shouldn’t be indexed, and some people thinking it should be (with slightly more weight on it being indexed than not indexed).

5. Approximately how many TEI P4 projects have you been involved with?

Answers to question 5

This is simply a statistical question (and of course depends how the respondent interprets ‘projects’). It is interesting that the majority of people seem to be involved with more than one project, but that is hardly unexpected. More were involved with 6-15 projects than I thought.

6. Approximately how many TEI P5 projects have you been involved with?

Answers to question 6

It is interesting that the percentages are vaguely the same as with TEI P4 projects, though slightly higher overall.

7. What amount of TEI P4 data do your projects have? (In documents, number of files, how many megabytes, or whatever convenient measure makes sense for your project)

This was a textual question, attempting to get a measure of how much TEI P4 stuff people have. It was deliberately left vague as to how it should be expressed, partly because I was interested to see how people would quantify their TEI P4 data, and partly because I recognise that it would be difficult to provide all the same form of measurement.  I was interested to see that this ranged more widely than I had expected.

  • 0
  • none
  • zero
  • Several hundred files.
  • I have about 500 texts
  • 3,200 files, 170Mb.
  • nil
  • Very roughly: 60,000 books = 5 million pages = 10 GB of marked-up text.
  • 40 megabytes in the one P4 project I still manage; a bunch more in ones I’m no longer involved in.
  • This varies a lot, but projects range from 3-150 MB In practice, the TEI files are a small part of the overall operation, which includes authority information usually in non-TEI format, and various generated TEI XML files used for web publication only
  • 50 files
  • Appx. 7000 files, 29 MB total data
  • Appr. 6500 documents (mostly letters)
  • 0
  • less than 10%
  • 0
  • about 3,000 XML files currently in P4.
  • in summa: about 4 Mb
  • All of the [Institution]’s projects are in migration from p4 to p5, so this is a snapshot of the migration process. The data is migrated, but the sites are not all rewritten yet. My hope is that by May of 2012, all of the current [Institution] sites will be serving out texts based on p5.
  • 0
  • Help files used by about 1000 Modes users.
  • 5 text-critical editions
  • 7000+ [P4 Customization] encoded letters
  • Main current project: several dozen megabytes including a few large files but mostly 10-20 kb: roughly 3000 files.
  • Roughly twelve published electronic editions, with at least a dozen more in the pipeline, in process of being finished (though they now have to be migrated to be published).
  • I have no clue, but it’s a lot.
  • The [Institution] has 113MB bytes of P4 documents, of archival interest only.
  • None, since we upgraded.
  • I’m not sure. I think I might have one project that is in TEI P4, but it’s a legacy project and I’m actually not positive. I haven’t looked at it in a while.
  • 2.5 million text pages
  • zero
  • None
  • Between 300 and 600 files.
  • ca. 70 files
  • dozens of documents.
  • Lots. Can’t access the figures quickly.
  • 700MB

This ranges from zero to multiple gigabytes of TEI text. What I should have asked was “And is all the TEI freely available for download?” as, of course, that is something I’d like to encourage.

8. Please list the URLs of any TEI P4 projects you want us to know about.

I’ve decided not to provide these on this summary, if projects wish to provide samples they should add them to  http://wiki.tei-c.org/index.php/Samples and/or describe their projects on the wiki.

9. Please list the URLs of any TEI P5 projects you want us to know about.

I’ve decided not to provide these on this summary, if projects wish to provide samples they should add them to  http://wiki.tei-c.org/index.php/Samples and/or describe their projects on the wiki.

10. Have you submitted a Bug or Feature Request to the TEI Technical Council?

Answers to question 10

Lots of people have provided bug or feature requests,  but most people have either contributed to discussion or not contributed them. We should, of course, strive to increase feedback from the TEI community. I’d be interested in any ideas on how to make this easier for the community to participate.

11. Where do you think the TEI Technical Council should expend its time and effort?

Answers to question 11

This is also an interesting result.  Scoring highest on ‘top priority’ is the idea that the TEI Technical Council should spend its time fixing bugs and implementing feature requests by the community. This, and analysing where the TEI Guidelines could be improved and undertaking these improvements was also ranked highly, along with developing the infrastructural basis for future versions of the TEI Guidelines. What  scored lower was the idea of the TEI Technical Council setting up a repository of TEI texts, or developing software to make publication of TEI texts easier. I would suspect that this is because that maintaining the Guidelines is the central mandate of the TEI Technical Council, and looking for how it can be improved is related to that, while the creating of repositories is already done better by people who already focus on those activities.  Although it is a community-based activity only the TEI is really in charge of maintaining the Guidelines, whereas any third-party can develop software or archives.  We should certainly encourage those activities and implement community suggestions which facilitate the greater development of community software.

12. Any other comments?

Here are the comments that I received (lightly edited), with my personal responses:
For people with large repositories of transcriptions (where the text content will never be updated), markup stability is essential. P4 to P5 is not essential but recommended, but it’s going to mean a huge effort. My worry is that there will be a far too rapid succession to P6, P7, P8, etc which adds bells and whistles but does not contribute anything meaningful to static repositories.
There is not necessarily any reason to migrate if your systems are set up and working fine with P4. I would, personally, recommend using P5 in any new project.  And then you probably reach a state where it is easier to migrate the P4 to P5 than support multiple systems, but different people’s experiences will vary.  The Birnbaum Doctrine suggested that the TEI Council should only move to new major versions (P6 etc.) when a large external technological change meant that it would be beneficial (e.g. SGML to XML) or a large internal infrastructural change (e.g. development of the P5 class system) was deemed significantly beneficial. I personally do not believe that we are at a juncture which would necessitate development of P6, rather I’d prefer to see P5 2.5, P5 4.5, P5 35.2, etc. than have people feel they need to move major versions.  This has its own challenges, of course, and your project in its TEI ODD can point to the very specific version of TEI P5 that it uses.
Yes – thanks for doing such a great service to the community!
You’re welcome, it was my pleasure. Although I know filling in surveys can be annoying I think it is a quick and easy way to get at least a vague indication of the community’s feeling on certain issues.
I think that lack of easy tools for presentation / publication od TEI documents is a serious drawback. Many of my younger colleagues would learn (or actually have learned) the TEI editing in Oxygen, but they are unable — and not willing! — to learn XSLT for the presentation of their texts (not to mention the publication – servers etc.). An average user who is not able to modify Sebastian’s stylesheets for his edition is left completely alone with his/her TEI document (only *exceptionally*, an XSL-expert is available for help in big institutions). As for now, the TEI is an ideal tool for only one part of the communication chain — but not for the whole …
This is of course difficult, but so is the publication of research in print or other mediums. Usually these forms of publication involve the work of other people, for which researchers pay in one way or another.  Perhaps it is because I happen to help manage a service, InfoDev,  which would be more than happy to undertake paid work in this area for you and other external institutions, but I don’t see this as much as a hurdle.  If the research is worthwhile, then hopefully funding is available, and some of this could be budgeted for technical development.  However, that said, researchers often spend years learning ancient languages or obscure discipline-based technicalities, and arguably they should be able to learn some basic XSLT and HTML with a very small dedication of their time.  Whether they should and could do this is, of course, a personal decision, but these are just more tools in a toolbox that might also include knowledge of how to write complex statistical queries or how to collaborate using version control systems. But again, we’re happy to undertake work, especially TEI-related work, from any part of the digitization to publication, analysis and visualization aspects of research projects.
Perhaps, a marketing campaign would help.
This would perhaps help get more people involved in the TEI. We would want, I suggest, that anyone doing a humanities text project applying for funding should feel (or get the advice that) they should be using the TEI (or at least justifying why they are using some other open standard instead). I feel this is probably more in the mandate of the TEI Board than the TEI Technical Council, but would encourage SIGs and indeed individuals to undertake whatever outreach activities are feasible.
about question 11 : it would be interesting to relate software/tools development and training/workshop. offering training sessions dedicated to one tool or category of tools, and looking at how people use tools IRL during the training sessions to get a better idea of need specifications… ?
This would be interesting, though those who have been just trained in tools are likely to perceive different needs from those who use them on a daily basis. But I do wonder whether this should be a priority for the TEI Technical Council, who has its hands full maintaining, improving, and extending the Guidelines themselves.  We should encourage tool development by third parties, and facilitate this development where it is in our power.
Please, please, please don’t spend time and money on building a TEI-wide repository. Instead, convince Google to recognize the TEI format so that one can easily do a web search for TEI texts. Then, get people to put their texts on the web. I think the building of publishing tools and education are very important, but that they shouldn’t be Council functions per se. Similarly, I think the interchange question is very, very important, but Council’s role in it should be limited. This is the kind of thing a SIG (or SIGs) should tackle, and Council should be involved in blessing/criticizing their output.
Personally, I agree with you about building repositories. I feel there are more than enough people with a lot more experience in undertaking this kind of activity.  There already has been discussion and work with Google regard exporting from Google Books in TEI P5 XML format which are promising. I agree the community, potentially through SIGs can handle a lot of these issues. I worry about the idea of it “blessing/criticizing” the output of SIGs, rather than just being on hand to provide support and implement changes recommended by them.
Creating and managing a content repository is vastly different from developing and maintaining markup guidelines, and would require a serious redirection of TEI-c’s resources. Let others who are already in the repo business (e.g., HathiTrust, OTA) take care of that.
I would agree with this, and it is what I would recommend to the TEI.
Thank you for undertaking this survey.

You’re welcome, it was my pleasure. I’m always interested in getting a sense of where the TEI community agrees on certain issues.

13. You may optionally include your email address so we can contact you if (and only if) we have any follow-up questions concerning your responses.

I’m certainly not going to provide these for spam-bots!


My recommendation to the TEI Council is going to be that we slowly start phasing out TEI P4 support. Closer to the end-of-support date (November 2012) we should move the TEI P4 materials to the TEI Vault but redirect links to there. I think this survey bears out my belief that the TEI Technical Council should focus on the maintenance and improvement of the Guidelines, and looking for ways to improve these in the future.

Posted in TEI | 5 Comments

5 Responses to “TEI P4 Support, Survey Results”

  1. Dot Porter says:

    Regarding question 4, I’m pretty sure I misunderstood that when I completed the survey, and given the results in comparison with question 3 I wonder if others did as well. I was thinking of whether the TEI P4 documents in our projects should be indexed, not the TEI P4 materials on the TEI website. It seems obvious now, but I’m pretty sure that I answered \yes\ to that, and I would now answer \no\.

  2. James Cummings says:

    Ah, I see. Apologies if the question was confusing. I did mean entirely on the TEI-C website and the TEI P4 Guidelines in specific. I have no power over whether google indexes your own projects. ;-) (And if I did I would have it index them. :-) And I would also have it recognise TEI as a filetype: in searches)

  3. Hugh Cayless says:

    Re: “blessing/criticizing” (which was mine): I really only meant that the SIGs should drive the effort and the Council have an advisory/implementing role, which is, I think, what you advocate.

  4. Michelle Dalmau says:

    Do you feel that the response rate is representative enough both in terms of just numbers (number of TEI-L subscribers v. respondents), but also with respect to institutional affiliations? I should have forwarded the survey post to TEILIB-L, for instance, so I wonder if Libraries are well represented. I, at any rate, am in agreement with the survey findings and your assessment of those. And I understood question #4 as you intended :-).

    Finally, depending on how Dot responded, here’s an example where you may have received duplicate information.

  5. James Cummings says:

    Hugh: You are right, of course, that is what I would advocate. It has been generally attempted to have council members target participation in SIGs, so that they have not only a way to feed back developments but also a source of advice. (This has been more or less successful in some SIGs, but with turnover of Council members it doesn’t always continue.)

    Michelle: Yes, it would potentially have been a good idea to get more institutions targeted to give a broader sample. I believe the IP addresses might be recorded, so I contemplated doing some basic statistics on location, especially for those who felt P4 support was important. But you are right, that it is a smaller sample so extrapolating is fraught with dangers.

Leave a Reply