Introduction
This post contains the results of a survey that collected information which the TEI Technical Council will use to assess the need for ongoing support for the TEI P4 version of its Guidelines. These have largely been replaced by the TEI P5 Guidelines since November 2007. At that point it was promised that support would continue for TEI P4 for 5 years, until November 2012. As that is just over a year away we are starting a slow process of phasing out support for the TEI P4 Guidelines. The TEI Technical Council is planning to de-emphasize the appearance of TEI P4 as an offering since support for it will be ending in November 2012. We will continue to support it over the next year but may take steps to stop it being indexed by search engines or make it less prominent on the website. These are the results of this survey, which I’ve also transformed to TEI P5 XML at http://users.ox.ac.uk/~jamesc/SurveySummary.tei.xml.
1. Are you involved with projects that are still using TEI P4?
My reading of these results is that many people are either not using TEI P4, or planning to migrate it to TEI P5. I suspect, given the other answers that those with TEI P4 projects probably do not rely on a lot of support from the TEI Consortium.
2. How important is ongoing TEI P4 support to you?
This seems fairly clear: out of 54 respondents 44 said it was not important, unnecessary or that we should get rid of it. But that it is important or very important for 18.5% of respondents is still significant and must be remember when making decisions concerning ongoing support for TEI P4.
3. How much should the TEI Consortium begin to de-emphasize TEI P4 on its website before November 2012?
There seems to be a strong vote for making TEI P4 available only from the TEI Vault and making sure existing links redirect.
4. Should search engines be dissuaded from index TEI P4 materials?
This result is less clear cut with some people feeling it shouldn’t be indexed, and some people thinking it should be (with slightly more weight on it being indexed than not indexed).
5. Approximately how many TEI P4 projects have you been involved with?
This is simply a statistical question (and of course depends how the respondent interprets ‘projects’). It is interesting that the majority of people seem to be involved with more than one project, but that is hardly unexpected. More were involved with 6-15 projects than I thought.
6. Approximately how many TEI P5 projects have you been involved with?
It is interesting that the percentages are vaguely the same as with TEI P4 projects, though slightly higher overall.
7. What amount of TEI P4 data do your projects have? (In documents, number of files, how many megabytes, or whatever convenient measure makes sense for your project)
This was a textual question, attempting to get a measure of how much TEI P4 stuff people have. It was deliberately left vague as to how it should be expressed, partly because I was interested to see how people would quantify their TEI P4 data, and partly because I recognise that it would be difficult to provide all the same form of measurement. I was interested to see that this ranged more widely than I had expected.
- 0
- none
- zero
- Several hundred files.
- I have about 500 texts
- 3,200 files, 170Mb.
- nil
- Very roughly: 60,000 books = 5 million pages = 10 GB of marked-up text.
- 40 megabytes in the one P4 project I still manage; a bunch more in ones I’m no longer involved in.
- This varies a lot, but projects range from 3-150 MB In practice, the TEI files are a small part of the overall operation, which includes authority information usually in non-TEI format, and various generated TEI XML files used for web publication only
- 50 files
- Appx. 7000 files, 29 MB total data
- Appr. 6500 documents (mostly letters)
- 0
- less than 10%
- 0
- about 3,000 XML files currently in P4.
- in summa: about 4 Mb
- All of the [Institution]’s projects are in migration from p4 to p5, so this is a snapshot of the migration process. The data is migrated, but the sites are not all rewritten yet. My hope is that by May of 2012, all of the current [Institution] sites will be serving out texts based on p5.
- 0
- Help files used by about 1000 Modes users.
- 5 text-critical editions
- 7000+ [P4 Customization] encoded letters
- Main current project: several dozen megabytes including a few large files but mostly 10-20 kb: roughly 3000 files.
- Roughly twelve published electronic editions, with at least a dozen more in the pipeline, in process of being finished (though they now have to be migrated to be published).
- I have no clue, but it’s a lot.
- The [Institution] has 113MB bytes of P4 documents, of archival interest only.
- None, since we upgraded.
- I’m not sure. I think I might have one project that is in TEI P4, but it’s a legacy project and I’m actually not positive. I haven’t looked at it in a while.
- 2.5 million text pages
- zero
- None
- Between 300 and 600 files.
- ca. 70 files
- dozens of documents.
- Lots. Can’t access the figures quickly.
- 700MB
This ranges from zero to multiple gigabytes of TEI text. What I should have asked was “And is all the TEI freely available for download?” as, of course, that is something I’d like to encourage.
8. Please list the URLs of any TEI P4 projects you want us to know about.
I’ve decided not to provide these on this summary, if projects wish to provide samples they should add them to http://wiki.tei-c.org/index.php/Samples and/or describe their projects on the wiki.
9. Please list the URLs of any TEI P5 projects you want us to know about.
I’ve decided not to provide these on this summary, if projects wish to provide samples they should add them to http://wiki.tei-c.org/index.php/Samples and/or describe their projects on the wiki.
10. Have you submitted a Bug or Feature Request to the TEI Technical Council?
Lots of people have provided bug or feature requests, but most people have either contributed to discussion or not contributed them. We should, of course, strive to increase feedback from the TEI community. I’d be interested in any ideas on how to make this easier for the community to participate.
11. Where do you think the TEI Technical Council should expend its time and effort?
This is also an interesting result. Scoring highest on ‘top priority’ is the idea that the TEI Technical Council should spend its time fixing bugs and implementing feature requests by the community. This, and analysing where the TEI Guidelines could be improved and undertaking these improvements was also ranked highly, along with developing the infrastructural basis for future versions of the TEI Guidelines. What scored lower was the idea of the TEI Technical Council setting up a repository of TEI texts, or developing software to make publication of TEI texts easier. I would suspect that this is because that maintaining the Guidelines is the central mandate of the TEI Technical Council, and looking for how it can be improved is related to that, while the creating of repositories is already done better by people who already focus on those activities. Although it is a community-based activity only the TEI is really in charge of maintaining the Guidelines, whereas any third-party can develop software or archives. We should certainly encourage those activities and implement community suggestions which facilitate the greater development of community software.
12. Any other comments?
For people with large repositories of transcriptions (where the text content will never be updated), markup stability is essential. P4 to P5 is not essential but recommended, but it’s going to mean a huge effort. My worry is that there will be a far too rapid succession to P6, P7, P8, etc which adds bells and whistles but does not contribute anything meaningful to static repositories.
Yes – thanks for doing such a great service to the community!
I think that lack of easy tools for presentation / publication od TEI documents is a serious drawback. Many of my younger colleagues would learn (or actually have learned) the TEI editing in Oxygen, but they are unable — and not willing! — to learn XSLT for the presentation of their texts (not to mention the publication – servers etc.). An average user who is not able to modify Sebastian’s stylesheets for his edition is left completely alone with his/her TEI document (only *exceptionally*, an XSL-expert is available for help in big institutions). As for now, the TEI is an ideal tool for only one part of the communication chain — but not for the whole …
Perhaps, a marketing campaign would help.
about question 11 : it would be interesting to relate software/tools development and training/workshop. offering training sessions dedicated to one tool or category of tools, and looking at how people use tools IRL during the training sessions to get a better idea of need specifications… ?
Please, please, please don’t spend time and money on building a TEI-wide repository. Instead, convince Google to recognize the TEI format so that one can easily do a web search for TEI texts. Then, get people to put their texts on the web. I think the building of publishing tools and education are very important, but that they shouldn’t be Council functions per se. Similarly, I think the interchange question is very, very important, but Council’s role in it should be limited. This is the kind of thing a SIG (or SIGs) should tackle, and Council should be involved in blessing/criticizing their output.
Creating and managing a content repository is vastly different from developing and maintaining markup guidelines, and would require a serious redirection of TEI-c’s resources. Let others who are already in the repo business (e.g., HathiTrust, OTA) take care of that.
Thank you for undertaking this survey.
You’re welcome, it was my pleasure. I’m always interested in getting a sense of where the TEI community agrees on certain issues.
13. You may optionally include your email address so we can contact you if (and only if) we have any follow-up questions concerning your responses.
I’m certainly not going to provide these for spam-bots!
Conclusion
My recommendation to the TEI Council is going to be that we slowly start phasing out TEI P4 support. Closer to the end-of-support date (November 2012) we should move the TEI P4 materials to the TEI Vault but redirect links to there. I think this survey bears out my belief that the TEI Technical Council should focus on the maintenance and improvement of the Guidelines, and looking for ways to improve these in the future.
Regarding question 4, I’m pretty sure I misunderstood that when I completed the survey, and given the results in comparison with question 3 I wonder if others did as well. I was thinking of whether the TEI P4 documents in our projects should be indexed, not the TEI P4 materials on the TEI website. It seems obvious now, but I’m pretty sure that I answered \yes\ to that, and I would now answer \no\.
Ah, I see. Apologies if the question was confusing. I did mean entirely on the TEI-C website and the TEI P4 Guidelines in specific. I have no power over whether google indexes your own projects. ;-) (And if I did I would have it index them. :-) And I would also have it recognise TEI as a filetype: in searches)
Re: “blessing/criticizing” (which was mine): I really only meant that the SIGs should drive the effort and the Council have an advisory/implementing role, which is, I think, what you advocate.
Do you feel that the response rate is representative enough both in terms of just numbers (number of TEI-L subscribers v. respondents), but also with respect to institutional affiliations? I should have forwarded the survey post to TEILIB-L, for instance, so I wonder if Libraries are well represented. I, at any rate, am in agreement with the survey findings and your assessment of those. And I understood question #4 as you intended :-).
Finally, depending on how Dot responded, here’s an example where you may have received duplicate information.
Hugh: You are right, of course, that is what I would advocate. It has been generally attempted to have council members target participation in SIGs, so that they have not only a way to feed back developments but also a source of advice. (This has been more or less successful in some SIGs, but with turnover of Council members it doesn’t always continue.)
Michelle: Yes, it would potentially have been a good idea to get more institutions targeted to give a broader sample. I believe the IP addresses might be recorded, so I contemplated doing some basic statistics on location, especially for those who felt P4 support was important. But you are right, that it is a smaller sample so extrapolating is fraught with dangers.