Requirements for a publication infrastructure Project Description
Magdalena Turska ER3
The original DiXiT project bid described my work shortly as ‘a requirements study for a publication architecture targeting multiple media, not only web and paper, but also mobile devices (EPUB). Progress in this field is especially important to projects without access to large supporting technical staff.’ My primary tasks were to develop a model of reusable components for a publication infrastructure – surveying the existing tools and frameworks, requesting improvements and implementing new components.
• Creation of an index of tools for all stages of the production of digital editions to be complete by M18
• The submission of several feature requests for improvements to the TEI ODD meta-schema language where appropriate by M15
• The creation of a proof-of-concept digital edition that evinces the use of re-usable components for edition production and publication by M24
• Documentation of improvements to the oXygen-TEI framework completed by M27
Tasks and methodology:
• Survey the community for existing tools and publication frameworks, coupled with user requirements gathering
• Develop a model of reusable components for a publication infrastructure
• Where feasible implement proof-of-concept components for use in publishing digital editions
• Document and openly request improvements in the TEI ODD meta-schema language
• Further develop the oXygen-TEI framework
• Create a proof-of-concept digital edition based on the defined reusable components (in collaboration with SyncRO)
• Document and disseminate conclusions through knowledge exchange activities
Table 1. ER3 Objectives and Tasks
Specific objectives and tasks listed above strongly stress the need for appropriate software tools, but implied between the lines there is even stronger assumption that software plays only a partial role within an editorial workflow. Obviously the actual methods of creating the digital edition differ as much as the editions themselves and are guided by multiple factors: source material, type of the edition, available human resources and infrastructure to name just a few. But any workflow presumably consists of distinguishable stages or components, even if quite often they may not follow the linear succession of a waterfall production model but are undertaken concurrently or backtracked between iterations. Still, it’s not hard to enumerate at least some conceptual tasks that form part of creating and publishing a scholarly digital edition. Each of these tasks should correspond with at least one tool or agent capable of performing it. Modelling the editorial workflow as a pipeline of such tasks with built-in possibility for iteration allows to provide a framework for further pairing of the tasks with appropriate software tools.
The following diagram illustrates the top level approximation of stages of creation of digital edition I’ve identified. Stages in the diagram are color-coded to suggest the core activity. Thus green symbolizes the incorporation of knowledge in an explicit form; yellow – the visualization; blue the conversion between formats. According to this the publication is both yellow, as it re-presents the data in ultimately a visual way whether printed or online, but also blue as on the data level it converts between specific data formats such as TEI and HTML or PDF. The crucial steps are the ‘green’ phases as this is where the injection of knowledge occurs, yet without complementary conversion and visualization steps that effectively translate and decode the plethora of formats, the data itself – however beautifully modelled – will remain unusable.
The aim here was not to include every tiny distinguishable procedure or step but rather to construct a very general pipeline schema that could be applied to all or at least most digital editorial workflows. Obviously the linear sequence of the diagram may not represent the real-life pragmatic that often includes backtracks from trial-and-error attempts at specific tasks. Some phases may be absent for specific projects, e.g. digitisation phase would be irrelevant for projects building upon born digital data or outcomes of previous projects. As mentioned earlier each of the stages needs to be further divided into smaller tasks, commonly with intermediary conversion between output and input formats of applicable tools and only at this more detailed level can actual pairings between tasks and tools be made.
I believe that such a framework not only could present the editors with a map of options they have when undertaking a scholarly edition thus cutting down on headaches and facilitating the adherence to good practices in the field (or, in fact actually identifying what the common practices are) but also has the potential to identify the critical gaps that will require new software to fill them. Moreover, facilitating the automated generation of the required outputs, thus freeing very substantial human and financial resources, is probably the only way to shift the focus onto the completeness, consistency, quality, and depth of the underlying data.
Due to this latter concern I am investigating not only existing publication platforms or components, but I also examine the possibilities of defining the intended outputs and publishing requirements in a formal manner and attaching this information to the encoded document or schema to allow its automated conversion into desired publication form.
My work now is twofold: one aspect is trying to break down the general production stages outlined in the diagram above into smallest possible tasks and create pairings between the tasks and tools that can perform them and deal with necessary conversion issues. This activity will result in updates to relevant TEI wiki pages, series of blog articles and possibly a bigger publication. The other part is more hands-on development of particular tools that can deal with one or more identified gaps. My main project is the implementation of processing model for TEI Simple in XSLT for outputs into HTML, Markdown and possibly LaTeX. Smaller projects are conversions between TEI and Markdown and customizations of oXygen framework that facilitate editing and proofing of TEI documents.
Poster for the presentation is available from here.