Last Friday the UMF-funded VIDaaS and DataFlow Projects staged their joint workshop at the Saïd Business School, Oxford. For DataFlow it was a chance to officially launch their DataStage software, for us in VIDaaS it was a chance to publicly show the DaaS software running on the Oxford cloud infrastructure for the first time and to discuss the shape of the Online Research Database Service (ORDS) that the software will underpin.
The demonstration of the DaaS software threw up a couple of interesting moments – such as the data interface not launching correctly on the first attempt, although it did make amends for this disobedience by working correctly a couple of minutes later. Fortunately we did not have to resort to screenshots at any point despite the high-risk strategy of running a new software build and creating a new virtual machine on the fly.
The afternoon session was divided into a technical strand, for those interested in installing the software themselves or getting involved in development, and a research strand for those interested more in how the software and service would function from a user’s perspective. The aspect of the service that provoked the most interest was undoubtedly that of cost and sustainability – perhaps unsurprisingly given the number of institutions currently wrestling with such issues in the face of new or more stringent data preservation requirements from the research funding agencies.
We’re designing the ORDS service very much with funders’ requirements in mind, and our price model reflects this, offering a price per year for a regular or large virtual machine, which will be easy for researchers to calculate and then cost into any funding application. This model is not so well adapted for researchers undertaking ‘unfunded’ research that is simply a part of the normal expectations of their employment, however, as is often the case in arts and humanities disciplines. Obviously this is a concern. One solution might be for departments to subscribe to the service in order to cover the costs of its use by their researchers – but many departments are unlikely to be willing to subscribe to new services at the expense of existing commitments.
The underlying problem is that a costed service such as ORDS makes costs visible which were previously hidden. At present, most researchers run their databases on their ‘own’ hardware and host their data either on their own storage media or on their departmental servers, where they consume power and require a certain level of upkeep from IT support officers or in some instances the researchers themselves. The costs of the hardware, software, infrastructure, hosting, and support for data management are not stripped out of broader overheads or salaries, and are therefore hidden from bean-counters and budget-setters. Even if switching to a service such as ORDS means that the overall costs of data management are reduced, the fact that these costs can for the first time be clearly seen and accounted for makes them look like additional costs, and furthermore costs which somebody – whether the researcher, the department, the institution, or the funding agency, has to take responsibility for paying.
At present, the Funding Councils are generally demonstrating a willingness to pay the costs of the data management costs that their requirements stipulate,* but this largesse has limits. During the workshop Mark Thorley from NERC pointed out that it’s part of the business of running a research institution to ensure one’s data is protected – so to a large extent the long term preservation of data is the institution’s job. At present most institutions seem unwilling to add the costs of data preservation and curation to their existing commitments.
*see both the final RCUK principle on data policy (http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital Curation Centre’s summaries of Funder’s data policies (http://www.dcc.ac.uk/resources/policy-and-legal/funders-data-policies).