A Farewell to VIDaaS, or, What VIDaaS Did Next

The VIDaaS Project is at an end. All that remains now is for the team to review the draft final report and then check that the JISC are happy with our conclusions and recommendations. Whilst the project may have ended though, our work improving the Database as a Service (DaaS) software and creating the Online Research Database Service (ORDS) is still very much ongoing.

Our work over the next three or four months will focus on migrating the DaaS software from the JBoss 5 application server to a more secure platform. Unfortunately, whilst JBoss 5 greatly benefited the rapid development of the prototype DaaS system back in the days of the Sudamih Project, it is no longer being actively supported, so security vulnerabilities are no longer being fixed. We cannot, of course, launch a production service hosting valuable research data on a platform with known security issues.

After this work is complete we will then begin work on a follow-on project funded by the University of Oxford which will oversee the official launch of the ORDS service during the Summer of 2012 and further develop the functionality of the DaaS thereafter to enable it to meet a wider range of researchers’ requirements. This project will be called the ORDS Maturity Project, and will run for twelve months, by which time ORDS should be able to cater for databases involving multimedia content and allow the publication of subsets of otherwise restricted databases to support research publications.

Additionally, we hope to be able to secure funding to enable early adopters of the service to use it free of charge, to encourage uptake and help us identify areas for further improvement. Expect news about this over the coming month.

I’d like to thank all of the members of the VIDaaS team, who have each put a lot of effort in to getting our virtual infrastructure up and running, adapting and improving the DaaS, and producing the supporting materials and service documents that the ORDS will require in the future. Thanks also to the members of the Steering Group who have helped us to focus on our priorities and ensure that what we do will actually be of benefit to Oxford and universities throughout the UK. And finally thanks to all of the researchers and support staff who have been involved in testing, telling us their requirements, or helping us understand current practices. Their input has been essential.

Several members of the VIDaaS Project Team will now jump over to the Data Management Rollout at Oxford (DaMaRO) Project to join our colleagues from the Bodleian Libraries with the task of integrating our existing data management tools and services. Others will return to their ‘day jobs’ managing the infrastructure and services that the Computing Services provide. Some of our developers working on the DaaS will remain involved in its migration to a new platform, where they will be joined by a few fresh faces. We will all be involved in some way, shape, or form with the continued drive to improve the research data management infrastructure at the University.

Once the VIDaaS Final Report is finalized and agreed for publication I shall add a link to it from the website and blog, but in the meantime enjoy the Spring and manage your research data well!

Posted in Uncategorized | Tagged , , , , , , | Comments Off on A Farewell to VIDaaS, or, What VIDaaS Did Next

The VIDaaS / DataFlow workshop, and the visibility of data management costs

Last Friday the UMF-funded VIDaaS and DataFlow Projects staged their joint workshop at the Saïd Business School, Oxford. For DataFlow it was a chance to officially launch their DataStage software, for us in VIDaaS it was a chance to publicly show the DaaS software running on the Oxford cloud infrastructure for the first time and to discuss the shape of the Online Research Database Service (ORDS) that the software will underpin.

You can obtain copies of the workshop presentations via the DataFlow Project blog or the VIDaaS website.

The demonstration of the DaaS software threw up a couple of interesting moments – such as the data interface not launching correctly on the first attempt, although it did make amends for this disobedience by working correctly a couple of minutes later. Fortunately we did not have to resort to screenshots at any point despite the high-risk strategy of running a new software build and creating a new virtual machine on the fly.

The afternoon session was divided into a technical strand, for those interested in installing the software themselves or getting involved in development, and a research strand for those interested more in how the software and service would function from a user’s perspective. The aspect of the service that provoked the most interest was undoubtedly that of cost and sustainability – perhaps unsurprisingly given the number of institutions currently wrestling with such issues in the face of new or more stringent data preservation requirements from the research funding agencies.

We’re designing the ORDS service very much with funders’ requirements in mind, and our price model reflects this, offering a price per year for a regular or large virtual machine, which will be easy for researchers to calculate and then cost into any funding application. This model is not so well adapted for researchers undertaking ‘unfunded’ research that is simply a part of the normal expectations of their employment, however, as is often the case in arts and humanities disciplines. Obviously this is a concern. One solution might be for departments to subscribe to the service in order to cover the costs of its use by their researchers – but many departments are unlikely to be willing to subscribe to new services at the expense of existing commitments.

The underlying problem is that a costed service such as ORDS makes costs visible which were previously hidden. At present, most researchers run their databases on their ‘own’ hardware and host their data either on their own storage media or on their departmental servers, where they consume power and require a certain level of upkeep from IT support officers or in some instances the researchers themselves. The costs of the hardware, software, infrastructure, hosting, and support for data management are not stripped out of broader overheads or salaries, and are therefore hidden from bean-counters and budget-setters. Even if switching to a service such as ORDS means that the overall costs of data management are reduced, the fact that these costs can for the first time be clearly seen and accounted for makes them look like additional costs, and furthermore costs which somebody – whether the researcher, the department, the institution, or the funding agency, has to take responsibility for paying.

At present, the Funding Councils are generally demonstrating a willingness to pay the costs of the data management costs that their requirements stipulate,* but this largesse has limits. During the workshop Mark Thorley from NERC pointed out that it’s part of the business of running a research institution to ensure one’s data is protected – so to a large extent the long term preservation of data is the institution’s job. At present most institutions seem unwilling to add the costs of data preservation and curation to their existing commitments.

*see both the final RCUK principle on data policy (http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital Curation Centre’s summaries of Funder’s data policies (http://www.dcc.ac.uk/resources/policy-and-legal/funders-data-policies).

Posted in Uncategorized | Tagged , , , , , , , , , , , , , | Comments Off on The VIDaaS / DataFlow workshop, and the visibility of data management costs

VIDaaS Project survey: creating research databases

The VIDaaS Project has just launched another survey, this time aimed at researchers (from all disciplines) who have been involved in setting up research databases – particularly relational ones. We hope that the information this provides will help us to assess the potential benefits of an online database service such as ORDS, which we’re currently developing.

We estimate that completing the survey will take 15-20 minutes. All researchers who complete it (and supply a valid email address) will be entered into a prize draw for a £100 Amazon voucher.

The survey URL is:

https://www.survey.bris.ac.uk/oxford/creating_databases/

Please feel free to circulate this information to any colleagues to whom it may be relevant.

Posted in Uncategorized | Tagged , , , , , , | Comments Off on VIDaaS Project survey: creating research databases

VIDaaS/DataFlow workshop on the 2nd March

On Friday 2nd March the VIDaaS Project will be staging a joint workshop with our colleagues from the DataFlow Project at the Saïd Business School in Oxford. The day will run from 10:30 am until 5pm, and feature demonstrations of the database-as-as-service software developed by the VIDaaS Project and the DataStage software that forms the centrepiece of the DataFlow Project. Delegates will also get to look at the DataBank data repository system that Oxford is introducing, and hear about the cloud infrastructure that the University has built – partly in order to host the outputs of VIDaaS. There will also be plenty of time to ask question, discuss developments, and get to know the other delegates.

In the afternoon the workshop will split into two groups – a user-focused break-out, and a smaller technical session for those interested in learning how to install the software in their own institution or contributing to future development and customization.

If you’re interested in attending, please register at http://www.eventbrite.co.uk/event/2804728017. The workshop is completely free, and the venue is immediately opposite Oxford Rail Station, so you won’t even need to walk past any dreaming spires on your way there.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , | Comments Off on VIDaaS/DataFlow workshop on the 2nd March

VIDaaS Project Update, 24th January 2012

After meeting our Project Steering Group for the final time yesterday, now seems like a good time to give a brief update of where VIDaaS is at, and what we’ll have in place when we go to service in April.

The virtual infrastructure side of the project is going well, with the Oxford private cloud now largely in place apart from some networking kit which has apparently fallen foul of Customs and Excise. Likewise, the work on Identity and Access management, which took a long time to get off the ground, now seems to be progressing nicely. The DaaS part of the deal is taking shape, and we’ve received some useful feedback from our test users regarding the user interface and current functionality. We have a good idea of the costs of the future Online Research Database Service (ORDS), and a sense of the staffing levels required to offer the service within the University.

Encouragingly, we are getting a steady trickle of enquiries about the forthcoming service from researchers currently planning data-based research projects, and there seems to be growing concern about good research data management more broadly, which the ORDS helps to address.

It has become clear during the course of the last nine months that we will not be able to get every feature that our researchers have requested into the ORDS service by its April launch (writing a new database management system from scratch is no small task!) but we do now have a clear sense of what will be in place come the launch, and what functionality will have to wait until we’re in service. We’ll be publishing a ‘roadmap’ in due course to give people an indication of when the more advanced features of the system will be ready to use.

We have already tried accessing the underlying system using Microsoft Access as a front-end interface, and this seems to work just fine – now it’s a question of polishing our native interfaces.

One final thing to note – we are planning on staging our concluding project workshop on Friday 2nd March at the Said Business School in Oxford. Expect a more detailed announcement shortly, but, for now, block out that day!

Posted in Uncategorized | Tagged , , , , | Comments Off on VIDaaS Project Update, 24th January 2012

Naming the ‘Online Research Database Service’

Just a quick announcement that the service name for the DaaS at the University of Oxford will be the ‘Online Research Database Service’, or ORDS for short. This was felt to be reasonably descriptive of what we’re offering whilst not treading on the toes of any other systems or services within the University.

So when we refer to the ORDS in this blog or in the VIDaaS website in future you will know that we are referring to the local service that we will be offering our staff and students from April 2012 onwards built upon the DaaS software. Other institutions wishing to adapt the DaaS for their own use will need to consider something similar. Whilst we will be adding ORDS branding to the DaaS user interfaces, we will do so in a way that makes it very simply for other institutions to replace this with graphics and text more to their own tastes.

Maybe in the not-too-distant future we can create a single national service around the DaaS, which would almost certainly enable greater economies of scale, but for the time being we’re taking things one step at a time.

Posted in Uncategorized | Tagged , , , , | 1 Comment

International Data Curation Conference 2011

Maybe I’m just becoming increasingly specialised, but this year’s International Data Curation Conference seemed more varied than ever. Last year’s divide between delegates interested in improving library practices and delegates interested in supporting researchers was less in evidence, with research data management seeming increasingly like the continuum of processes that it should be. Themes this year included institutional and funding council policies, legal risks, rewards for researchers, the ethics of openness, research reproducibility, preserving software and scientific workflows, the costs and benefits of data curation, Freedom of Information requests, tweet preservation in the name of social science, new tools for data management, sharing, and curation, and the visualisation and communication of data to the public at large, doubtlessly along with various other bits and bobs that I didn’t get to hear about due to the usual restrictions of corporeal vestiture.

Ruth McNally explained that ‘Data that doesn’t flow is dead data’, whereas Jeff Heywood reminded us that it is storage that is ‘the itch that researchers really want scratched’; Andrew Charlesworth told us not to ignore the legal problems that can be associated with data, whilst Ellen Collins (Research Information Network) worried us with talk of how Freedom of Information requests can have unintended consequences on researcher behaviour even though the threat of being forced to reveal one’s data should in theory encourage better data management.

Thinking about how all this relates to the VIDaaS Project, as I’m supposed to, a number of things occur to me. Firstly, there is the encouraging sense that project, and the Database as a Service tool it is creating, should address several of the concerns raised: it will make it much easier to open up data for public inspection, should the data creator wish to do this (or is forced to); it should assist the citation of data and the ability to link datasets to publications; and finally it should open up some possibility of a sort of data reincarnation – it is very straightforward to import old Access and other databases that may have been lying around in a draw gathering dust for several years which may now get the chance to ‘flow’ again. I was also prompted to consider how we could capture richer metadata about the processes by which the data we serve was gathered – something to occupy my mind whilst others enjoy their festive breaks.

Anyone interested in viewing the non-award-winning VIDaaS and DaMaRO posters I exhibited at the conference can find digital representations of them via the links below:

VIDaaS poster

DaMaRO poster

Posted in Uncategorized | Tagged , , , , , , | Comments Off on International Data Curation Conference 2011

A four country data management action plan

The report “A sufboard for riding the wave: towards a four country action programme on research data” was published recently by the Knowledge Exchange (KE) and builds of the “Riding the wave” report. The KE is formed by partners from Denmark, the Netherlands, Germany and the UK and aims to create a layer of openly available scholarly and scientific content in which research data plays a key role. The vision set out in the document is that of a collaborative infrastructure that supports seamless access, use, re-use and trust of data.

A surfboard for riding the waveIn order to achieve the vision four key drivers are identified: incentives, training, infrastructure and funding. These four elements are an excellent framework to analyse the RDM challenge as researchers are put at the core. It´s crucial to incentivize researchers to re-use and share data through recognition, and they need to be equipped with the data skills needed in their research domain.

Other stakeholders with prominent roles include libraries, scientific organizations, funders and journals. Libraries are positioning themselves through the emerging data libraries support services to help researchers access secondary datasets, and to manage and share primary data. This will result in libraries absorbing some of the costs. National and international scientific organizations should issue rules of scientific conduct specific to data to stimulate researchers. Funding agencies need to set data management requirements as part of grant applications. Editorial boards of journals have to press authors to provide access to replication data with the articles.

The report acknowledges the existence of a diverse data infrastructure with two levels: institutional and domain specific. Data management could initially be carried at the local level (researcher, institute) and the curation at higher levels (domain archives). In spite of this there are still many “orphaned datasets” without appropriate repositories and researchers´workflows tend not to be integrated with institutional services.

The action plan outlines a range of possible actions with long term objectives for making datasharing part of the academic culture and data logistics an integral component of scientific professional life, and for a sound infrastructure operationally and financially.

It´s remarkable to see such international collaborative effort in this field; this may help to avoid reinventing the wheel, and provide more coherent frameworks to address the data management challenge.

Posted in Uncategorized | Comments Off on A four country data management action plan

VIDaaS on tour – Copenhagen.

In mid-October VIDaaS continued its world publicity tour. Following on from its appearance at VMworld Las Vegas in September, it was the turn of VMworld Europe in Copenhagen.

Participating in Europe’s biggest IT conference (with over 7000 delegates) was always going to be a daunting prospect, but the VIDaaS team was well represented with Stuart Lee, Adrian Parks and myself all doing our bit to get the message across.

Adrian presented as part of a Colt session on the hybrid cloud, Stuart fielded questions from other interested parties in an executive briefing, and I sat on a stool on stage in a press briefing event with four business leaders to talk about our cloud experiences and our “Journey to the Cloud”.  It must have looked like a bad episode of Blind Date – I wasn’t aware until we started that we were the warm up for Paul Maritz (the CEO of VMware) so although I’d like to imagine the 100 journalists in the audience were all there to hear about Oxford, the hybrid cloud and VIDaaS, I imagine we were a side show for them. Nevertheless, the combined forces of the Oxford VIDaaS team on tour managed to gain some publicity for the project and what we are doing, including the lead story on Computing Weekly’s website for that day.

More importantly we formed some useful links within teams in VMware who are looking at similar challenges to the VIDaaS project, and learnt a good deal about the VMware vision for the future – all of which should help the service long term.

Posted in Uncategorized | Comments Off on VIDaaS on tour – Copenhagen.

Cloud Infrastructure news

The ‘VI’ part of the VIDaaS project is now well underway. Partnering with VMware and using loan hardware kindly supplied by Cisco and EMC (specifically, a UCS blade centre and a VNX 5100 SAN), an initial implementation of the Oxford private cloud is now complete. The virtualisation platform is VMware vSphere, with vCloud Director and vShield running on top to provide the cloud abstraction layer. In the cloud layer we are running several prototype VIDaaS VMs, based on our chosen technologies of Debian Linux, PostgreSQL and JBoss.

While development of the DaaS software continues in parallel, the virtual infrastructure team has been investigating best practice for design and implementation of a production private cloud for Oxford. We will shortly have to return the loan equipment on which we have developed the prototype cloud, but the deployment of our live environment is already underway. For this we’ve selected Dell as our hardware provider, and we’ll be using both their blade and SAN technology (the storage element being provided by Compellent Storage Center).

We’ve also been doing some investigative work into how we might move workloads between the private and public cloud. Workloads in VIDaaS are primarily based on what we are calling project nodes, which are essentially Linux VMs running Postgres and JBoss, and it is these nodes that we have been moving between cloud services. For this initial testing, public cloud facilities have been offered by Colt Technology Services. Colt is a leading cloud service provider, certified by VMware through their vCloud Datacenter Services program. Our collaboration with Colt and VMware has been very fruitful and we have successfully migrated several prototype DaaS nodes between the Oxford and Colt clouds.

Posted in Uncategorized | Comments Off on Cloud Infrastructure news