On the 22nd of September, I was invited to a Jisc Co-Design event on “Research at Risk”, with participants from organisations such as UCISA, RLUK, RUGIT, DCC, and of course some universities, including yours truly representing both the University of Oxford, and also as a special bonus the University of Bolton.

What follows are my completely informal and unofficial notes of the event.

Looking for the Gaps

This was about the need to properly map the entire architecture for RDM to identify where the gaps and joins are to inform decision making at different levels.

One issue we face is that many research data management solutions are barely past the prototype stage. Rather than build completely new services, it would make more sense to look at the solutions that are closest to matching requirements, such as CKAN and HYDRA, and work together to make them complete. The OSS Watch report on RDM tools highlighted the fact that many of the tools developed had very poor sustainability prospects, linked to the fact that they were developed with a small local user base and without long term sustainability planning. The next step could be to focus on a few solutions and ensure they are fit for purpose and sustainable.

Likewise, on the storage side there is already OwnCloud, which several institutions are interested in developing further. As an open source project, we can work on this collaboratively to ensure we have a good solution, while Jisc can work on the matching service offering for it for institutions that don’t have their own data center. Anyway, more on this later.

At a higher level, this whole area seems to be about taking stock of where we are now, which seems a pretty sensible thing to do.

What we know

Similar to the previous topic, but really about putting together the advice, guidance and lessons learned. UCISA were very keen on this one.

An interesting thing I learned about here was the “4Cs” cost exchange project that Jisc (or DCC, I wasn’t sure which) are engaged in, which seems to be principally about baselining IT costs against peers, including in areas such as RDM.

The Case for RDM

There seemed to be consensus that there is a gap between ideology and practice, and that while there is plenty of talk around mandates from the Research Councils and journals, there hasn’t really been very much from the researcher perspective, and this is something that needs to be addressed. So making the case, not from a mandate perspective, but from a benefits to researchers perspective.

One issue here is the different demands on research data depending on whether the intent is compliance, validation, reuse, or engagement. To make data truly reusable requires more effort than simply dumping it in an archive, but also can yield more benefits to researchers.

Changing Culture

This was seen as probably a good thing longer-term, but it wasn’t clear exactly what it would involve, or what role Jisc would play. For example, the previous three items taken together might constitute actions leading towards culture change. This also encompassed areas such as treating RDM as a professional skill and providing support for developing its practice. Another practical area is information sharing between institutions.

Making data count

This idea was all to do with metrics and measures, though it wasn’t clear what those metrics might look like. There could be some progress by combining existing measures and sources, such as DataCite, and then seeing where that leads.

Simplifying Compliance

There was an amusing comparison between RDM compliance and Health and Safety. However, we have the current situation where compliance is not standardised between the Research Councils, or between the Councils and the journals that mandate RDM. Help and support on compliance is also outdated, or difficult to find.

Another topic we discussed was something I’ve dubbed (wearing my University of Bolton hat) as “barely adequate research infrastructure for institutions that only give half a toss” – basically, many Universities are not research intensive and do not have dedicated resource in either Library or IT Services to support RDM, or even Open Access. Instead, a simple hosted solution with a reasonable subscription rate would be absolutely fine.

What was interesting is that some of the research intensive universities were also keen on this idea – can we just have ePrints+CKAN+DataCite+etc all set up for us, hosted, Shibbolized, configured to meet whatever the Research Councils want, and ready to just bung on a University logo?

Simplifying Data Management Plans (DMP)

There seemed to be a general feeling that it isn’t clear who should be writing DMPs, or why they should be doing it. In some cases it seems that research support staff are producing these instead of researchers, which seems sensible. The general feeling is that creating a DMP is something you do for someone else’s benefit.

Some institutions have been customising DMPOnline. Interestingly, one area that gets explored is “model DMPs” or “copy and paste”. I somewhat cheekily suggested a button that, once pressed, generates a plausible-sounding DMP that doesn’t actually commit you to anything.

In any case, if compliance requirements are simplified and standardised (see above) then this would also in effect simplify the needs for DMPs.

Other ideas explored included being able to export a DMP as a “data paper” for publication and peer review, though I’m not sure exactly how that contributes to knowledge.

So again we have the issue of what’s in it for researchers, and the tension between treating RDM as a hoop to jump through, or something with intrinsic benefit for researchers.

Metadata

There was a case made for this by DCC (Correction – Actually it was Neil Jacobs – Thanks Rachel!), which is basically around standardising the metadata profile for archiving research data, working on DataCITE, CRIS, PURE, ORCID, achieving consensus on a core schema and so on.

This sparked off a debate, my own contribution being “it may be important for some, but don’t start here” which seemed to resonate with a few people.

There was also the interesting area of improving the metadata within the data itself – for example making the labels within data tables more explanatory to support reuse – rather than just adding more citation or discovery metadata.

Storage as a service

This was the only major “techie” discussion, and it was interesting to see how much convergence there was between the Universities present at the event. So we had the issue of how we work with Dropbox (which many researchers really like), through to how we make best use of cloud storage services as infrastructure.

I asked whether Jisc had met with DropBox to discuss potential collaboration and apparently they have, though it seems not with great success. This is a pity as one potential “win” would be for researchers to be able to make use of the DropBox client tools, but synchronised with a UK data centre, or even institutional data centres.

Another interesting dimension was that several institutions have been looking into OwnCloud as a Dropbox replacement, and there was strong interest in collaborating to add any missing capabilities to OwnCloud (its open source) to bring it up to parity. Maybe thats something Jisc could invest in.

Preservation

I hadn’t met Neil Grindley before, and was surprised to see he bore more than a passing resemblance to the late SF author Philip K Dick. But anyway, onto the topic.

Preservation (and managed destruction) is one of those topics that people are either passionate about, or sends them into a kind of stupefied trance. I’m one of the latter I’m afraid. Its probably very important.

The only thing I can add to this is that the issue of preserving not just the data, but the software needed to process it, is not something that has been considered as part of the scope of this programme by Jisc.

Its nice also that they are considering using hashes to verify data integrity.

The Voting

Using the ultra scientific method of putting numbered post-it notes onto sheets of paper, the ranking of ideas looked like this:

Activity area	(Raw data)	Number of votes	1	2	3	4	5
Looking for the gaps	224535343	9	0	2	3	2	2
What we know so far	5245154	7	1	1	0	2	3
Case for sharing research data	1144221211	10	5	3	0	2	0
Changing the culture of research	4	1	0	0	0	1	0
Measuring the impact	215125	6	2	2	0	0	2
Simplifying compliance	34232333411	11	2	2	5	2	0
Simplifying data management planning	255355213	9	1	2	2	0	4
Data about data	35525	5	0	1	1	0	3
Sharing the costs of data storage	32444	5	0	1	1	3	0
Data for the future	12541143	8	3	1	1	2	1

Interestingly enough, although “Storage” wasn’t ranked highly, it was the topic that seemed to spark the most discussion amongst the university representatives after the event closed, and several of us pledged to work together in future to collaborate on our various approaches to solve these issues,

Funding?

Of course, it being a Jisc event, we wanted to know if there was going to be any funding!

The good news is, that as well as funding a number of larger projects already through capital funding (e.g. BrissKit), there are plans afoot for a “Research Data Spring” competition for innovation projects, I guess following a similar pattern to the successful Summer of Student Innovation competition but targeted at researchers and IT staff in universities.

More!

If you’d like to know more about this event, and read the “official” notes, then just get in touch with us at researchsupport@it.ox.ac.uk.

Rachel Bruce says:

October 16, 2014 at 7:09 pm

Hi Scott, Thanks for writing this up, it’s really helpful to get this out there, and especially as at the moment our research data blog is being set up. One point – you said that the preservation of software hadn’t been considered as part of the scope of this co-design work by Jisc, well that’s not strictly true. At the start of the meeting I mentioned that in the consultation the issue of sharing software and methods had been raised on occasion in the consultation, these were not explicitly mentioned in the synthesised set of ideas, but the issues are still valid and people were invited to raise them – they didnt feature in the ideas as it was not an issues raised very much in any of the consultation that had taken place. Given the structure of the meeting it didnt come up, guess there wasn’t time, but I dont think it iis out of scope. The software and the equipment are very important and is something we have regularly raised in discussions on research data management and access in the past. I think that we do need to bring these elements in as best we can; for example they probably come under meeting compliance and also support access and re-use of data? Also Scott thank you *very* much for your input to the discussions at the meeting. Really great. Cheers Rachel

October 16, 2014 at 7:21 pm

Oh and Scott you mention the 4Cs project – well it is both Jisc and DCC in a team with others, the idea was something that Neil Grindley came up with a while back & building on lots of people’s prior work, so it built on earlier work on the economics of preservation – e.g. the Blue Ribbon Taskforce work – [see here: http://brtf.sdsc.edu/ and http://www.jisc.ac.uk/news/task-force-to-address-sustainability-in-digital-preservation-25-sep-2007 and also work that was done through projects like keeping research data safe [see: http://www.jisc.ac.uk/publications/reports/2010/keepingresearchdatasafe2.aspx, life [see: http://www.jisc.ac.uk/whatwedo/programmes/reppres/life2.aspx etc. Sorry that was a bit of history lesson. Anyway …here is the current projects url: http://www.4cproject.eu/
and at the moment they are looking for feedback on their roadmap ….see here: http://www.4cproject.eu/roadmap
and to see the costs exchange – see here: http://www.4cproject.eu/news-and-comment/current-news/129-curation-cost-exchange-beta-release-understanding-and-comparing-digital-curation-costs-to-support-smarter-investments
or go straight to it here: http://www.curationexchange.org/
…cheers, Rachel

IT Services Research Support team

Research at Risk: Report from the Jisc Co-Design Workshop