On the 22nd of September, I was invited to a Jisc Co-Design event on “Research at Risk”, with participants from organisations such as UCISA, RLUK, RUGIT, DCC, and of course some universities, including yours truly representing both the University of Oxford, and also as a special bonus the University of Bolton.
What follows are my completely informal and unofficial notes of the event.
Looking for the Gaps
This was about the need to properly map the entire architecture for RDM to identify where the gaps and joins are to inform decision making at different levels.
One issue we face is that many research data management solutions are barely past the prototype stage. Rather than build completely new services, it would make more sense to look at the solutions that are closest to matching requirements, such as CKAN and HYDRA, and work together to make them complete. The OSS Watch report on RDM tools highlighted the fact that many of the tools developed had very poor sustainability prospects, linked to the fact that they were developed with a small local user base and without long term sustainability planning. The next step could be to focus on a few solutions and ensure they are fit for purpose and sustainable.
Likewise, on the storage side there is already OwnCloud, which several institutions are interested in developing further. As an open source project, we can work on this collaboratively to ensure we have a good solution, while Jisc can work on the matching service offering for it for institutions that don’t have their own data center. Anyway, more on this later.
At a higher level, this whole area seems to be about taking stock of where we are now, which seems a pretty sensible thing to do.
What we know
Similar to the previous topic, but really about putting together the advice, guidance and lessons learned. UCISA were very keen on this one.
An interesting thing I learned about here was the “4Cs” cost exchange project that Jisc (or DCC, I wasn’t sure which) are engaged in, which seems to be principally about baselining IT costs against peers, including in areas such as RDM.
The Case for RDM
There seemed to be consensus that there is a gap between ideology and practice, and that while there is plenty of talk around mandates from the Research Councils and journals, there hasn’t really been very much from the researcher perspective, and this is something that needs to be addressed. So making the case, not from a mandate perspective, but from a benefits to researchers perspective.
One issue here is the different demands on research data depending on whether the intent is compliance, validation, reuse, or engagement. To make data truly reusable requires more effort than simply dumping it in an archive, but also can yield more benefits to researchers.
This was seen as probably a good thing longer-term, but it wasn’t clear exactly what it would involve, or what role Jisc would play. For example, the previous three items taken together might constitute actions leading towards culture change. This also encompassed areas such as treating RDM as a professional skill and providing support for developing its practice. Another practical area is information sharing between institutions.
Making data count
This idea was all to do with metrics and measures, though it wasn’t clear what those metrics might look like. There could be some progress by combining existing measures and sources, such as DataCite, and then seeing where that leads.
There was an amusing comparison between RDM compliance and Health and Safety. However, we have the current situation where compliance is not standardised between the Research Councils, or between the Councils and the journals that mandate RDM. Help and support on compliance is also outdated, or difficult to find.
Another topic we discussed was something I’ve dubbed (wearing my University of Bolton hat) as “barely adequate research infrastructure for institutions that only give half a toss” – basically, many Universities are not research intensive and do not have dedicated resource in either Library or IT Services to support RDM, or even Open Access. Instead, a simple hosted solution with a reasonable subscription rate would be absolutely fine.
What was interesting is that some of the research intensive universities were also keen on this idea – can we just have ePrints+CKAN+DataCite+etc all set up for us, hosted, Shibbolized, configured to meet whatever the Research Councils want, and ready to just bung on a University logo?
Simplifying Data Management Plans (DMP)
There seemed to be a general feeling that it isn’t clear who should be writing DMPs, or why they should be doing it. In some cases it seems that research support staff are producing these instead of researchers, which seems sensible. The general feeling is that creating a DMP is something you do for someone else’s benefit.
Some institutions have been customising DMPOnline. Interestingly, one area that gets explored is “model DMPs” or “copy and paste”. I somewhat cheekily suggested a button that, once pressed, generates a plausible-sounding DMP that doesn’t actually commit you to anything.
In any case, if compliance requirements are simplified and standardised (see above) then this would also in effect simplify the needs for DMPs.
Other ideas explored included being able to export a DMP as a “data paper” for publication and peer review, though I’m not sure exactly how that contributes to knowledge.
So again we have the issue of what’s in it for researchers, and the tension between treating RDM as a hoop to jump through, or something with intrinsic benefit for researchers.
There was a case made for this by DCC (Correction – Actually it was Neil Jacobs – Thanks Rachel!), which is basically around standardising the metadata profile for archiving research data, working on DataCITE, CRIS, PURE, ORCID, achieving consensus on a core schema and so on.
This sparked off a debate, my own contribution being “it may be important for some, but don’t start here” which seemed to resonate with a few people.
There was also the interesting area of improving the metadata within the data itself – for example making the labels within data tables more explanatory to support reuse – rather than just adding more citation or discovery metadata.
Storage as a service
This was the only major “techie” discussion, and it was interesting to see how much convergence there was between the Universities present at the event. So we had the issue of how we work with Dropbox (which many researchers really like), through to how we make best use of cloud storage services as infrastructure.
I asked whether Jisc had met with DropBox to discuss potential collaboration and apparently they have, though it seems not with great success. This is a pity as one potential “win” would be for researchers to be able to make use of the DropBox client tools, but synchronised with a UK data centre, or even institutional data centres.
Another interesting dimension was that several institutions have been looking into OwnCloud as a Dropbox replacement, and there was strong interest in collaborating to add any missing capabilities to OwnCloud (its open source) to bring it up to parity. Maybe thats something Jisc could invest in.
I hadn’t met Neil Grindley before, and was surprised to see he bore more than a passing resemblance to the late SF author Philip K Dick. But anyway, onto the topic.
Preservation (and managed destruction) is one of those topics that people are either passionate about, or sends them into a kind of stupefied trance. I’m one of the latter I’m afraid. Its probably very important.
The only thing I can add to this is that the issue of preserving not just the data, but the software needed to process it, is not something that has been considered as part of the scope of this programme by Jisc.
Its nice also that they are considering using hashes to verify data integrity.
Using the ultra scientific method of putting numbered post-it notes onto sheets of paper, the ranking of ideas looked like this:
|Activity area||(Raw data)||Number of votes||1||2||3||4||5|
|Looking for the gaps||224535343||9||0||2||3||2||2|
|What we know so far||5245154||7||1||1||0||2||3|
|Case for sharing research data||1144221211||10||5||3||0||2||0|
|Changing the culture of research||4||1||0||0||0||1||0|
|Measuring the impact||215125||6||2||2||0||0||2|
|Simplifying data management planning||255355213||9||1||2||2||0||4|
|Data about data||35525||5||0||1||1||0||3|
|Sharing the costs of data storage||32444||5||0||1||1||3||0|
|Data for the future||12541143||8||3||1||1||2||1|