During the RunCoCo Meeting, 5th May 2010 (blog entry 1, project website) we recorded the audio from the presentations and the discussions, and filmed the presentations – we will publish some of this on the RunCoCo website when all consent forms have been completed. We have the presenters’ slides to prepare and then publish. In the meantime here follows what our colleague Emily McLoughlin recorded for us in writing during the day’s proceedings:
Exercise 1. Introductions and Challenges
At the start of the day Alun welcomed the meeting’s attendees, and noted that the vast majority represented a project or department based in Oxford. He went on to briefly outline the aims of the meeting – to discuss community collections, and to introduce and promote RunCoCo – before moving on quickly to the first exercise. Here, each attendee was asked to introduce themselves to the room, and raise a question or issue connected with community collections. Questions/concerns follow [in order of delivery with no attempt to prioritise or group by theme]:
- does it matter that as collectors or cataloguers we can’t always ‘trust’ what we receive from the public – there are issues of validity and relevance to be considered each time an object is submitted to our collection
- the need for easy-to-use guides to tell the public how to go about involvement, particularly scanning
- how we move participants from simple engagement to advanced involvement, such as in archiving
- interested in lateral uses for community collections, and find out how they may relate to more formalised collections
- what do you include in a community collection (and are there things you wouldn’t), and how do you assess quality
- how can my project be more easily facilitated by technology
- how to encourage members of the public to submit material
- how to assess the authenticity of materials. Moreover, if items are not strictly authentic, but still have relevance, what do you do?
- what responsibility does the University of Oxford as a well respected higher education organisation have to make its research available, how can that responsibility can be translated into practice, and how it might be funded
- what evidence do we have that new research opportunities and methods are being created out of community collections
- the issue of the future – how institutions will have evolved in 30 years time, and the part to be played by communities
- how to sustain a community collection once the funding has stopped
- the quality of metadata, trust issues, and how people don’t like to bother entering much cataloguing information
- discussion of any tools or frameworks, or online systems
- how to manage, audit and control quality without being too exclusive
- how to ensure a sustainable long-term outcome, which is important to funders
- know more about user-generated content
- where to go to learn about how to run a community project
- how to ensure quality, but particularly was interested to know how much work going down the road of a community collection was likely to entail, being part of a small team
- incentives to be involved – whether intrinsic incentives are enough, or whether extrinsic incentives are necessary too – and if so, what would those extrinsic incentives be
Case-study: Chris Lintott on Galaxy Zoo
Chris introduced Galaxy Zoo, a project which has already been up and running for some time, and which can be differentiated from many of the collections being mulled over by the group, in that Galaxy Zoo is a community based around a collection which was already extant and to which participants were not expected to contribute directly. The project uses a series of pictures taken through a telescope over the course of about ten years, recording 1 million galaxies. The aim is to catalogue the shape of those galaxies. Participants who go to the project’s website are presented with a randomly selected image of a galaxy, alongside buttons with shape options. Clicking on a button loads a different image with the same options. There is an emphasis on simplicity in the process – participants will be put through a simple test before starting, and then the actual participation is very straightforward. Chris went on to discuss some of the issues that had arisen during Galaxy Zoo.
One of his first points was that the project had been hugely successful – unexpectedly so. Due to coverage in the press people had logged-on in such great numbers that the server ‘caught fire’. Chris emphasised that swiftness in getting the website back online was absolutely vital – if it had taken more than a couple of hours then the popularity bounce from reporting would have been wasted.
His next point was that the results are really good quality. One of the strengths of crowdsourcing is the ability to take an average, so for example in Galazy Zoo each galaxy has been viewed about 80 times. Users are weighted by how well they perform on particular tasks – if they agree with an expert’s classification, their choices are given more weight. There were also interesting discoveries about participants’ skills, particularly the project team found that the people who were good at identifying a particular type of galaxy may not be so useful for others. Moreover, participants could be useful in ways that had not been anticipated – participants who had commonly misunderstood what was meant by the term ‘merger’ selected a collection of galaxies that had not previously been grouped.
Volunteers have an advantage over computer tools in that they will spot anomalies that a computer will miss, particularly the weird and wonderful. Chris’ example of this was Hanny’s Voorwerp – a visually-surprising entity shaped like Kermit the Frog that appeared in the same frame as a galaxy. If they are interested, volunteers will also go above and beyond the call of duty, and organise themselves into interest groups to explore figures and trends. One group of 30-50 people set themselves up as hunters of galaxies that looked like peas. They built a website and examined their findings, discovering that the pea-shaped galaxies shared a particular spectrum. Their findings have been reported in scientific papers. The Galaxy Zoo team is careful to ensure that their participants are credited as authors in those papers, where appropriate.
Chris recommended setting up a forum so that participants could answer their own questions. For Galaxy Zoo, the team made sure that the forum was manned by volunteer moderators, to keep a distinction between moderators and the professional authority of the project team.
The motivations of contributors are varied, but in a survey of 10,000 of Galaxy Zoo’s most active users, about half said that they contributed because they wanted to help with research.
From their experience, Chris’ team came up with some basic rules for running a project like Galaxy Zoo:
- tell people about why you want them to contribute, and what you’re trying to do;
- contributors should be seen as collaborators;
- design your project carefully, and don’t waste people’s time.
There are opportunities for expansion – Galaxy Zoo has been used as a springboard to create a platform called the Zooniverse, for other applications. These include a project to transcribe ships’ logs, and another to transcribe ancient documents. Each project develops individually but in a pattern: for the ships’ logs exercise, Chris mentioned three levels of engagement – the primary task (what participants will do on every page, e.g. record a date), then, ‘known unknowns’ (events that will recur, e.g. record the death of an officer), and finally, the ‘unknown unknowns’ (special or one-off events, such as doodles or lines of poetry).
Funding was touched on briefly – Galaxy Zoo is funded, and now has a full-time software developer. Each project has a team.
One of Chris’ final thoughts was to answer the question, what do you do if there are users that aren’t very good – do you tell and/or train them? When Chris’ team told people, they found that they either lost them altogether, or they improved. However, equally, Galaxy Zoo lost really good people when those people were praised. Chris said that his suspicion was that the really good people consider it boring. He suggested that one way to overcome this would be, as Galaxy Zoo has tried to, make it more obvious that contributors are accompanying others, such as by showing how many other people are currently reviewing/interacting with that item, making a sense of team of community. He did warn that there was a danger of feedback interfering with data gathering.
Case-study: Alun Edwards on The Great War Archive
The Great War Archive aimed to digitise the public’s artefacts from the First World War. Alun explained that community involvement seemed likely to succeed from the outset because of the particular public interest in this topic. In fact it proved very popular, producing all kinds of artefacts, from one-offs, such as a tiny matchbox thrown from a train with a soldier’s wife’s name on it that found its way back to her, to more obvious and well-known items.
Alun made the point that the project was funded only for a couple of months alongside the First World War Poetry Digital Archive, and so could provide an interesting contrast with traditional, very expensive digitisation projects.
With stylised, period-effect advertising material, the project aimed probably at a relatively elderly audience. There was also an effort to make submission very easy: The Great War Archive asked participants for four bits of information only – a title, description, and information on where they thought the item related to, and when.
Legal and re-use issues were considered from the outset: contact details of participants were recorded and the website gave details of the agreement under which users could recycle the information, and the circumstances under which people could download the material for teaching and learning. The agreement followed the JISC/HEFCE model licence.
Alun explained that the team built in an admin feature to allow them to check the validity of submissions, correct typos and the metadata generally. The feature also built-in other fields, particularly to control how items would be browsed, and identify things of interest/ importance.
Particularly successful was the project’s outreach – submission days held up across the UK, which drew contributions from as far away as the Orkneys. The submission days followed preparation and promotion in the form of broadcasts and articles in the local radio and press. The actual event would usually be set up in a museum or library, although in Hull the city council was also found to be really helpful. Items were digitised on the spot. Moreover, the one-to-one time spent with contributors meant that additional questions could be asked of them which didn’t appear in the online submission form – often this revealed important and relevant information. One of the outcomes of the ‘road-show’ was to provide guidance on how museums or equivalent may be able to host similar events for themselves.
Alun went on to discuss some of the project’s findings. One significant point was the contrast that could be drawn with another of Oxford University Computing Services (OUCS) projects aimed at digitising First World War poetry. That project was done in a very different manner, with professional involvement at each stage. One of the most striking differences was in the cost, which Alun estimated to be around £40 per item in the poetry project, versus £3.50 per item in The Great War Archive community collection.
A high number of items were recorded, including some that may have been lost otherwise – items that were apparently rescued from the bin or even builders’ skip. The project team found that they needed to reject only one item, and even then not because it was inappropriate – it was about the wrong war.
Alun recognised the problem of what to do after community collection projects officially close, remarking that in the case of the Great War Archive, the team were still being approached with important items after the end of the submission period. Their solution was to set up a Flickr group, which in fact turned out to be well used, now holding over 2,500 items. Promotion was mainly by word of mouth, and links on the project website. Alun commented on the advantages of Flickr, these including the presence of enthusiasts and knowledgeable people who will freely impart information to the owner of the photo and other viewers. Further, Flickr opened the project up to people outside the country, such as those in USA, Australia, Canada, and Europe – particularly useful so that families living distant from each other can share their findings.
Alun went on to give an overview of crowdsourcing, principally by giving examples of where crowdsourcing has been used well. His examples included UK Biobank, which provides community information for healthcare, and the BBC’s So You What To Be A Scientist, where the public submit ideas for research, from looking at profile photos on Facebook, to investigating the activities of garden snails.
Alun mentioned the particularly well known crowdsourcing tools in Flickr and Wikipedia, and even more interestingly Wikipedia Projects, and Flickr’s feature allowing individuals and organisations to release content under Creative Commons (CC) licences, and permit the maker of an image to choose whether to allow others to tag it.
It was emphasised how museums have benefited from crowdsourcing. Some images on the Staffordshire Hoard website were released under a CC licence, allowing material published online to be used immediately by the press and public. The Science Museum in London is currently asking for pictures of people on the theme of ‘couples’ as part of a new exhibition. The Natural History Museum is running a cherry tree blossom survey this spring with public participation, and last year the Science of Ghosts blog, run by psychologists, recorded people’s reactions to ghostly images and reports.
An important part of several projects is learning from or interacting with experts – where the ‘expert’ is the member of the public, such as the University of Leeds’ patients’ voice team, which uses public involvement to teach communication skills, or HealthTalkOnline, which facilitates choice for patients in how they seek support and treatment.
Alun explained that community projects such as Galaxy Zoo and the Great War Archive come under the banner of crowdsourcing, in that they appeal for members of the public to create content and generally interact with them.
The next issue raised was the potential danger that the public’s ‘goodwill may be misappropriated’ (Jonathan Zittrain) in crowdsourcing. To illustrate, one of the recent questionable uses of the public’s goodwill has been the Guardian’s expenses claim project, where participants were asked to read and analyse their MP’s expenses records. There were two main problems with the project: firstly, interest dwindled after the initial spike of enthusiasm – highlighting the now well-understood issue of the difficulty of sustaining involvement, and in this case whether producing an incomplete project devalues the work done by the participants; secondly, the Guardian ran a story about an MP, accusing the individual of spending money on tanning – in fact a volunteer had misread the handwriting (‘training’) on the expenses claim and the newspaper had not verified the finding before publishing the story, raising issues of the responsibility of the managing body of project.
The questions raised at this point were directed towards both of Alun’s sessions:
- One of the points brought up was whether professionals were needed in the Great War Archive to provide supporting information. Alun’s answer was that this could be done if there was funding available at the time, but there was not.
- Another issue discussed was how the Great War Archive coped with being inundated with material. This was flagged as a possible problem, as the enthusiasm of the public’s response was unforeseen, and dealing with it put other project work back. There’s also the problem of wanting to go back and add further tags to the material collected to make this more useful.
- Museums were mentioned further, particularly those that had opened up their whole catalogues online. Had anyone done this successfully – Chris Batt gave the example of the Powerhouse Museum in Australia.
- There was interest in the topic of validity, and a fear raised that material in public collections may be less ‘believable’. No particular conclusion was reached here, although it was mentioned that it is up to the public to decide how they use community archives.
Strategic involvement of universities, Chris Batt
Chris’ opening point was that the future is engagement. Included within this is the possibility of breaking down the barriers between academic staff and others with ‘hidden’ experience and skills. Chris’ example of the latter was the Phantom Carnation Grower, a gardening enthusiast who had developed outstanding plants as a matter of trial and error. Chris’ point was that an integrated network could help people such as these to develop their skills and go further with them, via access to professional guidance and resources.
In August 2009 Chris authored a report now published on the JISC website: ‘Digitisation, Curation and Two-Way Engagement’. The paper was a study on co-creation, and creating dialogue, and it was the findings in this paper that Chris went on to discuss.
His interests were to find terms of reference – to identify subject areas and to look at how a programme surrounding them might be instigated – and to examine policy, strategy, impact and value, in order to answer the question of whether any underlying principles were in existence for this kind of interactivity.
On the question of policy, Chris largely confined himself to higher level strategies, and so he reported on for instance the objectives in the HEFCE policy 2006-11, and the JISC strategy 2009-11.
With regard to strategy, the National Co-ordinating Centre for Public Engagement was one of the organisations highlighted. Chris commented on the availability of a toolkit from this source, and its concept of creating beacons for public engagement. He urged others to look at its results if considering any kind of community collection. This was also true of the Community University Partnership Programme, which has provided guidelines and understanding about what works and doesn’t.
Also discussed was the JISC BCE, how it has started by looking at techniques, and has now moved on in collaboration with Alistair Dunning’s digitisation team (JISC) to implement programmes for the co-creation of materials. Chris emphasised its usefulness in sharing the reality of many professional experiences.
For strategy Chris urged looking at the Department of Communities and Local Government, Arts Council, and Heritage Lottery Fund, and also community groups developing community projects – including Commanet, and NIACE.
One of the points stressed with respect to impact and value was that community engagement is not necessarily about doing or making something, it can be about building social capital, that is, the positive outcome need not necessarily be to create comprehensive history or collection, but ‘creating the opportunity for people to work together’. Chris recognised though that success can be measured in a variety of ways, and we should consider the concept of success in some depth. Therefore, in terms of value to the institution, we may think about collection, interpreting, augmenting, and delivering on the mission of that institution.
The need to measure success was raised, and it was recognised that this is required partly as a means of explaining and promoting the usefulness of proposed projects to sponsors. Some means of measuring success were proposed. One might consider for instance whether a given project changed behaviours and/or actions, and remember to look at the value to the individual, as well as the institution. There is some guidance available for weighing impact on participants in community projects, one of the examples given by Chris being MLA’s ‘Inspiring Learning For All’, which has made a toolkit available. Some warning was given against the old medium of flowcharts, which can be difficult to understand, with non-specific wording and no capacity to really measure anything.
Chris and his group looked at evidence from UK universities and other organisations and came up with some recommendations:
- It is critical for all parties, including the public, to know what’s expected of them;
- There was at the time of the report’s publication a need for greater cooperation in JISC and understanding of what’s happening, although Chris clarified that this was now improving;
- There should be a national forum for consultation and development, which has not yet happened.
He also offered some project subjects to consider, these being shared public memory experience (e.g. immigration, conflicts, national events) and academic subjects with community interest (astronomy, media studies, digital humanities).
With regard to the future, Chris was particularly eager to promote a long-term view, in addition to a short term one. Therefore his questions were how to respond long term to the economy, digital determinism, and public policy. In terms of the economy, it is clear that cuts in spending are approaching, but the severity of those cuts is an unknown. Perhaps, then, a radical rethinking about future funding is required, bearing in mind sustainable development, substitution funding (e.g. could libraries get rid of physical books if their pages are all digitised), and how co-creation can be embedded (by creating the digital objects).
Digital determinism is moving from physical objects to digital objects that can be transformed in different ways. This created new and more fragmented ways of doing things, and raises the issue of trust, particularly, if material can be reused, its provenance is more difficult to trace.
Chris brought attention to the relationship between the professional and amateur, which may be shifting, perhaps unlocking untapped skills as mentioned at the outset of his talk. He ended by stating that knowledge is the raw material of the future, with access to knowledge a basic right, requiring a new institutional architecture. He mentioned that Paul Wildman had expressed this as moving from ‘monophonic universities to polyphonic universities’, so moving from classrooms where only a finite number of people may be present toward iTunes-U and YouTube (Open View – the Open University’s presence there), where the numbers may be much greater, and the points of contact with others much more varied.
Exercise – community collection priorities
The voting exercise after lunch entailed the meeting being split into groups of two. Each group was given a handheld voting pod and asked to discuss and vote on questions which will feed into RunCoCo’s activities. Some results include that:
- RunCoCo should concentrate on community archives AND the wider issues of user generated content, e.g. tags, comments.
- RunCoCo should not write guidelines where these exist already, rather they should ensure that projects know where to find well-written guides and resources. Perhaps RunCoCo, needs to give a ‘highway code’ of methods, stating the results that can be anticipated if (for example) cataloguing is done according to a particular rule. A different point of view was that there is a danger with guidelines that one doesn’t know how they will be received or treated after they have left the hands of the author. People sometimes make guidelines which popularly become rules or canon without any consultation. The group seemed to reach the conclusion that this is something to be aware of, but that this in itself should not be sufficient to prevent one from publishing advice which may be useful. It was mentioned in addition that the JISC Strategic Content Alliance and other bodies do already produce background material of this kind.
Exercise – discussion groups
The meeting was again split up, this time into informal groups to discuss the following topics:
- Sustainability of community collections
- Trust, in terms of authenticity, and accuracy
- IPR, Creative Commons, and copyright
Discussion group feedback
The groups met up again afterwards to share their thoughts with the whole meeting.
The ‘Trust’ group, facilitated by Stuart Lee (OUCS/English Faculty) came up with the following themes:
- Is it right to use the word trust, when we really mean authenticity?
- Trust operates in two directions – whether we as project groups trust what the public submits to us, and also whether they trust us, our story telling, moderation and data management.
- Convincing for approval (by the public and for funding). Some projects require this more than others.
- The risks and challenges to be considered: inconsistent data; irrelevance; distinguishing quality; maintaining trust; maintaining data; actually using the information; preventing the service being hijacked for political or other purposes
- How the risks can be mitigated: flagging content for review or release; using communities to guard against misuse; providing examples; providing fields; being as clear as possible with the community as to what is expected of them; using existing tools; focusing on the main aim of the project.
The ‘Sustainability’ group facilitated by Alistair Dunning (JISC) came up with the following themes:
- There is no ‘silver bullet’ to solve every sustainability problem, different options need to be considered in different situations.
- Consider the issue openly at the start of, or before, the project.
- Prioritise goals – for example, will the overriding aim be to create a community or content? There will be different for different types of project.
- Some specific questions that might be considered are: how easy will it be to maintain a community; can it be outsourced; can you get software to do it; what would be ideal; what would be practical; what technically do you need; does your department or institution have an interest in maintaining it.
- It may also be appropriate to ask larger, institutional policy questions, although it was acknowledged that this may not be relevant to RunCoCo.
The ‘IPR’ group facilitated by Peter Robinson (OUCS) came up with the following themes:
- The legal aspect of project creation is often viewed as a dull topic, making it tempting to duck the issues early on. However, to be most effective policies must be thought through clearly from the start – retrofitting can be expensive and embarrassing.
- Choose the licence under which content will be released early. Creative Commons is fit for purpose and easy to understand.
- Get help from national institutions, and the legal arm of your institution.
- Consider privacy and personal data protection. Have a ‘take-down’ policy.
- Be aware that there are particular issues associated with the age of material. So, with contemporary material, you may need to be particularly careful that you find out who owns the copyright over it. With older material copyright may have expired, but the material and its rights may have more of a history to be aware of.
- Mixed ownership is difficult; make sure you have the time and ability to deal with the issues.
- It may be helpful to think up different scenarios at the outset.
RunCoCo – community involvement and demo of CoCoCo software
Alun used this last session to define RunCoCo, and CoCoCo (release an incarnation as open-source software).
He began with a brief note on how community research should not just benefit the institution, but the volunteers and community involved in the project. He gave the example of community archaeology, where the University of Birmingham archaeology department was using volunteers from a particular Scottish locality to help with digs, as reported by a BBC Alba programme. The show interviewed and therefore explored the viewpoints of all involved. The local community said that they felt more knowledgeable about archaeology, and were learning professional skills. The local development officer gave the point of view of the county council that they were pleased that a much larger number of sites had been explored than would have otherwise been possible, and that moreover it was great for the local archaeology interest groups, which hadn’t previously had the opportunity to dig on that scale. Students from the Archaeology department commented that it seemed good for the community, and raised the local tourism profile.
Alun went on to introduce some of the technicalities of CoCoCo as a work in progress. CoCoCo, as an open-source web application which runs on a server, needs some specialisation and server facilities. Stuart Lee’s project Woruldhord will use this software, providing a chance to experiment, and see what changes need to be made. Once this has been explored the RunCoCo team will be writing guidelines to assist others in using the software.
At the moment CoCoCo is set up with a very small number of required fields, however, Alun demonstrated how more could be added ad infinitum. The record creation page is straightforward, and includes a Google map, giving an opportunity to record the location of the material submitted accurately, as longitude and latitude will be calculated automatically. Other standard fields include name, title, and description. Users attach their digitised material and must view a draft of their record before submitting, providing a check of sorts.
In the final version some fields may include specific vocabularies, to save time and effort going over submitted information to correct typos and so on. Whether contributors will be able to add their own tags is also still being considered.
The Woruldhord team is still working on who will hold the rights, although it will probably be the contributor, and whether the contributor will be named alongside the material submitted. They are also considering the question of who will be the publisher – the project or the university?
The look of the application will be alterable to fit in with, say, promotional material and website look-and-feel. There will also be features for cataloguers, including reviewing records with thumbnail images.
The software is available to be downloaded now, but is set to be completed at some time in the near future.
Some questions followed Alun’s session:
- One delegate wanted to know more about how the software was presented or accessed, particularly whether one needed to link to it from a website. Alun explained that CoCoCo is a web application, and so is wrapped up inside your website – the intention is that you will be able to make it look and feel like rest of your website.
- Another issue raised was where one would be able to study the data collected. Alun answered that the software will include a search and browse interface, but you can export that data into your engine if you prefer.
- There was also some desire to clarify the distinction between this software and existing, older cataloguing programmes, like Filemaker. Filemaker was distinguished from CoCoCo as an older, commercial piece of software. The purpose of CoCoCo is as a prototype, an example for developers to save them a good deal of time and preparation. The advantage of this software over older software is that is open-source, specifically made with public access in mind – and also that it takes into consideration interactivity, in the form of comments, tags and ratings.
- The group moved on to discuss file formats and memory requirements – the extent of which will largely depend on bandwidth and server capabilities, but in theory will certainly accommodate video-sized files. This moved on to the question of which player videos will run on in practice. Although the Great War Archive was set up to use Quicktime, videos accessible via a CoCoCo system would play according to browser and client-side setup.
- At this point Alun commented on some other features, including the ability to download, print, and navigate search results (previous, next, etc).
- Delegates were interested to know whether the software would include some templates for those setting up for the first time – the answer to which was absolutely, yes.
In his closing comments, Alun thanked all of the attendees, and encouraged them to look at the How To Run a Community Collection Online Google group, http://groups.google.com/group/runcoco