Open Spires – A new view of Oxford’s Open projects

We’ve made it easier to see Oxford’s Open Educational projects – they are now all clearly grouped together and linked from a new site with updated introductory videos and search options :

http://openspires.it.ox.ac.uk/

A great variety of open projects are going on within the University of Oxford, from podcasts to crowdsourcing schemes, educational materials to whole digital archives. You can now see all the projects grouped together in one overview site and watch short introductory videos explaining their background. Browse the projects below to discover a wide range of open resources, materials and initiatives!

The new portal:

http://openspires.it.ox.ac.uk/

Screen shot 2015-03-27 at 15.50.56

Posted in Content, Creative Commons, Oxford, partnerships | Leave a comment

Making learning available for everyone 

The University of Oxford’s iTunes U site goes from strength to strength, making sure that learning has a real impact in society.The University of Oxford’s iTunes U site was launched in October 2008 and now, as it approaches its sixth anniversary, it can celebrate a remarkable landmark: more than 22.5 million downloads so far. Given that access to the lectures, teaching materials and interviews with leading academics on iTunes U is completely free, the City of Dreaming Spires is at the forefront of making learning available to as wide an audience as possible, opening its virtual doors to everyone. 
 
The Open Spires project had its roots in 2008 when the university entered into a partnership with Apple, which had developed a simple mechanism at Duke University in the United States to push content onto students’ iPods. The marriage of Apple and Oxford’s academics was fruitful at once, giving students, staff and the public any-time access to lectures. By 2010, some 60% of freshers revealed that they had viewed material via what became known as ‘the University of iPod’ before beginning their degrees. 
 
Today, the statistics speak for themselves. Over 6,400 podcast items have been processed via iTunes U. The site has a worldwide audience of 185 countries, with 31% of users from the USA, 17% from the UK and 7% from China. It boasts over 5,500 hours of material from 4,000 academic speakers and contributors. 
 
The sheer variety of content available on iTunes U is one reason for its success. Viewers can experience a typical Oxford Humanities tutorial, they can enter the complex worlds of chemistry and physics, they can enjoy the wit of Oscar Wilde and they can find out about Alumni events. They can also discover how to write the perfect business plan, ponder challenges to the Western canon and get to grips with Shakespeare thanks to a unique lecture series, with each talk covering one play and including an eBook. Whatever they download, users will find high quality and accessible learning materials. 

Oxford on iTunesU site

We had a feeling we were onto something when we set up iTunes U but its success has been very rewarding. We’ve now formalised the methods of acquiring and publishing content, around 55% of which is openly licenced by the speakers under Creative Commons so that it can be reused in education worldwide. It is fantastic to see Oxford playing a leading role in such a cutting edge and altruistic education programme, and to see researchers’ work reach such wide audiences.
 
The University offers and encourages other learning institutions to use the materials in their own teaching. Other universities, colleges and schools can now use the online lectures in their own lessons at no cost. The material can be downloaded from iTunes or the parallel University web site. 

Website details: http://itunes.ox.ac.uk and http://podcasts.ox.ac.uk/open
 

Posted in Content, Creative Commons, impact, iTunes U, ukoer | Leave a comment

Open Science, Open Data and Policies on Open Access

Open Science 2013 event – http://podcasts.ox.ac.uk/series/open-science

‪In April 2013, Oxford hosted a conference ‘Rigour and Openness in 21st Century Science’. Over 30 speakers gathered to discuss the cutting edge of digital innovations in publishing, how openness is set to improve standards in science, and the British government’s new policies on open access. ‬

‪All the talks are now available to watch freely on‬ Oxford on‪ iTunes U‬ and on http://podcasts.ox.ac.uk‪. Hear about exciting new startups like figshare, hear how established industry players such as Elsevier are evolving, and hear what academics are doing to bring openness into their daily work.‬
‪ ‬
‪Keynote speakers included Chief Scientific Adviser Mark Walport‬ and‪ ‬a c‪losing Keynote speech by Rt Hon David Willetts, Minister for Universities and Scienc‬e.

All talks are released with a Creative Commons licence.

Links
=====
Open Science podcasts:
http://podcasts.ox.ac.uk/series/open-science
Open Science podcasts in Oxford on iTunesU:
https://itunes.apple.com/gb/itunes-u/open-science/id509508541
Further Info:
http://rigourandopenness.com/

Posted in copyright, Creative Commons, dissemination, events, Oxford, ukoer | Leave a comment

OER International Case Study Published

Further to the recent post about Oxford’s OER International project funded by the HEA, you can now read the full case study.

The purpose of Oxford OER International was to identify suitable elements of the University of Oxford’s existing OER collection to be showcased internationally. By improving the web presence of Oxford’s OER outputs, designed with the international user in mind, the project was able to promote a selection of resources hand-picked for their suitability for an international audience.

The project enhanced the potential for engagement with international audiences by ensuring that the selected content was more easily discoverable through improved descriptions and additional metadata to indicate level (introductory, intermediate, advanced). Advocacy from world-class academics and appreciative users, clear routes to Oxford’s other OER projects, and the inclusion of other links focussed on international admissions were all included to present a true showcase of Oxford’s best international outputs. The project evaluated strategies to improve discoverability of content by a global audience and investigated a range of tracking and feedback methods for understanding their use.

This case study highlights successful approaches to understanding the needs of an international audience, for example by exploring how improved cataloguing metadata can be used to enhance discoverability and by demonstrating how targeted promotion of relevant content through better visibility and marketing can lead to higher usage and by introducing a tracking analytics strategy to evaluate usage and search behaviour. It also includes a simple 5-step methodology which is offered as a model for other OER creators to follow. 

 The 5 steps to gaining global reach

1. Getting a feel for your audience

  • focussing on your target audience
  • understanding some key aspects of their data for example: most popular pages, best traffic sources, most popular countries and languages.

2. Framing your objectives

  • avoiding vanity metrics
  • working out metrics for your key stakeholders.

3. Audit where you are in relation to your objectives

  • what are your primary traffic sources?
  • what behaviour can you see them making?
  • what can you tell about your geographical visitors?

4. Revising your objectives, planning and implementing some improvements

  • increase your traffic sources
  • optimise your content for searches.

5. Evaluate and repeat steps 3 and 4.

Posted in Content, dissemination, impact, Oxford, ukoer, Uncategorized | Leave a comment

Oxford OER International

We have recently received funding for a short project by the OER International strand of the HEA/JISC Open Educational Resources Phase 3 Programme. As part of this very quick turn-around project we have just made live our new open content page on the University’s podcasting website http://podcasts.ox.ac.uk/open. Huge thanks go to Steve Pierce for making the vision come alive.

The purpose of Oxford OER International was to identify suitable elements of the University of Oxford’s existing OER collection to be showcased internationally. By improving the web presence of Oxford’s OER outputs, designed with the international user in mind, the project was able to promote a selection of resources hand-picked for their suitability for an international audience. The project enhanced the potential for engagement with international audiences by ensuring that the selected content was more easily discoverable through improved descriptions and additional metadata to indicate level (introductory, intermediate, advanced). Advocacy from world-class academics and appreciative users, clear routes to Oxford’s other OER projects, and the inclusion of other links focussed on international admissions were all included to present a true showcase of Oxford’s best international outputs. The project briefly explored strategies to improve discoverability by an international audience and methods for understanding their tracking and use, and these are to be included in the final case study. The case study will highlight successful approaches, for example by describing how metadata can be used to enhance discoverability and demonstrating how tracking methods can support international promotion.

Details of the final case study will be posted here when it is published.

Posted in dissemination, Oxford, podcasting, ukoer | Leave a comment

SPINDLE – Speech to Text to Keywords to Captions – The Grand Finale

SPINDLE: Increasing OER discoverability by improved keyword metadata via automatic speech to text transcription.

A summary of the project  using the words of the voice-over that accompanies the SPINDLE overview video that documents the project.

1. Aim – Generate keywords automatically from recorded lectures

2. Spindle was funded by JISC through the “Open Educational Resources – Rapid Innovation” strand. – http://www.jisc.ac.uk/whatwedo/programmes/ukoer3/rapidinnovation.aspx

3. Spindle was a technical project whose key objective was to explore generating cataloguing keywords from recorded lectures.

4. Spindle reviewed the accuracy of “speech to text” tools available to media producers for automatically generating a text transcript from a recording file.

5. Spindle created a program that automatically filters the uncorrected transcript to a set of statistically interesting keywords. The program analyses the lecturer’s words and compares them with the British National Corpus of Spoken Words.

Better keywords improve the discoverability of open content !

6. Spindle went on much further than expected than the initial plan to create a “captioning” toolset to help media producers deal with cataloguing media

With this toolkit, a media service can now:

– batch process recordings to create transcripts automatically ( using the free toolset CMU Sphinx)

– generate keywords

– correct any transcript errors while listening to the media

– and export into time-coded captioning and archive formats

7. The Spindle captioning toolset was written in Python using the DJANGO framework

8. The Spindle code is publicly available to re-use in an online  repository under an open source licence  – [ Github code repository – https://github.com/ox-it/spindle-code hashtag #spindle #OERRI ]

9. All reports and further information are available through the Spindle blog – http://blogs.it.ox.ac.uk/openspires/category/spindle – hashtag #spindle

SPINDLE Overview Movie

SPINDLE Overview Movie

Watch the SPINDLE 2 minute overview video using the above text as the voice-over at:

http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_overview.mp4

The SPINDLE Workflow and Caption Editor Toolkit

Posted in dissemination, grandfinale, Spindle | Tagged | Leave a comment

SPINDLE – Benefits and Impact

Project SPINDLE is about to end. As lead on the project here at the Academic IT services I’ve tried to summarise the main impact and benefits of the work:

  • Training – improved skills within the OpenSpires and Media teams
  • Discoverability -making media more discoverable and accessible,
  • Content -the creation of better cataloguing resources, tools and data
  • Knowledge exchange – through the documentation of the workflow and the creation of free to use open source tools helping others to build on our work
  • Community building – working with others to explore ideas for time-coded texts and media

The project was funded by the JISC to rapidly innovate around technical issues that support the release of Open Educational Resources. The single biggest benefit of the project has been in training and skills acquisition for our media production team – by allowing time and funding to foster a multidisciplinary collaboration across linguistics, phonetics and computer science to research and create the prototype service. The fast-paced short five month project has achieved all of it’s original aims and through the efforts of combining our summer intern programmer with an expert in speech to text software we have manged to move beyond the area of keyword cataloging and create a more complex prototype web application to process transcripts as media is created. This captioning toolkit will speed up work, be very cost-effective and allow crowd-sourced corrections to be exported into emerging HTML5 captioning and archival formats.

Here is a list of the substantial benefits of the project:

  • SPINDLE developed a round trip work flow for transcription correction and created over 20 blog reports evaluating this work.
  • SPINDLE researched the use of automatic speech to text programs to generate transcriptions automatically. This automatic transcription serves as a starting point to create manual transcriptions and captions, as well as the base to generate keywords automatically.
  • SPINDLE documented how to use Adobe Premiere to make transcripts and how a media unit might install the research toolkit CMU SPHINX 4 to transcribe podcasts – https://github.com/ox-it/spindle-code/tree/master/speechToText
  • A large corpus of text – SPINDLE proved that the workflow could generate keywords automatically for 3,426 podcasts. Once these keywords are migrated into our delivery channels they will lead to better indexing and cataloguing, and better discoverability of our Open Educational Resources (OER) by search engines.
  • Accessibility  – We generated unchecked and uncorrected caption file data in WebVTT timecoded format for our OER video podcasts
  • Archival formats – We investigated an archival format for the keywords and transcripts using the Text Encoding Initiative encoding format which also include OER licence information

We developed code:

  • Programming scripts for finding non-common keywords from text transcripts – http://github.com/ox-it/spindle-code
  •  A new prototype online transcription editor – A toolkit that aids captioning work – freely available in a github code repository – http://github.com/ox-it/spindle-code
  • Integrating the SPINDLE Caption Editor to CMU Sphinx, and to import Adobe Premiere XMP transcript files and investigated an API to the Koemi commercial web service
  • To help accessibility via text and video caption formats – Exporting to plain text, HTML, Web VTT and a data RSS feed.
We improved speech to text skills across the OpenSpires and media services team and hence the University of Oxford, by fostering a multidisciplinary collaboration across academic IT services, linguistics, phonetics and computer science to create the prototype service. We also developed expertise in other subjects such as research tools ( CMU Sphinx), text encoding ( TEI, XML and HTML5), programming (Django,web services), accessibility formats (WebVTT) and automatic speech-to-text alignment.
The next technical steps are to
  • Test the prototype software in a day to day production server environment
  • Review and reduce any minor keyword cataloguing errors
  • Ingest the cataloguing data into our main databases
  • Expose the new cataloguing keywords on the 4,000+ media items delivered by the Academic IT Services in feeds and web pages  – primarily  Oxford on iTunesU and http://podcasts.ox.ac.uk
The next research work :
  •  Explore ways of filtering even further the keywords by ranking and removing words that are unlikely to be used in online searches
  •  Explore the practicalities and costs of crowd-sourcing the correction of raw automatic transcriptions of the lectures with the new caption software
  •  Explore using the benefits and weaknesses of automatic draft text as full text search
  •  Compare the costs of managing volunteers correcting automatic transcripts to the cost and accuracy of using a professional transcription service.
Further work with academic authors:
  • Attitudes to OER text transcript release –  information on contributor attitudes to displaying texts alongside a lecture.
  • Policy for approval of texts
  • Investigating storing a voice-bank or key subject terms database to help the software improve regular transcription
Future research ideas
The project also offers many future benefits and avenues to explore for researchers and HE services:
  • Corpus Linguistics and language  – SPINDLE offers a unique snapshot of text representing the academic language over a four year period at Oxford.
  • English as a foreign language – There has been interest and debate by the language learning community on SPINDLE and captioning lectures here – http://chirpstory.com/li/25724
  • Media Production Services – there is interest in using the SPINDLE work within automatic lecture capture solutions- http://opencast.org
  • Translation of texts to foreign languages
  • Data mining – research across the disciplines
Posted in dissemination, impact, oerri, Spindle | Tagged | Leave a comment

Navigating Open Oxford: the new OpenSpires Mind Map

Are you interested in seeing the bigger picture of Open Oxford? Try the new interactive OpenSpires Mind Map, freely available online. This new map is designed as a gateway into Open Educational practice at the University of Oxford. Here, you can explore the story and achievements of OpenSpires, read how the openness initiative can benefit academic practice and find ways to get involved at the University of Oxford.

As a part of the OER revolution OpenSpires has now overseen a number of major OER projects at the University of Oxford, and is still growing. This new interactive map showcases all the diverse projects under the OpenSpires umbrella since it was established in 2009. It is a useful starting point for beginners, including Key Definitions, and How To, as well as answering some FAQs. It also goes deeper, offering information about the strategies behind OpenSpires projects like Ripple, Triton and Great Writers Inspire. It is hoped that this map will be a multi-faceted tool to help explain and celebrate various aspects of OpenSpires.

For more information explore the Mind Map or the OpenSpires homepage, or read the LTG Case Studies blogpost.

The OpenSpires Mind Map was created by Alexandra Paddock as part of a summer internship at IT Services.

Posted in Oxford, ukoer | Leave a comment

Great Writers – taking stock

With a fast-paced 1 -year project it is easy to forget some of the interesting bits along the way. As we write our final report we have taken the opportunity to reflect on all aspects of the project and this has been made easier by the excellent blogging of our student team and our academic supporters. The final report will be available in mid-October but until then here are some mini-reports and reflective posts which give a taste of our outputs and findings.

Ebooks

http://writersinspire.wordpress.com/2012/05/10/the-ipad-in-the-library/, http://writersinspire.wordpress.com/2012/04/19/engage-event-ebooks-ereaders-elearning/,

Teaching case study (video)

http://writersinspire.org/content/teaching-shakespeare-schools

Engagement with schools

http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-teachers-comments/, http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-oxford/

Engagement with the wider community

http://writersinspire.wordpress.com/category/events/engage-events/

How to inspire students

http://writersinspire.wordpress.com/2012/04/20/engage-and-inspire/

Copyright/CC

http://writersinspire.wordpress.com/2012/04/19/engage-event-copyright-and-licencing/ , http://writersinspire.wordpress.com/2012/04/17/copyright/, http://writersinspire.wordpress.com/2012/03/28/releasing-and-reusing-creative-commons-material/, http://writersinspire.wordpress.com/2012/02/09/who-owns-scholarship/, http://writersinspire.wordpress.com/2012/01/25/creative-commons/

Digital literacy

http://writersinspire.wordpress.com/2012/08/17/down-the-rabbit-hole-discovering-open-educational-resources/, http://writersinspire.wordpress.com/2012/05/01/the-satisfaction-of-a-reliable-and-interesting-source/

Posted in Content, dissemination, Great Writers, ukoer | Leave a comment

SPINDLE Automatic Keyword Generation: Step by Step

In this post we are going to show the automatic generation of keywords from the automatic transcription of a podcast. First of all, please find below a figure showing the main workflow of the SPINDLE project.

From our podcasts, we obtain an automatic transcription by using CMU Sphinx or the Speech Analysis Tool from Adobe Premiere Pro. Alternatively, a podcast could be transcribed by our media team or by using an external transcription service.

Once we have a transcription, how can we obtain the most relevant words? Using the Log-likelihood method. This method compares the frequency of a word in the transcription with the frequency of the same word in a large corpus. For example, the word “banks” occurs 17 times in the automatic transcription of this podcast, Global Recession: How Did it Happen?  and 201 in a large corpus. Why the word “banks” is relevant?

Collecting word frequencies from a large corpus

First of all we need a reference corpus to which we can compare our automatic transcriptions. This corpus should be large enough to contain most words and general enough to be representative of the language. We chose for our experiments the spoken part of the British National Corpus (BNC) as our reference corpus.

The characteristics of the spoken part of the BNC corpus can be found below:

  • 589,347 sentences
  • 11,606,059 words

So, now we know we have more than 11 million words in our reference corpus. So, taking into account that the word “banks” occurs 201 times out of 11.6 million words and 17 times out of 5439 times in our transcription,  how do we calculate the relevance of the word “banks”?

Step 1

  1. Use Natural Language Processing techniques to normalise the corpus (remove punctuation and stopwords)
  2. Calculate for each word in the British National Corpus how many times does that word occur in the corpus (a)
  3. Calculate the total number of words in the corpus (c)

The final file is composed of 56,029 words and the number of occurrences of each word. An extract of that file can be found below:

  • banks: 201
  • crisis: 195
  • companies: 758
  • ….

Generating relevant keywords and bigrams

Step 2

  1. Use Natural Language Processing techniques to normalise the transcription (remove punctuation if necessary and stopwords)
  2. Calculate for each word in the transcription how many times does that word occur in the transcription (b)
  3. Calculate the total number of words in the transcription (d)

Step 3

  1. Calculate the Log-likelihood, G2, of each individual word
  2. Sort the words by Log-likelihood value (the higher the better)

Step 4

  1. Calculate frequent bigrams counting the number of occurrences

Example of Automatic Keywords Generation

We used the keyword generation tool to generate the relevant keywords and bigrams of the automatic transcription of the podcast Global Recession: How Did it Happen? (Correct Words = 32.9%). We selected a bad automatic transcription to show that even with a low number of correct words we can extract some relevant keywords and bigrams automatically.

Keywords Generated (word: Log-likelihood)

banks : 141.12175627
crisis : 73.3976004078
companies : 67.8498685789
assets : 61.8910800051
haiti : 47.7956942776
interest : 41.3390170289
credit : 39.6149918395
crunch : 35.9334074944
senate : 32.4501608202
profited : 30.625124757
sitcom : 30.625124757
ansa : 30.625124757
nineteen : 29.0864140753
economy : 28.6440250819
nineties : 27.5138518651
haitian : 26.8069860979
sanctioning : 26.8069860979
center : 26.8069860979
regulate : 25.4923775621
hashing : 25.0818400138
haitians : 25.0818400138
stimulus : 24.5089608603
united : 24.1102094531
successful : 21.8091735308
financial : 21.7481087661
key : 21.6791751296
caught : 21.1648006228
eases : 21.0970376283
bankruptcy : 21.0970376283
rates : 21.0105869453
kind : 20.8040324729
cited : 20.6246470912
backs : 19.9877139071
borrowing : 19.9877139071
crimes : 19.5817617075
countries : 19.5490491082
essentially : 19.334521352
fiscal : 19.1532240523

Collocations Generated (collocation: #occurences)

[interest rates] : 5
[financial crisis] : 4
[wall street] : 3
[nineteen nineties] : 3
[credit crunch] : 3
[british government] : 3

Word Cloud (using Wordle)


Conclusion

We should note that we are generating keywords from automatic transcriptions and not from human transcriptions. Therefore, we obtain along relevant keywords and bigrams some keywords and bigrams that are not that relevant or directly, out of topic. However, through the SPINDLE project we have generated automatically thousands of relevant keywords and bigrams for our collection of podcasts that are going to increase in the near future the discoverability and accessibility of our podcast collection.

Posted in oerri, Spindle, ukoer | Tagged , , | 5 Comments