<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OpenSpires</title>
	<atom:link href="http://blogs.it.ox.ac.uk/openspires/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.it.ox.ac.uk/openspires</link>
	<description>Inspirational Open Content from Oxford University - Open Spires</description>
	<lastBuildDate>Tue, 09 Apr 2013 08:09:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>OER International Case Study Published</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2013/04/09/oer-international-case-study-published/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2013/04/09/oer-international-case-study-published/#comments</comments>
		<pubDate>Tue, 09 Apr 2013 08:09:44 +0000</pubDate>
		<dc:creator>Lisa Mansell</dc:creator>
				<category><![CDATA[Content]]></category>
		<category><![CDATA[dissemination]]></category>
		<category><![CDATA[impact]]></category>
		<category><![CDATA[Oxford]]></category>
		<category><![CDATA[ukoer]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1291</guid>
		<description><![CDATA[Further to the recent post about Oxford&#8217;s OER International project funded by the HEA, you can now read the full case study. The purpose of Oxford OER International was to identify suitable elements of the University of Oxford’s existing OER &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2013/04/09/oer-international-case-study-published/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Further to the recent post about Oxford&#8217;s OER International project funded by the HEA, you can now read the full <a title="Oxford OER International Case Study" href="http://www.heacademy.ac.uk/projects/detail/oer/OER_int_005_ox1" target="_blank">case study</a>.</p>
<p>The purpose of Oxford OER International was to identify suitable elements of the University of Oxford’s existing OER collection to be showcased internationally. By improving the web presence of Oxford’s OER outputs, designed with the international user in mind, the project was able to promote a selection of resources hand-picked for their suitability for an international audience.</p>
<p>The project enhanced the potential for engagement with international audiences by ensuring that the selected content was more easily discoverable through improved descriptions and additional metadata to indicate level (introductory, intermediate, advanced). Advocacy from world-class academics and appreciative users, clear routes to Oxford’s other OER projects, and the inclusion of other links focussed on international admissions were all included to present a true showcase of Oxford’s best international outputs. The project evaluated strategies to improve discoverability of content by a global audience and investigated a range of tracking and feedback methods for understanding their use.</p>
<p>This case study highlights successful approaches to understanding the needs of an international audience, for example by exploring how improved cataloguing metadata can be used to enhance discoverability and by demonstrating how targeted promotion of relevant content through better visibility and marketing can lead to higher usage and by introducing a tracking analytics strategy to evaluate usage and search behaviour. It also includes a simple 5-step methodology which is offered as a model for other OER creators to follow. </p>
<p><strong> The 5 steps to gaining global reach</strong></p>
<p>1. Getting a feel for your audience</p>
<ul>
<li>focussing on your target audience</li>
<li>understanding some key aspects of their data for example: most popular pages, best traffic sources, most popular countries and languages.</li>
</ul>
<p>2. Framing your objectives</p>
<ul>
<li>avoiding vanity metrics</li>
<li>working out metrics for your key stakeholders.</li>
</ul>
<p>3. Audit where you are in relation to your objectives</p>
<ul>
<li>what are your primary traffic sources?</li>
<li>what behaviour can you see them making?</li>
<li>what can you tell about your geographical visitors?</li>
</ul>
<p>4. Revising your objectives, planning and implementing some improvements</p>
<ul>
<li>increase your traffic sources</li>
<li>optimise your content for searches.</li>
</ul>
<p>5. Evaluate and repeat steps 3 and 4.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2013/04/09/oer-international-case-study-published/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oxford OER International</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/12/19/oxford-oer-international/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/12/19/oxford-oer-international/#comments</comments>
		<pubDate>Wed, 19 Dec 2012 10:37:37 +0000</pubDate>
		<dc:creator>Lisa Mansell</dc:creator>
				<category><![CDATA[dissemination]]></category>
		<category><![CDATA[Oxford]]></category>
		<category><![CDATA[podcasting]]></category>
		<category><![CDATA[ukoer]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1279</guid>
		<description><![CDATA[We have recently received funding for a short project by the OER International strand of the HEA/JISC Open Educational Resources Phase 3 Programme. As part of this very quick turn-around project we have just made live our new open content &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/12/19/oxford-oer-international/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/12/New-open-page.jpg"><img class="alignnone  wp-image-1280" src="http://blogs.it.ox.ac.uk/openspires/files/2012/12/New-open-page.jpg" alt="" width="672" height="504" /></a></p>
<p>We have recently received funding for a short project by the OER International strand of the <a href="http://www.heacademy.ac.uk/resources/detail/oer/oer-phase-3">HEA/JISC Open Educational Resources Phase 3 Programme</a>. As part of this very quick turn-around project we have just made live our new open content page on the University’s podcasting website <a href="http://podcasts.ox.ac.uk/open">http://podcasts.ox.ac.uk/open</a>. Huge thanks go to Steve Pierce for making the vision come alive.</p>
<p>The purpose of <em>Oxford OER International</em> was to identify suitable elements of the University of Oxford’s existing OER collection to be showcased internationally. By improving the web presence of Oxford’s OER outputs, designed with the international user in mind, the project was able to promote a selection of resources hand-picked for their suitability for an international audience. The project enhanced the potential for engagement with international audiences by ensuring that the selected content was more easily discoverable through improved descriptions and additional metadata to indicate level (introductory, intermediate, advanced). Advocacy from world-class academics and appreciative users, clear routes to Oxford’s other OER projects, and the inclusion of other links focussed on international admissions were all included to present a true showcase of Oxford’s best international outputs. The project briefly explored strategies to improve discoverability by an international audience and methods for understanding their tracking and use, and these are to be included in the final case study. The case study will highlight successful approaches, for example by describing how metadata can be used to enhance discoverability and demonstrating how tracking methods can support international promotion.</p>
<p>Details of the final case study will be posted here when it is published.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/12/19/oxford-oer-international/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPINDLE &#8211; Speech to Text to Keywords to Captions  &#8211; The Grand Finale</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/10/05/spindle-speech-to-text-to-keywords-to-captions-the-grand-finale/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/10/05/spindle-speech-to-text-to-keywords-to-captions-the-grand-finale/#comments</comments>
		<pubDate>Fri, 05 Oct 2012 17:36:58 +0000</pubDate>
		<dc:creator>Peter Robinson</dc:creator>
				<category><![CDATA[dissemination]]></category>
		<category><![CDATA[grandfinale]]></category>
		<category><![CDATA[Spindle]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1228</guid>
		<description><![CDATA[SPINDLE: Increasing OER discoverability by improved keyword metadata via automatic speech to text transcription. A summary of the project  using the words of the voice-over that accompanies the SPINDLE overview video that documents the project. 1. Aim &#8211; Generate keywords automatically &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/10/05/spindle-speech-to-text-to-keywords-to-captions-the-grand-finale/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>SPINDLE: Increasing OER discoverability by improved keyword metadata via automatic speech to text transcription.</p>
<p>A summary of the project  using the words of the voice-over that accompanies the <a title="SPINDLE - A Video Introduction to the project" href="http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_overview.mp4" target="_blank">SPINDLE overview video</a> that documents the project.</p>
<p><em>1. Aim &#8211; Generate keywords automatically from recorded lectures</em></p>
<p>2. Spindle was funded by JISC through the “Open Educational Resources &#8211; Rapid Innovation” strand. - <a title="JISC OERRI Strand" href="http://www.jisc.ac.uk/whatwedo/programmes/ukoer3/rapidinnovation.aspx">http://www.jisc.ac.uk/whatwedo/programmes/ukoer3/rapidinnovation.aspx</a></p>
<p>3. Spindle was a technical project whose key objective was to explore generating cataloguing keywords from recorded lectures.</p>
<p>4. Spindle reviewed the accuracy of &#8220;speech to text&#8221; tools available to media producers for automatically generating a text transcript from a recording file.</p>
<p>5. Spindle created a program that automatically filters the uncorrected transcript to a set of statistically interesting keywords. The program analyses the lecturer’s words and compares them with the British National Corpus of Spoken Words.</p>
<p><em>Better keywords improve the discoverability of open content !</em></p>
<p>6. Spindle went on much further than expected than the initial plan to create a &#8220;captioning&#8221; toolset to help media producers deal with cataloguing media</p>
<p>With this toolkit, a media service can now:</p>
<p>- batch process recordings to create transcripts automatically ( using the free toolset CMU Sphinx)</p>
<p>- generate keywords</p>
<p>- correct any transcript errors while listening to the media</p>
<p>- and export into time-coded captioning and archive formats</p>
<p>7. The Spindle captioning toolset was written in Python using the DJANGO framework</p>
<p>8. The Spindle code is publicly available to re-use in an online  repository under an open source licence  - [ Github code repository - <a href="https://github.com/ox-it/spindle-code" target="_blank">https://github.com/ox-it/spindle-code</a> hashtag #spindle #OERRI ]</p>
<p>9. All reports and further information are available through the Spindle blog &#8211; <a title="SPINDLE project blog" href="http://blogs.it.ox.ac.uk/openspires/category/spindle">http://blogs.it.ox.ac.uk/openspires/category/spindle</a> &#8211; hashtag #spindle</p>
<div id="attachment_1235" class="wp-caption alignleft" style="width: 243px"><a href="http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_2min_pam_october.mp4"><img class="size-full wp-image-1235" src="http://blogs.it.ox.ac.uk/openspires/files/2012/10/Screen-shot-2012-10-05-at-18.22.18.png" alt="SPINDLE Overview Movie" width="233" height="174" /></a><p class="wp-caption-text">SPINDLE Overview Movie</p></div>
<p>Watch the SPINDLE 2 minute overview video using the above text as the voice-over at:</p>
<p><a href="http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_overview.mp4"> http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_overview.mp4</a></p>
<div id="attachment_1234" class="wp-caption aligncenter" style="width: 623px"><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/10/Caption-Editor-grand-finale.png"><img class=" wp-image-1234 " src="http://blogs.it.ox.ac.uk/openspires/files/2012/10/Caption-Editor-grand-finale.png" alt="" width="613" height="430" /></a><p class="wp-caption-text">The SPINDLE Workflow and Caption Editor Toolkit</p></div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/10/05/spindle-speech-to-text-to-keywords-to-captions-the-grand-finale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://media.podcasts.ox.ac.uk/oucs/spindle/spindle_overview.mp4" length="7117914" type="video/mp4" />
		</item>
		<item>
		<title>SPINDLE &#8211; Benefits and Impact</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/10/03/spindle-benefits-and-impact-at-oxford-and-beyond/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/10/03/spindle-benefits-and-impact-at-oxford-and-beyond/#comments</comments>
		<pubDate>Wed, 03 Oct 2012 07:19:55 +0000</pubDate>
		<dc:creator>Peter Robinson</dc:creator>
				<category><![CDATA[dissemination]]></category>
		<category><![CDATA[impact]]></category>
		<category><![CDATA[oerri]]></category>
		<category><![CDATA[Spindle]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1211</guid>
		<description><![CDATA[Project SPINDLE is about to end. As lead on the project here at the Academic IT services I&#8217;ve tried to summarise the main impact and benefits of the work: Training &#8211; improved skills within the OpenSpires and Media teams Discoverability -making &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/10/03/spindle-benefits-and-impact-at-oxford-and-beyond/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Project SPINDLE is about to end. As lead on the project here at the Academic IT services I&#8217;ve tried to summarise the main impact and benefits of the work:</p>
<ul>
<li>Training &#8211; improved skills within the OpenSpires and Media teams</li>
<li>Discoverability -making media more discoverable and accessible,</li>
<li>Content -the creation of better cataloguing resources, tools and data</li>
<li>Knowledge exchange &#8211; through the documentation of the workflow and the creation of free to use open source tools helping others to build on our work</li>
<li>Community building &#8211; working with others to explore ideas for time-coded texts and media</li>
</ul>
<p>The project was funded by the JISC to rapidly innovate around technical issues that support the release of Open Educational Resources. The single biggest benefit of the project has been in training and skills acquisition for our media production team &#8211; by allowing time and funding to foster a multidisciplinary collaboration across linguistics, phonetics and computer science to research and create the prototype service. The fast-paced short five month project has achieved all of it&#8217;s original aims and through the efforts of combining our summer intern programmer with an expert in speech to text software we have manged to move beyond the area of keyword cataloging and create a more complex prototype web application to process transcripts as media is created. This captioning toolkit will speed up work, be very cost-effective and allow crowd-sourced corrections to be exported into emerging HTML5 captioning and archival formats.</p>
<p>Here is a list of the substantial benefits of the project:</p>
<ul>
<li>SPINDLE developed a round trip work flow for transcription correction and created over 20<a title="Spindle Blog reports" href="http://blogs.it.ox.ac.uk/openspires/spindle/"> blog reports</a> evaluating this work.</li>
<li>SPINDLE researched the use of automatic speech to text programs to <a href="blogs.it.ox.ac.uk/openspires/2012/06/08/automatic-speech-to-text-transcription-preliminary-results/">generate transcriptions automatically</a>. This automatic transcription serves as a starting point to create manual transcriptions and captions, as well as the base to generate keywords automatically.</li>
<li>SPINDLE documented how to use Adobe Premiere to make transcripts and how a media unit might install the research toolkit CMU SPHINX 4 to transcribe podcasts - <a href="https://github.com/ox-it/spindle-code/tree/master/speechToText" target="_blank">https://github.com/ox-it/spindle-code/tree/master/speechToText</a></li>
<li>A large corpus of text &#8211; SPINDLE proved that the workflow could <a href="http://blogs.it.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/">generate keywords automatically</a> for 3,426 podcasts. Once these keywords are migrated into our delivery channels they will lead to better indexing and cataloguing, and better discoverability of our Open Educational Resources (OER) by search engines.</li>
<li>Accessibility  &#8211; We generated unchecked and uncorrected caption file data in WebVTT timecoded format for our OER video podcasts</li>
<li>Archival formats &#8211; We investigated an archival format for the keywords and transcripts using the Text Encoding Initiative encoding format which also include OER licence information</li>
</ul>
<p>We developed code:</p>
<ul>
<li>Programming scripts for finding non-common keywords from text transcripts - <a href="http://github.com/ox-it/spindle-code" target="_blank">http://github.com/ox-it/spindle-code</a></li>
<li> A new prototype online transcription editor &#8211; A toolkit that aids captioning work &#8211; freely available in a github code repository - <a href="http://github.com/ox-it/spindle-code" target="_blank">http://github.com/ox-it/spindle-code</a></li>
<li>Integrating the SPINDLE Caption Editor to CMU Sphinx, and to import Adobe Premiere XMP transcript files and investigated an API to the Koemi commercial web service</li>
<li>To help accessibility via text and video caption formats &#8211; Exporting to plain text, HTML, Web VTT and a data RSS feed.</li>
</ul>
<div></div>
<div>We improved speech to text skills across the OpenSpires and media services team and hence the University of Oxford, by fostering a multidisciplinary collaboration across academic IT services, linguistics, phonetics and computer science to create the prototype service. We also developed expertise in other subjects such as research tools ( CMU Sphinx), text encoding ( TEI, XML and HTML5), programming (Django,web services), accessibility formats (WebVTT) and<a> automatic speech-to-text alignment</a>.</div>
<div></div>
<div>The next technical steps are to</div>
<ul>
<li>Test the prototype software in a day to day production server environment</li>
<li>Review and reduce any minor keyword cataloguing errors</li>
<li>Ingest the cataloguing data into our main databases</li>
<li>Expose the new cataloguing keywords on the 4,000+ media items delivered by the Academic IT Services in feeds and web pages  &#8211; primarily <a title="Oxford on iTunesU" href="http://itunes.ox.ac.uk/"> Oxford on iTunesU</a> and <a title="Podcasts at Oxford University" href="http://podcasts.ox.ac.uk">http://podcasts.ox.ac.uk</a></li>
</ul>
<div></div>
<div>The next research work :</div>
<ul>
<li> Explore ways of filtering even further the keywords by ranking and removing words that are unlikely to be used in online searches</li>
<li> Explore the practicalities and costs of crowd-sourcing the correction of raw automatic transcriptions of the lectures with the new caption software</li>
<li> Explore using the benefits and weaknesses of automatic draft text as full text search</li>
<li> Compare the costs of managing volunteers correcting automatic transcripts to the cost and accuracy of using a professional transcription service.</li>
</ul>
<div></div>
<div>Further work with academic authors:</div>
<div></div>
<ul>
<li>Attitudes to OER text transcript release -  information on contributor attitudes to displaying texts alongside a lecture.</li>
<li>Policy for approval of texts</li>
<li>Investigating storing a voice-bank or key subject terms database to help the software improve regular transcription</li>
</ul>
<div>Future research ideas</div>
<div></div>
<div>The project also offers many future benefits and avenues to explore for researchers and HE services:</div>
<ul>
<li>Corpus Linguistics and language  &#8211; SPINDLE offers a unique snapshot of text representing the academic language over a four year period at Oxford.</li>
<li>English as a foreign language &#8211; There has been interest and debate by the language learning community on SPINDLE and captioning lectures here &#8211; http://chirpstory.com/li/25724</li>
<li>Media Production Services &#8211; there is interest in using the SPINDLE work within automatic lecture capture solutions- http://opencast.org</li>
<li>Translation of texts to foreign languages</li>
<li>Data mining &#8211; research across the disciplines</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/10/03/spindle-benefits-and-impact-at-oxford-and-beyond/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Navigating Open Oxford: the new OpenSpires Mind Map</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/17/navigating-open-oxford-the-new-openspires-mind-map-2/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/17/navigating-open-oxford-the-new-openspires-mind-map-2/#comments</comments>
		<pubDate>Mon, 17 Sep 2012 13:50:00 +0000</pubDate>
		<dc:creator>Lisa Mansell</dc:creator>
				<category><![CDATA[Oxford]]></category>
		<category><![CDATA[ukoer]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1206</guid>
		<description><![CDATA[Are you interested in seeing the bigger picture of Open Oxford? Try the new interactive OpenSpires Mind Map, freely available online. This new map is designed as a gateway into Open Educational practice at the University of Oxford. Here, you can &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/17/navigating-open-oxford-the-new-openspires-mind-map-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Are you interested in seeing the bigger picture of Open Oxford? Try the new interactive <a title="Go to OpenSpires Mind Map" href="http://www.mindmeister.com/195576975/openspires">OpenSpires Mind Map</a>, freely available online. This new map is designed as a gateway into Open Educational practice at the University of Oxford. Here, you can explore the story and achievements of OpenSpires, read how the openness initiative can benefit academic practice and find ways to get involved at the University of Oxford.</p>
<p>As a part of the OER revolution OpenSpires has now overseen a number of major OER projects at the University of Oxford, and is still growing. This new interactive map showcases all the diverse projects under the OpenSpires umbrella since it was established in 2009. It is a useful starting point for beginners, including Key Definitions, and How To, as well as answering some FAQs. It also goes deeper, offering information about the strategies behind OpenSpires projects like <a href="http://openspires.oucs.ox.ac.uk/ripple/index.html">Ripple</a>, <a href="http://openspires.oucs.ox.ac.uk/triton/index.html">Triton</a> and <a href="http://writersinspire.org/">Great Writers Inspire</a>. It is hoped that this map will be a multi-faceted tool to help explain and celebrate various aspects of OpenSpires.</p>
<p>For more information explore the <a title="OpenSpires Mind Map" href="https://www.mindmeister.com/195576975/openspires">Mind Map</a> or the <a title="OpenSpires" href="http://openspires.oucs.ox.ac.uk/">OpenSpires</a> homepage, or read the LTG Case Studies <a href="http://blogs.it.ox.ac.uk/ltg-casestudies/2012/08/23/navigating-open-oxford-openspires-mind-map/">blogpost</a>.</p>
<p>The OpenSpires Mind Map was created by Alexandra Paddock as part of a summer internship at IT Services.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/17/navigating-open-oxford-the-new-openspires-mind-map-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Great Writers &#8211; taking stock</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/14/great-writers-taking-stock/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/14/great-writers-taking-stock/#comments</comments>
		<pubDate>Fri, 14 Sep 2012 09:33:17 +0000</pubDate>
		<dc:creator>Lisa Mansell</dc:creator>
				<category><![CDATA[Content]]></category>
		<category><![CDATA[dissemination]]></category>
		<category><![CDATA[Great Writers]]></category>
		<category><![CDATA[ukoer]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1196</guid>
		<description><![CDATA[With a fast-paced 1 -year project it is easy to forget some of the interesting bits along the way. As we write our final report we have taken the opportunity to reflect on all aspects of the project and this &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/14/great-writers-taking-stock/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>With a fast-paced 1 -year project it is easy to forget some of the interesting bits along the way. As we write our final report we have taken the opportunity to reflect on all aspects of the project and this has been made easier by the excellent blogging of our student team and our academic supporters. The final report will be available in mid-October but until then here are some mini-reports and reflective posts which give a taste of our outputs and findings.</p>
<p>Ebooks</p>
<p><a href="http://writersinspire.wordpress.com/2012/05/10/the-ipad-in-the-library/">http://writersinspire.wordpress.com/2012/05/10/the-ipad-in-the-library/</a>, <a href="http://writersinspire.wordpress.com/2012/04/19/engage-event-ebooks-ereaders-elearning/">http://writersinspire.wordpress.com/2012/04/19/engage-event-ebooks-ereaders-elearning/</a>,</p>
<p>Teaching case study (video)</p>
<p><a href="http://writersinspire.org/content/teaching-shakespeare-schools">http://writersinspire.org/content/teaching-shakespeare-schools</a></p>
<p>Engagement with schools</p>
<p><a href="http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-teachers-comments/">http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-teachers-comments/</a>, <a href="http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-oxford/">http://writersinspire.wordpress.com/2012/07/17/schools-engagement-at-cheney-oxford/</a></p>
<p>Engagement with the wider community</p>
<p><a href="http://writersinspire.wordpress.com/category/events/engage-events/">http://writersinspire.wordpress.com/category/events/engage-events/</a></p>
<p>How to inspire students</p>
<p><a href="http://writersinspire.wordpress.com/2012/04/20/engage-and-inspire/">http://writersinspire.wordpress.com/2012/04/20/engage-and-inspire/</a></p>
<p>Copyright/CC</p>
<p><a href="http://writersinspire.wordpress.com/2012/04/19/engage-event-copyright-and-licencing/">http://writersinspire.wordpress.com/2012/04/19/engage-event-copyright-and-licencing/</a> , <a href="http://writersinspire.wordpress.com/2012/04/17/copyright/">http://writersinspire.wordpress.com/2012/04/17/copyright/</a>, <a href="http://writersinspire.wordpress.com/2012/03/28/releasing-and-reusing-creative-commons-material/">http://writersinspire.wordpress.com/2012/03/28/releasing-and-reusing-creative-commons-material/</a>, <a href="http://writersinspire.wordpress.com/2012/02/09/who-owns-scholarship/">http://writersinspire.wordpress.com/2012/02/09/who-owns-scholarship/</a>, <a href="http://writersinspire.wordpress.com/2012/01/25/creative-commons/">http://writersinspire.wordpress.com/2012/01/25/creative-commons/</a></p>
<p>Digital literacy</p>
<p><a href="http://writersinspire.wordpress.com/2012/08/17/down-the-rabbit-hole-discovering-open-educational-resources/">http://writersinspire.wordpress.com/2012/08/17/down-the-rabbit-hole-discovering-open-educational-resources/</a>, <a href="http://writersinspire.wordpress.com/2012/05/01/the-satisfaction-of-a-reliable-and-interesting-source/">http://writersinspire.wordpress.com/2012/05/01/the-satisfaction-of-a-reliable-and-interesting-source/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/14/great-writers-taking-stock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPINDLE Automatic Keyword Generation: Step by Step</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/12/spindle-automatic-keyword-generation-step-by-step/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/12/spindle-automatic-keyword-generation-step-by-step/#comments</comments>
		<pubDate>Wed, 12 Sep 2012 14:05:57 +0000</pubDate>
		<dc:creator>Sergio Grau Puerto</dc:creator>
				<category><![CDATA[oerri]]></category>
		<category><![CDATA[Spindle]]></category>
		<category><![CDATA[ukoer]]></category>
		<category><![CDATA[SPINDLE]]></category>
		<category><![CDATA[UKOER]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1014</guid>
		<description><![CDATA[In this post we are going to show the automatic generation of keywords from the automatic transcription of a podcast. First of all, please find below a figure showing the main workflow of the SPINDLE project. From our podcasts, we &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/12/spindle-automatic-keyword-generation-step-by-step/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In this post we are going to show the automatic generation of keywords from the automatic transcription of a podcast. First of all, please find below a figure showing the main workflow of the SPINDLE project.</p>
<p style="text-align: center"><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/09/example_keywords.jpg.png"><img class="aligncenter  wp-image-1176" src="http://blogs.it.ox.ac.uk/openspires/files/2012/09/example_keywords.jpg-797x1024.png" alt="" width="512" height="658" /></a></p>
<p>From our podcasts, we obtain an automatic transcription by using CMU Sphinx or the Speech Analysis Tool from Adobe Premiere Pro. Alternatively, a podcast could be transcribed by our media team or by using an external transcription service.</p>
<p>Once we have a transcription, how can we obtain the most relevant words? Using the <a href="http://ucrel.lancs.ac.uk/llwizard.html">Log-likelihood</a> method. This method compares the frequency of a word in the transcription with the frequency of the same word in a large corpus. For example, the word &#8220;banks&#8221; occurs 17 times in the automatic transcription of this podcast, <a href="http://podcasts.ox.ac.uk/global-recession-how-did-it-happen-audio">Global Recession: How Did it Happen?</a>  and 201 in a large corpus. Why the word &#8220;banks&#8221; is relevant?</p>
<h2>Collecting word frequencies from a large corpus</h2>
<p>First of all we need a reference corpus to which we can compare our automatic transcriptions. This corpus should be large enough to contain most words and general enough to be representative of the language. We chose for our experiments the spoken part of the <a title="British National Corpus" href="http://www.natcorp.ox.ac.uk/">British National Corpus </a>(BNC) as our reference corpus.</p>
<p>The characteristics of the spoken part of the BNC corpus can be found below:</p>
<ul>
<li>589,347 sentences</li>
<li>11,606,059 words</li>
</ul>
<p>So, now we know we have more than 11 million words in our reference corpus. So, taking into account that the word &#8220;banks&#8221; occurs 201 times out of 11.6 million words and 17 times out of 5439 times in our transcription,  how do we calculate the relevance of the word &#8220;banks&#8221;?</p>
<h3>Step 1</h3>
<ol>
<li>Use Natural Language Processing techniques to normalise the corpus (remove punctuation and stopwords)</li>
<li>Calculate for each word in the British National Corpus how many times does that word occur in the corpus (a)</li>
<li>Calculate the total number of words in the corpus (c)</li>
</ol>
<p>The final file is composed of 56,029 words and the number of occurrences of each word. An extract of that file can be found below:</p>
<ul>
<li>banks: 201</li>
<li>crisis: 195</li>
<li>companies: 758</li>
<li>&#8230;.</li>
</ul>
<h2>Generating relevant keywords and bigrams</h2>
<h3>Step 2</h3>
<ol>
<li>Use Natural Language Processing techniques to normalise the transcription (remove punctuation if necessary and stopwords)</li>
<li>Calculate for each word in the transcription how many times does that word occur in the transcription (b)</li>
<li>Calculate the total number of words in the transcription (d)</li>
</ol>
<h3>Step 3</h3>
<ol>
<li>Calculate the Log-likelihood, G2, of each individual word<a href="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq1.gif"><img class="aligncenter size-full wp-image-1159" src="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq1.gif" alt="" width="125" height="44" /></a><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq2.gif"><img class="aligncenter size-full wp-image-1160" src="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq2.gif" alt="" width="127" height="44" /></a><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq3.gif"><img class="aligncenter size-full wp-image-1161" src="http://blogs.it.ox.ac.uk/openspires/files/2012/09/lleq3.gif" alt="" width="269" height="41" /></a></li>
<li>Sort the words by Log-likelihood value (the higher the better)</li>
</ol>
<h3>Step 4</h3>
<ol>
<li>Calculate frequent bigrams counting the number of occurrences</li>
</ol>
<h2>Example of Automatic Keywords Generation</h2>
<p>We used the <a href="https://github.com/ox-it/spindle-code/tree/master/keywords">keyword generation tool</a> to generate the relevant keywords and bigrams of the automatic transcription of the podcast <a href="http://podcasts.ox.ac.uk/global-recession-how-did-it-happen-audio">Global Recession: How Did it Happen?</a> (Correct Words = 32.9%). We selected a bad automatic transcription to show that even with a low number of correct words we can extract some relevant keywords and bigrams automatically.</p>
<h3>Keywords Generated (word: Log-likelihood)</h3>
<p>banks : 141.12175627<br />
crisis : 73.3976004078<br />
companies : 67.8498685789<br />
assets : 61.8910800051<br />
haiti : 47.7956942776<br />
interest : 41.3390170289<br />
credit : 39.6149918395<br />
crunch : 35.9334074944<br />
senate : 32.4501608202<br />
profited : 30.625124757<br />
sitcom : 30.625124757<br />
ansa : 30.625124757<br />
nineteen : 29.0864140753<br />
economy : 28.6440250819<br />
nineties : 27.5138518651<br />
haitian : 26.8069860979<br />
sanctioning : 26.8069860979<br />
center : 26.8069860979<br />
regulate : 25.4923775621<br />
hashing : 25.0818400138<br />
haitians : 25.0818400138<br />
stimulus : 24.5089608603<br />
united : 24.1102094531<br />
successful : 21.8091735308<br />
financial : 21.7481087661<br />
key : 21.6791751296<br />
caught : 21.1648006228<br />
eases : 21.0970376283<br />
bankruptcy : 21.0970376283<br />
rates : 21.0105869453<br />
kind : 20.8040324729<br />
cited : 20.6246470912<br />
backs : 19.9877139071<br />
borrowing : 19.9877139071<br />
crimes : 19.5817617075<br />
countries : 19.5490491082<br />
essentially : 19.334521352<br />
fiscal : 19.1532240523</p>
<h3>Collocations Generated (collocation: #occurences)</h3>
<p>[interest rates] : 5<br />
[financial crisis] : 4<br />
[wall street] : 3<br />
[nineteen nineties] : 3<br />
[credit crunch] : 3<br />
[british government] : 3</p>
<h3>Word Cloud (using <a href="http://www.wordle.net/">Wordle</a>)</h3>
<h2><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/09/keywords_script_example_snapshot.jpg"><img class="aligncenter size-large wp-image-1173" src="http://blogs.it.ox.ac.uk/openspires/files/2012/09/keywords_script_example_snapshot-1024x457.jpg" alt="" width="640" height="285" /></a><br />
Conclusion</h2>
<p>We should note that we are generating keywords from automatic transcriptions and not from human transcriptions. Therefore, we obtain along relevant keywords and bigrams some keywords and bigrams that are not that relevant or directly, out of topic. However, through the SPINDLE project we have generated automatically thousands of relevant keywords and bigrams for our collection of podcasts that are going to increase in the near future the discoverability and accessibility of our podcast collection.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/12/spindle-automatic-keyword-generation-step-by-step/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>SPINDLE Frequently Asked Questions</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-frequently-asked-questions/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-frequently-asked-questions/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 15:52:50 +0000</pubDate>
		<dc:creator>Sergio Grau Puerto</dc:creator>
				<category><![CDATA[oerri]]></category>
		<category><![CDATA[Spindle]]></category>
		<category><![CDATA[ukoer]]></category>
		<category><![CDATA[faq]]></category>
		<category><![CDATA[SPINDLE]]></category>
		<category><![CDATA[UKOER]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1085</guid>
		<description><![CDATA[What is Spindle? SPINDLE has been a project funded by JISC as part of their &#8220;Rapid Innovation in Open Educational Resources&#8221; programme. The project experimented with speech-to-text technologies to automatically create transcripts of Open Educational Resources (OER), and develop new &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-frequently-asked-questions/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>What is Spindle?</h2>
<p>SPINDLE has been a project funded by JISC as part of their &#8220;<a href="http://www.jisc.ac.uk/whatwedo/programmes/ukoer3/rapidinnovation.aspx">Rapid Innovation in Open Educational Resources</a>&#8221; programme. The project experimented with speech-to-text technologies to automatically create transcripts of Open Educational Resources (OER), and develop new tools to generate better <strong>keywords</strong> to help with the indexing and description of OER.</p>
<h2>How do I transcribe automatically from speech to text?</h2>
<p>We investigated three options for automatic transcription of podcasts:</p>
<ul>
<li><a href="http://cmusphinx.sourceforge.net/wiki/download/">CMU Sphinx</a></li>
<li><a href="http://www.adobe.com/uk/products/premiere.html">Adobe Premiere Pro</a> CS5 and CS6 Speech Analysis Tool</li>
<li>Commercial services (<a href="www.koemei.com/">Koemei</a>, etc)</li>
</ul>
<p>Adobe Premiere Pro is excellent for video editing, but not for transcribing  thousands of podcasts automatically. If you require the automatic transcription of one or more audio/video podcasts, then the Speech Analysis tool of Adobe Premiere Pro can be helpful, but cannot be used for batch processing of transcriptions of audio or video. On the other hand, CMU Sphinx allowed us to run the batch transcriptions of thousands of podcasts efficiently.</p>
<h2>How accurate is automatic Speech to Text?</h2>
<p>It depends. The key factor in our experience is the quality of the recording &#8211; a professional recording using a good tie-clip microphone gives the best results. A microphone far away from the speaker in a noisy room with echoes gives the worst results. It also depends of course on the clarity and accent of the speaker. In the very best situation we have had results where 6 out of every 10 words are automatically transcribed. In this case the gist of the lecture is obvious. This can drop to much lower results of say 3 out 10 words with poor quality recordings. In this case the results are probably too confused to read as normal English and are too poor to generate a good range of keywords.  It&#8217;s important to realise that all automatic transcripts will need significant editing and checking, particularly to insert correct punctuation in order to make them readable for human users.</p>
<h2>How do I generate keywords automatically from transcriptions?</h2>
<p>We used two methods:</p>
<ul>
<li><a href="http://www.antlab.sci.waseda.ac.jp/software.html">Antconc</a></li>
<li>Keyword Generation Tool (<a href="https://github.com/ox-it/spindle-code/">source code</a>)</li>
</ul>
<p>Antconc is a desktop application (which works on Windows, Mac and Linux), and generating keywords involves starting the programme, loading the text and the reference word-list, and manually running the function to generate the keywords. The user has the opportunity to adjust various parameters, and change the reference corpus, so we found this useful when we were investigating the best ways to generate relevant keywords. But the nature of this interactive application meant that it couldn&#8217;t be deployed in an automated workflow to generate keywords from multiple podcasts.</p>
<p>So, instead, we wrote a script to generate the keywords, which could be inserted into our automated workflow, and could be invoked and run programmatically without human intervention.</p>
<h2>How does the algorithm for keyword filtering work ?</h2>
<p>We compared the words in the automatic transcription with the speech transcribed in a large corpus of English called the <a href="http://www.natcorp.ox.ac.uk/">British National Corpus</a>. Words that are repeated much more often than in normal speech are likely to be keywords.</p>
<h2>Where can I download the code generated during the SPINDLE project?</h2>
<p>The code is available from <a href="https://github.com/ox-it/spindle-code/">https://github.com/ox-it/spindle-code/</a>.</p>
<h2>How can I align audio and transcription automatically?</h2>
<p>We used the <a title="Penn Phonetics Lab Forced Aligner" href="http://www.ling.upenn.edu/phonetics/p2fa/">Penn Phonetics Lab Forced Aligner</a> (P2FA), an application which has emerged from academic research in phonetics. The staff at the Phonetics Laboratory at the University of Oxford had identified this in <a title="Mining a Year of Speech" href="http://www.phon.ox.ac.uk/mining">an earlier JISC-funded research project</a> as the state of the art for the automatic alignment of everyday contemporary English speech, and had gained expertise in using it. P2FA is free to download and use, and doesn&#8217;t have any licence conditions attached to it.</p>
<p>P2FA is a python script which interfaces with the Hidden Markov Model Toolkit (HTK) aligner, and with a set of good quality acoustic models. It is necessary  to install HTK , and use it according to the HTK End User Licence Agreement, which is not restrictive in terms of how the software is used. HTK is usually available from <a title="HTK" href="http://htk.eng.cam.ac.uk" target="_blank">http://htk.eng.cam.ac.uk</a>, but not accessible 12-09-2012.</p>
<h2>What formats did you use for caption work ?</h2>
<p>We used WebVTT &#8211; this is a simple to understand HTML5 web format for presenting groups of words over a video.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-frequently-asked-questions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPINDLE Project Outputs</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-outputs/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-outputs/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 14:52:37 +0000</pubDate>
		<dc:creator>Sergio Grau Puerto</dc:creator>
				<category><![CDATA[oerri]]></category>
		<category><![CDATA[Spindle]]></category>
		<category><![CDATA[ukoer]]></category>
		<category><![CDATA[outputslist]]></category>
		<category><![CDATA[SPINDLE]]></category>
		<category><![CDATA[UKOER]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1118</guid>
		<description><![CDATA[SPINDLE set up and documented a workflow to generate the automatic transcription of future open access audio and video podcasts using an online platform concentrating on generating automatic keyword extraction for better cataloguing. SPINDLE tested and documented this workflow by: &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-outputs/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<ul>
<li>SPINDLE set up and documented a workflow to generate the automatic transcription of future open access audio and video podcasts using an online platform concentrating on generating automatic keyword extraction for better cataloguing.<a href="http://blogs.it.ox.ac.uk/openspires/files/2012/08/exampleSpindle2.jpg"><img class="aligncenter  wp-image-999" src="http://blogs.it.ox.ac.uk/openspires/files/2012/08/exampleSpindle2.jpg" alt="" width="432" height="324" /></a></li>
</ul>
<ul>
<li>SPINDLE tested and documented this workflow by:</li>
<ul>
<li>developing a method to generate keywords and relevant word pairs automatically</li>
<li>generating in a batch process automatic speech-to-text keywords and timecoded transcripts from a database of over 3,400 podcasts</li>
<li>documenting the problems of accuracy in automatic transcriptions by testing and reporting the results of using two commonly used speech to text tools and services against baseline hand-transcribed transcripts</li>
<li>investigating the use of Automatic Speech-to-Phoneme alignments for our existing manual transcriptions that did not already include time-code information</li>
</ul>
</ul>
<ul>
<li>SPINDLE also successfully designed and documented a filtering program for automatically extracting  keywords and relevant word pairs from uncorrected time-coded transcripts by selecting non-common words.</li>
</ul>
<ul>
<li>SPINDLE extended the functionality of the keyword extraction tool by creating an online web application to manage the transcription of online media podcasts. The main functionality of this online platform is:</li>
</ul>
<blockquote>
<ul>
<li>Caption editor:</li>
<ul>
<li>to edit time-coded transcripts whilst reviewing against the original online media file</li>
<li>to allow registered users to transcribe in parallel, with support for crowd-sourcing corrections</li>
<li>import into the Caption editor time-coded transcriptions in XMP, srt or CMU Sphinx formats</li>
<li>to edit transcriptions to provide corrections, punctuation, caption length chunking, speaker labels, etc.</li>
</ul>
<li>Batch converter:</li>
<ul>
<li>Create automatic transcriptions from an online media file using a CMU Sphinx installation</li>
<li>Create batches of media for automatic transcription</li>
<li>Create a list of automatic keywords with relevance statistics</li>
</ul>
<li>Export Tool</li>
<ul>
<li>Support for media metadata and Open Educational Resource (OER) licences</li>
<li>Support for exporting time-coded transcriptions in multiple formats:</li>
<ul>
<li>human readable:  plain text and HTML</li>
<li>HTML5 compatible captions: online media caption format (webVTT)</li>
<li>XML format suitable for archiving and preservation</li>
</ul>
</ul>
<li>Data feed in RSS format to facilitate online visibility</li>
</ul>
<p>All SPINDLE code is available from the open repository <a href="https://github.com/ox-it/spindle-code/">https://github.com/ox-it/spindle-code/</a></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-outputs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SPINDLE project: Lessons Learnt</title>
		<link>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-lessons-learned/</link>
		<comments>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-lessons-learned/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 12:19:40 +0000</pubDate>
		<dc:creator>Sergio Grau Puerto</dc:creator>
				<category><![CDATA[oerri]]></category>
		<category><![CDATA[podcasting]]></category>
		<category><![CDATA[Spindle]]></category>
		<category><![CDATA[ukoer]]></category>
		<category><![CDATA[lessonslearnt]]></category>
		<category><![CDATA[SPINDLE]]></category>
		<category><![CDATA[UKOER]]></category>

		<guid isPermaLink="false">http://blogs.oucs.ox.ac.uk/openspires/?p=1082</guid>
		<description><![CDATA[The SPINDLE project is wrapping up and will end in September 2012. Please find below some of the lessons learnt during the project. We can obtain good keywords even if the automatic transcription has got lots of errors. You do &#8230; <a href="http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-lessons-learned/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://openspires.oucs.ox.ac.uk/spindle/">SPINDLE</a> project is wrapping up and will end in September 2012. Please find below some of the lessons learnt during the project.</p>
<ul>
<li>We can <a href="http://blogs.it.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/">obtain good keywords</a> even if the automatic transcription has got lots of errors.</li>
<li>You do not need perfect automatic transcription to implement word search for your Open Educational Resources.</li>
<li>The importance of timecoded transcriptions to create captions, chapters or marks for your Open Educational Resources. <a href="http://blogs.it.ox.ac.uk/openspires/2012/05/30/automatic-speech-to-text-alignment-for-audio-indexing/">Automatic Speech-to-Text alignment </a>can help you if you already have a manual transcription.</li>
</ul>
<p style="text-align: center"><a href="http://blogs.it.ox.ac.uk/openspires/files/2012/05/paaatcapture3globalisation2.jpg"><img class="wp-image-746" src="http://blogs.it.ox.ac.uk/openspires/files/2012/05/paaatcapture3globalisation2.jpg" alt="" width="616" height="377" /></a></p>
<ul>
<li>Adobe Premiere Pro is excellent for video editing, but not for automatically transcribing thousands of podcasts. If you need the automatic transcription of one or more audio or video podcasts, then the Speech Analysis tool of Adobe Premiere Pro can be helpful,  but not for batch processing. In contrast, CMU Sphinx allowed us to run the batch transcriptions of thousands of podcasts efficiently.</li>
<li>The Pareto principle (or 80/20 rule) applies to the automatic keyword generation from automatic transcriptions. We will need to dedicate 80% extra time to generate automatic keywords accurately for 20% of our podcasts (difficult recording conditions, long distance microphones, multiple speakers, specialised vocabulary, multiple accents, etc). We were able to generate accurately keywords for a majority of our podcasts without having to deal with those issues. The podcasts that are difficult to transcribe automatically could be transcribed manually in the future or wait for further funding.</li>
<li>The use of a <a href="http://blogs.it.ox.ac.uk/openspires/2012/08/23/project-spindle-update-condor-cluster/">High Throughput Computing</a> cluster (Condor) was extremely beneficial for the project. We could submit all the transcription jobs to the cluster and get the results in a timely manner. Usually there were up to 60 transcription jobs running in parallel in the cluster.</li>
<li>The combination of skills of the project members was an important factor to the success of this short project. We had a diversity of skills in our team, from open educational resources to natural language processing, automatic speech recognition and web development.</li>
<li>The variety of <a href="http://blogs.it.ox.ac.uk/openspires/2012/08/03/pdf-xml-textgrid-xmp-txt-and-then/">representation of timecoded transcripts</a> was also a subject of discussion during the project. Finally, we decided to have a TEI/XML representation of the automatic/manual transcription including the time information and the automatic keywords. On the other hand, a transcription can be exported into a variety of formats (text, HTML, srt, webVTT, XML) in the developed online caption editor platform.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blogs.it.ox.ac.uk/openspires/2012/09/11/spindle-project-lessons-learned/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
