The SPINDLE project is wrapping up and will end in September 2012. Please find below some of the lessons learnt during the project.
- We can obtain good keywords even if the automatic transcription has got lots of errors.
- You do not need perfect automatic transcription to implement word search for your Open Educational Resources.
- The importance of timecoded transcriptions to create captions, chapters or marks for your Open Educational Resources. Automatic Speech-to-Text alignment can help you if you already have a manual transcription.
- Adobe Premiere Pro is excellent for video editing, but not for automatically transcribing thousands of podcasts. If you need the automatic transcription of one or more audio or video podcasts, then the Speech Analysis tool of Adobe Premiere Pro can be helpful, but not for batch processing. In contrast, CMU Sphinx allowed us to run the batch transcriptions of thousands of podcasts efficiently.
- The Pareto principle (or 80/20 rule) applies to the automatic keyword generation from automatic transcriptions. We will need to dedicate 80% extra time to generate automatic keywords accurately for 20% of our podcasts (difficult recording conditions, long distance microphones, multiple speakers, specialised vocabulary, multiple accents, etc). We were able to generate accurately keywords for a majority of our podcasts without having to deal with those issues. The podcasts that are difficult to transcribe automatically could be transcribed manually in the future or wait for further funding.
- The use of a High Throughput Computing cluster (Condor) was extremely beneficial for the project. We could submit all the transcription jobs to the cluster and get the results in a timely manner. Usually there were up to 60 transcription jobs running in parallel in the cluster.
- The combination of skills of the project members was an important factor to the success of this short project. We had a diversity of skills in our team, from open educational resources to natural language processing, automatic speech recognition and web development.
- The variety of representation of timecoded transcripts was also a subject of discussion during the project. Finally, we decided to have a TEI/XML representation of the automatic/manual transcription including the time information and the automatic keywords. On the other hand, a transcription can be exported into a variety of formats (text, HTML, srt, webVTT, XML) in the developed online caption editor platform.