Automatic Keyword Generation: Human Transcriptions vs Automatic Transcriptions

As part of the SPINDLE project we are producing a set of automatic transcriptions for the university podcasts. We will then use these automatic transcriptions to produce automatically a set of keywords to increase the OER discoverability of our podcasts. For a small subset of the podcasts we already have human transcriptions. We will use these human transcriptions to compare with the automatic transcriptions both at transcription level and at keyword level.

Please find below a snapshot of the human transcription for the podcast Globalisation and the effect on economies:We could use this transcription to generate a list of keywords automatically. However, if there are no human transcriptions available (remember that they are expensive to produce) or in our case to compare with human transcriptions, we create an automatic transcription of a podcast using Large Vocabulary Continuous Speech Recognition Software.

Using Adobe Premiere Pro  we obtain an automatic transcription with a Word Accuracy (WA) rate of 66.08% (WA = 66.08%). Please find below the automatic transcription for the two first paragraphs using Premiere Pro.

Using Sphinx-4 we get an automatic transcription with a Word Accuracy rate of 46.65% (WA=46.56% that could be improved using acoustic adaptation or extending the language model, work in progress!). Please find below the automatic transcription for the two first paragraphs using Sphinx-4.

We would like then to compare the keywords generated by these transcriptions so we plotted the keywords obtained using the Log-likelihood measure of each individual word  (as explained in this post) using Wordle (the larger the Log-likelihood the bigger the word in the plot).

Human Transcription (WA = 100%)

Automatic Transcription: Adobe Premiere Pro (WA = 66.08%)

Automatic Transcription: CMU Sphinx (WA = 46.56%)

We can see that words like borders, political, transnational, decisions, transactions, finance, crisis and many others are relevant in the three word clouds. Globalisation unfortunately was not recognised by the Sphinx-4 LVCSR software. We can see that there is a way of obtaining automatically keywords using automatic tools that are similar to the keywords that would be obtained using a human-generated transcription. However, we would like to measure this similarity. We will report back about how to quantify this similarity in following posts.

In the next blog post we would like to discuss how we will store the information generated by the automatic transcription tool and the keyword generation tool. Should we use TEI/XML or should we use a captioning formats such as TTML or WebVTT? Stay tuned!

Posted in oerri, Spindle, ukoer | Tagged , , | Leave a comment

Leave a Reply