To assess impact as measured quantitively through technical processes such as log file analysis, web analytics and referrer analysis, it helps to understand the podcast distribution process and what data can be collected during it.
Podcasting in its strictest sense is the automated distribution of media (audio, video, pdf, etc) files to a “subscriber”. That subscription is setup once by a visitor initially viewing information about the podcast collection (“podcast feed”) and using a piece of software to subscribe to the RSS URL advertised. RSS is an XML based format that stands for Really Simple Syndication. I’ll walk through the subscription model further down. However, podcasting as a term has been corrupted through time and misuse to be understood simply as being able to download media files from the internet. This broader definition somewhat muddies the scope of monitoring impact, because now we have to account for a much wider range of use cases and the likely justifications/motivations for those use cases.
For the purpose of understanding monitoring techniques we can boil the process down to two approaches – Downloading a single podcast, and being Subscribed to a podcast feed.
The above diagram (fig 1) attempts to illustrate the steps that occur in downloading a podcast from Oxford . There are two scenarios covered above:
- Locating and downloading a file via the Oxford Web Portal for Podcasts in a web browser (shown as the upper image on the visitor’s computer)
- Locating and downloading a file via the iTunes U store using the iTunes application (shown as the lower image on the visitor’s computer)
In scenario 1, a visitor uses their computer to open a web browser and visits http://podcasts.ox.ac.uk/ (shown as the upper of the two green/initial request arrows). This results in a webpage listing our podcasts being returned from that web server. The visitor then browses the page and clicks on a download link for a podcast of interest. That action sends a request directly to the web server hosting that podcast. That web server then sends the media file back to the user’s computer (shown as the blue/media download arrow). What the visitor’s computer then does with that file is then entirely down to how the visitor’s computer is configured and is a complete mystery to the systems serving out the podcasts.
In scenario 2, we have a subtly different approach. Whilst on the surface the iTunes application is little more than a glorified web browser, the setup is not exactly the same as in scenario 1. Here the visitor opens their iTunes application and browses to the Oxford on iTunes U page (or to any part of the iTunes site that shows Oxford content) – this is the initial request shown as the lower of the two green arrows. Once the visitor has made a selection, they click to “get” the podcast and the software then contacts the Apple controlled iTunes Web Servers again. They register the action and then send a redirection message back pointing the iTunes application at the relevant Oxford web server. This redirection happens behind the scenes to the visitor and is shown as the second request/red dashed arrow. The final action is very similar in that the Oxford web server sends the media file back to the visitor’s computer, but this is then handled by the iTunes application, which typically puts the file into it’s library and facilitates playback within the application. In some respects, this scenario is a much simpler one for the visitor, however the merits or otherwise are a little beyond our scope again.
In both of these scenarios there are a few overlaps:
- Both Oxford on iTunes U and Oxford’s Web Portal for Podcasts communicate periodically with the Oxford RSS Server (OxItems) to update their internal catalogues, which are what they then present to the visitor.
- Who the visitor is, what they do with the media they have downloaded, and a range of other visitor specific questions are beyond the scope of what can be addresses with the data logged from the above processes. This is highlighted with the broken arrow between the Visitor and the Visitor’s Computer. To answer these sorts of questions we need to do various forms of qualitative analysis.
- Understanding that what can be monitored during this process is merely the actions and fingerprints of the technologies involved, and not necessarily by implication the people, is critical to understanding the limitations of this form of quantative analysis. We’ll talk more about this in the post on “Sifting signal from noise“.
Our second approach involves understanding the technical differences involved when a listener is subscribed to a podcast feed.
In this second diagram the entities are all the same, but the direction and timing of the steps is a little different. Again there are two scenarios here, but this time the there is more of an overlap in approaches. These scenarios are:
- Using a generic RSS enabled application to subscribe to a podcast feed (this also includes using the iTunes application)
- Using the iTunes application to subscribe to a podcast, but to subscribe via a link on the iTunes U store.
In scenario 1, our visitor/listener has visited a website (such as the Oxford Web Portal for Podcasts) and made note of the RSS URL and given that to their RSS-aware Media Application. That application will read the RSS feed from the RSS Server to locate all the related podcasts described by that feed. Typically it will offer to download the last X number of podcasts in the feed (typically based on their publish date). The key now is that that application will periodically reread the RSS feed from the RSS Server and determine if any further episodes/podcasts are available, and if so, typically it will download those without further intervention. The visitor is then free to listen/view these podcasts anytime, anywhere and anyhow.
Scenario 2, using the iTunes U store via the iTunes application, is simpler from the user’s perspective, but slightly more complicated from the technical perspective. As in scenario 1, the visitor locates a podcast feed they like, and can click on the “subscribe” button within the interface. This then instructs the iTunes software to remember the RSS feed url and to check it on a periodic basis. However, as far as we are aware, this means the iTunes application is checking against the iTunes U store, not against the original RSS Server. The iTunes U store will periodically check the RSS Server for any updates (and can be manually made to do so by our podcast administration team), but this is done independently to any specific visitor’s actions.
Common features to look for in a subscription use case are:
- Periodic checks to the RSS Server from each visitor’s computer.
- Initial feed downloads, likely signified by a number of related podcasts being downloaded in a smallish timeframe by the same computer.
Of interest to us are finding data that supports the theory that podcasts do better when in a feed of related items, because the success of one item can easily lead to the rest being downloaded thereafter.
I hope this guide has helped you to connect the dots between the various entities being observed by our LfI project. If you have any queries, please do leave a comment below. For those of you more comfortable with technical details, we will explore the details of the data trail left by these processes in “Sifting signal from noise“.
- Some of you may be wondering why our podcast files are not stored on the same server as our podcast portal(s). Whilst many smaller sites will use the same webserver for both serving html pages, documents and similar as well as their audio and video files, this is a little inefficient in terms of setup and a little more complicated to scale and grow. It is pragmatically more accurate to note that in Oxford this is largely a result of the decision to devolve out hosting of Oxford podcasting content early on in the development of the podcasting service rather than any technical efficiency. Hindsight and further development of the service is now allowing us to reconsider this model, but that is beyond the scope of this project.It is worth highlighting this separation of media content from the portals that advertise it because this has an impact on the data we can collect, that it stored in separate silos, and that when we do have access to multiple silos, extra effort and time is required to be able to join up related data points to give a more complete picture. More on this in the post on “Fishing with a broken net“.