User Agent Analysis – Part 2: Name those agents

In writing a system to parse the myriad of User Agent strings that appear in our podcast hosting logs, I have come across a number of interesting (on a very geeky level) observations. Short background is that I have sampled three random log files (one from each year of operation), manually extracted the user-agent strings, collated them, and then begun writing a parser to handle this sample of data. I’m writing in python, and unfortunately there is not an existing system that really works for our use-case. The biggest source of help so far has been http://user-agent-string.info/ (honourable mention to http://www.useragentstring.com/). UAS-info has an extensive database of systems, browsers, oses and a working library of regular expressions to parse for these, with an implementation available in python.

However, their library is rather short of “Multimedia Player” references – and specifically for the iTunes application. Unfortunately, a very large proportion of our podcasts are downloaded with a user agent identifying as iTunes. I am in the process of modifying their library to include matching for these agent strings, but I still have a few strings that need some further analysis. The following is a quick look at my current (work in progress) challenges.

iOS devices and iTunes

The following is a list (not 100% complete) of user agent strings that appear to be iOS devices. The pattern is fairly obvious: Application-Device/ios_version (device_version?; memory_size)

  • iTunes-iPad-M/3.2 (16GB)
  • iTunes-iPad-M/3.2 (32GB)
  • iTunes-iPad-M/3.2 (64 GB)
  • iTunes-iPad-M/3.2 (64GB)
  • iTunes-iPad-M/3.2.1 (32GB)
  • iTunes-iPad-M/3.2.1 (64GB)
  • iTunes-iPad-M/3.2.2 (16GB)
  • iTunes-iPad-M/3.2.2 (32 GB)
  • iTunes-iPad-M/3.2.2 (32GB)
  • iTunes-iPad-M/3.2.2 (64 ??)
  • iTunes-iPad-M/3.2.2 (64 GB)
  • iTunes-iPad-M/3.2.2 (64GB)
  • iTunes-iPad-M/4.2.1 (16GB)
  • iTunes-iPad-M/4.2.1 (32GB)
  • iTunes-iPad-M/4.2.1 (64GB)
  • iTunes-iPad/3.2 (16GB)
  • iTunes-iPad/3.2 (64 ??)
  • iTunes-iPad/3.2 (64GB)
  • iTunes-iPad/3.2.1 (16GB)
  • iTunes-iPad/3.2.1 (32 ??)
  • iTunes-iPad/3.2.2 (16 GB)
  • iTunes-iPad/3.2.2 (16GB)
  • iTunes-iPad/3.2.2 (32GB)
  • iTunes-iPad/3.2.2 (64GB)
  • iTunes-iPad/4.2 (16GB)
  • iTunes-iPad/4.2.1 (16GB)
  • iTunes-iPad/4.2.1 (32GB)
  • iTunes-iPad/4.2.1 (64GB)
  • iTunes-iPad/4.3 (16GB)
  • iTunes-iPhone/3.0 (2)
  • iTunes-iPhone/3.0 (3)
  • iTunes-iPhone/3.0.1
  • iTunes-iPhone/3.1 (3)
  • iTunes-iPhone/3.1.2
  • iTunes-iPhone/3.1.2 (2)
  • iTunes-iPhone/3.1.2 (3)
  • iTunes-iPhone/3.1.3
  • iTunes-iPhone/3.1.3 (2)
  • iTunes-iPhone/3.1.3 (3)
  • iTunes-iPhone/4.0 (2; 8GB)
  • iTunes-iPhone/4.0 (3; 16GB)
  • iTunes-iPhone/4.0 (3; 8GB)
  • iTunes-iPhone/4.0 (4; 16GB)
  • iTunes-iPhone/4.0.1 (2; 16GB)
  • iTunes-iPhone/4.0.1 (2; 8GB)
  • iTunes-iPhone/4.0.1 (3; 16GB)
  • iTunes-iPhone/4.0.1 (3; 8GB)
  • iTunes-iPhone/4.0.1 (4; 16GB)
  • iTunes-iPhone/4.0.1 (4; 32GB)
  • iTunes-iPhone/4.0.2 (3; 16GB)
  • iTunes-iPhone/4.0.2 (3; 32GB)
  • iTunes-iPhone/4.0.2 (4; 16GB)
  • iTunes-iPhone/4.1 (2; 16GB)
  • iTunes-iPhone/4.1 (2; 8GB)
  • iTunes-iPhone/4.1 (3; 16GB)
  • iTunes-iPhone/4.1 (3; 32GB)
  • iTunes-iPhone/4.1 (3; 8GB)
  • iTunes-iPhone/4.1 (4; 16GB)
  • iTunes-iPhone/4.1 (4; 32GB)
  • iTunes-iPhone/4.2.1 (2; 16GB)
  • iTunes-iPhone/4.2.1 (2; 8GB)
  • iTunes-iPhone/4.2.1 (3; 16GB)
  • iTunes-iPhone/4.2.1 (3; 32GB)
  • iTunes-iPhone/4.2.1 (4; 16GB)
  • iTunes-iPhone/4.2.1 (4; 32GB)
  • iTunes-iPod/3.1.1 (2)
  • iTunes-iPod/3.1.2
  • iTunes-iPod/3.1.2 (2)
  • iTunes-iPod/3.1.2 (3)
  • iTunes-iPod/3.1.3 (2)
  • iTunes-iPod/3.1.3 (3)
  • iTunes-iPod/4.0 (2; 8GB)
  • iTunes-iPod/4.0 (3; 32GB)
  • iTunes-iPod/4.0.2 (3; 32GB)
  • iTunes-iPod/4.1 (2; 8GB)
  • iTunes-iPod/4.1 (3; 32GB)
  • iTunes-iPod/4.1 (4; 32GB)
  • iTunes-iPod/4.1 (4; 64GB)
  • iTunes-iPod/4.1 (4; 8GB)
  • iTunes-iPod/4.2.1 (2; 16GB)
  • iTunes-iPod/4.2.1 (2; 32GB)
  • iTunes-iPod/4.2.1 (2; 8GB)
  • iTunes-iPod/4.2.1 (3; 32GB)
  • iTunes-iPod/4.2.1 (4; 32GB)
  • iTunes-iPod/4.2.1 (4; 64GB)
  • iTunes-iPod/4.2.1 (4; 8GB)

There are a few oddities here though…

  1. What is the difference between iPad, and iPad-M? My current hypothesis is that the iPad-M refers to iPads with GSM (mobile phone) connectivity.
  2. Why do some identify with nn?? rather than nnGB. Again, my hypothesis is that these might be jailbroken devices?
  3. Is that a device version number in the brackets for the iPhones and iPods? For iOS 3.x systems this isn’t always present (inconsistent). Also, I was under the impression that iOS 4.x would not run on 2nd generation iPhones, and if so, what explains “iTunes-iPhone/4.2.1 (2; 8GB)” for example?

If anyone can answer these points, please do get in touch.

NSPlayer

There is another class of common User Agent strings that I have yet to identify. These look like:

  • NSPlayer/10.0.0.4072 WMFSDK/10.0
  • NSPlayer/12.00.7600.16385 WMFSDK/12.00.7600.16385
  • NSPlayer/4.1.0.3856
  • NSPlayer/12.00.7600.16597 WMFSDK/12.00.7600.16597
  • NSPlayer/9.0.0.4506 WMFSDK/9.0
  • NSPlayer/11.0.5721.5145 WMFSDK/11.0
  • NSPlayer/11.0.5721.5275 WMFSDK/11.0
  • NSPlayer/9.0.0.4504 WMFSDK/9.0
  • NSPlayer/9.0.0.3268 WMFSDK/9.0
  • NSPlayer/11.0.5721.5145 WMFSDK/11.0

My initial thought is that these are some form of Windows Media Framework based applications (WMFSDK = Software Development Kit?), and perhaps these represent users who are accessing podcasts directly via Windows Media Player? Again, under investigation, and help desired.

Apple Core Media

Similarly to the NSPlayer set, we have a large number of accesses from agents showing the following strings:

  • Apple Mac OS X v10.6.3 CoreMedia v1.0.0.10D2322a
  • Apple Mac OS X v10.6.5 CoreMedia v1.0.0.10H574
  • Apple Mac OS X v10.6.6 CoreMedia v1.0.0.10J567

These are the slightly odd ones. I have many examples of Quicktime user agent accesses (e.g. QuickTime/7.6.6 (qtver=7.6.6;cpu=IA32;os=Mac 10.6.6) ), which suggests that these are not from Quicktime player. CoreMedia is a library function within OS X, which suggests these may be some form of custom application… but beyond that (and the OS version), I can’t say much about these, but would like to know more.

This second (longer) list is a little clearer I feel.

  • Apple iPhone OS v2.2 CoreMedia v1.0.0.5G77a
  • Apple iPhone OS v3.1.2 CoreMedia v1.0.0.7D11
  • Apple iPhone OS v3.1.3 CoreMedia v1.0.0.7E18
  • AppleCoreMedia/1.0.0.7B367 (iPad; U; CPU OS 3_2 like Mac OS X)
  • AppleCoreMedia/1.0.0.7B405 (iPad; U; CPU OS 3_2_1 like Mac OS X)
  • AppleCoreMedia/1.0.0.7B500 (iPad; U; CPU OS 3_2_2 like Mac OS X)
  • AppleCoreMedia/1.0.0.8A293 (iPhone; U; CPU OS 4_0 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8A293 (iPod; U; CPU OS 4_0 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8A306 (iPhone; U; CPU OS 4_0_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8A306 (iPhone; U; CPU OS 4_0_1 like Mac OS X; zh_cn)
  • AppleCoreMedia/1.0.0.8A400 (iPhone; U; CPU OS 4_0_2 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8B117 (iPhone; U; CPU OS 4_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8B117 (iPhone; U; CPU OS 4_1 like Mac OS X; ru_ru)
  • AppleCoreMedia/1.0.0.8B117 (iPhone; U; CPU OS 4_1 like Mac OS X; zh_cn)
  • AppleCoreMedia/1.0.0.8B117 (iPod; U; CPU OS 4_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8B118 (iPod; U; CPU OS 4_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; de_de)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; en_gb)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; he_il)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; ja_jp)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; pt_br)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; ru_ru)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh_cn)
  • AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh_tw)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; de_de)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; en_gb)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; fr_fr)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; ko_kr)
  • AppleCoreMedia/1.0.0.8C148 (iPhone; U; CPU OS 4_2_1 like Mac OS X; zh_cn)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; de_de)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; en_gb)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; ko_kr)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; pt_br)
  • AppleCoreMedia/1.0.0.8C148 (iPod; U; CPU OS 4_2_1 like Mac OS X; zh_cn)
  • AppleCoreMedia/1.0.0.8C148a (iPhone; U; CPU OS 4_2_1 like Mac OS X; en_us)
  • AppleCoreMedia/1.0.0.8C148a (iPhone; U; CPU OS 4_2_1 like Mac OS X; fr_fr)

These are user agent strings for the media player application on iOS based devices as far as I can see. Our working understanding of these is that they are triggered when a user clicks on a link (e.g. in a browser, or on email) to a podcast file, the media player loads, and then proceeds to download the file – which is where we then see these entries. Again, UASparser isn’t entirely lost, but it doesn’t recognise the application as such. A typical return would be:

In [146]: s="AppleCoreMedia/1.0.0.8C148a (iPhone; U; CPU OS 4_2_1 like Mac OS X; fr_fr)"
In [147]: p.parse(s)
Out[147]:
{'os_company': 'Apple Inc.',
'os_company_url': 'http://www.apple.com/',
'os_family': 'iPhone OS',
'os_icon': 'iphone.png',
'os_name': 'iPhone OS',
'os_url': 'http://developer.apple.com/iphone/',
'typ': 'unknown',
'ua_company': 'unknown',
'ua_company_url': 'unknown',
'ua_family': 'unknown',
'ua_icon': 'unknown.png',
'ua_info_url': 'unknown',
'ua_name': 'unknown',
'ua_url': 'unknown'}

I would like to be able to confirm my hypothesis on these before I write the relevant regex to cope with identifying these entries. I note as a minor bonus, there appears to be some locale information in the string. It might be nice to compare that with the GeoIP information for these records.

Miscellaneous Unknowns

There are a small subset of strings that have appeared which seem a little more unique. The following set can be parsed to get some OS information, but not much else:

  • CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)
  • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • podcaster/3.7.3 CFNetwork/485.10.2 Darwin/10.3.1

…and then this set are completely beyond UASparser’s understanding…

  1. AppEngine-Google; (+http://code.google.com/appengine; appid: lfe-alpo-gm)
  2. FDM 3.x
  3. gsa-crawler (Enterprise; S5-JE6K2P2TH8JAA; oxsearch@oucs.ox.ac.uk)
  4. HTC Streaming Player vodafone_uk / 1.0 / htc_buzz / 2.2.1
  5. Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
  6. Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )
  7. Openstat/0.1
  8. pvConnect DLNADOC/1.50
  9. SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu)
  10. TencentTraveler 4.0
  11. VeryCD \xb5\xe7\xc2\xbf v1.1.15 Build 110125 BETA
  12. vlc/1.1.4 LibVLC/1.1.4
  13. Xenu’s Link Sleuth 1.1a
  14. Zune/4.2
  15. Zune/4.7

14 & 15 seem fairly obvious – Microsoft’s ill-fated Zune devices. 11 (VeryCD) crops up a few times, often with some rather unusual log entries. I’m not sure whether there is some form of strange character encoding going on that isn’t being interpreted correctly, or wether these are deliberately crafted requests designed to cause problems in some systems. 3 is an obvious one, it is our Google Search Appliance hosted locally. However, the rest are largely unknowns. If you’ve got any information you can shed on these or any of the above UA strings, please do contact us (or leave it in a comment below).

Thanks,
Carl

Posted in Tech-Heavy, WP2: Initial Rapid Analysis, WP3: Website Enhancement | 4 Comments

4 Responses to “User Agent Analysis – Part 2: Name those agents”

  1. Laimonas says:

    AppEngine-Google; (+http://code.google.com/appengine; appid: lfe-alpo-gm)

    These come from the applications hosted on the google’s AppEngine platform (their “cloud” hosting – http://code.google.com/appengine/). Anyone can write python/java apps and host on their infrastructure. Note the “appId:” portion, you can access the app that made a request by going to http://[appid].appspot.com URL, in this case that would be lfe-alpo-gm.appspot.com (this app requires login and I did not login to check what they the app is all about).

    Thanks for sharing this by the way, I run into similar situations as yourself, also for podcasting use cases. There are a lot of user agent strings out there! Let me know if you run into any more that are not clear, I might be able to help.

  2. Rick Hoffman says:

    I have a similar issue
    NSPlayer/10.0.0.4072 WMFSDK/10.0 is responsible for 25% of the activity on my website. I have about 50 of my original songs loaded there and every month at least 10 receive in excess of 100 hits or downloads. Meanwhile no one ever sends me an email and it seems that 10 songs are singled out each month and played excessively. I would think once would get you a download & I cant believe 100 different users wont those particular songs. Anyway this is a mystery to me

    Top 15 of 244 Total User Agents
    # Hits User Agent
    1 2172 25.63% NSPlayer/10.0.0.4072 WMFSDK/10.0
    2 804 9.49% msnbot/2.0b (+http://search.msn.com/msnbot.htm)._
    3 642 7.58% Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like
    4 354 4.18% Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Mozilla/4.
    5 295 3.48% Mozilla/5.0 (compatible; AutomaticSiteMap)

    I think some one is using my music in some way to get access to something else or for some reason that has nothing to do with my music
    RH

  3. tconnolly says:

    Carl,

    I admit I’m guessing, but I suspect that “Apple Mac OS X v10.6…” are all the new Quicktime X Player that Apple began shipping with Snow Leopard.

    TC

  4. @Rick – I haven’t had chance to revisit the data and test my current hypothesis, but I suspect these are some form of custom scripted application, perhaps trying to parse the site for content or local usage. I would expect to find these accesses coming from a small number of IP addresses, and in fairly close batches, perhaps spread over a range of RSS grouped files.

    @TC – It’s a fair guess, and one I’ll try and test when I next revisit the Log Analysis software. My other premise that it’s a custom app/script (perhaps an Applescript iterating over media links) based on Mac OS, similar to my premise for NSPlayer for Windows machines.