Limitations of using WordPress as an aggregator
Part of the Triton Project was to develop “Dynamic collections”. The idea driving this was to develop collections of open educational resources around certain topics. Initially developments involved using the Jorum Widget, the OER Recommender Widget, and an RSS Widget modified slightly so as to use Xpert’s RSS Search feature. At the CETIS OER Hack Day in March this plugin was extended to allow multiple OER RSS feeds to be aggregated (although this code only ran as part of the page – what WordPress calls a “widget”) it could not be the page’s content itself. Each of these aforementioned technologies (Jorum, OER Recommender and Xpert) could provide content which would augment a blog post, but not directly generate the content to create something akin to a blog post (I.E an entire page on a WordPress site).
The steps beyond aggregation
WordPress has a default set of functions for bringing in RSS Feeds already. The University of Lincoln in earlier OER project used a plugin to turn RSS feeds into WordPress blog posts. An approach such as this would perhaps be beneficial in creating a collection of OER content in a WordPress hosted webspace – however, they would not directly be presented as a collection to the end user, and the collection would not be arranged according to terms or keywords, merely a collection en masse.
An approach of turning OER content into posts would also be highly textual (unless a source of videos or podcasts could be created and so guaranteed content could be presented visually). This perhaps outweighs the direct benefit of commenting that would come with an OER as blog post approach.
What features do we want?
The dynamic collection was specified as such to allow users to “like” content it returned, as this would help to facilitate creating a section marked “popular content”. There was also a desire for a section entitled “recent content”, so some way of either recording the date first found, or the metadata having date information was required. Features such as these, and the possible absence of relevant metadata meant that it was likely to require a separate system than using something akin to FeedPress.
How to harvest
WordPress as a system supports only one form of scheduling, which it calls cron. Although this is the same name as the Unix command, WordPress cron is triggered to run by certain events (such as a page loaded), but without the trigger nothing would happen – unlike a Unix Cron job which is triggered by a clock. As such whereas a keyword search across 1-2 repositories could be done on demand, a search across 3 repositories, multiple RSS Feeds, 4 wikipedia API calls and mendeley (for journals) would prove slow, and possibly too slow for PHP to execute safely (php usually has a 30 second limit on how long one script can run for). Using an asynchronous approach such as Ajax would work, but still lead to inconsistent performance and a difficulty in presented data in a controlled way (slower sites with popular content would still appear last).
Given it is also hard to model when a site is accessed. If we rely on the site being accessed to update part of the collection (trigger aggregation), then the more popular the site the less often we would need to trigger the aggregator, but if the site went through quiet periods then the collections would not update. Actions on the admin side also generate the event to trigger an update, and the possibility exists for this avenue to be explored further, but given we do not know, and cannot limit the size of a collection’s feeds, this approach still faces limitations with what PHP could handle.
As such the decision to schedule a trigger once every half hour (by setting hourly events in WordPress and delaying one by half an hour) seems in initial testing to not lead to any noticeable problems for end users. Even with our four collections, this means that the site will refresh over approximately 1 week, and so an issue of scalability remains. Given the time scale of the project remaining, there isn’t scope to experiment with this and assess a generic approach.
You can see the four collections (representing the four main topics covered by the subject) here
Political Theory – http://politicsinspires.org/dynamic_collection/political-theory/
Comparative Government – http://politicsinspires.org/dynamic_collection/comparative-government/
International Relations – http://politicsinspires.org/dynamic_collection/international-relations/
European politics and society – http://politicsinspires.org/dynamic_collection/european-politics-and-society/