IT Services has just launched a brand new project aimed at improving the resilience of WebLearn. The project which started in June 2014 and will run for 9 months aims to improve the infrastructure that supports the WebLearn architecture.
WebLearn is now classified by the University as a Tier 1 service. This means that it is essential to the business of the University and must suffer from minimal unplanned downtime.
Even though the reliability of WebLearn has been very good of late, there are still improvements that can be made. WebLearn runs in a physical cluster of 4 machines: one back-end server which hosts the MySQL database and SOLR search platform and a further 3 worker nodes which host the rest of WebLearn.
There have been a couple of occasions when a problem with SOLR has put a strain on the back-end server and caused the database to run slowly – to an end user, this makes WebLearn appear sluggish with slow response times.
On other occasions, we have noticed ‘runaway threads’ on one of the worker nodes – to an end user who is using this node, again, the service will appear to be labouring.
The objectives of WeRP are that robustness and recovery time should be improved, this will result in less unplanned downtime for users and should also give overall faster response times.
Certain kinds of problems will be able to be identified quicker and users will be less inconvenienced when there is a problem as it will be easier and quicker to take a node out of services in order to repair a problem. Improved monitoring & access to logs will allow both IT Services to diagnose past issues and respond to user queries.
The project is at a very early stage and is still in the planning phase – keep an eye on this blog for updates of progress as an when they happen.