Here “Large Installation” means in the order of dozens up to hundreds of systems, rather than thousands to hundreds of thousands as in specialised application areas like physics and web search.
But then typical installations running dozens to hundreds of systems use them like Sysdev for a rather diverse set of applications, while installations running thousands to hundreds of thousands of systems tend to use them as fairly uniform compute clusters, effectively running a single application, the cluster job scheduler.
The challenges in administration of an installed base with dozens to hundreds of server across a rather diverse set of applications are significant, with a careful balance between simplifying by having uniformity, and the requirement to run diverse applications.
This has been reflected in several of the presentations at the workshop, for example in the number about monitoring systems. In a different ways in those presentations about specific parts of the application supporting infrastructure. My impressions of the presentations I like best follow.
Ansible was presented in both a tutorial and a talk, and was an interesting topic, and I liked both the tutorial and the talk. Ansible is a configuration delivery system, where customised configuration files are transported and installed onto target systems. It is usually coupled to a configuration building system based on the Jinja2 templating system.
Its distinctive design goals, which were very emphasised by the presenters, are to have minimal dependencies and in particular to avoid requiring explicit installation on the target hosts. This is achieved by:
- implementing the system in Python;
- using SSH to connect to a shell prompts on the targets hosts;
- downloading over SSH to the target hosts local Python programs to perform preparatory actions, such as host profiling;
- uploading the host profile, if any, to the Ansible server, and generating templated configuration files;
- downloading the generated configuration files.
Overall the idea is a generalisation of the very useful FISH file transfer protocol.
It is however harder to compare with client systems with a very different flavour like CfEngine or Sysdev’s own RB3/ConfigTool pair, which are mostly aimed at the issues around generating configurations rather than distributing them.
Of the monitoring systems there were some impressive and well delivered presentations about the availability monitor Icinga which is a derivative and mostly improved version of Nagios and is particularly suitable for smaller installations (dozens of hosts) out of the box, and can scale to larger ones (hundreds of hosts) with a bit of planning.
Another good talk was about progress with the OpenNMS availability monitor, which includes a network discovery system, and has been designed for scaling to hundreds of monitored hosts.
On the overall monitoring problem there was a very informative, very well presented, candid report of the history of monitoring at a web hosting company. They went through several iterations of both their monitoring infrastructure and reaction processes as business growth pushed higher the number of installed hosts. They made some quite interesting points:
- The most efficient performance monitor is collectd, and they display the performance graphs using Graphite.
- Ganglia is almost as efficient.
- They use Icinga for availability monitoring.
- Performance issues: a 10 second collection interval is essential to spot transient performance issues, and gives a much better feel than a longer interval.
- Writing to dozens of log-like RRD files for hundreds of systems can hit hard a storage system, and putting the RRD archive on a RAM disk and periodically copying it to disk or using a purpose written RRD caching tool is a good idea.
One of the best presentations was an enthusiastic one by a maintainer about the 9.2 release of PostgreSQL, a very robust DBMS which is advisedly used extensively by Sysdev. It is also widely used by “cloud database” companies like Heroku who report having 1.5 million databases run by PostgreSQL. The major advances in the 9.2 release are:
- Even better handling of highly variable workloads (for cloud databases) and of highly parallel ones (for transactional systems).
- Foreign tables which are views (read-only) over tables held in other DBMSes.
- Much faster spatial index access, and parts of queries using a covering index no longer need access to the table.
- Even better support for non-tabular data such as key-value and textual data, and improved type handling, including range/interval types.
The 9.3 release is progressing well, and its new features have already been written and are being tested, among them:
- Event triggers, 64b object addressing, table snapshots.
- Materialised views, and ability to update foreign table views.
PostgreSQL is probably the relational DBMS which comes closest to the ideal described in the 12 rules and even in the more controversial third relational manifesto and generalised handling of views is one of the most important and less commonly implemented aspects of the relational model, and one that is particularly important to realise a three schema design in a relational database.
In some release after 9.3 on PostgreSQL will have multimaster (active/active) clustering, which is already being tested, similar to Apache Cassandra.
A presentation on alternative DNS servers was also quite useful, as relatively recent updates to the DNS protocols offer very useful flexibility if it is easy to take advantage of them. The presentation was made by the author of the good book Alternative DNS Servers.
One of the themes of the presentation was that as DNS is a truly critical service for the Internet many core DNS service providers have felt a need to create independent DNS server implementations to provide diversity of code base reducing the change of a single bug affecting all or most of them.
Of the various DNS servers:
- Unbound was highly recommended, because of efficiency, robustness, extensibility with Python.
- NSD was also recommended as a reliable, complete implementation for serving large numbers of domains at high speed, in part thanks to compiled zone files. NSD version 4 is about to offer also dynamic zone add and delete.
- PowerDNS was even more recommended, having a number of interesting zone database back-ends, a very good DNSSEC implementation, and zone transfer via database replication. Also pretty easy to use Dynamic DNS updates with DNSSEC and maintain the zone database that way. Also there are several tools to edit zones directly in the zone database.
DNS server flexibility can help a lot by providing a layer of indirection in service provision, and to delegate the update of sub-zones to relevant administrators.
Another interesting short presentation was about storage caching. The premise was the widespread issue that consolidating storage into dedicated shared storage servers subjects them to extremely different workloads, and this results into either lower performance or higher costs to accomodate the demand for high IOPS (IO Operations Per Second).
This can be counteracted by caching application specific data on the systems local to the application, and the storage for those systems can be customized for the demands of the application, and the demand for high IOPS can be satisfied with local caching on flash SSDs.
There are 3 major open-source caching systems:
- Flashcache was developed by Facebook as an extension to DM (Device Mapper). It handles retries, but needs careful consideration because its mode of operation is write-back.
- bcache is an independent kernel module which comes as an intrusive kernel patch, and requires a dedicated partition for caching. It has specific support for SSD caching.
- ZFS is a well known filesystem which supports SSD caching in one of two modes, L2ARC and slog. Cache areas can be added or removed dynamically but are not persistent.