By now people are probably getting bored of hearing about the webcache so you’ll be glad to know that this should be the last post on the subject, the webcache having been successfully enabled for IPv6 this morning.
Note that the following is a “warts and all” description of the deployment. I was pressed for time and it could have been better but this is not a best practise guide, it’s a about what a similar system administrator might face in case the experience helps others.
Pre Deployment
It didn’t go entirely to plan, I worked through constructing and testing a formal configuration checklist for the service. Yesterday I made a announcement to our university IT support staff that there would be service downtime for the host (there would at least be a reboot to apply the IPv6 connection tracking enabled kernel as discussed in the previous post here) and then a little later (cue various interruptions and help tickets with exclamation marks and ‘ASAP’ in them) I discovered that I had no method of applying the preferred_lft 0 settings discussed here previously and required to make the IPv6 interfaces use the expected source addresses.
Under Debian it had been a simple cast to add a pre-up command that when a sub interface was brought up would apply the preferred_lft 0 setting, essentially telling it to prefer another interface for outgoing traffic, but otherwise use the interface as normal. Under Centos I couldn’t manually issue a command to alter it (as far as I can see -‘ip add…’ rejected the preferred_lft option as junk and ‘ip change’ was not supported) and needed to update the iproute package. This was fairly painless (download latest source, butcher a copy of the previous packages .spec file and then rpmbuild the package on our development host) but is yet another custom package needed – I’ll be glad to redeploy with Centos 6 when it is released and so have a dedicated package maintainer rather than have extra work ourselves.
As I’d already announced the service would be going down I either had to stay late or lose a little face and do the work the next week (delaying the webcache IPv6 enabling yet another week). It’s important to have a work/life balance but this was one occasion I decided to stay late.
Aspects of the (re)deployment for IPv6 were
- ip6tables configuration
- Squid reconfiguration (ipv6 acls etc)
- Apache reconfiguration (it serves a .pac file to some clients used to supply webcache information)
- making tests to check the service configuration after change
- install the new kernel, mcelog and iproute packages
- interface configuration
Prior to this work the service didn’t have a formal checklist. I constructed one in our teams documentation and wrote a script to conduct the tests in succession (currently only 6 tests for the main functionality but this I’ll add more).
I was able to test the Squid and Apache configurations in advance on the hosts with static commands (e.g. squid -k parse -f /etc/squid/squid.conf) but (due to time) there is currently no identical webcache test host so there is room for improvement.
Deployment Day
The testing work paid off, the existing IPv4 service was down for about 2:30 minutes after shortly 7am with a minor outage of about the same duration a little later. The full IPv6 service was up before 8am.
There were a few hiccups
- The workaround to apply preferred_lft 0 to IPv6 sub interfaces didn’t work, I’ve applied this manually for now and will make a ticket in our teams RT ticket system.
- Sometimes really simple issues slip through: Due to oversight the IPv6 firewall wasn’t set to apply on boot, I applied it and fixed the boot commands.
- One of my squid.conf IPv6 acls was valid syntax but wrong for service operation
The script for service testing was useful and speeded up testing greatly. I’ll aim to incorporate the tests into our service monitoring software.
End Result and Service Behaviour
This important results of this work are:
- The service is now reachable via IPv6
- It’s now possible to use wwwcache.ox.ac.uk from a University of Oxford host to visit an IPv6 only website even if your host is IPv4 only.
- The opposite is also true. If for some reason a host is IPv6 only, the webcache can be used to visit IPv4 only websites.
Someone queried how the service would behave if the destination is available via IPv4 and IPv6 (or more accurately has both a A and AAAA DNS records), the answer is that IPv6 will be attempted first. The is typically the default behaviour for modern operating systems and while it’s possible to alter this we will be leaving it as expected.
Related to this, if you have a good memory I stated in a previous post that we wouldn’t add a AAAA for a service but would make the service use a slightly different name e.g. ntp6.oucs.ox.ac.uk instead of ntp.oucs.ox.ac.uk for the stratum 3 IPv6 provision. we also wanted to add IPv6 cautiously, enabling a service for IPv6 but making IPv4 the default where possible. For this service we’ve seemingly done a about face and added AAAA records for the same IPv4 service name and associated interfaces. My reasoning is subjective but based on the following:
- If a user has an issue contacting wwwcache.ox.ac.uk we’re likely to get a support ticket complaining and so be aware of the issue quickly, compared to a users computer not accessing a NTP service correctly, in which case (in my experience) the machines clock silently drifts slowly out over the course of weeks or months and then when noticed it is wrongly assumed by the user that the entire university NTP service must be out by the same amount of time as their local clock.
- I don’t want to have end users as my experiment subjects as such, however wwwcache.ox.ac.uk is a less used system than – for example – the main university mail relay. Hence it’s a more suitable place to use a AAAA for the first time on a main service.
- We’re getting a bit more confident with the IPv6 deployment and as a result changing some previous opinions.
Remember that formal tests were run on the service, the end users are not being used as the test however I am aware of Google stating that 1 in 1000 IPv6 users had misconfigured connectivity, so I’m still keeping an eye out for odd reports that might be related to odd behaviour in a certain odd device or in a given network situation.
The service is lightly used and as such due to a funding decision (remembering the current UK academic budget cuts) the service is not fault tolerant. That is, the host is in warranty, has dual power supplies and RAID (and is powerful) but there is only one host.
Performance, Ethics and Privacy
In terms of performance the host has 8GB of RAM, before the work a quick check revealed 7.5GB was in use (caching via squid and the operating system itself) so the service is making good use of the hardware. The CPU is the low energy version which is powerful enough for the task and the disks are RAID1 (no, RAID5 would not be a good idea with squid). I believe I’ve covered what we would have purchased if there had been more budget in a previous post so I won’t dwell on it further.
In terms of ethics, the networks team and security team have access to the logs which are preserved for 90 days but (without getting into an entire post on the subject) within the same ethical and conduct rules as for the mail relay logs. Specifically if a person was to request logs for disciplinary procedures (or to see how hard coworker X is working) they are directed to the University Proctors office who would scrutinise the request. Most frivolous requesters including (on occasion) loud, angry and forceful managers demanding access or meta data about an account give up at this point. I can’t speak for the security team but in 3 years I’ve only had 2 queries from the proctors office relating to the mail logs and in both cases this was with regard to an external [ab]user sending unwanted mail into the university. This whole subject area might deserve a page on the main OUCS site, but in short I use the webcache myself and consider it private. We supply logs to the user for connectivity issues and have to be careful of fuzzy areas when troubleshooting with unit IT support staff on behalf of a user but personal dips into the logs are gross misconduct. We process the logs for tasks relating to the service, for example we might make summaries from the logs (“we had X users per day for this service in February”) or process them for troubleshooting (“the summary shows one host has made 38 million queries in one day, all the other hosts are less than 10k queries, I suspect something is stuck in a loop.”).
There’s no pornography, censorship or similar filters on the webcache; people do research in areas that cover this as part of the university, and frankly there’s nothing to be gained from filtering it. If there is a social problem in a unit with an employee viewing pornography (and so generating a hostile working environment for other employees) then it is best dealt with via the local personnel/HR/management as a social/disciplinary issue, not a technical one – no block put in place on the webcache will cure the employee of inappropriate behaviour. On a distantly related note we haven’t been asked to implement the IWF filter list by JANET and I have strong opinions on the uselessness of the IWF filter list. I don’t think I’m giving away any security teams secrets if I reveal they have less than 10 regular expression blocks in place on the webcache which target specific virus executables and appear to have been added almost a decade ago – these wont interfere with normal browsing (unless you really need to run a visual basic script called “AnnaKournikova.jpg.vbs” – remember that?). There’s no other filters. Network access to the webcache is restricted to university IP address ranges.
This is a old service with some of the retained configuration referring to issues raised 10 years ago. I believe all the above is correct but if you think I’ve made an error please raise it with me (either by email to our group or in the comments here) and assume the error is the result of inheriting a service that’s over a decade old and not deliberate.
[…] webcache.ox.ac.uk (relevant blog post) […]