Global IPv6 Day

On the 8th of June, for 24 hours, the major names that make up the web experience for a large proportion of users of the Internet will be enabling IPv6 on their services.

The announcement: http://isoc.org/wp/worldipv6day/

What does this mean?

Up until now there’s been an argument made by some network administrators that there’s no point deploying IPv6 as the home Internet Service Providers haven’t , and the ISPs might say there’s no point as a lot of websites aren’t IPv6 enabled, the website owners are worried 1/2000 of their visitors might have IPv6 issues and go to a competitors instead. The network hardware vendors have a similar opinion and so you risk a monotonous stalemate, with the occasional voice of ‘have we run out of addresses yet?’.

This date means all of the above groups joining in, all having the same risks on the same date.

This is great as it means actual progress now, rather than when it’s a panic later. This means ISPs, website owners and even end users[1] taking notice.

[1] Perhaps ideally they shouldn’t know anything has happened but if they’re seeing the publicity and putting pressure on ISPs, vendors and websites then that’s fine.

What about Oxford?

  • With regards to www.ox.ac.uk , I’ve had no involvement with the running of it but I believe it’s maintained by a lot of teams from different parts of the university. I think by June it will be running on hardware from a non OUCS section of the university (I think currently it is NSMS, later it will be BSP), the backend is written by a contracted company and the political control of the website content is via a dedicated team at the Public Affairs Directorate. This makes it all slightly tricky but I’ll begin prodding the contacts involved tomorrow.
  • For smaller university websites hosted by OUCS or via NSMS the outlook is much better, the technical and political challenges are much smaller and we’d like to get as many sites on a AAAA for the date as possible. The systems development team in OUCS have already started deploying sites (such as this blog) with a AAAA.
  • As our first test unit the Maths Institute already has IPv6 connectivity and I’ll be trying to assist them to get their websites IPv6 enabled (if they need my help of course; they might not).
  • For units themselves: (If you aren’t from the university it may help to first explain that the networks team doesn’t supply networking to the end user, we supply networking to the ‘front door‘ of each department/college/unit and the unit has it’s own politically separate IT staff that maintain the unit)
  1. For IPv6 connectivity look at the checklist then get in contact when ready. If in doubt you can phone myself.
  2. You can start today – when someone asks how your IPv6 deployment preparation is going, don’t say that you can’t do anything because OUCS haven’t yet given you IPv6 connectivity. Do an audit of switch hardware, check your firewalls IPv6 support, make a list of the services you run, plan how you will layout your network (these tasks may take months whilst doing your normal duties, please start now).
  3. Please listen to the technical advice given and remain professional. 128bit numbers are long and noone expects you to be perfect beacuse humans make mistakes. We don’t mind mistakes and the move to IPv6 is tricky but we’ll assist and providing you don’t expect us to configure your hardware for you we’ll give advice when asked. As time allows we do go out of our way for approachable IT staff, but please don’t refuse to listen to the advice given.

What about the Networks Team?

You might remember from previous posts that our three main issues were/are:

  1. The firewall: It’s always dangerous to suggest dates in a blog but the IPv6 firewall should be replaced with something more sturdy in late February. The replacement should be quite straight forward and it should be transparent to most users (we’ll see how it goes but at worst IRC server users might notice a disconnection at some dark hour of the morning).
  2. The IPAM (DNS and DHCP management for units): We had a lot of discussions with the vendor in late last year for our replacement system, publically I’m expecting it to be early May before I can state anything. In the meantime our existing system requires entries to be made to the forward and reverse zones by hand. This isn’t so bad for individual website entries so for the June 6th date it should be survivable.
  3. Security blocking: We’ve some code to re-write, I think we can have it done by June.

With the delay in the IPAM I’m thinking about possibly sacrificing some time to modify one of the shorter scripts that pushes out configurations on the existing DNS infrastructure. The current script can’t deal with both a IPv4 and an IPv6 address being pushed to the hosts DNS service configuration, although the hosts themselves (resolver and authoritative DNS) have working IPv6 connectivity. It might be that on the 8th June we can get the auth and resolver DNS systems to have IPv6 service addresses.

I’ll need to consult with my teammates however it might be that with reasonably little pain we can get eduroam and/or the vpn network to have IPv6 client connectivity, since they are self contained networks we administer the service for.

I should stop now and make no more promises, but I’m glad there’s a firm date and I’m looking forward to this.

Posted in IPv6 | Comments Off on Global IPv6 Day

Early 2011 Work

So an overview of my own individual tasks for early 2011 looks like:

Replacing the DHCP servers for the university

This was scheduled for last year but the sequence of events needed to free up hardware for the service to move to has been more awkward than expected so instead we’re going to purchase two low end fault tolerant servers. Hopefully the order, delivery and base install will take place in January with testing at the end of the month. Actual deployment may take place either at the immediate start of a Tuesday standard Janet at risk period (e.g. 7am) or on a weekend, but we’ll decide closer to the time and make an announcement to the IT officers beforehand.

This is important because it’s already behind schedule and the hardware it’s replacing is out of warranty. Essentially if a DHCP servers hardware failed now, although it would failover to the other we’d be redeploying the nearest development server rather quickly as the replacement. The new hardware will be in 5 year warranty, which should be well past when the system is replaced by either an integrated IPAM (DNS/DHCP) system or virtualised.

On the virtualisation note, and before there’s any comments of ‘why don’t you put this service on a virtual host?’ I believe there’s a university virtualisation service in the works from other sections but I don’t know enough detail to talk about it. NSMS currently have a smaller service but we’ll be keeping the DHCP changeover simple for now due to the high number of people affected if there were to be an issue with the service to an offering our own team isn’t familiar with. We do virtualise the majority of our development hosts but our own team doesn’t currently have a public service virtualised – we will in the future, probably as the warranty runs out on more minor services.

ASA IPv6 firewall

The second project in January is to setup and test the intended IPv6 firewall configuration on a ASA 5510 platform that’s currently available for testing here. The decision on purchasing isn’t until the end of the month, if it went ahead I’d expect deployment near the end of February.

The is important in order to replace the temporary IPv6 firewall we currently have, it also means we can get on with deploying websites in OUCS onto IPv6 (e.g. with a AAAA) and (hopefully) websites in Maths. The Mathematical Institute has capable IT staff of it’s own but I’m keen on seeing some things deployed before others so have offered to assist.

LMS

At the start of February I’d like to spend some time trialling Cisco LMS and if this goes well perhaps the Cisco Security Manager. Specifically instead of developing our own in house scripts to manage IPv6 network restrictions (via a Perl Expect module and similar) perhaps we might have better visibility and less maintenance issues with the Cisco tools.

We also have our own in house inventory and network monitoring systems, with various overlapping reporting – I’d like to check that we aren’t needlessly making our lives hard.

Aside from saving on maintenance and misunderstandings, an important aspect I’m interested is problem visibility in a disaster. Specifically if something that should never happen does, I’d like a magical arrow that points to the exact issue. From experience I think we currently have the information needed but it takes some time to realise which place to dig it out from and compare with what, the integration and usability is low.

DNS warmspares

February should also see the deployment of two DNS warmspare hosts, to replace a host lost to hardware failure. These will be the old DHCP servers, since the hardware need not be in warranty. This will start as soon as the hardware is available and the new DHCP service has been running a couple of days.

Other

I’ve planned beyond this however with an upcoming change in management it could well be my priorities change plus well laid plans are vulnerable to some unrelated work suddenly cropping up halfway through the timeframe with a high technical or political priority and needing all other projects postponed.

We’ll also be continuing normal duties, so for 2 days a week I’m on the support queue for our team.

There’s also been progress on the new IPAM system over December however I’m not keen on making promises with regards to this project. We’re hoping for a significant development from the vendor involved in April.

Posted in Uncategorized | Comments Off on Early 2011 Work

BBC iPlayer and the University VPN

[edit] Since writing this an iPlayer developer has passed on via informal channels that they’re using the Quova geolocation service. In this database part of our VPN address range was designated an ‘international proxy’ – while this may be regarded as true or not for restricted access VPN clients I simply wanted a decision, and so I’ve contacted Quova stating that users in that range are told their network will behave as if in Oxford – they appear to accept this so as of ~15th of January this issue may be fixed unless there’s also an additional system the BBC use to override this.

Note that I’m not arguing if the VPN range should or shouldn’t be able to use the iPlayer, I simply wanted to know a response from the BBC to my contact through their iPlayer support channel and wanted to give the users a definitive answer about iPlayer access without have to vaguely reverse engineer the way iPlayer works. There are more important things on the backbone network for our team to be working on than iPlayer access.

The BBC appear to have blocked the university VPN address range from iPlayer, you will get a message stating that content is not available for your region no matter where you were when connected to the VPN.

This was originally reported to our team in August 2010. We were asked to look at an issue with the BBC iPlayer service from central VPN service connections. The original user reports made quite a few claims but I’ll stick to what we were able to verify, since it seems there were some changes and maintenance at the iPlayer end affecting results at the time of the initial reports.

The initial reports suggested that all users of the VPN were affected but we were unable to reproduce the issue – it then transpired that the requestor was a university member who was abroad and that it was only for them that there was an issue. Hence at this point our interest waned and it was pointed out to the user that BBC policy is not to provide content to overseas users . It’s not an especially good/sane (legal?) use of our resources to try and get around the BBC content restrictions so we were not interested. If this had been the end of it, that would have been fine.

A few more queries followed however, with the occasional suggestion by requestors that it must be something ‘special’ about our VPN service we provide that is causing the problem, and so I contacted the BBC to clarify what the BBC position was, and asked for clarification on the technicalities to ensure what was seen was expected and not in fact the symptoms of a technical issue instead of the assumed intended restriction. As part of this request we provided our VPN address range, for which access is restricted to university members, as part of the technical information. What we were hoping for was a BBC statement which we’d pass onto the users (I seem to recall from memory that the BBC policy documentation at the time wasn’t quite as well explained as the current BBC link I’ve given in the paragraph above, but I could be wrong). In hindsight this was not such a good idea.

It now appears that although there was no email response, the VPN client address range we provided was added by the BBC to some form of iPlayer blacklist – but note that this is an educated guess based on the evidence as we have no response from the BBC nor do we have visibility in to the access control mechanism of the BBC iPlayer. It is now the state that all users of the university VPN service, whether inside or outside the UK are denied content by the iPlayer application with a message that content is not available in the persons region.

The knock on effect is that this also stops access to iPlayer for university members using the OWL campus wireless service.

The OWL campus wireless service (which pre-dates WPA technology in consumer devices) uses an unauthenticated/unencrypted network that (to simplify) has destination access restricted to the university VPN service and hence clients make encrypted VPN connections across the unencrypted network to the VPN server in order to provide ‘normal’ and secure network access. The eduroam wireless service also offered in the university is based on WPA enterprise and so needs no VPN connection, leaving it unaffected.

  1. Hence if you’re on campus using the wireless services, connect to eduroam rather than OWL wherever possible (there’s also other reasons to prefer the WPA based service but I don’t want to drift off topic). If your site only offers OWL, ask your local IT support if/when they are hoping to deploy eduroam – they will contact our team when they need assistance with doing this. The BBC have affected the service, it’s not something we have implemented, nor can we affect it so complaints should be directed to the BBC (feel free to link to this post).
  2. If you are outside the UK using the VPN – the BBC policy is not to provide service to you, write to the BBC if this annoys you.
  3. If you are inside the UK using our VPN for internet connectivity but not on wireless (which is an odd situation), then you’ll need to find a different mechanism for internet connectivity that doesn’t use our VPN.

My only comment to the BBC would be that the restriction that was initially in place worked fine – users abroad couldn’t access iPlayer but your new restriction is over the top.

Posted in General Maintenance, VPN, Wireless | Comments Off on BBC iPlayer and the University VPN

AOL mail

Just a minor post about an issue some people might have seen (things are fairly quiet in the runup to Christmas).

If you had an issue delivering mail to or from an aol.com address today this post explains why. I don’t currently see anything on AOL’s postmaster blog with regards to the outage.

At approx 07:00 GMT today aol appear to have removed the MX record for aol.com

Here we lookup their nameservers – the servers that hold all the DNS records for their domains

$ dig NS aol.com +short
dns-02.ns.aol.com.
dns-01.ns.aol.com.
dns-06.ns.aol.com.
dns-07.ns.aol.com.

So (during the outage) lets ask one of those DNS servers where the mailserver for the domain aol.com is – we’re querying their nameserver directly:

$ dig MX aol.com @dns-02.ns.aol.com.

; <> DiG 9.7.2-P3-RedHat-9.7.2-1.P3.fc13 <> MX aol.com @dns-02.ns.aol.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48542
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;aol.com. IN MX

;; AUTHORITY SECTION:
aol.com. 300 IN SOA dns-02.ns.aol.com. hostmaster.aol.net. 304268691 43200 60 1209600 300

;; Query time: 115 msec
;; SERVER: 205.188.157.232#53(205.188.157.232)
;; WHEN: Tue Dec 21 10:07:02 2010
;; MSG SIZE rcvd: 89

So in short, there’s nothing – nowhere to deliver mail and so the domain will not handle mail. This means mail to @aol.com addresses was returned as unroutable and mail from that domain was rejected by basic sender verification (e.g. does the domain you’re claiming to be from actually exist as a mail domain?).

It appears to have been fixed at 10:30 GMT, the mailservers are now listed:

$ dig MX aol.com @dns-02.ns.aol.com. +short
0 mailin-04.mx.aol.com.
0 mailin-01.mx.aol.com.
0 mailin-02.mx.aol.com.
0 mailin-03.mx.aol.com.

Posted in Mail Relay | Comments Off on AOL mail

Webcache IPv6 enabled

By now people are probably getting bored of hearing about the webcache so you’ll be glad to know that this should be the last post on the subject, the webcache having been successfully enabled for IPv6 this morning.

Note that the following is a “warts and all” description of the deployment. I was pressed for time and it could have been better but this is not a best practise guide, it’s a about what a similar system administrator might face in case the experience helps others.

Pre Deployment

It didn’t go entirely to plan, I worked through constructing and testing a formal configuration checklist for the service. Yesterday I made a announcement to our university IT support staff that there would be service downtime for the host (there would at least be a reboot to apply the IPv6 connection tracking enabled kernel as discussed in the previous post here) and then a little later (cue various interruptions and help tickets with exclamation marks and ‘ASAP’ in them) I discovered that I had no method of applying the preferred_lft 0 settings discussed here previously and required to make the IPv6 interfaces use the expected source addresses.

Under Debian it had been a simple cast to add a pre-up command that when a sub interface was brought up would apply the preferred_lft 0 setting, essentially telling it to prefer another interface for outgoing traffic, but otherwise use the interface as normal. Under Centos I couldn’t manually issue a command to alter it (as far as I can see -‘ip add…’ rejected the preferred_lft option as junk and ‘ip change’ was not supported) and needed to update the iproute package. This was fairly painless (download latest source, butcher a copy of the previous packages .spec file and then rpmbuild the package on our development host) but is yet another custom package needed – I’ll be glad to redeploy with Centos 6 when it is released and so have a dedicated package maintainer rather than have extra work ourselves.

As I’d already announced the service would be going down I either had to stay late or lose a little face and do the work the next week (delaying the webcache IPv6 enabling yet another week). It’s important to have a work/life balance but this was one occasion I decided to stay late.

Aspects of the (re)deployment for IPv6 were

  • ip6tables configuration
  • Squid reconfiguration (ipv6 acls etc)
  • Apache reconfiguration (it serves a .pac file to some clients used to supply webcache information)
  • making tests to check the service configuration after change
  • install the new kernel, mcelog and iproute packages
  • interface configuration

Prior to this work the service didn’t have a formal checklist. I constructed one in our teams documentation and wrote a script to conduct the tests in succession (currently only 6 tests for the main functionality but this I’ll add more).

I was able to test the Squid and Apache configurations in advance on the hosts with static commands (e.g. squid -k parse -f /etc/squid/squid.conf) but (due to time) there is currently no identical webcache test host so there is room for improvement.

Deployment Day

The testing work paid off, the existing IPv4 service was down for about 2:30 minutes after shortly 7am with a minor outage of about the same duration a little later. The full IPv6 service was up before 8am.

There were a few hiccups

  • The workaround to apply preferred_lft 0 to IPv6 sub interfaces didn’t work, I’ve applied this manually for now and will make a ticket in our teams RT ticket system.
  • Sometimes really simple issues slip through: Due to oversight the IPv6 firewall wasn’t set to apply on boot, I applied it and fixed the boot commands.
  • One of my squid.conf IPv6 acls was valid syntax but wrong for service operation

The script for service testing was useful and speeded up testing greatly. I’ll aim to incorporate the tests into our service monitoring software.

End Result and Service Behaviour

This important results of this work are:

  • The service is now reachable via IPv6
  • It’s now possible to use wwwcache.ox.ac.uk from a University of Oxford host to visit an IPv6 only website even if your host is IPv4 only.
  • The opposite is also true. If for some reason a host is IPv6 only, the webcache can be used to visit IPv4 only websites.

Someone queried how the service would behave if the destination is available via IPv4 and IPv6 (or more accurately has both a A and AAAA DNS records), the answer is that IPv6 will be attempted first. The is typically the default behaviour for modern operating systems and while it’s possible to alter this we will be leaving it as expected.

Related to this, if you have a good memory I stated in a previous post that we wouldn’t add a AAAA for a service but would make the service use a slightly different name e.g. ntp6.oucs.ox.ac.uk instead of ntp.oucs.ox.ac.uk for the stratum 3 IPv6 provision. we also wanted to add IPv6 cautiously, enabling a service for IPv6 but making IPv4 the default where possible. For this service we’ve seemingly done a about face and added AAAA records for the same IPv4 service name and associated interfaces. My reasoning is subjective but based on the following:

  • If a user has an issue contacting wwwcache.ox.ac.uk we’re likely to get a support ticket complaining and so be aware of the issue quickly, compared to a users computer not accessing a NTP service correctly, in which case (in my experience) the machines clock silently drifts slowly out over the course of weeks or months and then when noticed it is wrongly assumed by the user that the entire university NTP service must be out by the same amount of time as their local clock.
  • I don’t want to have end users as my experiment subjects as such, however wwwcache.ox.ac.uk is a less used system than – for example – the main university mail relay. Hence it’s a more suitable place to use a AAAA for the first time on a main service.
  • We’re getting a bit more confident with the IPv6 deployment and as a result changing some previous opinions.

Remember that formal tests were run on the service, the end users are not being used as the test however I am aware of Google stating that 1 in 1000 IPv6 users had misconfigured connectivity, so I’m still keeping an eye out for odd reports that might be related to odd behaviour in a certain odd device or in a given network situation.

The service is lightly used and as such due to a funding decision (remembering the current UK academic budget cuts) the service is not fault tolerant. That is, the host is in warranty, has dual power supplies and RAID (and is powerful) but there is only one host.

Performance, Ethics and Privacy

In terms of performance the host has 8GB of RAM, before the work a quick check revealed 7.5GB was in use (caching via squid and the operating system itself) so the service is making good use of the hardware. The CPU is the low energy version which is powerful enough for the task and the disks are RAID1 (no, RAID5 would not be a good idea with squid). I believe I’ve covered what we would have purchased if there had been more budget in a previous post so I won’t dwell on it further.

In terms of ethics, the networks team and security team have access to the logs which are preserved for 90 days but (without getting into an entire post on the subject) within the same ethical and conduct rules as for the mail relay logs. Specifically if a person was to request logs for disciplinary procedures (or to see how hard coworker X is working) they are directed to the University Proctors office who would scrutinise the request. Most frivolous requesters including (on occasion) loud, angry and forceful managers demanding access or meta data about an account give up at this point. I can’t speak for the security team but in 3 years I’ve only had 2 queries from the proctors office relating to the mail logs and in both cases this was with regard to an external [ab]user sending unwanted mail into the university. This whole subject area might deserve a page on the main OUCS site, but in short I use the webcache myself and consider it private. We supply logs to the user for connectivity issues and have to be careful of fuzzy areas when troubleshooting with unit IT support staff on behalf of a user but personal dips into the logs are gross misconduct. We process the logs for tasks relating to the service, for example we might make summaries from the logs (“we had X users per day for this service in February”)  or process them for troubleshooting (“the summary shows one host has made 38 million queries in one day, all the other hosts are less than 10k queries, I suspect something is stuck in a loop.”).

There’s no pornography, censorship or similar filters on the webcache; people do research in areas that cover this as part of the university, and frankly there’s nothing to be gained from filtering it. If there is a social problem in a unit with an employee viewing pornography (and so generating a hostile working environment for other employees) then it is best dealt with via the local personnel/HR/management as a social/disciplinary issue, not a technical one – no block put in place on the webcache will cure the employee of inappropriate behaviour. On a distantly related note we haven’t been asked to implement the IWF filter list by JANET and I have strong opinions on the uselessness of the IWF filter list. I don’t think I’m giving away any security teams secrets if I reveal they have less than 10 regular expression blocks in place on the webcache which target specific virus executables and appear to have been added almost a decade ago – these wont interfere with normal browsing (unless you really need to run a visual basic script called “AnnaKournikova.jpg.vbs” – remember that?). There’s no other filters. Network access to the webcache is restricted to university IP address ranges.

This is a old service with some of the retained configuration referring to issues raised 10 years ago. I believe all the above is correct but if you think I’ve made an error please raise it with me (either by email to our group or in the comments here) and assume the error is the result of inheriting a service that’s over a decade old and not deliberate.

Posted in IPv6 | 1 Comment

Kernel for the webcache

Mathematical Institute

The initial switched /64 connection to the Mathematical Institute was switched to a routed firewall connection this week and as a result I’ve routed the entire /56 that was set aside to them. I think we’ll have to require units to have a routing firewall for the /56 and limit non routed units to a maximum of 3 /64’s until they have a firewall in place, otherwise we’ll have too much configuration to do for so many individual /64’s to be routed on the backbone. With the work complete my contact in the Institute has gone on holiday for a fortnight, which is perfect as I’ve some catching up to do in order to improve their service. The Institute is the first unit in the university to receive a /56 IPv6 connection.

OUCS offices

We deployed IPv6 without router advertisements, setup a single host that was used for normal purposes (generating network traffic) and then completed a security response to a pseudo report of the host being compromised. The test highlighted some behavioural differences between our response tools for IPv4 and those for IPv6 which caused minor confusion (the differences were already documented but we could improve the situation). We also completed our migration of Netdisco to the latest CVS source which means IPv6 tracking to a local switchport is working without issue, that is, we can track a compromised host to an office wallport inside our department building. Outside OUCS we can only track a host to the final connection to the college/department, then the tracking must be internal and is handled by the independent IT staff at the unit – Netdisco is a perfect solution for this if you’re not using it already.

I’m going to talk to the security team again and then should be able to give out some addresses to internal teams in order to encourage some interest in IPv6 enabling other services.

Webcache/Centos Kernel

In order to add IPv6 connection tracking support I’ve built a new kernel for our Centos hosts, using the latest stable version at www.kernel.org. I believe I mentioned in a previous post that IPv6 connection tracking wasn’t present on the stock 2.6.18 kernel (any kernel prior to 2.6.20 I believe) and I’m somewhat reluctant to use more complex rules or wait for RHEL/Centos 6 so a new kernel seems a reasonable solution for us as long as it’s only in place until Centos 6 is available.

The main changes I made over the stock kernel config are

  • General setup ->
    • “enable depreciate sysfs features to support old userspace tools” -> I understand that you must enable this in order for the new kernel to boot on the current Centos
  • Networking Support ->
    • Network Options
      • enable “the ipv6 protocol” and enable “source addressed based routing” in the submenu (I would make all the others in the submenu available at least as modules)
      • enable “Network packet filtering framework (netfilter)”
        • “IPv6 Netfilter configuration”
          • enable “IPv6 connection tracking support”
          • enable “IP6 tables support”
          • I specifically selected “packet filtering” “reject” and “log”, and made all others available as modules

There’s quite a few other changes made (if you’re going to the trouble of packaging your own kernel you might as well tailor it to your needs) but they aren’t relevant to the IPv6 connection tracking.

I didn’t quite adhere to the official centos kernel rebuilding guide as I found it rather awkward to follow. From a documentation and testing point of view I think it’s always a good idea to test your instructions on a clean machine and a person that knows little about the subject area (loved ones or a coworker from another section that happens to be passing through…) in case you’re making subconscious omissions of steps that are obvious to yourself. You may need to bribe the person for their time if you find your subject area is utterly dull to the majority of the population.

The other oddity is that the newer kernel uses more bits for the storage of MCE logs so the Centos mcelog package needs updating otherwise it will complain each hour. I used the spec file from the existing .src.rpm and the more recent source to build a new package. Notice that stock 64 bit hosts will have mcelog but not 32bit hosts. I believe the new kernel requires mcelog for both architectures.

I’ve tested it on our development host, I’ll be making formal plans to deploy on the university webcache on Tuesday which will make the service dual stacked. Testing the service in advance is problematic so I’ll need to formalise and scrutinise the deployment steps.

IPv6 University Firewall

We want to replace our current development IPv6 firewall with a production service. Of interest to us with regard to this is that Cisco released the 5585-X ASA’s this last week. These run the same software and commands as the ASA 5510 and upwards (the 5505 is different in some aspects from the rest of the range I believe) but outperform our FWSM modules for IPv6. So we could deploy a couple of ASA 5500 series devices, develop any in house scripts/software needed to work with them and then move to the 5585-X when the main university IPv4 firewall service is due to be replaced in 2 years time.

My concern that would be that we deploy a lower capacity ASA, enable a service such as the HFS backup service for IPv6, then watch as hordes of workstations without a native IPv6 connection use the JANET 6to4 service, sending IPv4 traffic out onto JANET resulting in high IPv6 traffic in through the new firewall, making an internet connection that was barely used swamped overnight, resulting in user complaints. I think as long as we’re careful and monitor the volumes of traffic we should be able to predict and respond without issue but it’s one reason why better performance monitoring tools for the firewall might be of interest.

I’ve had an initial discussion with Cisco about the above and have spent some time in the past week reading through the ASA manuals and listening to the Cisco product podcasts (whilst doing other tasks). I don’t use iTunes much but I believe from memory it was simple to select the podcasts in the store section and search for ‘Cisco’, subscribe to the channels shown and then ‘show previous episodes’ for each channel and ‘get all’.

Next week

Next week I’ll concentrate on the Webcache IPv6 deployment, which will be early Tuesday during the JANET at risk period (e.g. 7am).

You may be aware that we’ve this week lost a prized team member (Oliver Gorwits) to another employer, have also recently re-employed for the wireless position and are seeing the retirement of our senior manager, so I don’t want to make too many other predictions about next week – I enjoy working on the IPv6 deployment issues but we have many other projects and support calls.

If I’m lucky I might have a deployment plan for an ASA solution to the development firewall at the end of next week.

Posted in IPv6 | 1 Comment

IPv6 to client networks, the raw first attempt

All New Authoritative and Resolver DNS servers

The last DNS resolver and all of the DNS authoritative servers have now been migrated to new hardware, the resolver having being migrated approx 6:10am this morning. There’s other work I’d like to do on the DNS, such as internally documenting any differences between our own and the Team Cymru Secure Bind template , or adjusting our configuration to match. At first glance perhaps specifically with regards some of the more minor types of logging (lame servers etc), but this a minor secondary project that will have to wait (I keep a list of these for when a miracle occurs and I have spare time).

The question on how to create IPv6 DNS records for clients was a concern but interestingly a draft RFC was published in September that covers exactly this topic. There’s also work to be done with regards to IPv6 enabling the auth and resolvers however I’ll talk about something a bit more exciting this week, which is namely our first IPv6 user networks for testing

IPv6 on the OUCS Offices Network

The political process for enabling IPv6 on the OUCS Offices network is coming to an end. The ICTST team provide the unit level IT support for OUCS, and I’ve been querying approval for a limited IPv6 deployment on the OUCS offices network. My hope is that enabling this early on will encourage IPv6 testing among some of the core services teams – I’ve already had queries from NSMS and the Systems Development team. It sounds as if this will be approved so I’ve also been checking with the security team to ensure they’re happy with the deployment going ahead since on the local network we have to be able to track misuses down to a host/person just as any other unit would.

I’m not sure full blown IPv6 on the network would cause much issue in terms of application/service support since at least some of the Windows workstations appear to be already using the JANET 6to4 service to connect to IPv6 enabled services without either our assistance or the users knowledge. While there are firewalling considerations to be careful of, this connectivity is something I’m not going to complain about since it’s quite useful (in terms of testing/verifying) to have IPv6 based connections occurring from our users to IPv6 enabled services.

The plan is that if ICTST give approval we will first enable IPv6 for only one machine (the network will not have router advertisements on it initially, so it will be statically configured), we’ll then generate some standard harmless network traffic from it. At this point we stage a security response when the machine is said to have been compromised and the networks and security teams check our normal logging and security tools to ensure we can track the host down and take actions to block it. This should uncover any quirks or deficiencies in our logging or toolsets. To make the test valid the machine will be a single stack (IPv6 only) since it’s the IPv6 logging and tools we need to check.

In preparation we’ve already upgraded Netdisco to the latest CVS version which can handle IPv6. IPv6 name resolution doesn’t appear to work in the latest CVS but this doesn’t hinder us at all and looks fairly simple to fix. If I get a chance (another task for the list) I will investigate and may submit a minor patch for consideration.

IPv6 on the Mathematical Institute Network

I believe it’s now been two years since I approached the Mathematical Institute and asked if they’d be willing to take part in a IPv6 trial. They’ve been far too polite by not pointed out the passage of time (which I originally optimistically stated would be months) but today we’ve supplied them with an actual native IPv6 connection for testing.

This is quite early for us in terms of the services we provide being IPv6 capable so there’s a fair few limitations, as a result this is clearly not a production deployment. However I wanted to get at least one interested customer on the IPv6 service as soon as possible so that they can start bringing up issues that might be obvious to themselves that we haven’t thought of – for instance due to local contact with common applications that might have odd quirks.

So the provided connection in brief:

  • Initially it’s just a /64 out of the eventual /56, it’s suitable for familiarisation with IPv6 until a IPv6 unit firewall is prepared
  • We don’t have a clean solution for DNS changes at the moment, this is being worked on
  • As mentioned last week, the university IPv6 firewall needs replacing with a production system
  • stateless DHCP and Router advertisements are provided
  • Advertised via the stateless DHCP, a single IPv6 DNS resolver is provided (later this will be replaced with the production service)

In security terms and from our immediate teams perspective things are politically easier for us than if it were the OUCS offices network – if a compromised address can’t be tracked to a host/user by the Mathematical Institute we’ll have to cut the development IPv6 feed off. This rather simple situation for us isn’t especially helpful for the local ITSS however, so I’ll prepare a suggested toolset for IPv6 adoption to assist. For instance the already mentioned NetDisco (CVS version) will provide the information required. Not everyone has the time to setup new services so I’ll also check  if perhaps NSMS could offer a local unit controlled version of the toolset on their new ‘FiDo‘  device since it would have network presence in a unit as part of the wake on lan facility/green IT. It might be too heavy a load for that device but Netdisco can be split into parts (web interface, database, probe) so we’ll see.

Next months work

It’s probably time to redo my predicted IPv6 related work for the next month since for one reason or another it’s drifting off course from last months prediction.

  • Subject to ICTST approval, deploy IPv6 to teams that request it for desktop systems in OUCS
  • Assist the Mathematical institute with their IPv6 connectivity
  • Deploy IPv6 on the new Networks Technology Group Service Network in order to enable more services (this is complex and may take most of the month)

And the 101 more minor tasks that don’t seem IPv6 related at first but are part of the critical path and are large tasks

  • Finish migrating our network monitoring host (Nagios etc) to new hardware/network
  • Migrate our internal database server to new hardware
  • Come up with a solution for a production IPv6 capable university firewall for the interim period between now and the scheduled backbone upgrade in 2 years time
Posted in IPv6 | Comments Off on IPv6 to client networks, the raw first attempt

DNS resolvers and (unrelated) IPv6 progress

I thought I’d cover our IPv6/server replacement progress this week but also describe some IPv6 issues we bumped into in case it assists other IT Officers in the University.

DNS Resolvers

Firstly we’re replaced the second of the three DNS resolvers this morning, it seems with less than 30 seconds downtime for the individual resolver being replaced. The process (which happens about once every five years) is now more mature – the deployment/migration instructions I created for the first migration were tested again with the second deployment with only two minor corrections. I’ve also created a formal test plan for pre and post migration which I’ve applied – the previous migration had an odd logging issue that worked in testing but for which the configuration was overwritten for production due to my own human error and had to be corrected. By formalising the testing process it should now be impossible for this to crop up again.

The load on the resolvers is quite low, compared to what the hardware can cope with. Prior to the replacement hardware arriving I ran a script to show the top 10 hosts in the University making DNS queries (no other information, simply the number of queries per host, cropped at a limit of say X million queries per day). The top 5 were guaranteed to be misconfigurations, for example the top host at 38.5 million queries a day was a host asking hundreds of times a second and endlessly for the same individual DNS record. I contacted the sysadmins for the five hosts involved which reduced the queries per day by roughly 20%, but even with these hosts the query load would be manageable on lesser hardware. We’ve already used the lowest power consumption cpus we can in the sever range as part of the University’s energy initiative. Perhaps the next hardware refresh will see virtualisation of the service however this years work is a simple warranty refresh and there’s many other services our team would virtualise first to ensure our chosen virtualisation environment was mature before the (high downtime impact) DNS service was migrated.

The low load means the speed of response in production can be considered simply a measure of what’s been cached. If the record being queried is in the cache then the response will be instant, the only delays coming from lookups to external dns servers, there’s no cpu load worth mentioning.

e.g. in this example, using dig, we get 12ms for the uncached query. In the examples that follow we then get 1 ms for the second (cached) query and 0ms for a host on the same network

$ dig www.bbc.co.uk @163.1.2.1
[...]
;; ANSWER SECTION:
www.bbc.co.uk.        161    IN    CNAME    www.bbc.net.uk.
www.bbc.net.uk.        161    IN    A    212.58.244.68
[...]
;; Query time: 12 msec

[…]

Now the query has been cached:

$ dig www.bbc.co.uk @163.1.2.1
[…]
;; Query time: 1 msec[…]

And even then, using linux.ox.ac.uk on the same network as the DNS server, it appears the 1ms delay might well be the network from my host to the server, or possibly my workstation, but for 1ms or less I’m not going to investigate too hard.

@raven:~$ dig www.bbc.co.uk @163.1.2.1
[…]
;; Query time: 0 msec
[…]

So how do our (caching resolver) nameservers compare to others? Well using namebench this morning (and the results can vary a little but I’ll explain)

Our servers are closer to hosts on our network than external severs so give the quickest responses (the ‘Sys-$address’ servers) for the first summary:

Fastest individual response (in milliseconds):
----------------------------------------------
SYS-129.67.1.1   ######### 1.86205
SYS-163.1.2.1    ######### 1.86491
SYS-129.67.1.180 ########## 1.94716
Hurricane Electr ################### 3.84903
Norton DNS US    #################### 4.03595
OpenDNS          ##################### 4.39906
Cable & Wireless ########################## 5.29504
DynGuide         ########################## 5.35607
BT-70 GB         ########################## 5.39613
Google Public DN ################################################## 10.44893
UltraDNS-2       ##################################################### 11.17086

In terms of our servers the above test is fairly typical/consistent – the University servers should always be the fastest from the list

For the second test however, there are external DNS services which I’d suggest are receiving more queries and hence having a larger cache of queries at any point in time and so have a faster average response:

Mean response (in milliseconds):
--------------------------------
BT-70 GB         ############### 28.51
Google Public DN ##################### 39.61
OpenDNS          ############################ 54.62
Cable & Wireless ############################## 58.24
SYS-129.67.1.1   ################################### 68.50
SYS-163.1.2.1    #################################### 70.55
SYS-129.67.1.180 ###################################### 73.40
Norton DNS US    ######################################### 81.05
Hurricane Electr ########################################## 82.70
DynGuide         ################################################### 99.75
UltraDNS-2       ##################################################### 104.77

So based on the above we should all use the BT,Google or OpenDNS servers not the University DNS, right? Well there’s a couple of reasons why that might end up being slower. Firstly, using the default testing methodology of namebench this later test is more variable. Running the test the next day/hour/minute might give quite different results, so don’t jump to conclusions. For example the above might suggest that the two new DNS servers we’ve deployed are somehow faster, whereas the (currently) older 129.67.1.180 is slower, but the next test suggests the opposite.

Mean response (in milliseconds):
--------------------------------
BT 41 GB         ############## 50.04
OpenDNS-2        ############## 50.21
SYS-129.67.1.180 ############### 50.70
OpenDNS          ################ 57.20
Google Public DN ################## 64.75
SYS-163.1.2.1    ################### 67.80
UltraDNS         #################### 70.04
Hurricane Electr ##################### 74.04
SYS-129.67.1.1   ##################### 75.29
DynGuide         ############################# 104.22
Fast GB          ##################################################### 191.12

Hence namebench is a handy test, but relax and don’t panic about the results (or if you’re dishonest, simply run the test a number of times until you get the result you want to show your boss). Secondly each of our resolvers also carries a local copy of the ox.ac.uk zone so lookups for this will be instant (even if this weren’t the case, the authoritative servers for ox.ac.uk are also on the immediate network so I’d expect to be faster than an external lookup to a host that then contacts our authoritative servers, but this isn’t important). e.g.

$ dig www.oucs.ox.ac.uk @163.1.2.1
[…]
;; Query time: 1 msec

The last resolver will be replaced next week, it’s already prepared so I’ll finish testing it today. The authoritative servers replacements will be quite painless and not as potentially exciting.

IPv6 Work

A few issues cropped up. I mention them here, not because they aren’t known but because if you’re an average everyday sysadmin (like I am – I’m no IPv6 expert I just happen to be tasked with implementing it on our sevices) you might not be aware of them.

Firstly for our IPv4 based servers we tend to have a management interface (that you might ssh to) separate to the hosts service addresses. We use virtual interfaces (eth0:1,eth0:2) to provide these in most cases. Under Ipv6, as you may know, you don’t use virtual interfaces,so your configuration might look something like:

# don't use this example, read the explanation
iface eth0 inet6 static
 address [% ipv6_management_interface %]
 gateway [% ipv6_gateway %]
 netmask 64
 mtu 1280
 post-up /sbin/ifconfig eth0 inet6 add [% ipv6_service_X  %]/64
 [...more service addresses..]

That’s fine, except the traffic from the host (e.g. making database connections) may well come from any of the service addresses, which caused an issue when the webserver for IT Support Staff was IPv6 enabled. There’s roughly 10 rules set out in an RFC to define how the source should be chosen, this article is already rather long so I’m only discussing the solution, there’s a better article on the Linux implementation but in brief here’s what I’ve done:

iface eth0 inet6 static
 address [% ipv6_management_interface %]
 gateway [% ipv6_gateway %]
 netmask 64
 mtu 1280

 pre-up ip -6 addr add [% ipv6_service_X  %]/64 dev eth0
 pre-up ip -6 addr change [% ipv6_service_X %]/64 dev eth0 preferred_lft 0

I found documentation on this general area and on preferred_lft to be a little sparse (but please correct me in the comments if you know of a link to an article with any real meat to it). Using the section of RFC2461 it’s the length of time the prefix is valid for the purpose of on-link determination. We’ve set it to zero so which results in the interface being marked as depreciated (the interface still works fine). We’ve also altered the interfaces defined order so the management interface is the last initialised.

Of unrelated interest is the mtu specified which is explained in detail by Geoff Huston so read his notes for this.

The final host for our NTP round robin is a somewhat quirky machine which has (for historical reasons that pre-date my joining the team) got an interface on both physical OUCS machine room networks, a practise we ask others not to do and don’t do on any other service we have. Under IPv4 a single gateway is defined and the host responds correctly to a ping or other traffic on either interface. Under IPv6 the host receives traffic on the secondary interface and replies out of the primary, causing the packets to be dropped by the networks border. Adding a second gateway to anywhere via the secondary connection fixes ICMPv6 so it behaves as expected, however ntpd replies out the opposite interface to the one that received the query. From what I can find this appears to be a known problem with ntpd, and since the host is about to be migrated to a single homed host I’ve simply removed the interface from the ntp6 round robin and will allow the hosts decommission to fix the issue – if I had more time I might investigate further but we are short on time compared to outstanding tasks. Sadly this host is also a component of our Nagios monitoring so we may have to postpoone the IPv6 service monitoring and perhaps speed up this base hosts migration to new hardware/software.

Lastly there was an issue last week on the (separate IPv6 only) University firewall for (if I recall correctly) roughly 40 minutes which was my own human error and embarrassing. Although the IPv6 deployment is considered currently a non production service, the distinction is weaker as we enable more production services on IPv6, accessible either externally or via tunnelled internal hosts. The issue was a configuration management and testing one hampered by there being only one firewall device for the IPv6 connection currently (and no test equivalent). We discussed in our team meeting yesterday contacting our switch/router hardware vendor to discuss a more mature (and upgradable) interim solution instead of waiting 2 years for the backbone upgrade project. We also need a solution for the firewall management itself – adding,removing webserver exemptions for example. We have an existing system which manages the main firewall and IPv4 exemptions but some work and research will be needed as the IPv6 exemptions are currently manually handled and so not scalable.

Progress

In short we’re about a week behind based on the original plan and I may insert an additional weeks breathing space into the schedule in order to address minor issues that have come up during the work. Specifically looking at last weeks targets:

  • I stated I’d be building a Centos5 custom kernel, which is required for the webcache to be IPv6 enabled. I didn’t have time for this last week but aim to revisit it this week.
  • The final host was added to the ntp6 stratum3 and had an issue as discussed above, it was removed and the present ntp6.oucs.ox.ac.uk service will be regarded as complete for now
  • This also affects the Nagios network monitoring, which is hence delayed
  • The expected DNS resolver deployment has gone fine, the next one will be Tuesday 5th October, when all 3 resolvers are replaced they can be IPv6 enabled.
  • I haven’t replaced any Authoritative DNS servers yet but hope to replace at least one this week

In addition

  • I’m looking at how we’ll handle DNS for the units that want to take part in early adoption of IPv6 prior to our team having a IPv6 capable DNS management interface available for IT officers – we may use wildcards in the initial period (not generate statements, which are different)
  • I’ll try and get a public update to see if the Network Security team are ready for a unit to have IPv6, (if interested note that our own team has basic local network sanity requirements for taking part in the early adoption testing)
  • As discussed we’ll be looking at making the IPv6 firewall a production quality service
Posted in IPv6 | Comments Off on DNS resolvers and (unrelated) IPv6 progress

Surprise! You have IPv6 connectivity!

I bet you didn’t think you had IPv6 connectivity yet (certainly in any University department). After all we’re still working through our plan to light up IPv6 services in the core. Well, news flash: if you’re running Windows 7 in the University it’s likely you can already access IPv6 services.

How so? By means of what we call tunnelling mechanisms. The Internet standards designers realised that during the transition to full IPv6 access there would be islands of IPv4-only and IPv6-only systems. The idea was to create some simple transitioning mechanisms by which systems in these islands could still talk to the rest of the Internet, be it IPv4 or IPv6.

Windows 7 ships with a number of tunnelling mechanisms enabled by default, and they pretty much all work in the same way. Your client wraps up the IPv6 packet inside an IPv4 packet and fires it off to a Relay Server out on the Internet somewhere. The Relay Server has both IPv4 and IPv6 connectivity so extracts the IPv6 content and sends it natively to the target server. The reply is somewhat similar, and usually some fancy IP addressing rules are used to allow traffic to find its way back to your client.

Note that depending on local department and college firewall configurations, some of the tunnelling mechanisms may not work.

Teredo is a common tunnelling mechanism which is used when the client is on an RFC1918 IPv4 address (sometimes called private addressing), often what you get behind NAT. 6to4 is another mechanism, this time one which requires a publicly routable IPv4 address (so is common at our institution as most clients have that configuration).

There are a few issues with tunnelling mechanisms, however:

  1. They don’t promote setting up native IPv6 connectivity
  2. There are security concerns because your traffic (potentially local traffic between two University systems) goes via a Relay Server on the Internet
  3. Performance may be poor because of the latency introduced by relaying out to the Internet or because the Relay Server is congested

The second point is particularly interesting to the Network Development and Network Security teams in OUCS. We’d much rather traffic local to the University didn’t relay via some untrusted server potentially on the other side of the world, and the tunnelling also makes it difficult for us to monitor for and catch malware-infected clients like we can do for IPv4.

Confider for example an IPv4 workstation in a department connecting to the IRC service irc.ox.ac.uk, which has now been IPv6 enabled. The Windows 7 client is on IPv4 and the server is on IPv6. Due to the default configuration of Windows and Microsoft’s interpretation of the RFCs a tunnelled IPv6 connection will be preferred to a native IPv4 connection (the IRC service still runs on IPv4, too!).

  1. Client is on IPv4 and asks a DNS server for the IP(s) of irc.ox.ac.uk
  2. DNS resolver replies with both 129.67.1.25 (A record) and 2001:630:440:129::407 (AAAA record used for IPv6 addresses)
  3. Windows 7 spots it’s on a publicly routable IPv4 address so starts up a 6to4 tunnel
  4. A connection to 2001:630:440:129::407 is made over the 6to4 tunnel, via a Relay Server on the Internet

By the way, one mitigation technique is to use an SSL connection to the IRC server, which we support 🙂

To protect collegiate University interests, OxCERT took the decision to place a block on Teredo traffic (udp/3544) at the University JANET-connection firewall. We’ve recently also been looking at the 6to4 mechanism — but then hit a stumbling block…

Ideally we’d like to run a local 6to4 relay so that University systems can still use this transition mechanism. I set one up on a development server at OUCS. This server needs, due to the way 6to4 works, to be able to send IPv6 traffic with source addresses in the IPv6 range 2002::/16. Fair enough, our backbone can deal with that.

However at our connection to JANET it turns out that the JANET-UK engineers implement some IP address filters (and quite rightly so). Only IPv6 addresses in Oxford’s 2001:630:440::/44 range are permitted onto JANET, and the 2002::/16 packets are dropped on the floor 🙁

The net effect is that sadly I can’t run a local 6to4 relay for us in Oxford, at least not without persuading JANET-UK to change their configuration policy for connected institutions. I’d quite like to do that: it seems a little unfortunate to assume we won’t want to use these transition mechanisms (and various other IPv6 goodies which use non site-specific addresses). However as I know what they’re likely to be doing in the configuration (uRPF) and I understand its limitations, I agree it would be more work for them, although not impossible by any means, to implement smarter access control lists.

Curiously, JANET-UK does not filter what we receive, only what we send. For instance we receive lots of spoofed packets from the Internet which are dropped by our JANET-connection backbone router. I suppose it’s their implementation of “be liberal in what you receive, conservative in what you send.”

The upshot of all this is that from a Windows 7 box on a publicly routed IPv4 address you probably can already ping6 ipv6.google.com, or visit http://www.kame.net/ and see a dancing turtle. I’m glad for that – IPv6 isn’t scary, and is here and alive and working well. However I’d much prefer us to be able to provide an improved and safer experience when you are inadvertently using IPv6, as will become all the more common in the future.

If we accept that use of 6to4 is inevitable then we (JANET-connected institutions) should also be able to run a local 6to4 relay for:

  1. Network security – to monitor for and catch malware-infected clients
  2. Network performance – to avoid “tromboning” local traffic via a remote (and untrusted/congested/etc) server on the Internet

Do any other institutions feel the same way? Are we barking up the wrong tree? I’d appreciate feedback in the comments section below. Regardless, good luck with your IPv6 transition — exciting times!

Posted in Backbone Network, IPv6 | 2 Comments

New DNS servers

Completed

The main progress on the IPv6 and server deployments this week:

  • This morning we’ve deployed a new DNS resolver to replace our oldest in service host. It was due to be done last week but I spent a little longer on testing. This has made the deployment a lot smoother than it would otherwise have been if rushed through last week. The DNS resolver service itself is made up of 3 servers with only one server being migrated. Due to this and because the changeover period was going to be short and the time for the change early morning a general announcement to IT staff was not made (there are other social reasons – too many announcements has the effect of crying wolf and then staff stop reading them). One address would have been unreachable for roughly 90 seconds during the changeover at ~7:14am (I think we can make it faster for the two that follow).
  • I’ve IPv6 enabled another couple of minor servers and as a result added another host to our IPv6 stratum 3 NTP round robin DNS record
  • I’ve done the majority of work in preparation for IPv6 enabling our webserver that IT Support Staff use for our web based network management tools, there isn’t time to make this live in the at-risk slot today but it’s a service we can deploy on another early morning this week without causing issues.

Webcache

The main setback has been the webcache. It’s based on a Centos 5 host, which is using the standard 2.6.18 series Linux kernel. The issue is that IPv6 connection tracking is broken on kernels prior to 2.6.20. Some Linux distributions can have slightly misleading kernel version numbers since the distribution maintainers backport certain select newer fixes and features to the older kernel version they ship, so I tested the IPv6 connection tracking on our centos 5 development host in case. Sadly testing confirmed there were issues.This has further implications since oxmail.ox.ac.uk, our ntp stratum 2 and smtp.ox.ac.uk are among our Redhat/Centos based services. It’s quite a shame that I didn’t pick this up when researching/auditing our services, and in hindsight I believe a second mistake was that our IPv6 test network was Debian only hence I didn’t spot it in testing. The Debian hosts had a kernel new enough not to suffer the issue (On Debian this is “etch and a half” kernel onwards).

What’s the fuss about? Connection tracking means that you can make a statement in your firewall rules along the lines of “allow in traffic from anyone who’s replying to my attempt at contacting them”. To simplify things: if you have broken connection tracking then your firewall rules either don’t work or you have to make them more primitive and yet more complex to configure. The possible solutions include running a custom kernel, which if possible I’d like to avoid on a production system since we’ll have to track kernel security announcements and do all the actions that would normally be done for you by a distribution package manager. We could also reinstall the system, perhaps to Debian, but this is a time consuming and service affecting solution. Redhat6/Centos6 should solve the issue but might not be released until January. I took a little look on a development host at putting the redhat6 beta 2 kernel on to Centos 5 but met a dependency chain that suggested this was not the way forward. Building complex firewalls based on connectionless rules fells like a step backwards, I’d like to avoid this.

I suspect we’ll use a custom kernel for a few months (e.g. our own package from the latest stable version at kernel.org), then make the webcache one of the first hosts upgraded when Centos6 is released. Once this is done we might take stock and think about the other Centos based services.

DNS

It was also planned to replace one authoritative and one resolver DNS service this week. The resolver is completed, as mentioned earlier, but the auth service hasn’t been done due to time constraints. The auth service typically has a much lighter load so it was more important to replace the resolver. We might replace the (one of three making up the service) auth server outside of the JANET at risk period since queries tend to come from other DNS servers which have better caching and failover behaviour than end user clients which use the resolvers, hence 60 seconds of one auth DNS server being down out of the three shouldn’t have a noticeable effect, especially if the work is done in the early hours.

It’s taken a fair time to deploy one resolver, but this has been due to integrating the older DNS configuration management system with our newer system used for our other hosts (we use cfengine). It’s not possible to totally turn off the old configuration system at this point, which is responsible for pushing new DNS configurations across the dns servers but now that the configuration templates and integration are done (and tested) the remaining DNS servers should be easy and quick to configure and hence faster to deploy. As far as IPv6 goes, once all the resolvers or all the authoritative DNS servers are using the new configuration system it’s a simple matter to enable it – I was able to complete the configuration templates and testing for this last week.

To Follow

The rest of this week will involve:

  • Enabling the webserver that IT Support Staff use for our web based network management tools to support access via IPv6, possibly tomorrow morning
  • Adding the final host to the ntp stratum 3 IPv6 round robin
  • Prepare the two new hosts that will replace our other two older DNS resolvers next Tuesday
  • Building and packaging a working Centos5 kernel from the latest stable version at kernel.org and testing for stability, then considering deployment on the webcache
  • (if time allows) replace the DNS auth servers one at a time
  • (if time allows) setting up monitoring of the IPv6 based services
Posted in IPv6 | Comments Off on New DNS servers