Machine Room IPv6 and checklist for early IPv6 adopters

We’ve made some progress, we’ve reached agreement with the security team that we can enable IPv6 on the OUCS machine room hardware starting 7th September, which means we will be able to start making our core network services IPv6 capable. We’ve already had the political approval for this so this week we’re testing before the change.

I’m currently working on a hardware refresh of the core DNS/DHCP service so there’s a good chance that when the new DNS servers come online they will be able to handle queries over IPv6. The other core network services might not take long to follow as we’ve already done a lot of the background work needed. By the time this is done we can start deploying IPv6 to units that are interested in early testing.

I did an IPv6 talk today for roughly 40-50 of those not able to make the recent talk at the IT support staff conference. I gave a handout to anyone interested in taking part in early testing, a copy of which is below. It’s important to remember that the university is made up of roughly 200 politically independent units, most of which have their own local IT support staff. There’s a lot of variety in skill sets/specialities and this is compounded when either managers or IT staff could be the audience so it’s sometimes tricky to get the technical level right in the handout.

So the following are steps we are asking units to have taken before taking part in the IPv6 early adopters/testers.

1. Are you running Spanning Tree?

Please deploy Spanning Tree Protocol (STP) on your switches. This prevents network loops in your network, symptoms of which will be any of

  • excessive broadcast traffic

  • unexpected traffic arriving at interfaces

  • lower than expected network throughput

This is important because network loops are a fundamental issue in a complex network which STP resolves. Your switch vendor support channels may refuse to assist you with other issues you raise while looping is evident on your network.

Rapid Spanning Tree Protocol (RSTP) can be viewed as a newer implementation of STP, this is also fine and will give a much shorter ‘repair’ time (convergence) to repatching if your switches support it.

This isn’t our only switching advice but it’s the bare minimum you should have for a sane network.

Links:

http://packetlife.net/blog/2009/oct/15/stp-your-friend/

2. Do you have a helpdesk ticketing system?

You should be able to track a users open issues, track new issues, see what work has been done on troubleshooting an issue and who is dealing with it. If there is some dispute or re-occurrence and a user queries what happened to their previous issue, you can then look it up and see what the last correspondence was and why it was resolved.

If you don’t have one of these the ‘RT’ system is recommended and widely used.

Links:

http://bestpractical.com/rt/

[edit] I should have noted in the original handout that NSMS can offer RT as a hosted service to units, they have price details online.

3. Do you have a network diagram?

You should have or be able to produce a physical and logical network diagram for your network. It doesn’t have to have every workstation you own but you should be able to diagram your core infrastructure.

This will save you time when planning network changes and IPv6 deployment. It may save expensive (man-hours or equipment) design mistakes.

Ask another team member to independently check the diagram or separately produce their own to test it is correct.

4. Do you have a central documentation system for the IT team?

Can your IT staff record information about a service, server or similar that other staff can then look up? It could be a wiki or a content management system like Plone or even a shared folder full of MS Word documents (not a recommendation but better than nothing), as long as it’s actually usable. Does your team find it easy to use, are they using it?

This is important because it reduces duplication of effort. Odd corner cases and workarounds are troubleshooted/discovered and documented once, from then on staff need only implement the solution that was previously found, no matter if the staff member is ill, has left or worse.

If server/service X goes down then the documentation should let junior members of staff bring it back up.

5. Are your hosts time synchronised?

You can use the NTP servers that make up ntp.ox.ac.uk and ntp.oucs.ox.ac.uk. These means that when you are troubleshooting you can know for sure that the time period from logs on one server match that on another. At least have your servers and key network devices (firewalls etc) synchronised with a working NTP service.

6. Are you on top of your IT in general?

1) Check your unit doesn’t have unhandled security blocks on hosts pre-dating 2010

[internal only link, sorry] https://networks.oucs.ox.ac.uk/webauth/blocks

If it does, chase each one up with security@oucs.ox.ac.uk , investigate and clear them up.

2) Check your units university firewall webserver exemption details are correct and don’t include dead hosts or unexpected internal webservers, unexpected items of critical network infrastructure that happen to have a web interface and similar.

[internal only link] https://networks.oucs.ox.ac.uk/webauth/firewall?showog=http_servers

To change entries email networks@oucs.ox.ac.uk

3) Look at your internal helpdesk queue, are you coping? If you can’t tell what is going on, or there’s more than perhaps 30 open tickets per supporting staff, then it might be time for some form of intervention.

The aim of the above wasn’t to put up barriers to adoption but to ensure that a basic level of network sanity was present in those networks taking part. I could have made the list longer (ideas welcome in the comments) however some of the above are quite large projects to undertake for a small unit currently without and facing a 10% budget cut plus employment freeze. Aside from IPv6 perhaps our team could produce a best practise recommendation checklist for units that they can self audit themselves against (and keep the results to themselves) if they wish.

Posted in IPv6 | 1 Comment

One Response to “Machine Room IPv6 and checklist for early IPv6 adopters”

  1. […] both local units are taking part as IPv6 early adopters. We can’t currently offer IPv6 to all units until we’ve a working IPAM for IPv6. The […]