Kerberos upgrades: kdc-admin

kdc-admin is the master server in our Kerberos realm – it’s the server that account changes happen on, and where password resets happen. The data is then propagated to the slave KDCs every 5 minutes. Upgrading this critical system will be the first stage in improving our Kerberos infrastructure.

Note that Kerberos-specific terms and acronyms should be covered in the first blog post in this series. If there’s anything that’s not explained there, please do leave a comment and I’ll try and explain things better.

It’s about time!

kdc-admin is rather overdue an upgrade for various reasons. Its operating system is not as supported as we would like, and to get various features we need we’ve had to backport a version of Kerberos from a newer version of Debian. Maintaining this is extra overhead we’d be quite happy to get rid of.  There are also newer versions of Kerberos available that would give us other features we would like (indeed, one of them we require for our DES deprecation plan). The current kdc-admin also lives in our Banbury Rd data centre, rather than the University’s shared data centre (an altogether nicer space for servers, not to mention the fact that the Banbury Rd data centre is likely to be replaced in the next 2 years or so).

In many ways, upgrading the Kerberos servers is fairly easy; kdc-admin is definitely the most complex.  This is because all the others (kdc0.ox.ac.uk, kdc1.ox.ac.uk, kdc2.ox.ac.uk, kdc3.ox.ac.uk) are read-only slaves, and unless you have only one of them defined in your krb5.conf (which the Kerberos libraries use to work out which host(s) to connect to) we can take one offline with no-one noticing.  Kerberos will happily accept multiple KDCs defined in krb5.conf, or will look at SRV records in DNS, to find which KDC to talk to, and it will iterate through until it finds one that works.  (I haven’t heard of any implementations that only take a single KDC, although I fear there may be such creatures out there somewhere.)

What’s involved?

So, what is involved in upgrading kdc-admin?  Well, we first need to build a test server and run it against our TEST.OX.AC.UK Kerberos realm.  This lets us check some useful things such as whether our tools still work with the upgraded version of Kerberos (have any arguments changed names?  Are we explicitly specifying encryption types that don’t exist?); whether configuration files will need updating; whether packages have changed name or dependency, and so on.  For example, for various reasons we synchronize passwords to Nexus using the krb5-sync plugin[1].  Since the currently-running kdc-admin was installed, the plugin has been packaged for Debian and is supported by the kadmin daemon.  This means that we can drop our custom packaging of it, and simply make sure it gets installed on the new system, and the appropriate snippet of new configuration is in place.

We’ve built the test server, and ironed out a few problems that we discovered (mostly relating to configuration and packages changing).  There were a few issues with replicating to the test slave, but after we built a new slave that was more consistent with the existing ones we found they disappeared.

We’ve also tested the password synchronization – right now I know that when I reset a password on a test account in the TEST.OX.AC.UK realm it is propagated to Nexus[2].

Going live

Once we’re happy with the testing, we can think about installing the live server.  Normally when we run services, we add them as extra interfaces on the server (so we might have charlotte.oucs.ox.ac.uk as the server, with an extra IP to host www.oucs.ox.ac.uk). Generally we’ll install a new server and migrate the service interface across when we’re ready to go live.  Unfortunately, Kerberos service operation is inextricably linked to the name of the host – in this case, kdc-admin.ox.ac.uk – so we have to keep the name of the server the same.  (This is because the server name gets encoded in various places, and Kerberos doesn’t really do multiple interfaces with different names very well, so odd things break.)  This means that we will actually have to install the server with a test name, but have all the kdc-admin configuration (including Kerberos principals) also in place on the server.  When it comes to time to go live, we simply rename the server.

For those who like sysadmin checklists, the general process will look something like:

  • Install new server with temporary name on new IP address (kdc-admin-new.it.ox.ac.uk, 163.1.221.7)
  • Ensure TTLs on kdc-admin are low (300s)
  • Ensure server has appropriate kdc-admin configuration
  • Ensure server has appropriate kdc-admin Kerberos keytabs (by copying from the existing kdc-admin[3])
  • Securely[4] copy the Kerberos stash file[5] to the server
  • Configure kdc-admin to treat the new KDC admin as a slave and replicate changes to it
  • In an announced window (probably a Tuesday morning at 7am), stop the Kerberos daemons on kdc-admin and the new kdc-admin.  Also put webauth.ox.ac.uk into maintenance mode.
  • Take a final dump of the Kerberos database from old kdc-admin, and copy it to kdc-admin-new
  • Disconnect old kdc-admin from the network
  • Rename kdc-admin-new to kdc-admin (this involves some twiddling with configuration management and a reboot, and possibly also lying to the sysadmin’s desktop using /etc/hosts)
  • Test password changes via kadmin.local
  • Get networks to update DNS
  • Run manual propagation pushes to each of the slaves
  • Take webauth.ox.ac.uk out of maintenance mode
  • Check that password changes via webauth.ox.ac.uk work
  • Check that password resets using the security question via webauth.ox.ac.uk work
  • Continue to monitor
  • Celebrate with pastries

What if it all goes wrong?

We roll back.  If we haven’t got as far as the DNS update, it’s as simple as turning old kdc-admin back on; if we have, we’ll need to follow the above procedure somewhat in reverse (disable access via webauth, turn off daemons, manual dump and propagate database to old kdc-admin, get networks to update DNS, turn everything back on).

What if a compromise is discovered and OxCERT need to randomize passwords really urgently?

We can perform this manually for them. But we’d really rather not have to do that.

When?

This should be done at the end of July.  This should be a quieter time (being in the vacation), and it won’t affect people being able to log in – it will simply affect changes to accounts (so password resets, etc).

Notes

[1]  In an ideal world we’d be using a cross-realm trust, as there are various downsides to this sync method.

[2] We have a test account for this purpose, and it’s the only account TEST.OX.AC.UK can change the password of – so even if things go horribly wrong, we can’t inadvertently reset everyone’s live password from the test system!

[3] Normally we’d generate new keytabs as part of the system install (or hostname takeover).  Unfortunately, we’re working on the service that’s used to create keytabs, so we can’t do that here.

[4] This involves GPG, an encrypted USB key, and sneakernet.

[5] The stash file, per the previous blog post, contains the key used to encrypt the Kerberos database entries.  Without this, the server can’t read any of the data about principals (such as even whether they exist).

Posted in Service Improvement | Tagged , | Leave a comment

Kerberos upgrades

Prompted by our colleagues in the Networks team, we will be posting a series of blog entries about the work that we’re currently undertaking to improve the University’s Kerberos infrastructure.

What is Kerberos?

I’m not going to try and explain Kerberos here – it’s been explained elsewhere on the internet far better than I can.  I recommend the “Explain like I’m 5” guide.  Wikipedia’s description is a bit technical.  And MIT have a dialogue about designing a system like Kerberos which is very readable.

The first (public) version of Kerberos was Kerberos 4.  The current version is Kerberos 5, which expands on the previous version and improves security, and this is the version we’re using.

Where do we use Kerberos?

Kerberos underpins Oxford’s Single Sign-On stack.  Your Oxford username is actually a Kerberos principal.  Kerberos underpins WebAuth (which is where most people will be familiar with it) – WebAuth basically implements the ticket part of Kerberos using cookies, and authentication is done against the Kerberos infrastructure.

Kerberos is also used by Nexus (although indirectly; it uses Active Directory which uses Kerberos, but it doesn’t hook directly into Oxford SSO. Instead passwords are synchronized), as well as for cross-realm trusts.

Other things Kerberos is used for in the University include:

  • LDAP
  • NFS
  • AFS
  • GSSAPI-protected web services
  • SSH
  • SMTP

What are some of the advantages of using Kerberos?

Single password

For each account, you only have a single password.  Note that you might have more than one account – for example, I might have a ‘ouit0144’ account for normal work, and an ‘ouit0144/itss’ account for carrying out more privileged tasks.

Single sign-on

Kerberos uses a ticket cache, so once you’ve authenticated to the KDC you can keep getting tickets to connect to new services without reauthenticating, as long as you’re within a given time period.  (For anyone thinking this sounds like WebAuth, that works in the same way – as I said above, it basically implements that aspect of Kerberos but using cookies.)

Your password never goes over the network

If you’re using kerberized software, you run ‘kinit’ or similar on your desktop.  which uses your password generate a key to (symmetrically) encrypt a timestamp, and sends it to the KDC.  The KDC validates that, and uses the same key (which it has stored in a database) to encrypt a response.  This means that your password never has to go over the network.

What will this work involve?

We’re dividing the work up into a series of individual tasks, both to make it more manageable and to make it easier for us to be confident that our changes aren’t breaking things (and if something does break, we only have a single change to look at, rather than a collection of changes).

In (approximate) order the tasks are:

  1. Upgrading kdc-admin to new hardware, new software and a different data centre
  2. Rekeying some “high-value” Kerberos principals (such as krbtgt/OX.AC.UK) to remove single DES
  3. Dropping single DES from the default list of encryption types offered
  4. Upgrading slave KDCs to newer software (and possibly new hardware)
  5. Enable incremental database propagation
  6. Wait for all user principals to be updated to not have single DES portions
  7. Disable single DES support entirely

We will be writing further blog posts to explain more these steps more fully.

Why are we doing it?

Single DES has been deprecated for a long time (NIST withdrew it as a standard in 2005), but in 2012 RFC6649 was published explicitly stating that DES should be considered weak.  MIT have also updated their Kerberos advice to say that DES is deprecated and should not be used.  Given all that, we’re looking to retire single DES across the OX.AC.UK realm.

Rekeying the krbtgt/OX.AC.UK principal will allow easier cross-realm trusts with Windows domains.  Currently, while krbtgt/OX.AC.UK supports stronger encryption types than single DES, they’re not supported by Windows.  Recent versions of Windows have become increasingly less happy about talking single DES, and rekeying will allow us to add strong encryption types supported by all systems (such as AES256).

kdc-admin is running software that is less supported than we would like, and so we want to take this opportunity to upgrade it to the latest version of Debian.  It’s also running a backported and patched version of Kerberos, which was required to get some functionality we needed – functionality that’s built-in in the latest version of Kerberos in Debian, so we won’t have the overhead of maintaining a separate package.

Currently when changes are made to principals they are pushed in bulk from kdc-admin to the slave KDCs.  This bulk dump and propagation runs every 5 minutes, and locks the database for a brief period while it takes place.  This causes problems for users and systems attempting to carry out actions on those principals (eg setting passwords, changing expiration dates, etc), and requires retrying the action.  Moving to incremental propagation should reduce the amount that the database needs to be locked, reducing the number of failures end-users see, and also reducing the number of incidents where we have to get involved to reconcile data. It should also lead to faster password changes.

When are we doing it?

Starting as soon as we can after the end of Trinity term.  While we are doing thorough testing, there will need to be downtime of services during maintenance periods while we carry out some of the work, and we would rather avoid that during busy times of the year (such as exam season).  We also need to give ITSS enough notice about major changes  to allow them to check their systems and make changes if required.

I think my services may require single-DES, what do I do?

Drop us a line at iam@it.ox.ac.uk.  We can have a look at the logs and the principals in question and tell you if you are currently using it, and work with you to mitigate the impact of the work.

Another reason for annual password resets

Annual password resets of user principals aren’t just for the reasons listed here – they also allow us to add new encryption types and remove deprecated encryption types from users’ Kerberos principals.  If we didn’t have the annual expiry policy in place, we would have to have more disruptive procedures when we needed to modify encryption types, such as forcing mass password expiries – not a move likely to make us popular!

Glossary

This is not by any means a comprehensive list of terms related to Kerberos; however, it should help explain most of the terms mentioned here. See the links in the first section of this blog post for deeper explanation of the terms.

  • DES: Data Encryption Standard.  A type of encryption, now generally considered weak.
  • 3DES: encryption that applies DES three times to each block.  Stronger than DES.
  • KDC: Key Distribution Centre.  Generates Kerberos tickets and authenticates users.
  • kadmin: the Kerberos administrative daemon – the software that clients talk to in order to create, delete or modify principals (including password changes)
  • MIT: Original developers of Kerberos.  There are now more variants available (Heimdal is another open-source implementation, and Windows AD uses its own implementation it under the hood), but in Oxford we use MIT Kerberos.
  • Principal: an entry in the authentication database.  The most obvious case is usernames – eg ouit0144@OX.AC.UK – but there are other principals too, such as host/webauth1.ox.ac.uk@OX.AC.UK.  They are qualified with a realm, and if you don’t have a realm the default realm is assumed (so here, OX.AC.UK).
  • Realm: an authentication administrative domain.  By convention, it is an upper-cased version of an organization’s DNS name.  In Oxford’s case, we use OX.AC.UK.  (Note that the realm is case-sensitive.)

Further posts

Posted in Service Improvement | Tagged , | Leave a comment

Infrastructure and Hosting team leader vacancy

The Infrastructure and Hosting team is looking for a new team leader.  This post is open to both internal and external applicants, and is a permanent appointment.

The IAH team leader is responsible for providing technical leadership for the Infrastructure and Hosting team, so strong Linux/UNIX systems administration skills are a must, as well as the ability to line manage the team.

More details are available from the University’s recruitment site, and the closing date is the 4th March.

Posted in Vacancies | Leave a comment

Return of the Advent

Regular readers may be unsurprised to learn that Sysdev have once more acquired a Lego Star Wars advent calendar, to remember a good friend by. We’ll be updating this blog post each day with our adventures in model building (with possible delays at weekends). The first update should appear very shortly…

Continue reading

Posted in Star Wars Advent | Tagged | Leave a comment

Team vacancies

Once more it’s time to alert readers of vacancies in our team. This time we have both a sysadmin and team leader position vacant (although the latter is only available to applicants internal to the University and its constituent colleges).

The sysadmin post is similar to those we’ve listed before on this blog, but with a focus on Drupal deployment – as IT Services builds a large-scale Drupal deployment to be offered as a central service across the University. This is a Grade 8 post which closes on 30th October.

The IAM team leader post is a permanent appointment at Grade 9 and is responsible for the management and development of new and existing IAM services offered by the team. Closing date: 20th October.

Posted in Vacancies | Leave a comment