Kerberos upgrades: rekeying the krbtgt

Kerberos is the University’s Single Sign On system, which underpins other services such as WebAuth and Shibboleth.  Most members of the University don’t use it directly, but indirectly use it every day.

After something of a delay, we are continuing with our Kerberos upgrades, as previously described.

Having successfully upgraded kdc-admin, it’s on to the krbtgt/OX.AC.UK principal – that is, step 2 (and then step 3) of the “What will this work involve” section.

While we are announcing the work to IT Support Staff (ITSS) in the University, this blog post is to provide more background, and explain why we’ve made some of the decisions we have.

What is the krbtgt?

When you successfully authenticate to a KDC, you are given a TGT (Ticket Granting Ticket).  This is passed back to the KDC when you want a ticket for another service, and proves that you are who you say you are.  This ticket is encrypted with the krbtgt principal for a realm – so in our case, it’s the krbtgt/OX.AC.UK@OX.AC.UK ticket.

The only systems that know the password for the krbtgt are the KDCs.

What are we doing?

At the moment, the krbtgt/OX.AC.UK principal only supports DES and 3DES encryption types.  DES has been deprecated for years, and as of 2015, MIT (who develop the version of Kerberos we use) have removed it from the default supported encryption type list.

We are going to rekey the krbtgt to add RC4 and AES encryption types.

We will continue to support DES and 3DES on the principal until we have established that no-one is using it.

Due to an interesting quirk, our krbtgt actually has two DES keys: des-cbc-md5 and des-cbc-crc.  When we rekey, we will drop desc-cbc-md5, as it is not possible to add multiple keys with the same encryption type.  We have established (via the logs on the KDCs) that no-one is currently using des-cbc-md5.

Our plan is pretty much the MIT DES retirement plan.  We will keep the old keys, so existing sessions continue to work.

Why has it taken so long?

Back when we tried this in 2015, we discovered an oddity while doing some final testing.  As we can’t easily roll back once we’ve gone live, and we didn’t understand what was happening, we decided to roll back and investigate further.

It took a while, but we tracked down the issue.  If you get a ticket before rekeying, rekey, then forward your ticket (eg via SSH) and try and use it, you get a “bad encryption type” error.  (There is more detail in the mailing list post I wrote about it.)

The MIT Kerberos developers replied to say that this was a new manifestation of a known bug, that was fixed in Kerberos 1.14.  (It has since been confirmed that the same thing will happen if you have a renewable ticket, and try and renew it rather than getting an entirely new ticket.)

(Just to note here, a renewable ticket is a specific type of ticket that you can present back to the KDC to extend its lifetime.  Normally, you would have to re-authenticate (with a password or keytab), and get a new ticket (which is the behaviour of tools like k5start).  If you use krenew, this will affect you.)

The problem here is that we are using Kerberos 1.12, which is the version currently in Debian stable (jessie), and upstream suggested that it would be difficult to backport the patch.  That’s not too much of an issue, though – Debian testing (stretch) is close enough to release that we can backport the 1.14 libraries from it, and use them.

We did this, and in November we rolled out some new KDCs running Kerberos 1.14 (these replaced KDCs running the very old 1.8).

Unfortunately, by the end of the day we had 4 or 5 reports from ITSS with cross-realm trusts to their Windows domains that users could no longer access file stores when using the new KDCs.  As we had only upgraded some KDCs, they were able to test against the new and old KDCs, and identified that the problem was with the new KDCs.

After some head-scratching, we rolled the new KDCs back to 1.12, and suddenly all the problems went away.

With the generous assistance of Simon Wedge at St Antony’s, who had a system that was consistently failing, we got some packet dumps, and were able to analyse them.

It seems that between 1.12 and 1.14 MIT Kerberos changed the way it responded (also) to initial authentication requests.  In 1.12 and earlier, it would return a list of all encryption types supported by the principal for which authentication was being tried (which included DES and 3DES).  However, for 1.14, it only responds with a single encryption type (generally the strongest – which is not DES or 3DES!).

Windows was somehow caching this result – but not using it initially.  Instead, it would use the full list of encryption types for the complete initial authentication, and get a valid ticket.  It would then attempt to re-authenticate to access a file share.  At this point, instead of sending the normal list of encryption types, it would send the one that was returned by the KDC earlier, and its own list – which included RC4 and some custom Microsoft RC4 encryption types, but not DES.  This then failed, because the krbtgt didn’t have any keys of that type.

Rather confusingly, we only saw errors from some cross-realm trusts – we know of at least 3 or 4 other Windows cross-realm trusts that worked fine.

Now, there is a work-around suggested by the developers – unfortunately, this effectively makes all krbtgt tickets DES, even those that could be 3DES.  This is something that we are keen to avoid.

Ironically, once we rekey the krbtgt, the 1.14 problem goes away, as we will have the full set of encryption types supported by the krbtgt.

Rock and a hard place?

So, we find ourselves with a choice – stick with 1.12 (which we know has issues with renewable tickets) or upgrade to 1.14 (which will break cross-realm trusts for a time).

The risk of upgrading to 1.14 is that if things break we can’t necessarily easily tell whether it’s caused by the rekey or 1.14.  With 1.12 we have been running it for over a month, and have a good feeling for what is ‘normal’.

1.12 is also the version currently in Debian stable – 1.14 would require us to track Debian testing and backport any appropriate fixes (made more interesting by the fact that since we started this, 1.15 has moved into testing – so we’d have to backport and test that).

We have therefore decided that we will stick with 1.12, and accept the risk of renewable/forwardable tickets not working.

When exactly is this happening?

Our standard maintenance period is 7am-9am on a Tuesday morning.  This is partly because it coincides with the Janet maintenance period, and partly because if anything goes wrong staff are available during the day to fix problems.

We expect any issues to fall in to one of two groups:

  • ‘transient’ issues with sessions, where sessions created before the rekey do not work post-rekey
  • ‘permanent’ issues, where systems do not work with the rekeyed krbtgt

The default Kerberos ticket lifetime is 10 hours, so the permanent issues (with new sessions) may well only become apparent some time after we make the change – if we’re unlucky, about the time everyone is going home.

For this reason, we have decided to make the change on at 9pm on Monday evening.  This should minimize the number of people who see issues with existing sessions, purely because there are fewer people using the system at night out of term.  It also means that if permanent issues appear we can work with ITSS colleagues to identify and fix them during normal working hours.  (It also means that we shouldn’t end up working a 16-hour day – towards the ends of those, troubleshooting gets very hard.)

We are doing this on Monday 9th January 2017, which is Monday of 0th week.  This is less notice than we would ideally like, but this date is a compromise with the minimal number of users actively using systems.

What impact will I see?

Hopefully, none.

We have tested that Webauth works fine (unless you’ve done something very non-standard to your server).  Shibboleth will also work.  So, most people shouldn’t notice.

We have tested cross-realm trusts (a simple case with Server 2008R2 and Windows 7, and Server 2012 and Windows 10), and they work in testing.  However, given the different setups across the University, this is in no way a comprehensive test (as we saw from the 1.14 upgrade – a handful of units had issues where most were fine).

What if I do see problems?

If you are an end user, please talk to your local IT Support Staff.  They will be able to assist you in identifying the issue, and should be able to assist you with initial investigations.

If you run a service that is affected, we recommend you restart the affected service, or, at the worst case, reboot the systems.  While this sounds a very stereotypical answer, it is for good reason – it will clear any state may have been using the old encryption types, and also fix any renewable tickets, if they existed.

If that doesn’t work, please email us at giving as much detail as possible.  We will be able to review the logs on our side, and help troubleshoot and fix your problems.

What if it all goes wrong?

If everything goes pear-shaped, we will be able to roll back.

Unfortunately, this will invalidate the sessions of everyone who has got a new ticket since the rekey.

This could potentially have a large impact – while WebAuth should be ok (people will be asked to re-authenticate), other services will likely experience issues until they get new tickets.  This includes many IT-Services run systems (including anything backed by Oak LDAP, the Registration service, the mirror service, mailing lists, CUD).

It is possible that the rollback may also roll back all changes since the rekey – including account creation and deletion, and password changes.

The impact of this is likely to be so large that we would prefer to work with ITSS to fix problems, rather than roll back and then deal with restarting services.

What’s next?

If this works, we expect most principals to move to using AES256 tickets immediately.  Once things have settled down we will follow up with owners of principals that are still using DES, and help them move to a stronger encryption type.

Once we have no users of DES, we will be able to rekey again (which will be much less painful, as we’re not changing the strongest enctype) and remove DES entirely.

Posted in Service Improvement | Tagged , | Leave a comment

SysDev Star Wars: Attack of the Advent

In a galaxy quite close to home…

Regular readers should be encouraged to discover that our adherence to this great tradition has not wavered, as we not only enjoy some advent fun in the run up to the Christmas holidays, but also remember a good friend and colleague, despite being gripped by the closing in of the dark side – until Spring at least.

The rules remain the same, each day one door will be opened, and one brave Jedi will do battle with fiddly little pieces of plastic, and then triumphantly append a photo of their finished masterpiece to this post.

We’ll be nagging team members to update this blog post daily with their latest lego adventures (with possible delays over the weekends). The first update should appear very shortly…

1st December 2016 (Adrian)


Many many aeons after Jango Fett had stolen the patrol craft and taken it as his own and named it Slave, a strange event occurred. A new venture to explore the outer reaches of the galaxy was about to commence. Then, out of the early frosty and wintry morning a new ship appeared. But not one of these new-fangled jobs. It was an absolute clone of the Slave. Where it came from, no-one knew. But it was seen hovering in front of a set of old blue-prints of the Slave, and the myth arose that it had spontaneously auto-reassembled from thin air. It henceforth became known as henceforth-the-auto-reassembled.

2nd December 2016 (Alex)


This poor Bespin guard bursts from his plastic and cardboard prison, takes a moment to compose himself — luckily he is of few parts — and looks around. This was not Bespin, as evidenced by there being a ground to stand upon. Turning, he discovers an abandoned patrol craft and wonders whether he might fit inside. Thinking it unlikely and unsure of what else to do, he resolves that he should do what he knows best. He stands and guards it, waiting.

3rd December 2016 (Christopher H)

20161209_140508Fly, my pretty! Fly!

4th December 2016 (ChrisF)

A passing Imperial Navy Trooper swaggers on to the scene and sees a bored looking Bespin Guard who seems to have developed a fondness for a piece of nearby space junk. Naturally the Bespin Guard takes offence at this description of the recently discovered patrol craft. Honour must be satisfied!

So, the duel begins… Ten paces, turn and vapourise! But who will be the victor?

5th December 2016 (Chux)

A DF.9 gun turret guarding a precious christmas tree

Trying to make the most of the temporary refuge which Echo Base had become, General Rieekan orders for the installation of DF.9 gun turrets at vantage points around the base.

This particular DF.9 was hastily installed to contain a clutch of Imperial AT-AT walkers and ground troops spotted making a bee-line for a particularly precious Christmas tree.

And just in time too, the gun turret helps defeat the Imperial forces and saves Christmas at Echo Base!!!

6th December (Dameon)

2016-12-06_snow_trooper_smAnd now there are three of them, it turns from a casual duel, to an official Mexican standoff!

“Get off my snowfield, Bespin scum!” yells the Snow Trooper, “I don’t even know what a Mexican is!”

“I don’t think we have them in this far far away galaxy” replies the Navy Trooper, before turning his blaster on the cloud-city native, all the while keeping one eye on the distant gun turret.

7th December (Jim)

2016-12-06_snow_trooper_smOlaf the Snowtrooper Snowman has just come down the mountain.

But where are Anna and Kristoff? We were due to go up the mountain to find Elsa!

And which bastard took my stick arms?

He then looks to his left and sees the HX400-UT Imperial Blaster Cannon and ejaculates:

Bollocks! I’m in the wrong movie!

8th December (Dave)

20161208 A resupply mission gives the Mexican standoff a turn to the unfair when the Imperial Navy Trooper finds himself armed with a dish canon.

“Feel the POWER of MY Force! Yeah, baby! This is what I’M TALKIN’ ABOUT!”

“Will? Will Smith? Is that you?” asks the Bespin Guard.

“CUT CUT CUT CUT CUT! screams the director. “Will you guys PLEASE stick to the script? You ain’t the Fresh Prince of nothin’ out here and your ad libs ARE NOT better than the lines we gave ya. Okay, ready people? Let’s make some MAGIC today! From the top, ACTION!”

9th December (DR)

goggsWhere the hell did I put my goggles?

10th December (Jim)

Venator-class Star DestroyerNewly assigned to his post as Admiral of “Constipation”, the Galactic Republic’s new Venator-class Star Destroyer, Lar Jarse, stood on the bridge and surveyed the life teeming on the planet below.

Well, we can’t have this! All these creepy-crawly things squirming around to no good! It just isn’t British!

As his crew desperately scrambled for their scanners, the new Admiral vociferated and the unfortunate planet’s fate was sealed.

Unleash the death-ray!

11th December (Julian)


Back on the other nearby planet an Armoured Assault Tank swings into view. Its clone pilots are briefly confused not to see the Gungans that they were originally pursuing. Ah, well lets take out that Bespin Guard first and then see whether we get any further orders from our leader – hang-on maybe that carrot nosed trooper is our leader?

12th December (Ken)


“Mayday! Mayday! Mayday!


13th December (Robert)

Battle droid “Hold it right there!” yelled the battle droid, as it came across Olaf.  “Give me the battle cannon or else!”

Olaf sighed, and continued to stare into the distance as he wondered just where he had ended up.  “Just let it go…” he muttered…

14th December (Michael)


As Obi-Wan takes a  sharp right around the festively-decorated turret, he idly wonders how he’s being persued by a ship that wouldn’t be invented for many years.  The magic of Christmas?

15th December (Dameon)

2016-12-15_tantive_iv_sm“What do you mean ‘The damage doesn’t look so bad from out there’?” exclaims the increasingly pessimistic Captain Antilles.
“‘Oh, there’s the Tantive IV’ they say, ‘Let’s pull them over and check their systems for secret plans again. That’s always a good laugh’ … damned imperial speed cops, don’t they have anything better to do with their time?”
“Oh well, brace for impact … again…”


16th December (Robert)

E-3P0Meanwhile, E-3P0 was admiring the Empire’s latest Star Destroyer, “Constipation”.  Whilst he was well aware of its ability to destroy entire planets with a single shot, from here it looked small; almost insignificant compared to some of the other ships he had seen lately.  Such a powerful ship deserved a name with distinction, a name with gravitas.  “Constipation”? That has zero gravitas, he thought.

17th December (Stu)


A GNK Power Droid arrives to charge up the parked Starship Enterprise*. “It’s a lot smaller than it looks on the telly.”, it gonks.

[* Pesky wormholes, mixing up the universes.]




18th December (Adrian)

20161220_jabba_palace-1Of course, all along Jabba (the Hutt) had been up in his palace atop Mt Pannatooine, observing  the arrival of lots of goodies for the plunder down in the valley below. He was licking his slobbery chops and anticipating a good festive season. What he didn’t know though, was that his musings were to be rudely interrupted by a swarm of raisin-bots which had scaled the steep cliffs and were intent on some plundering of their own.

 19th December (Alex)

Luke who it is! The Death Star Trooper tries to take aim but finds himself lifted into the air as the protocol droid looks on. Better style this one out, he thinks. “Thanks, Matilda!” says Luke to the small girl hiding in the Gonk droid.




20th December (ChrisF)


Fast forward… and our once heroic Imperial Navy Trooper, having eradicated every last trace of Rebel nonsense in the neighbourhood (especially that Bespin rogue with poor dress sense) is now bored witless and is reduced to handing out on the spot fines to illegally parked Desert Skiffs for a living. It was either that or a job grilling rontoburgers at the local “StarChow! All the nutrients you can chew, suck or absorb for only 99 Imperial Credits!”

21st December (Christopher H)

Lounging on his Sun-bed, the tired soldier saw an approaching airborne object. “Is it a bird? Is it a plane? I’ll shoot it down anyway he thought”.

22nd December (Stu)


An Imperial Sentinel Class Landing Craft cruises around with Luke roof-surfing, before doing what it does best, and landing. “Look at me! Olaf. Look at me! Olaf! OLAF! YOU’RE NOT LOOKING!!”






23rd December (Nigel)

Having not only gained Olaf’s attention, but also an impromptu lecture about how “Star Wars” is “so last millennium, baby” and the future is in Ice and crossover movies, “like, um, ‘Star Wars Frozen'”, Luke trades his father’s lightsaber for a pair of ice skates and a hockey stick, determined to hone his Jedi Hockey skills (puck telekinesis, anyone?) for the musical “Disney On Ice: Star Wars” that Olaf makes him certain is just around the corner.

Feel the freeze, Luke, feel the freeze.


24th December (Dave H)

snow-chewbaccaThe Albino Chewbacca is decked with festive decoration – his bandolier is painted red and green. He comes with a snazzy new bowcaster which fires off snowballs (1×1 studs) which is always a great weapon to have. He also comes with two miniature pine trees and some spare snowballs.

A great way to finish the 2016 Star Wars Advent calendar.

Happy Christmas!

Posted in Star Wars Advent | Tagged | Leave a comment

Shibboleth Identity Provider upgrades

After some slight prompting by both the Networks team and colleagues in Sysdev, the IAM team felt that we should write some blog posts of our own about our own work to upgrade the University’s authentication infrastructure.  The first of these is on our work to upgrade the Shibboleth service.  This work ensures that we are running a fully-supported version of Shibboleth, as well as enabling new features in the future, such as single log-out.  The upgrade will also make our Shibboleth servers highly available, which should improve service reliability, and allow us to consolidate our existing servers to an extent.

The upgraded service will go live on the 5th April after much testing over the past few months.  No Shibboleth-protected services should be affected by this work, and the upgrade should be transparent to end users.

What is Shibboleth?

Shibboleth currently sits at the top of Oxford’s Single Sign-On (SSO) stack, on top of both Kerberos and Webauth.  The original purpose of Shibboleth was to extend SSO to services outside the University, such as journal access.  However, Shibboleth is also frequently used for services within the University as well, not least to provide SSO to systems that lack support for Webauth.  Although Windows servers are the most common case of servers without Webauth support, other systems such as Bradford Campus Manager also fall within this group.  Shibboleth is based on the idea of “claims-based authentication” using SAML, where a Service Provider (or SP) is given a signed “assertion” from a trusted Identity Provider (IdP).  This assertion contains details (known as attributes) about the end-user such as username, name and email address that can then be used to make decisions about access.

For Shibboleth to work, the IdP and SP need to know certain details about each other, such as where they may be found and the certificates used for signing assertions.  This information is known as the metadata for a given server.  It is possible to share this manually between the two servers if needed, but when this is scaled up to a large number of services and identity providers it becomes unwieldy to manage the metadata swapping.  To solve this problem, Shibboleth servers generally have their metadata published by one or more “federations”, which act as a single trusted source of metadata.  The individual Shibboleth servers then fetch signed metadata from the federations they trust.

Since Shibboleth may be used to login to many different SPs with varying levels of trust, the software is privacy-preserving by default.  This means that attributes that could be used to identify end users must be explicitly “released” to a given service provider.  This means that instead of a normal username, services are typically presented with an opaque persistent ID, which is generated by a one-way hash of the service provider’s “entityID” (an identifier for that particular identity or service provider) and the Oxford SSO username.  This prevents separate SPs working together to de-anonymise users.

Why upgrade Shibboleth?

About a year ago, we received the news that updates and support for version 2 of the Shibboleth Identity Provider (IdP) server would be discontinued by July 2016.  This meant that we had to start work on migrating to the new version of the software (IdP v3), since running supported software is a good idea.

In addition to the obvious desire to run a supported version of the IdP software, the upgrade also means we can make resiliency improvements.  At present, almost all Oxford Shibboleth authentication is handled by a single server.  This is mostly down to the difficulties in setting up an IdP v2 cluster, but is also down to avoidance of load-balancers in the past.  (For historical reasons, there is also a completely separate IdP pair that is used for some internal business systems, with manual switching between the two servers.)  However, the popularity of Shibboleth for new services means that the current single point of failure is no longer a sensible option today.  The IdP v3  software is also rather easier to cluster than the previous version, and no longer requires a complicated state-sharing mechanism for clustering.

Finally, the upgrade process provides an opportunity to consolidate our existing Shibboleth environments.  Currently, we have three environments, which look like the following:

  • Main IdP
    • Live (1 server)
    • Test IdP (1 server)
    • Development (1 server)
  • Business Systems IdP
    • Live (2 servers)
    • Test (1 server)
  • IAM test stack (1 server)

As mentioned earlier, we have historically run a separate IdP for business systems that required a high-availability authentication service.  However, as the upgrade will bring high-availability features to the main IdP, we should be able to remove the additional environment:

  • Main IdP
    • Live (3 servers)
    • Test IdP (2 servers)
    • Development (1 server)
  • IAM test stack (1 server)

While the total number of servers is identical, the elimination of the two business systems environments improves manageability of the service.

Load balancing and improving resiliency

The new service uses the Netscaler load-balancing device run by the Business Systems Operations Team, which is also used by WebLearn and other services.  The Netscaler supports both session stickiness (necessary for avoiding server switches mid-authentication) and content-based switching, which is useful for allowing users to choose between old and new servers as well as separating out SAML1 and SAML2 requests for testing.  For services using SAML2, the attributes are transferred between the IdP and SP via the end-user’s browser.  However, in the case of SPs using SAML1, the SP must contact the IdP directly via a back-channel to obtain attributes.  All the necessary state is stored on the client side, so no shared server state is required.  The only exception to this is the authentication process, which must be performed on a single server.

One interesting question is how the IdP maps an attribute query to the back-channel to SAML1 authentication request to the front-channel.  The answer is that the front-channel returns a transient ID which is reversibly encrypted.  The back-channel process then decrypts this transient ID to find out which user the request applies to.

Problems we saw

While the process of upgrading was slow, there were relatively few problems during the upgrade process.  In several cases, the upgrade to IdP v3 improved compatibility with external services.  For example, some service providers require particular types of authentication or require certain forms of user identifier to define the “subject” of an assertion.  However, there were some problems that we saw during the upgrade.


The first problem was how to test service providers that still use the old SAML1 protocol.  Because these servers communicate directly with the IdP to retrieve attributes, it is generally difficult to test whether these behave as intended with the new service.  The solution we came up with was to test specific development servers against the new IdP cluster, before testing external systems later in the rollout process.  Ideally, we would have tested external sites with a separate test IdP.  Unfortunately, some providers set strict limits on the number of IdPs that can be trusted (often 1) for a given organization, which makes this impossible.

Assertion signature algorithm

Another problem we saw was a lack of support for assertion signatures based on SHA-2.  This is fairly rare, but affected one relatively important Service Provider: the cloud-based software used by our centralised helpdesk.  While some may consider a lack of visible queries to answer a good thing at times, the Service Desk team may beg to differ!  We fixed this by modifying relying-party.xml, as documented in the Shibboleth wiki:

<!-- SHA-1 support bean -->
<bean id="SHA1SecurityConfig" parent="shibboleth.DefaultSecurityConfiguration"
  p:signatureSigningConfiguration-ref="shibboleth.SigningConfiguration.SHA1" />

<util:list id="shibboleth.RelyingPartyOverrides">
  <bean parent="RelyingPartyByName" c:relyingPartyIds="entityID here">
    <property name="profileConfigurations">
        <bean parent="SAML2.SSO" p:securityConfiguration-ref="SHA1SecurityConfig" />

Persistent IDs

The third issue we saw concerned our generation of opaque persistent IDs, which include an IdP-specific salt value.  This is needed so that SPs cannot trivially reverse the persistent ID by brute force.  For historical reasons, we use a random binary salt as opposed to the text-based salt more typically used, and accommodating this required some minor modifications to the IdP software.

Additional Verification

The final problem we saw was with the Additional Verification service, which provides multi-factor authentication.  Although this service is rather limited at present, Additional Verification is currently used by WebLearn to protect examination setting and marking.  The service is currently based on a custom-written Java servlet that sends one-time codes via text message.  As the new IdP version changed the authentication interfaces used, the servlet required some modifications to work correctly.  As a side-effect, the service was also restyled to match the current Webauth service.

The roll-out process

We started the process on the 8th March by placing our existing IdP behind the Netscaler load balancer.  The existing server kept its IP address, but the DNS entries were modified to point at the load balancer.  The reason we did this was to avoid problems with SPs that use the older SAML1 protocol, which include several journals and library resources, along with the Bodleian’s SOLO portal and this blog.  Since some SPs cache DNS responses for up to seven days, a grace period is needed to make sure that the back-channel and front-channel connections both use the load balancer.

Netscaler setup before IdP v3 go-live

Netscaler setup before IdP v3 go-live (courtesy of Julian)

The next step was to test that services using the old SAML1 protocol still worked using the new servers.  On the 22nd March, we temporarily switched requests for SAML1 authentication (including back-channel requests) to the new servers during the maintenance window.  This let us test that the new servers worked as intended with external journals, and confirmed that the sites worked.

The final step will be to switch traffic from the old IdP server to the new cluster.  Barring any last-minute problems, this will happen on the 5th April during the 7 a.m.-9 a.m. maintenance window, which will allow us time to test the new service and revert back if anything does go wrong. The resulting Netscaler setup will look like this:

Netscaler setup after IdP v3 go-live

Netscaler setup after IdP v3 go-live (courtesy of Julian)

Posted in Service Improvement | Tagged , , | Leave a comment

Stardev Syswars: The Advent Menace

The days have been glorious; the force lit the world and provided bounty and delight for all. All things have their time. The evenings have turned to the darkside and enforced merriment pervades the air at Tesco. It can only mean that winter is upon us once again.

As regular readers will remember, it is a long standing tradition that sysdev gets a Lego StarWars advent calendar. This year is no different (and, actually, the first door was opened by Adrian yesterday — more on that later). The rules remain the same, one day, one door, one brave Jedi to do battle with fiddly little pieces of plastic, and one photo of their finished masterpiece appended to this post.

So, onwards to victory …

1st December 2015.

Adrian’s landing ablationator is first ashore in a heroic rush, but he’s torn between the red-wrapped treat on the right and the orange one on the left. Adrian will learn to pay attention to the road ahead…


2nd December 2015

Word reached us that an army of Trandoshan bounty hunters were on the prowl .. and could very well reach Camp Gaderffii within hours .. We quickly readied the  Multi-Projectile OzBlop J-24 and beamed it up to a vantage position with its 4 x ZZ14 Rocket Systems facing 4 possible approach directions

Multi-Projectile OzBlop J-24

3rd December 2015


4th December 2015 (Dameon)

Utini! It looks like the pesky little Jawas have turned up to try and steal all the presents … luckily they haven’t been bought yet!


5th December 2015 (Robert)

This is not the sand crawler you are looking for…


6th December 2015

My first Lego Star Wars calendar – ever! I’m told this is probably a Ewok turret of some kind.

photo 1

7th December 2015

No idea what I have built here, I’m confused but then that’s war for you. Perhaps it will cause the Trandoshan bounty hunters to think on’t.  I’ll tell them it’s a present form Darth Santa and leg it to the Pub, I’m in need of some Dagobah swamp water.

photo 1

8th December 2015

This is an Ewok! The first piece apart from the Jawa that I actually recognise despite having watched all three original films.

photo 2

9th December 2015

The Ewok is happy now he’s got his boulder catapult loaded. Don’t let those glazed eyes deceive you – he can take out a Stormtrooper’s helmet at 30 Wookiee strides. Once he’s run out of boulders the catapult will make an excellent present delivery mechanism to the tree tops.


10th December 2015

This Stormtrooper seems to have stolen a bow and arrow and is creeping up on an oblivious Ewok.


11th December 2015

A Star Destroyer ‘jacked by joyriders hurtles past the diorama. Remember kids, Speed Kills. And Merry Christmas!


12th December 2015 (Robert)

You can never have too many weapons racks…

Weapons rack

13th December 2015

Bender makes a cameo appearance and cannot resist goosing a stormtrooper. “Ooh ooh ooh ooh. Can you feel the Force?”  That stormtrooper soon will.


14th December 2015

If you look very carefully, you’ll see the millenium vulcan swooping in at 3 parsecs below the speed of light, about to wreak havoc upon the land.
(cameraman: adrian. director, producer and pilot-stuntman: dave).
Hang on, wasn’t vulcan from another story? Dave, I thought you said vulcan!


15th December 2015

Small 20151223_163109.mp4

16th December 2015

A trigger happy Sleazebaggano on watch duty and sat in a grounded Alderaan Y-Wing Starfighter  blasts out a wrapped xmas present with a proton torpedo when the electronic toy inside makes a squeak .. oops!!! someone will be without a xmas present .. Too right, no more watch duty for Sleazebaggano  ..

Alderaan Y-Wing Starfighter

Alderaan Y-Wing Starfighter


17th December

The Hoth Rebel Trooper looks rather bemused at the lack of snow – clearly, Hoth is also having a very mild winter.


18th December

Imperial Walker just minding its own business…


19th December

Rebel Ion Canon

Rebel Ion Canon

Traverse Right!  Steady on!  1 … 500!  Oops, wrong film.  I’m sure the Beatles don’t mind.  A rebel ion canon looses a volley at the Imperial Walker.

20th December

I don’t know what it is but it doesn’t look good.  We need courage, pass the Dagobah swamp water.


Ah!  That’s better, I think I may ask it if it would care to dance.altered-photo20

21st December

Activity is seen in the hills.  All guns!  All guns!  Fire at will!  No, no, Fire at will, not Fire at Will!  Stop firing!  Stop firing!

Look carefully and you can just make out the shot being fired off.  I should have made an annoying animated GIF instead, I know

Look carefully and you can just make out the shot being fired off. I should have made an annoying animated GIF instead, I know


22nd December

R2-Deer2 and his mate turn up late for the party. The Dagobah swamp fumes have livened their circuits and they start a bout of rutting around the Pit (of Carkoon) whilst the Stormtrooper cheers on.


23rd December 2015 (Dameon)

Swoosh! This guy simply HAS to be the droid I’m looking for … Help me R2-Deer2, Vader must be near, I can feel his presents!

24th December 2015

And finally! Santa3P0, with a sack containing a taun-taun of presents, lands at last in his sleigh, pulled by the red-nosed R2Deer2.

(mutters) “It’s against my programming to impersonate Santa Claus”

All the boys and girls wait patiently, keeping Luke warm near the fire.

“Hello, boys and girls, I’m terribly sorry but you’re doomed, the possibility of a child being good all year round is approximately 32,700 to 1.”

(cue sad faces) 

“But here are your presents anyway”

(Fade to cheers from the assembled throng)


Posted in Star Wars Advent | Tagged | Leave a comment

Kerberos upgrades: kdc-admin

kdc-admin is the master server in our Kerberos realm – it’s the server that account changes happen on, and where password resets happen. The data is then propagated to the slave KDCs every 5 minutes. Upgrading this critical system will be the first stage in improving our Kerberos infrastructure.

Note that Kerberos-specific terms and acronyms should be covered in the first blog post in this series. If there’s anything that’s not explained there, please do leave a comment and I’ll try and explain things better.

It’s about time!

kdc-admin is rather overdue an upgrade for various reasons. Its operating system is not as supported as we would like, and to get various features we need we’ve had to backport a version of Kerberos from a newer version of Debian. Maintaining this is extra overhead we’d be quite happy to get rid of.  There are also newer versions of Kerberos available that would give us other features we would like (indeed, one of them we require for our DES deprecation plan). The current kdc-admin also lives in our Banbury Rd data centre, rather than the University’s shared data centre (an altogether nicer space for servers, not to mention the fact that the Banbury Rd data centre is likely to be replaced in the next 2 years or so).

In many ways, upgrading the Kerberos servers is fairly easy; kdc-admin is definitely the most complex.  This is because all the others (,,, are read-only slaves, and unless you have only one of them defined in your krb5.conf (which the Kerberos libraries use to work out which host(s) to connect to) we can take one offline with no-one noticing.  Kerberos will happily accept multiple KDCs defined in krb5.conf, or will look at SRV records in DNS, to find which KDC to talk to, and it will iterate through until it finds one that works.  (I haven’t heard of any implementations that only take a single KDC, although I fear there may be such creatures out there somewhere.)

What’s involved?

So, what is involved in upgrading kdc-admin?  Well, we first need to build a test server and run it against our TEST.OX.AC.UK Kerberos realm.  This lets us check some useful things such as whether our tools still work with the upgraded version of Kerberos (have any arguments changed names?  Are we explicitly specifying encryption types that don’t exist?); whether configuration files will need updating; whether packages have changed name or dependency, and so on.  For example, for various reasons we synchronize passwords to Nexus using the krb5-sync plugin[1].  Since the currently-running kdc-admin was installed, the plugin has been packaged for Debian and is supported by the kadmin daemon.  This means that we can drop our custom packaging of it, and simply make sure it gets installed on the new system, and the appropriate snippet of new configuration is in place.

We’ve built the test server, and ironed out a few problems that we discovered (mostly relating to configuration and packages changing).  There were a few issues with replicating to the test slave, but after we built a new slave that was more consistent with the existing ones we found they disappeared.

We’ve also tested the password synchronization – right now I know that when I reset a password on a test account in the TEST.OX.AC.UK realm it is propagated to Nexus[2].

Going live

Once we’re happy with the testing, we can think about installing the live server.  Normally when we run services, we add them as extra interfaces on the server (so we might have as the server, with an extra IP to host Generally we’ll install a new server and migrate the service interface across when we’re ready to go live.  Unfortunately, Kerberos service operation is inextricably linked to the name of the host – in this case, – so we have to keep the name of the server the same.  (This is because the server name gets encoded in various places, and Kerberos doesn’t really do multiple interfaces with different names very well, so odd things break.)  This means that we will actually have to install the server with a test name, but have all the kdc-admin configuration (including Kerberos principals) also in place on the server.  When it comes to time to go live, we simply rename the server.

For those who like sysadmin checklists, the general process will look something like:

  • Install new server with temporary name on new IP address (,
  • Ensure TTLs on kdc-admin are low (300s)
  • Ensure server has appropriate kdc-admin configuration
  • Ensure server has appropriate kdc-admin Kerberos keytabs (by copying from the existing kdc-admin[3])
  • Securely[4] copy the Kerberos stash file[5] to the server
  • Configure kdc-admin to treat the new KDC admin as a slave and replicate changes to it
  • In an announced window (probably a Tuesday morning at 7am), stop the Kerberos daemons on kdc-admin and the new kdc-admin.  Also put into maintenance mode.
  • Take a final dump of the Kerberos database from old kdc-admin, and copy it to kdc-admin-new
  • Disconnect old kdc-admin from the network
  • Rename kdc-admin-new to kdc-admin (this involves some twiddling with configuration management and a reboot, and possibly also lying to the sysadmin’s desktop using /etc/hosts)
  • Test password changes via kadmin.local
  • Get networks to update DNS
  • Run manual propagation pushes to each of the slaves
  • Take out of maintenance mode
  • Check that password changes via work
  • Check that password resets using the security question via work
  • Continue to monitor
  • Celebrate with pastries

What if it all goes wrong?

We roll back.  If we haven’t got as far as the DNS update, it’s as simple as turning old kdc-admin back on; if we have, we’ll need to follow the above procedure somewhat in reverse (disable access via webauth, turn off daemons, manual dump and propagate database to old kdc-admin, get networks to update DNS, turn everything back on).

What if a compromise is discovered and OxCERT need to randomize passwords really urgently?

We can perform this manually for them. But we’d really rather not have to do that.


This should be done at the end of July.  This should be a quieter time (being in the vacation), and it won’t affect people being able to log in – it will simply affect changes to accounts (so password resets, etc).


[1]  In an ideal world we’d be using a cross-realm trust, as there are various downsides to this sync method.

[2] We have a test account for this purpose, and it’s the only account TEST.OX.AC.UK can change the password of – so even if things go horribly wrong, we can’t inadvertently reset everyone’s live password from the test system!

[3] Normally we’d generate new keytabs as part of the system install (or hostname takeover).  Unfortunately, we’re working on the service that’s used to create keytabs, so we can’t do that here.

[4] This involves GPG, an encrypted USB key, and sneakernet.

[5] The stash file, per the previous blog post, contains the key used to encrypt the Kerberos database entries.  Without this, the server can’t read any of the data about principals (such as even whether they exist).

Posted in Service Improvement | Tagged , | Leave a comment