We are approaching deployment of a new fleet of DNS resolvers and there are a few questions that we would like feedback from the wider ITSS community. Specifically this post is broaching the subject of DNSSEC. Just to be clear, this is nothing to do with securing and signing our own zones (ox.ac.uk being but an example), but rather whether we will request and validate signed responses from zones who have already implemented DNSSEC. I have views and opinions on this matter, but I will put them to one side and offer an untainted exposition. If my bias creeps through then I apologize.
On the subject of comments, whereas I welcomed comments in my previous blog posts, I actively encourage it here. A dialogue would be nice in an informal channel and will hopefully help us reach a consensus. The informal place is because ultimately, you are free to do whatever you like with the validation data; this is only to decide the central resolvers’ default behaviour.
What is DNSSEC?
Hopefully you are already aware of what DNSSEC does, and possibly how it achieves it. There are some good guides already online explaining DNSSEC. In essence, before DNSSEC, you had to take it on trust that the reply you received for a DNS query was valid. In some sense, nothing has changed, you are still trusting that the DNS resolvers are correctly validating any responses received (by default, you are free to replicate the validation yourself). However, you can now be sure that if you want www.cam.ac.uk to resolve to an IP address, with DNSSEC requested (via something called an AD bit), any validated response will either be the correct answer, or will fail (NXDOMAIN).
Does this decision affect me?
- I am running my own resolver / am not using the central resolvers
- No
- I’m running a stub resolver and am using the central resolvers as a forwarder
- Potentially
- I’m running a stub resolver and validating my own queries
- No
- I’m running a laptop plugged into eduroam and am using the DNS resolvers provided by DHCP
- Potentially
- I’m running a laptop connected to OWL and have authenticated as a guest
- You shouldn’t be a member of the University and connected to OWL, but in any case no.
Why is it good?
This subheading is almost redundant as it should be fairly clear what the benefits of DNSSEC are. Any request for a record in a signed zone will always be relied upon to be correct (unless there has been a SIG key breach or some other disaster.) This means that problems in the past, like cache poisoning are just that; problems of the past. If you want to ensure that a hostname resolves to an IP address with confidence that no man-in-the-middle has tampered with any response, then there really is no other tool available, it’s DNSSEC or nothing.
Why is it not so good?
- There is additional complexity. For us to deploy resolvers that validate records, it’s just a simple configuration option. However, for those zones that are signed, the ease at which you can make every record you serve an NX (aka Not found) is alarming. Since I have worked here, one organization has gone completely dark to the outside for validating resolvers due to key mismatches, and another due to TTLs on expired keys. Any records would have resolved on any resolvers which didn’t do validation.
- Not every zone is signed. This really shouldn’t affect our decision since unsigned zones work fine whatever we decide, but there is the next point to consider
- Validating zones and records adds complexity to a resolver. We use BIND and the list of recent vulnerabilities shows that a not insignificant number of them are related to DNSSEC. Some have not affected us as we do not currently do any validation.
- Your opinion may vary on this, but most important information on the internet is signed by other means already. Windows and Linux updates are almost without exception signed by an organization (perhaps some viruses don’t) and websites employ SSL to secure web communication. If you are concerned with the efficacy of SSL in general, then conceptually DNSSEC is no different; if a zone is compromised, then it’s compromised in all the sub-zones.
I disagree with the decision of validating/not validating on the new resolvers! What can I do?
DNSSEC is supposed to be completely backwards compatible with existing infrastructure. I know of one unit that is validating all records while using the existing central resolvers as forwarders (as a point of information, it was this unit that led to the discovery of the TTL expiry NXDOMAIN problem. Most requests for this organization were being resolved fine as we weren’t validating!)
So, whatever the final outcome, there is nothing stopping anyone from running a STUB resolver that either asks to remove signing responsibility from the central resolvers (via the CD flag) or by requesting the extra DNSSEC records (via the DO flag). However, whatever is decided will be used for eduroam and unless you wish to configure individual clients, there will be no provision to change this.
Conclusion
In some sense, there is not yet any conclusion. If you wish to ask me to expand on any point, of if I have neglected anything, then please write a comment below. The benefits are obvious, but hopefully this article lists some concerns that should at least be acknowledged if we are to validate zones by default.
Updates
Following are reponses to emails received:
Could you elaborate on the potential issues for someone running a laptop plugged into eduroam and am using the DNS resolvers provided by DHCP – that would probably account for two-thirds or more of the folk here these days.
The potential issues are exactly the same as outlined above, but for users connected to eduroam. These are the problems of mismatched keys and BIND vulnerabilities resulting in outages.
While I’m cautiously in favour of DNSSEC, generally speaking, argument 1 against DNSSEC was demonstrated pretty well by yesterday’s problems with reverse-DNS:
http://mailman.apnic.net/mailing-lists/apnic-talk/archive/2016/03/msg00003.html
One of the affected zones was 163.in-addr.arpa., which covers a large part of Oxford’s address space. By default, Kerberos uses reverse DNS to determine the correct principal name for a given service, so any servers using Kerberos authentication to internal services started to fail once their local DNS caches expired. Unfortunately, this meant that several Sysdev-hosted services including the registration self-service pages were unusable until DNSSEC verification was disabled. (Technically, we used Unbound’s “val-permissive-mode: yes” option to ignore bad DNSSEC signatures and treat them as insecure results.)
Thanks Robert for that. Just to put it bluntly, had we been running validating resolvers, we would have stopped resolving /our own records/ in the 1.163.in-addr.arpa zone through absolutely no error on our part. The only course of action, as you found in Sysdev, would have been to disable validation until it was fixed. For BIND, at least for now, this is a global option and so we would have turned off validation entirely.
Hi,
Your knowledge of this greatly exceeds my own, so I’ll offer a pragmatic reply.
If the risks associated with breaking DNS can be reduced to below the risks associated with security breaches via invalid DNS lookups then this should be implemented, with appropriate guidance to other units on what they need to do.
It offers defence in depth and is great if it’s essentially transparent to the end user, but the repercussions of it going wrong would be most unpleasant.
So that’s a very cautious yes.
Thanks for taking the time to read Duncan. That’s the crux of it, where the scales tip between increased security and increased risk of outages. In another comment Tony Finch has explained some benefits that I hadn’t considered (SSHFP keys being but one.) As someone who has been maintaining validating resolvers for (if I remember correctly) a few years now, Tony is a voice of experience and if he says the benefits are real and some of the disadvantages I list are only minor issues, then I am inclined to go with his evaluation.
We have been validating DNSSEC on the central DNS servers at Cambridge for several years now, and it has been almost completely trouble-free. So I am very pleased to see Oxford joining in!
There have been just two incidents that caused noticeable trouble. Usually when a remote site screws up their DNSSEC they fix it promptly, but in two cases the breakage lasted more than a day and caused problems for email delivery for a few people – in both cases the remote sites were quite small. These kinds of problems are getting easier to work around: Unbound has support for negative trust anchors, and BIND will get support in version 9.11. [1]
On the positive side, DNSSEC provides new ways to authenticate remote hosts in situations where X.509 does not work.
You can use SSHFP DNS records and DNSSEC to authenticate ssh servers, without having to manually verify host key fingerprints. [2] [3]
And for several other protocols, especially related to mail and instant messaging [4], DANE TLSA records are a great improvement to server authentication. They are particularly important for securing inter-domain SMTP [5] [6].
So I think if we can get DNSSEC more widely deployed it will significantly help to improve the security of the Internet, in more ways than just making the DNS itself less vulnerable to attack.
[1] https://tools.ietf.org/html/draft-ietf-dnsop-negative-trust-anchors
[2] https://tools.ietf.org/html/rfc4255
[3] http://fanf.livejournal.com/130577.html
[4] https://tools.ietf.org/html/draft-ietf-dane-srv
[5] https://tools.ietf.org/html/draft-ietf-dane-smtp-with-dane
[6] http://www.postfix.org/TLS_README.html#client_tls_dane
I guess you’re in the pro camp Tony! Thanks for the insight.