The Clam Closes

We use a lot of open source software in our team and we try to contribute a little back to the community when we can.  The central mail relay, Oxmail, had been using ClamAV since sometime between 2003 and 2005 and when we discovered that we could host a public mirror of the signature databases we set one up in 2007.

This was an apache vhost on our team webserver running on a trusty IBM eServer xSeries 336.  That server was only recently decommissioned after having given 12 years of faultless production service.  In 2013, the mirror was moved to a dedicated webserver running in a VM.

The ClamAV project was acquired by Sourcefire in 2007, which itself was acquired by Cisco in 2013.  Over the summer, Cisco changed the DNS records that clients should use to find a mirror to point to Cloudflare’s content delivery network.  Our mirror still received thousands of hits per day from clients that had presumably hard-coded our mirror’s IP address in their config.  We recently learnt that Cisco had silently stopped updating the signature databases on volunteer mirrors and so our mirror was serving stale data.  We considered it better to stop serving altogether rather than to give clients out-of-date signatures and so switched off our mirror today.

In our busiest month over the past 11 years of service, our mirror served up 17 TB of data at a peak transfer rate of 8 Gb/s.

Posted in Services | Tagged , | Leave a comment

September 2018 – Odin 5940 FroDo Comware Upgrade – Additional full reboots required

From both my last post and my colleague Rob Perkins’ previous post, you’ll see that we’ve had some fun and games recently with updating the software on the FroDos provisioned on the HPE 5940 platform.

Whilst these FroDos represent a relatively small proportion of the Odin FroDo estate (<10%), this has been enough to create a reasonable amount of work for us (and I imagine for you as ITSS also). Sadly this has also resulted in unplanned and unavoidable disruptions to Odin service for affected customers. For this we can only sincerely apologise and rest assured, we are feeding all of this back to HPE in an effort to improve the situation moving forward.

It should perhaps be noted that the vast majority of customers on the HPE 5510 platform (which also happens to be currently undergoing a software update – see my colleague Mike’s post) would not have been subject to the unplanned disruptions mentioned in this post.

 

So what went wrong?

Essentially nothing from a hard-line technical perspective. The update involves both a main code update and a ‘hot’ patch (the latter of these we were provided with by HPE support to fix numerous issues which are documented in our previous posts). There’s nothing particularly extraordinary about any of that.

However what is unusual perhaps, is that the (so called) hot patch actually addresses some resource issues we’ve been seeing with this platform which involves re-juggling TCAM memory allocation on the switch. This is to allocate more resources in favour of some features which were struggling before in our implementation (control plane stuff like PIM multicast routing and OSPFv3 for instance) away from others which we aren’t using.

What we didn’t know until during the update process and as part of the support cases we subsequently opened with HPE, was that only a full reboot would complete the upgrade properly. Sadly it also seems that HPE hadn’t documented this clearly in their release notes which we are working with them to resolve.

Because the aforementioned additional reboot in general hasn’t happened during the upgrades so far, the L2 annexe VSI connectivity problem some units have observed and other issues we’ve seen so far are the result of a lack of resources. This issue can only be resolved permanently via the full reboot.

 

What do you mean ‘full’ reboot?

So a full reboot in this context is a reload of all switches involved in an Odin FroDo provision simultaneously.

This means in practice that regardless of whether your unit opted for Odin provisioning options 0-1 (you have only one switch operating as your FroDo) or if you opted for option 2 (you have two switches logically operating together in an IRF to act as one for resiliency purposes), your 5940 FroDo (or FroDo pair) will be down entirely during the reboot cycle. For option 2 customers this is a rare event as most upgrades can be carried out using the In-Service Software Update (ISSU) capability (as was our original intention with this one).

If you’re unsure of what your unit opted for, then you can check via the Huginn portal here.

If you’re still unclear about what the Odin provisioning options are, or what they mean, you should consult the Odin SLD and associated information here.

 

So what’s the plan moving forward?

A small number of 5940 FroDos have had their upgrade and full reboots already.

The remaining ones will need to have their full reboot and this is scheduled as follows:

Thursday 11th October
frodo-030809    dcdist-br           - 7.00am
frodo-100907    welcome-trust       - 7.00am
frodo-120601    beach-2             - 7.30am
frodo-100909    orcrb-2             - 7.30am
 
Tuesday 16th October
frodo-120809    dcdist-usdc         - 7.00am
frodo-120810    molecular-medicine  - 7.00am
frodo-030811    dcdist-osney        - 7.30am

Impact

The expected outage whilst each reboot completes is approximately 10 minutes.

 

Is this really necessary?

Unfortunately yes. We’ve weighed up the potential consequences of doing nothing vs undertaking the additional reboots and we just aren’t comfortable with the former. This is because doing nothing has the potential to introduce difficult to diagnose issues resulting from potential TCAM exhaustion later on.

Posted in Uncategorized | Leave a comment

September 2018 – Odin 5940 Frodo Upgrade – Take 2

Odin 5940 FroDo Comware Upgrade (reattempt)

We would like to announce a staged upgrade of the version of Comware running on our HPE 5940 FroDos for those that were not completed last time around. This blog entry aims to answer the majority of questions that this work will raise. Please, feel free to contact the Networks team with any further questions at networks@it.ox.ac.uk

What Happened Last time & Remediation Steps Moving Forward

Essentially we encountered an unexpected issue the last time around with unit L2 annexe connectivity not being re-established following the application of a hot patch which is part of the upgrade. This is strange as the MAC learning continues to work which initially gave us the impression last time around that all was well. This issue is logged as a support case with our vendor HPE and unfortunately to date, they’ve been unable to replicate the issue we had. We’ll therefore be seeking their availability on a remote session for at least one option 1 and option 2 upgrade to ensure that if the issue recurs we can get their eyeballs on to it.

In the meantime, we have a workaround which is to ‘turn it off and turn it on again’. Seriously, should the issue recur the workaround is to shut down the L2VPN Virtual Switching Instance (VSI) serving the annexe connection on the affected FroDo and then re-enable it which we’ll do in instances should it proves necessary to re-establish connectivity post-upgrade.

Why?

I shan’t be repeating what Rob Perkins said in his original post. If you’d like to know why this upgrade is needed, please read his original post here.

Impact

The expected impact is ~5-10 minutes for Option 1 customers during which time the FroDo will reload and external services will not be available. For Option 2 customers the impact is expected to be minimal thanks to the In Service Software Upgrade (ISSU) capability.

Because we wish to get this completed before the start of Michaelmas term, we will be carrying out the upgrades as per the  accelerated schedule below. Please accept our apologies for any inconvenience caused by these upgrades.

Timescale

Thursday 20th September
frodo-120601 beach-2 - 7.30am
frodo-120809 dcdist-usdc (option 2) - 8.00am

Tuesday 25th September
frodo-120810 molecular-medicine - 7.00am
frodo-050909 dcdist-beg (formerly begbroke-iat-1 - option 2) - 7.30am

Wednesday 26th September
frodo-120812 john-radcliffe-3 - 7.00am
frodo-100908 richard-doll (option 2) - 7.30am
frodo-120811 big-data-institute (option 2) - 8.00am

 

Posted in Uncategorized | Leave a comment

September 2018 Odin FroDo Upgrade

FroDo Comware Upgrade

We would like to announce a staged upgrade of the version of Comware running on our HPE 5510 FroDos. This blog entry aims to answer the majority of questions that this work will raise. Please, however,  feel free to contact the Networks team with any further questions at networks@it.ox.ac.uk

Why?

As part of ongoing maintenance it is essential that we keep our FroDo software up to date. The new version of software being deployed addresses a number of vulnerabilities and bugs. For those interested this upgrade takes us from R1309 to R1309 P06 and involves over 300 devices.

Relevant Bug Fixes

201806290399
• Symptom: The value of the snmpEngineboot node is incorrect.
• Condition: This symptom occurs if the whole IRF fabric is rebooted to cause a master/subordinate switchover.

Addressed Vulnerabilities

This release addresses the following CVE

CVE-2016-9586
CVE-2017-15896
CVE-2017-3737
CVE-2017-3738
CVE-2017-3736
CVE-2017-12190
CVE-2017-12192
CVE-2017-15274
CVE-2017-15299
CVE-2017-1000253
CVE-2017-3735
CVE-2017-6458
CVE-2016-9042
CVE-2014-9297
CVE-2015-9298

Information about the detail of these vulnerabilities can be found at https://cve.mitre.org/cve/search_cve_list.html

Impact

The expected impact is ~5-10 minutes for Option 1 customers during which time the FroDo will reload and external services will not be available. For Option 2 customers the impact is expected to be minimal thanks to the In Service Software Upgrade (ISSU) capability introduced in the firmware update applied in August 2017.

We will be carrying out the upgrades between 06:00 and 07:30 to minimise impact.

Timescale

We plan to upgrade approximately 80 FroDo’s on the each of the following days:

Group A: Tuesday 18th September
Group B: Thursday 20th September
Group C: Tuesday 25th September
Group D: Thursday 27th September

Schedule

We have attempted,where possible, to group devices around main sites and annexes so that those sites will only see one period of disruption. Detailed schedules listing devices and dates can be found at https://docs.ntg.ox.ac.uk/pub/reference/FroDoUpgrade-Sep2018

Once again, if you have any further queries then please contact us at networks@it.ox.ac.uk

Posted in General Maintenance, Odin | Leave a comment

September 2018 – Odin 5940 Frodo Upgrade

Odin 5940 FroDo Comware Upgrade

We would like to announce a staged upgrade of the version of Comware running on our HPE 5940 FroDos. This blog entry aims to answer the majority of questions that this work will raise. Please, feel free to contact the Networks team with any further questions at networks@it.ox.ac.uk

Why?

As part of ongoing maintenance it is essential that we keep our FroDo software up to date. The new version of software being deployed addresses a number of vulnerabilities and bugs. For those interested this upgrade takes us from F2604H04 to R2612H01 and involves more than a dozen devices.

Relevant Bug Fixes

Symptom: After the master of an IRF fabric is rebooted, SNMP obtains an incorrect value for the snmpEngineBoots node.

Condition: This symptom might occur if SNMP is used to obtain the value of the snmpEngineBoots node after the master of an IRF fabric is rebooted.

Effect: This stops management systems from connecting to the SNMP engine on the device. Noticeable and inconvenient because graphs of port throughput are no longer maintained.

Addressed Vulnerabilities

This release addresses the following CVEs

CVE-2014-9297

CVE-2015-3405

CVE-2015-9298

CVE-2016-7427

CVE-2016-7428

CVE-2016-7431

CVE-2016-9042

CVE-2017-3731

CVE-2017-3732

CVE-2017-3735

CVE-2017-3736

CVE-2017-3737

CVE-2017-3738

CVE-2017-6458

CVE-2017-12190

CVE-2017-12192

CVE-2017-15274

CVE-2017-15299

CVE-2017-15896

CVE-2017-1000253

Information about the detail of these vulnerabilities can be found at https://cve.mitre.org/cve/search_cve_list.html

Impact

The expected impact is ~5-10 minutes for Option 1 customers during which time the FroDo will reload and external services will not be available. For Option 2 customers the impact is expected to be minimal thanks to the In Service Software Upgrade (ISSU) capability.

We will be carrying out the upgrades between 06:00 and 07:30 to minimise impact.

Timescale

We plan to upgrade up to 2 FroDos, one option 1, and one option 2, on the each of the following days:

Tuesday 4th September
 frodo-030809 dcdist-br (option 2) - completed
 Notes: Resilience of link to BSP-STORAGE in BRDC not functioning correctly causing interruption and some Left Hand storage entered read-only mode.
        AD DC behind ADFS for Nexus 365 coincidentally failed the night before ~23:00 causing failure of *some* user logins to Outlook. 
        Not caused by Frodo upgrade but we were blamed for it by some before all the details were known.
Wednesday 5th September
 frodo-030811 dcdist-osney (option 2) - completed
 frodo-100907 wellcome-trust - completed
 frodo-100909 orcrb-2 - completed - upgraded 1 day early

Due to an issue encountered on the morning of 5th with two of the upgrades 
we will postpone the remaining ones until further notice pending the result 
of a support call with the vendor.
Thursday 6th September
 frodo-120809 dcdist-usdc (option 2) - cancelled
Tuesday 11th September
 frodo-100908 richard-doll (option 2) - cancelled
 frodo-120601 beach-2 - cancelled
Wednesday 12th September
 frodo-050909 begbroke-iat-1 (conversion to option 2 and dcdist-begbroke) - upgrade cancelled - conversion will still take place
 frodo-120810 molecular-medicine - cancelled
Thursday 13th September
 frodo-120811 big-data-institute (option 2) - cancelled
 frodo-120812 john-radcliffe-3 - cancelled
Posted in HP Networks, Odin | Leave a comment

eduroam and realmless usernames: an update

You may be aware that the University of Oxford will shortly be mandating fully qualified usernames for eduroam, explained and for reasons discussed in a previous blog post. This post is intended as a followup, highlighting how we’re intending on enforcing it and helping to reassure ITSS with fears about the impending change.

How is this change enforced just for eduroam?

Technically the remote authentication is not a service as defined in the IT Services service catalogue. However we cannot just unplug our RADIUS servers because that will take out the eduroam service which most definitely is in the catalogue. Said in a less tactful way, using the central RADIUS servers for anything other than eduroam is not covered under any SLA and this change could in principle be a blanket change for all services which depend on remote access password authentication. Of course this is the real world and we’re aware of a number of colleges and departments making use of remote access accounts for their own SSIDs.

The question that I’m surprised nobody (as of today [a Tuesday]) has asked us is “will this impending change affect these other SSIDs?” If you’re not one for rambling blog posts I can say now that no, this change will not affect other SSIDs.

How does the central RADIUS server know which SSID was connected to?

In a RADIUS packet is an attribute called “Called-Station-Id”, its value usually looks something like “01-02-03-04-05-06:eduroam”. You can probably guess that what comes after the colon is the SSID.

Using this attribute and the User-Name attribute, this is how we’re rejecting users without a realm, in our FreeRADIUS2 configuration:

if( "%{Called-Station-Id}" =~ /:eduroam$/ && "%{User-Name}" !~ /@/ ) {
        update reply {
                Reply-Message = "missing @ before realm"
        }
        reject
}

But my proxying NAS is not appending :eduroam to the “Called-Station-Id”. What will happen?

In your case we will not be able to enforce the fully qualified username and you will still be able to authenticate without a realm. I am certain that there are eduroam setups out there which do not do this. Instead these access points and controllers send attributes like “Aruba-Essid-Name”. We will not be acknowledging proprietary attributes and we would like to make an impassioned plea to custodians of affected controllers (Aruba controllers is one notable offender) to configure them to support RFC 3580. If you don’t the experience may be confusing for end users. Aruba has something on their community forum with the steps required.

I’m manually appending :eduroam to the Called-Station-Id but it isn’t for eduroam. How do I exempt myself from the change?

Why on earth would you be doing that?

Which username will need changing, the inner or outer?

The inner username, if it works currently, will not require changing. We will only be enforcing the change on the outer identity.

Has the communication with end users been effective?

Yes. Before the initial email was sent to relevant users, 30% of devices were unqualified. It is closer to 20% after three weeks.

Conclusion

There isn’t any conclusion, but please please please ensure your Called-Station-Id attributes contain an SSID where appropriate.

Posted in Uncategorized | Tagged , , , | Leave a comment

May 2018 Odin FroDo Upgrade

FroDo Comware Upgrade

We would like to announce a staged upgrade of the version of Comware running on our HPE 5510 FroDos. This blog entry aims to answer the majority of questions that this work will raise. Please, however,  feel free to contact the Networks team with any further questions at networks@it.ox.ac.uk

Why?

As part of ongoing maintenance it is essential that we keep our FroDo software up to date. The new version of software being deployed addresses a number of vulnerabilities and bugs. For those interested this upgrade takes us from R1122P01 to R1309 and involves over 300 devices.

Relevant Bug Fixes

Symptom: Forwarding errors or traffic interruptions might occur on the switch.

Condition: This symptom occurs with a low probability if the switch runs for a long time.

Addressed Vulnerabilities

This release addresses the following CVE

CVE-2017-6458
CVE-2016-9042
CVE-2014-9297
CVE-2015-9298

Information about the detail of these vulnerabilities can be found at https://cve.mitre.org/cve/search_cve_list.html

Impact

The expected impact is ~5-10 minutes for Option 1 customers during which time the FroDo will reload and external services will not be available. For Option 2 customers the impact is expected to be minimal thanks to the In Service Software Upgrade (ISSU) capability introduced in the last firmware update applied in August 2017.

We will be carrying out the upgrades between 06:00 and 07:30 to minimise impact.

Timescale

We plan to upgrade approximately 80 FroDo’s on the each of the following days:

Group A: Tuesday 1st May
Group B: Thursday 3rd May
Group C: Tuesday 8th May
Group D: Thursday 10th May

Schedule

We have attempted,where possible, to group devices around main sites and annexes so that those sites will only see one period of disruption. Detailed schedules listing devices and dates can be found at https://docs.ntg.ox.ac.uk/pub/reference/odin-frodo-software-upgrade-may-2018-1

Once again, if you have any further queries then please contact us at networks@it.ox.ac.uk

Posted in General Maintenance, Odin | 1 Comment

eduroam and realmless usernames

IT Services’s user-facing instructions for connecting to eduroam have always been unequivocal about the username to use: if you want to connect to eduroam, your username is your SSO with @ox.ac.uk appended on at the end, all lower case. So, an SSO of unit1234 would become unit1234@ox.ac.uk. However, as you may have discovered, when you connect to eduroam within the University and authenticate without the @ox.ac.uk appended to your SSO, you will still be granted access.

RADIUS is the service underpinning eduroam authentication. In RADIUS parlance, the @ox.ac.uk is the username’s realm and declares the institution performing the authentication, in this case “ox.ac.uk”. Other institutions that offer eduroam have their own realms and when someone within the University uses a realm other than ox.ac.uk, say larry@faber.edu, we will proxy the request to that foreign institution for them to authenticate.

Back in the dim and distant past, our RADIUS servers were configured such that if no realm was supplied when authenticating to eduroam here at the University, they would infer the realm to be ox.ac.uk and act accordingly. When configuring your device for connecting to eduroam, the advantage of making your realm explicit is fairly obvious: when you travel to other institutions that offer eduroam, you will be able to connect without any changes to your connection details, because the other institution will know to use the RADIUS servers at the University of Oxford to authenticate. The advantage of making your realm implicit is equally obvious: you save typing out 9 characters including that pesky ‘@’. For many years we’ve turned a blind eye to the practice of realmless authentication, but it’s coming to a head:

  1. The University of Oxford is not the only institution in this fine city offering eduroam. Other institutions are available. We’ve had reports that these institutions’ IT staff are being contacted by University members using realmless authentication with connection problems. “It works everywhere else, so it must be something wrong with your system”. Saying that the University’s instructions never mention realmless usernames is of little consolation to these IT staff fielding repeated support queries.
  2. Perhaps more importantly, since our RADIUS configuration was written Jisc has released an update to the eduroam technical specification. Specification 1.2 explicitly states that “only RFC 4282 compliant usernames (of the form userID@realm) to be employed for user authentication both for roaming users and for users when at the Home site”.

It’s the “at the Home site” that’s important for the second point. It means even though our internal authentication never leaves the confines of the University, we are in breach of the eduroam specification and we should fix that.

Some numbers

So with that out the way, how many devices are configured without a realm? It’s fairly easy to find out from our logs. Results for the past few days (no prizes for guessing on which day the undergraduates started arriving):

Day Realmless Realmed Percentage realmless
0 8834 25580 25.7%
1 9101 26994 25.2%
2 8921 28267 24.0%
3 6322 21409 22.8%
4 7867 22121 26.2%
5 13106 33646 28.0%
6 14443 36203 28.5%

The configuration change to reject realmless usernames is relatively simple and we actually have had it waiting to be deployed for a while now. However, with at least 14,443 devices configured without a realm, all requiring reconfiguration after we mandate an explicit realm, it’s not a simple case of making the change and hoping the users will reconfigure it when they realize something’s wrong.

What about eduroamCAT?

eduroamCAT is no panacea. In fact, our eduroamCAT profile is configured to include the realm so these figures are even more stark, as the realmless authentications would thus have had to have been manually configured clients. If we were to enforce the realm in usernames though I’m sure eduroamCAT would play a large part in device conformance.

Conclusion

There’s no question of if we’re going to stop allowing realmless authentication. For us to comply with the requirements specified by Jisc it’s a case of when it’s going to happen. With so many devices configured without a realm it would be a very bold move for us to make the necessary changes next Friday before leaving for the weekend. Such a change would require involvement from all sectors of IT support, both within IT Services and within the general ITSS community. The benefits may not be felt directly by ITSS here, but certainly other institutions would appreciate the change.

In terms of helping ITSS know who has devices configured without a realm, it’s something that we are discussing here at the moment and once we have decided on the best course of action there most likely will be an announcement on a medium more formal than a blog post.

Posted in eduroam | Leave a comment

ODIN FroDo Software Upgrade

FroDo Comware Upgrade

We would like to announce a staged upgrade of the version of Comware running on our HPE 5510 FroDos. This blog entry aims to answer the majority of questions that this work will raise. Please, however,  feel free to contact the Networks team with any further questions at networks@it.ox.ac.uk

Why?

As part of ongoing maintenance it is essential that we keep our FroDo software up to date. The new version of software being deployed addresses a number of vulnerabilities and bugs, as well as introducing some useful new features.

In Service Software Upgrade (ISSU)

This feature aims to reduce the downtime required for software upgrades. For Option 2 customers who have a pair of FroDos this means that, for future software upgrades,  service will usually remain up while each member of the pair is upgraded and reloaded.

The ISSU feature also supports so-called hot patches which can be implemented without rebooting a device. This is of benefit to both Option 1 and Option 2 customers. There may be a small service interruption for these patches but it will be significantly less than a full reboot.

Bug Fixes

Symptom: On an MPLS L2VPN or VPLS network, PIM packets and IGMP packets cannot be

transparently forwarded between PEs.

Condition: This symptom might occur if IP multicast routing is configured on the MPLS L2VPN

or VPLS network.

Symptom: When a large number of MAC address entries are deleted from member ports of an

aggregation group, memory leak occurs at both the local end and the remote end of the

aggregate link.

Condition: This symptom might occur if a large number of MAC address entries are deleted

from member ports of an aggregation group.

Addressed Vulnerabilities

This release addresses the following CVE

CVE2016-[5195,7431,7428,7427]

CVE2017-[3731,3732]

Information about the detail of these vulnerabilities can be found at https://cve.mitre.org/cve/cve.html

Impact

The expected impact is ~5-10 minutes during which time the FroDo will reload and external services will not be available.

We will be carrying out the upgrades between 07:30 and 09:00 to minimise impact.

I am an Option 2 customer – will I be affected?

For this upgrade yes you will. This is the first software release we have been happy with that also offers In Service Software Upgrades (ISSU). The good news is that future upgrades will be able to leverage ISSU so that your service is not likely not be affected by compatible firmware upgrades moving forward.

Timescale

We plan to upgrade approximately 30 Frodo’s every Tuesday, Wednesday and Thursday over the firs three weeks of August until all of the HPE 5510 devices in service are up to date.

Schedule

We have attempted where possible to group devices around main sites and annexes so that those sites will only see one period of disruption. Detailed schedules listing devices and dates can be found at https://docs.ntg.ox.ac.uk/pub/reference/odin-frodo-software-upgrade-august-2017

Posted in General Maintenance, Odin | Leave a comment

The University’s mail relays and encryption

By the time this post has been published, the Oxmail relays will most likely be using opportunistic encryption to encrypt outgoing emails, in response to actions by cloud mail providers. However, we would like to make it clear that we have always known that we had encryption disabled and that our reasons for enabling it have nothing to do with addressing privacy concerns. This post should hopefully explain all this along with some relevant history.

What is SMTP?

Simple Mail Transport Protocol is the de-facto standard for email transfer between servers. SMTP is an old standard and at its inception the internet was a happier place with less need for security and thus no security was built into the protocol. Mail delivery is via a hop by hop mechanism, which is to say that if I fire off an email to fred.bloggs@example.org, my mail client does not necessarily contact Fred’s mailstore directly, rather it contacts a server it thinks is better suited to deliver the mail to Fred. It is a very similar concept to 6 degrees of separation. The Oxmail relays are one hop in the chain from the sender, you at your laptop (other devices are available), and the destination server which houses the mailbox of Fred Bloggs.

This is just an example of the many servers that need to participate to get an email from your laptop to a recipient.

This is just an example of the many servers that need to participate to get an email from your laptop to a recipient. The number of servers is variable and you do not necessarily know the number when sending an email.

 

What is TLS?

TLS, or Transport Layer Security to give its full name, is a mechanism by which each hop is encrypted so that eavesdroppers in the middle of the connection cannot listen in on the transfer. To be clear, routers and most firewalls are not considered endpoints in this context, it’s just the mail servers that are set up to route mail to particular destinations, and as such these routers and firewalls are exactly the devices for which this mechanism is designed to protect against.

Why did the Oxmails not encrypt mail?

I should start by saying that there was nothing inherently stopping the Oxmail relays from initiating an encrypted communication when sending mail. The software that we run is capable of encrypting communications, and in fact we require it for incoming external connections to smtp.ox.ac.uk, so as to protect password credentials from being harvested. However, we have reservations with the concept of TLS encryption for a few reasons:

  • Since SMTP is a hop-by-hop protocol with an email traversing multiple servers A through to G, just because the communication between F and G is secure you know absolutely nothing about how secure your email is. For G to know that the email received is actually from A and is unaltered, every point needs to be encrypted, and yet there is no way of telling G that this is the case. All G knows is that the last hop was secure.
  • As almost a repetition of the last sentence, TLS does not necessarily make communications any safer and pretending otherwise is bordering deceitful. Similarly, if the mail received by G is set to be forwarded to another mailbox H, and this is done via an encrypted channel, is that now secure?
  • The battle may already be lost on this since its uptake is so small, but there is a technology that was designed to solve this: GPG. Using GPG, you encrypt the email using your laptop and only Fred can decrypt it, unlike TLS where each hop has access to every email’s contents. The truly security conscious should be using GPG to encrypt mail as only the recipient and sender can see the message. The necessary data to decrypt the message is stored locally on your computer.

To summarize these points, we did not encrypt outgoing mail as we considered it a pointless exercise that would only give people the illusion of security without actually doing anything.

Why are we now enabling opportunistic encryption on the Oxmail relays?

Following the actions of cloud service providers, where emails received unencrypted were flagged to the person reading the mail, we were presented with two options:

  • Do nothing.
  • Implement TLS.

The former may have been our stance, but recently we have been receiving complaints that sent emails’ privacy has been violated when sent to certain mail providers. Rather than argue the point that email as an entire concept is insecure (after all there is nothing stopping cloud mail providers from reading your emails for account profiling and targeted advertising), the change is relatively minor our end and so we took the conscious decision to enable outgoing TLS when available, so as to remove the flag on mail sent to these cloud providers.

Are there better solutions available?

Yes! Even better, some of these solutions can be used today without any change on any infrastructure (except perhaps your mail client). I mentioned GPG above which is completely compatible with the existing infrastructure used around the world. You could even post your emails onto a public share using a service such as Dropbox with a link to it on Twitter and still only the recipient can read it. I must admit that usage of GPG is minimal despite its relative maturity and perhaps going into the reasons is not beneficial to the current discussion. There is also an encryption mechanism called S/MIME which has the same overall effect as GPG, even though its method is quite different. S/MIME reportedly is better supported by more mail clients, but requires purchasing a digital certificate and is thus potentially more expensive than GPG [update: this is incorrect. They can be obtained free of charge. See comments].

Added to GPG and S/MIME there are SPF and DKIM which can help verify servers’ authenticities (they do not encrypt). These technologies themselves are not well suited to our (the University’s) devolved environment as outlined in an excellent blog post by my predecessor Guy Edwards.

Conclusion

I hope this helps explain our thoughts on TLS encryption, and that our recent change to use encrypted communications is not a reaction to a mistake we discovered we were making. If there is anything you wish to add, please do add a comment, or contact the IT Services helpdesk for further information.

Posted in Mail Relay, Message Submission | Tagged , , | 4 Comments