HPC Showcase 2016

For the second year, Advanced Research Computing (ARC) held a HPC (High Performance Computing) Showcase day on Thursday 21 April 2016.  The event was entitled Bringing HPC Alive combined with a Lenovo & University of Oxford Bioscience Day held the day before.

The HPC Showcase event included talks on the use of HPC in Oxford and beyond and Poster presentations from ARC users.  As part of the day, an Exhibition area in the atrium and conference room at the Oxford e-Research Centre had stalls from various HPC vendors.  Also exhibiting was Wee ARCHIE, a supercomputer made from Raspberry Pi hardware which is used by the Outreach team of the ARCHER national HPC service to introduce supercomputing to a general audience


Over 90 people attended the HPC showcase.  Thanks to OCF, Lenovo and the Oxford e-Research Centre events team for assistance in making this year’s Showcase a successful event.

Posted in Uncategorized | Leave a comment

Tracking usage of ARC and the new DOI

All users of the Advanced Research Computing Service are required under our T+C’s to acknowledge the use of the facility in their research. This is quite common, particularly amongst HPC sites, in an effort to try and track not only what our users are actually doing, but most importantly to enable us to track how many publications have been produced that in some way depend on HPC to generate the results. Its a useful metric particularly when seeking additional funding.

It sounds simple and a recent adhoc search for publications that contained our old ‘standard’ text showed that on average 1 publication per week could be attributed to ARC and its predecessor OSC (Oxford Supercomputing Centre). But this approach has been flawed as it relied solely on searching for a short paragraph of text that ARC users needed to put somewhere in their publications. Any slight changes to the text means that the results don’t show in results, or less specific searchers return false positives. For example, we are not the only Advanced Research Computing centre in the UK. Cardiff, Leeds, Durham can easily appear in simple searches for publications.

To try and tackle this we wanted a specific unique ‘string’ that we could search for, and a DOI (Digital Object Identifier) seemed ideal. Whilst typically associated with being used to link publications to data sets associated with the research, increasingly DOI’s are being used to link publications to software applications and other more general bits of data. Zenodo https://zenodo.org/ makes creating a DOI very easy for all kinds of research outputs. And so we have created a simple output, a document describing ARC, that Zenodo has accepted and created a DOI for. This now enables researchers to not only include a paragraph of text acknowledging the use of ARC but to also include a DOI making it uniquely and easily citable. How well this works only time will tell, but if researchers start to include this in their publications (as required by our terms and conditions of use) then hopefully we will be able to track more publications more easily in the future.

Posted in News, Publications, Terms and Conditions | Leave a comment

Planned downtime on Tue-Wed 16-17 Jun 2015 completed

Here is a report on the scheduled ARC downtime on Tuesday-Wednesday 16-17 June 2015.  The purpose of the downtime was to perform an upgrade to the ARC Panasas storage OS. We also planned to run some extended testing on storage and networking following the storage update, amongst other things.

By ~11am on Tue 16 Jun 2015, the Panasas upgrade to PanFS 6.0.3 completed (surprisingly) without incident.  I say “surprisingly” as last time we upgraded the Panasas system we suffered “an unusual timing bug” which meant that the file system had to be rebuilt, leading to an unexpected extra day of downtime.  As part of the main system upgrade we also upgraded Panasas clients to latest version.

Performance testing of Arcus-B IB fabric for Panasas storage was then performed.  Before any tuning we have been seeing about ~850MB/s for reading and writing aggregate across a set of clients, with a maximum throughput of 1200MB/s.  Various configuration changes were made: datagram versus connected mode for IB cards, pinning IRQ interrupts for the infiniband card.  After tuning we observed about ~1200-1300MB/s bandwidth for reading and writing.

Testing of Mellanox IB router.  The service4 (Mellanox IB router) system was moved into the “management” rack in the Arcus-B cluster (rack A12)  to connect it to Arcus-B IB fabric to check whether better performance can be achieved with a Mellanox IB card doing Panasas storage traffic. Networking configuration was done to connect service4’s 40GE network card to the IBM 40GE switch and its Mellanox IB card to rack A12’s QLogic/TrueScale IB switch. Performance testing with service4 required completely changing the route configuration on both arcus-b nodes and the Panasas static route configuration to send traffic to  Once the route configuration was changed it was found that service4 using the Mellanox IB card to communicate to the QLogic/TrueScale IPoIB network was unstable. (About 5 packets would send then service4 suffered a kernel panic.)  It was unclear whether the instability was due to Mellanox/QLogic devices not “playing nice” or whether the service4 hardware had other problems.  Further investigation of hardware required.

IB switch out-of-band interfaces: While setting up service4, an attempt was made to configure networking on the out-of-band management interface on the QLogic/Intel TrueScale IB switch in rack A12. When connecting the ethernet management port to a switch or a laptop, no network activity was observed, so nothing was able to be configured.  A plan for future downtime work should includes taking a rack of Arcus-B offline and powercycling the IB switch in the rack to investigate the behaviour of the serial console and/or ethernet port after a hard power cycle.  Update: We have confirmed with the cluster vendor that the QLogic/TrueScale switches we have don’t have management modules, though we can purchase them to install if we want.

Arcus-B SLURM partitions were reconfigured: The partition structure for Arcus-B SLURM was updated to the following:

  • devel partition of 4 nodes
  • compute partition with all the remaining nodes
  • gpu partition with all the GPU nodes

Further the general SLURM configuration was updated to allow for selection of “consumable resources” on GPU nodes. The devel and compute partitions are configured Shared=EXCLUSIVE, so that the compute nodes continue to be exclusively assigned to jobs.  When the SelectType=select/cons_res setting was pushed out, then additionally in slurm.conf the SelectTypeParameters setting was required. Copying Arcus-GPU set up, this was set to CR_Core. To get SLURM’s GRES plugin for GPUs to pass the CUDA_VISIBLE_DEVICES variable correctly to jobs requesting GPUs (ie. –gres=gpu:1), a /etc/slurm/gres.conf configuration was required to specify the mapping to /dev/nvidia* device files.  Had a bit of fun tracking down why, with the presence of the gres.conf file, the CUDA_VISIBLE_DEVICES environment variable wasn’t being set.  No information in SLURM logs and no real clue, when I realized that the /etc/slurm/gres.conf file should be world-readable (or possibly at least readable to the slurm user).  Once this change was pushed out, the CUDA_VISIBLE_DEVICES variable was happily set by the SLURM GRES plugin.

The half-height infrastructure rack was moved to make space for a new rack that is to be installed soon.  The rack was moved two spaces to the right from its position on Tile 44 to Tile 42 next to Arcus-B Rack A1. This allowed the same power connections to be used. The infrastructure rack currently only has the following live systems: icarus (LDAP server), radley (GOLD database server and power meter monitoring scripts) and dcim (Data Centre temperature sensor system–laptop on top of rack, connected to one-wire (Pink) network cables in floor).  Some older previously live network cables were disconnected and rolled up left to neatly dangle in the rack.

Things not done during downtime

We didn’t get around to connecting PanActive manager to LDAP. This work was optional and can possibly be done during a future at-risk period.

We didn’t get on to configuration of the additional 10G ethernet switch for Arcus-B. This work was deemed to be optional. Arcus-B 10G/1G networking is working and the set up of the second 10GE switch and associated trunking cables to the Extreme stack would improve networking, but can be done later.

Fix srun on Arcus-B: Update/fix IB/OFED software installation.  This work was not seen to require downtime. This work is being progressed separately and node reimaging is being done while the cluster is up and running.

UPS maintenance: This work was optional.

Cabling for new nodes in Arcus-B: While checking the details of switches in Rack A12, it was noticed that the two chassis in this rack only contain 14 nodes, so it may be sensible to relocate the 8 new nodes into Rack A12 chassis so that the new nodes are sensibly.

Posted in Downtime, Hardware, Infiniband, Panasas, SLURM | Leave a comment