UAS Photocopiers – Scan to network drives

Last updated: 10.08, Tuesday 29th March

Following on from the problems affecting UAS network drives that occurred on 9th March work has been carried out to update the photocopiers which were configured to scan to the affected departmental drives. The outstanding devices below are waiting on a firmware upgrade which Ricoh need to apply directly to each of the individual devices. We’re working with UAS Facilities to co-ordinate this across all of the outstanding devices. This firmware upgrade is required to communicate with the newer Microsoft file server cluster that the data is now being stored on.  If you urgently require this functionality on a device that is not currently working please contact help@it.ox.ac.uk. This list will continue to be updated when we have confirmation of devices working. Continue reading

Posted in Announcements | Comments Off on UAS Photocopiers – Scan to network drives

CONNECT – UAS network drive outage

Last updated: 07.39, Friday 10th March

Work continued throughout Wednesday and Thursday to recover from Wednesday’s failure affecting UAS network drives. All departmental drives were available from 11am Thursday morning and access to the P drive was restored last night. Anyone who is still missing or unable to access drives will need to re-logon to restore access. Continue reading

Posted in Announcements | Comments Off on CONNECT – UAS network drive outage

Continuing with the desktop RAM upgrades

We previously reported the RAM upgrades that were completed during the first week. We knew we wouldn’t be able to continue at that pace because the remaining upgrades needed to take place over a much wider  larger spread of buildings. Below is a summary of where we’re at and what we have left. Continue reading

Posted in Slow PC Investigations | Comments Off on Continuing with the desktop RAM upgrades

The first week of RAM upgrades

We’re continuing our RAM upgrade for desktop machines and are over half way through. During the first week we managed to upgrade 494 out of the 516 machines that we planned so thank you to everyone that has assisted us and helped provide both information and access to buildings! With only a handful of exceptions these upgrades have all happened between 5 and 7pm to try and help minimize the interruptions.

Posted in Slow PC Investigations | Leave a comment

Improving and benchmarking logon times

One of the areas that we’ve been focusing in on is the time take between turning machines on and them being ready to load applications. A previous blog post outlines the changes we’re making to printer deployment and removing the Printer Group Policy from running on logon. This should make a noticeable improvement to workstation logon times with a lot of the processing, and subsequent delays, during logon being linked with this policy.

To help show the differences we’ve been measuring logon times by pulling back and analysing logs from five machines where users have directly reported issues with logon speed. We’ve also taken a standard machine and benchmarked the time we expect logon to take using it. This is on one of our older models (a Dell Optiplex 9010) with a sample user account of similar size and with the same configuration as all of our standard user accounts.

The chart below outlines the times taken to logon before and after our planned Group Policy changes from both our benchmarking machine and the five sample machines. Whilst we’ve tested our benchmarking machine ‘after’ removing the policy we’re not quite yet in a position to be able deliver the change to users’ workstations. Hopefully we’ll be in a position to start doing this next week and can then update the missing columns and see what the real world results and how much impact it has.

logon_graph2The timings we’ve got are combined from two different logs on machines: the Main Path Boot Time and Boot Post Boot Time. In brief the Main Path Boot Time logs the time taken from the Windows Boot Loader kicking in through to the logon screen being displayed. The second, Boot Post Boot Time, then measures the time from you clicking ‘logon’ through to Windows having loaded your desktop and the CPU being 80% idle for 10 seconds. So these added together gets a nice accurate figure to work with that we can consistently retrieve and compare on any of our machines. It also means that we are not measuing the time it takes you to enter in your username and password!

I’ll hopefully soon have time to write a separate blog post to explain a bit more about this, including a breakdown of the Microsoft OS loader and initialisation stages.

Posted in Slow PC Investigations | Leave a comment

Windows Peformance Logs

One of the bits of work that we’ve been looking at is identifying useful information help within the Windows logs which we can utilise to help us see what is going on.

An example of this is looking using the Windows Diagnostics Performance logs. These are burred pretty deep in the menus and are not the easiest to find. When you do find them you then see what seem like a huge number of alarming looking entries classed as either critical or errors. These are a bit mis-leading as most of them are not referring to things that are not working which is different to say the standard System log – if you had a screen full of them in there then you know you’re in serious trouble.

If we filter this log on just Event ID 100 we get all errors associated with delays at boot, ID 101 gives is applications which are taking longer than usual to start up and ID 102 gives us any drivers that took too long to initialise.  These are all measured against built in thresholds within the Windows source code and understanding all of the details is rather complex. However it does give us a baseline to pull information back from machines and compare them. An example of one of the ID 100 errors on my machine is below:

 

diag_log_event_id_100_sample

We will start gathering these entries from a sample of machines where we know we’ve had specific performance-related issues occurring. We’ll do this remotely on machines, with a script which will pull these locally into an Excel spreadsheet before emailing them back to us so we can collate the data. The majority of this is being done through PowerShell, with the below line an example of filtering and capturing the ID 100 logs as above.

ps_script_sample

Once we’ve done this we’ll be going through everything to find any common entries and also comparing it with our benchmarking.

 

 

Posted in Slow PC Investigations | Leave a comment

Understanding boot times and logon times

A lot of support calls, especially associated with the slowness problems, surround slow logon and boot times. These range from unable to logon to delays and error messages at a range of points during people getting to the point where they can work. A lot happens in the background to get a machine to the point that you can use it and I’ll post a more detailed explanation of what happens when I get a chance.

Essentially from a support and troubleshooting perspective we can divide incidents into three categories which cover all aspects of the reports we get:

  • Booting up
  • User logon
  • Loading applications

I’m sure if you google PC start or logon you’ll get as many different ways of defining these as you click results. However, for simplicity we’ll stick to the above three.

Booting Up

Boot_process_v3This covers the initial startup of the machine through to the point where you can enter your username and password to logon. Typical issues that occur during this phase are:

  1. Machine fails to turn on or to reach the point that Windows is loading (normally hardware issues)
  2. Windows fails to boot (normally an OS software issue)

The only times machines will fail at this point is when something has changed. So for example a faulty RAM module could cause the machine not to boot past the BIOS Post checks (happening when the Dell logo is being displayed) and an application being installed or updated could cause Windows to fail to load, resulting in an error message or ‘blue screen of death’ (a lot less common these days!) happening after the ‘Starting Windows’ screen.

User Logon

logon_process_v3This covers the process of your logon to the machine to the point that Windows is ready to use. This phase is one of the primary areas that we are focusing on improving, with the planned changes to the printing policies directly affecting this phase (see our previous blog post about it). Again, errors that we see here can generally be split up into one of the following:

  1. Username or password is not accepted
  2. Existing roaming profile not loaded
  3. Errors applying group policies
  4. Services or applications not loading correctly on start-up

The first of these is pretty obvious. If you enter a wrong username or password you’re not going to get any further.

Did you know we have systems in place to allow you to reset and unlock your own accounts? Information is on the IT Services Help website.

Similar to booting up, any errors that occur here are normally as a result of something that has changed. For example, we have this extremely rarely, but a user’s roaming profile could get itself into a state where the server side copy is corrupt, meaning Windows is unable to copy it back down to the machine during logon. Similarly Sophos antivirus, which loads on start-up, could fail to run correctly either because of an update or a dependency has failed.

It’s probably worth noting that whilst we are looking to trying to improve the user logon this is just looking at decreasing the time it takes rather than fixing an actual problem. Everything happening during the logon phase is working, just some bits of it are taking blooming ages to run through.

Loading Applications

A common misconception we have is users considering loading applications as part of the logon process. OK, so a computer is pretty useless once it has loaded until you open up any applications however it is really critical for us that we separate out issues with applications to issues during logon. Behind the scenes the two are completely different with no links between the two.

outlook_splash_2Our primary focus whilst investigating issues has so far deliberately not been looking at individual applications. However we have always planned to come back and review them,with Outlook being by far the number one application that people have issues with. We will shortly be starting to focus effort onto the problems that people are experiencing with it to help identify any issues and problems as well as producing some guides and help information on getting the best out of it.

Posted in Slow PC Investigations | Leave a comment

Understanding Microsoft Updates

Our original two updates that we were focusing on were KB3050265 and KB3102810 (see our previous post). In simple terms KB3050265 has been superseded by KB3102810 however its never quite that straight forward.  Microsoft updates are a huge spider web of interlinks patches all with multiple dependencies. The below diagram shows the two we were focusing on (in green) and how they fit in around other updates that they are directly related to.

WSUS Updates v2

So all of the updates listed are modifying system files that are directly related to the Windows Update process. The greyed out updates all have further updates which have superseded them however I ran out of whiteboard by that point so they weren’t documented.  At some point I’ll re-visit that avenue to check there isn’t anything else that we’ve missed.

Where it gets confusing is that it isn’t as straight forward as one patch being superseded by another: So for example KB3112343 is listed as superseding both of the original two updates we were looking at (KB3050265 and KB 3102810), however as I’ve mentioned at the top the second of these also supersedes the first, so mapping out all of the paths through rapidly becomes rather complex and time-consuming.  What looks like a relatively simple diagram above takes quite a while to originally map out! Remember this is just looking at two particular updates out of a much larger number that get applied monthly to our machines.

The interesting bit (for me at least), is the update in orange. This is listed as superseding KB3112343, KB3102810 and KB3050265 however has only recently been released. Microsoft are not classifying it as a critical or security update and by their own admittance the documentation they’ve written on it has not yet been fully reviewed. At the moment we are not pushing this new update out, however we’ve setup the server so that we’re ready to should we need to. It will be one for us to keep any eye on over the coming months in case it gets re-categorised to a higher priority.

Posted in Slow PC Investigations | Leave a comment

RAM Upgrades – now underway

This week sees us starting the process of upgrading all desktop machines to 8GB of RAM, as per our previous post. So far this week we’ve target University Offices, with 105 machines completed on Monday and 116 on Tuesday.

Tonight we start to hit Dartington House and Hythe Bridge Street with other buildings to follow on. We need to access the insides of the machines which is why where possible we are carrying out this work after 5pm and we’ll endeavour to send out individual emails a week ahead of us visiting machines. Please do confirm back to us with any amendments as it makes huge difference to both the planning and the time taken to track down individual machines.

As always please report any problems to help@it.ox.ac.uk.

IMG_0086

Posted in Slow PC Investigations | Leave a comment

That ‘Applying Group Policy Printers policy’ message (Reviewing printer setup – Part 3)

We’ve known for some time that printer deployment isn’t necessarily working as well as it was originally planned to, but as with everything else finding the time to pull everyone together to look at it is difficult.

Whilst we will completely review it at a later date we’ve come up with a few changes to the existing process which should make a noticeable improvement in log in speeds for a significant number of people.  We utilise roaming profiles to enable everyone to log on to any machine and retain a number of settings and configuration. They can work really well however one of the downsides to this is that printers will follow you to every machine that you use. This means if you log into a machine other than your normal one the same printers will be still be installed which isn’t great as they’ll likely not be the printers you want to use from this second machine. As a result there are some clever bits of script that run in the background. These can essentially be split into two categories:

  1. Printer group policy: Making a printer available on a machine you haven’t previously used
  2. Printer log on/off scripts: Restoring the printers you had last time you logged on to that machine.

applying printer gpThe first of these is the main one causing issues. Many of us sit watching the printers policy applying in the background for quite a while during log on. Unfortunately to add to the woes Windows 7 is appalling slow at doing anything related to print drivers: installing, removing, updating, checking – they all take a lot longer than every other Windows OS, inlcuding the newer versions (8.1, 10) and also going back to the days of XP. This has accentuated the delay whilst printers are being dealt with and is what we are primarily targeting in the changes we’re making that are outlined below.

How it works

The diagram below outlines the current processes that run on your machine and how printers are automatically installed on machines:

v5 Printer Mapping Process - current

During log on the majority of processing that is going on relates to point 1 above ‘Making a printer available on a machine you haven’t previously used’. This is being applied via the Printing Group Policy and is running whilst the screen is sat stuck on ‘applying group policy printers policy’. It works by referencing an attribute on the workstation object within the active directory before then comparing this against a list of printers available in CONNECT. Where it has a match it then installs that printer.

Point 2“Restoring the printers you had last time you logged on to that machine” then runs via log on and log off scripts. If you have previously logged on to the machine then the printers that you last used will have been captured during log off, including keeping note of which printer was set as the default. These then get re-installed alongside any printers that are mapped at point 1.  This works by writing the printers you have installed to a registry location within your profile, generating a different key for every machine you’ve logged onto. As part of the roaming profiles we use this information gets stored centrally and is pulled down to every machine you log on to. The part that runs during log on gets initiated whilst the machine is loading the profile but mainly runs in the background immedaitely after everything has loaded, with your desktop and start menu available.

The planned changes

We’ve sat down and reviewed the current method used for printer deployment within CONNECT and what each aspect is doing. What we’ve come up are changes which are relatively low-risk, taking minimal amount of time to implement but having a noticeable improvement in log in speeds. Briefly, the plan is to remove the Printer Group Policy that deals with ensuring you have a printer available the first time you log on to any machine.

This hasn’t been correctly set for a large number of machines due to us having to manually update the default printer for each machine. Updating this is rather time consuming and can be like chasing your tail as machines get switched about between offices. Where we haven’t done this no printers will be automatically installed, equally if it is set incorrectly then you will always continue to get that incorrect printer installed alongside any others that you have previously used. The speed at which we had to rollout Windows 7 also meant that this was one of the things that we didn’t manage to capture when visiting all of the machines.

However, before we make this change we need to update the script that runs on log off. Currently printers are left installed until you next log on. At this point the first thing the group policy does is remove all printers. We still need this to happen after we remove the group policy so it will be moved to run during log off. The plan is to update the script that captures the currently installed printers and then removes them. (If we did it via a separate group policy we’d be in danger of removing the printers before the script had captured them!)

So, the updated process will look like the below diagram and in our testing this updated method has shown a significant decrease in log in times. The only trade off being that the first time you log into a new machine it won’t automatically map a printer, however for ~50% of our machines this won’t have been happening properly anyway due to the reasons mentioned above.

v2 Printer Mapping Process - planned

Posted in Slow PC Investigations | Leave a comment