Affected services:

  • Crane
  • Crane Open OnDemand

Crane scheduler unavailable due to network switch failure

Opened on Sunday 2nd May 2021, last updated

Resolved

Crane is operating normally.

Posted by Garhan Attebury

Monitoring

The network switch that failed and caused Crane's scheduler to become inaccessible (and /work to become unavailable briefly) has been replaced and both SLURM and the /work filesystem should be available at this time.

Please note that with this being a weekend we have not done exhaustive testing of Crane after this outage. There are also likely a few nodes down which we still need to look into. If you experience issues, please let us know at hcc-support@unl.edu and we will attempt to correct them first thing on Monday.

As always you should log in and check the status of any running jobs you may have had and restart or re-queue them as necessary. If an owned partition is down due to nodes being unavailable we will attempt to correct that first thing on Monday as well.

Posted by Garhan Attebury

Monitoring

The network switch that failed and caused Crane's scheduler to become inaccessible (and /work to become unavailable briefly) has been replaced and both SLURM and the /work filesystem should be available at this time.

Please note that with this being a weekend we have not done exhaustive testing of Crane after this outage. There are also likely a few nodes down which we still need to look into. If you experience issues, please let us know at hcc-support@unl.edu and we will attempt to correct them first thing on Monday.

As always you should log in and check the status of any running jobs you may have had and restart or re-queue them as necessary. If an owned partition is down due to nodes being unavailable we will attempt to correct that first thing on Monday as well.

Posted by Garhan Attebury

Investigating

A failure at the PKI datacenter has made Crane's scheduler unavailable. No jobs will run or queue until this is corrected, but access to data via crane or crane-xfer and Globus should still be available. We will work to resolve this as soon as possible.

Posted by Garhan Attebury