Opened on Monday 5th August 2019, last updated
The final filesystem checks on Crane /work have completed and all data looks to be intact. Crane and JupyterHub are now available for use. While we understand this outage was unexpected, it was necessary to ensure data integrity for the /work filesystem.
Jobs that were running or in queue at the start of the outage will in theory have been restarted or still be queued. We encourage all users to double check their jobs anyway.
If you encounter any issues with the resources please contact us at firstname.lastname@example.org or visit one of our office locations (https://hcc.unl.edu/location).
Initial checks on the servers backing Crane's /work filesystem have been completed and look good. Due to the nature of the issues and to ensure data consistency we are performing a final Lustre filesystem check before bringing the Crane back online. This check is expected to run overnight.
We do not believe any data loss has occurred as a result of this outage, but please remember that /work is NOT BACKED UP and is essentially a scratch filesystem intended for running jobs.
Barring unforeseen circumstances, a final announcement will be made tomorrow (August 7th) when Crane is fully back online.
Due to the issue with the Crane /work filesystem, JupyterHub and Sandstone services are not accessible. Access to these services will be restored once the Crane issue has been resolved.
Crane's /work filesystem requires more exhaustive repairs than are possible with the filesystem online. All Crane servers including login and worker nodes will require a reboot as part of this process and all currently running jobs will be canceled / killed. No new logins or jobs will be allowed until this issue is resolved which is expected to be tomorrow (August 6th) at the earliest.
The Crane cluster is now fully down while the filesystem issues are worked on. Additional updates will be made once the cluster is back online.