Some information about the system outage last week:
Last Thursday, around 1:30 AM SLT, the asset server crashed. The asset
server is an essential component of the SL cluster: first the residents
noticed slowdowns and missing assets (textures, avatar appearance,
etc.), then the entire grid had to be taken offline. It was a long and
painful night for the on-call responders.
The asset server is on a fault tolerant distributed filesystem. On one
hand, this makes the Thursday crash pretty mysterious. We’re not sure
exactly why it went down. On the other hand, the asset server’s never
crashed like this before, so it’s been doing a fairly good job at
surviving disk failures and the like.
We’re still working on what caused the crash and how to prevent it from
happening again. Going forwards, we’re also considering different
configurations for system-critical data storage.
~~ beez Linden