[RESOLVED 7 Feb 08 1:18 pm PST –teeple]
This protracted battle appears to be over.
The final fix involved testing and adjusting timeouts at several layers, from Application to Transport, inclusive.
The Web team took the occasion for some other housekeeping initiatives on caches and timeouts.
Those of us who post here occasionally find ourselves in the position of play-by-play announcers. We think it’s important to let you know that we’re aware of a problem as quickly as possible, instead if letting it sit until we can blog a tidy post mortem.Because problems in a heavily distributed environment can be subtle and difficult to thoroughly trace, we sometimes have to relay unpleasant surprises (in the form of the infamous [RE-UN-RESOLVED] tags) as our tech team closes in iteratively on an operational glitch.Having said all that: These intermittent failures with our Website seem to be setting some kind of record for aggravation.Here’s what we know so far:
- It looks like a load balancer’s mostly to blame. A really stubborn load balancer.
- If you get a 500 Error, a 503 Error, or a redirect error, hitting ‘refresh’ in your browser will generally get you a good connection the second or third time around.
- Your irritation is thoroughly understandable, and we apologize for contributing to it with premature reports of resolution.
- Multiple Web-savvy Lindens are focused on this. Some of them, still on the road to work, don’t know that yet. They’ll find out very shortly.
- We’re not going to call this [RESOLVED] again until we’re sure the tag fits.