Rolling Restart – Thursday, Nov. 1st [COMPLETED]

We’re pushing out a few server-side patches via a rolling restart.

The restart will progress North to South across the grid, giving each region a 5-minute warning before restarting, and will take approximately 4 hours. Region managers can postpone restarts for up to one hour via Region/Estate properties.

The changes include:

  • Tweaks to behavior of search flags controlling inclusion of information in the upcoming Search changes
  • Fix to internal land sale auction IDs (which was time consuming for Lindens to work around)
  • Add ability for Linden Lab to shut down individual inventory databases for maintenance without kicking everyone off of Second Life.

The last change should allow the maintenance work previously mentioned to be performed without a complete outage.

UPDATE @ 2:17 PM: Restart initiated

UPDATE @ 3:50 PM: The restart is about halfway complete at this point, not counting any cleanup we’ll have to do at the end.

UPDATE @ 5:30 PM: Less than an hour remaining.

UPDATE @ 6:52 PM: The rolling restart is complete.  A small number of regions (less than 50) remain down, but
will be up shortly.

20 Responses to Rolling Restart – Thursday, Nov. 1st [COMPLETED]

  1. Elendir Axon says:

    If I read this right this means we can now shut down problematic databases and restore stability to the grid without taking the grid off line, abet the cost of some people not being able to grab their inventory, or maybe even sign on? Sweet, separation of assets and redundancy is what the grid needs and this is the step in the right direction for better stability and uptime.

  2. Novis Dyrssen says:

    A little warning would have been nice, though. I only noticed it because one third of the sims I tp’ed to were about to restart…

  3. Cheese cake says:

    Woooooooo! Go lindens! Spank those naughty severs till they behave. >:O

  4. Elindir Axon: “…this means we can now shut down problematic databases and restore stability to the grid without taking the grid off line, abet the cost of some people not being able to grab their inventory, or maybe even sign on?”

    Yeppers! If an inventory server (and we have about 20 of them, with users spread across them) is offline, users won’t be able to connect. But that’s better than everybody being kept offline, which is how the system worked prior to today. Note that we need to manually set this state; the system is not yet smart enough to detect this failure and take action by itself. (You don’t want a temporary network glitch to kick half of the residents off if you don’t need to.)

  5. Elendir Axon says:


    Thanks for the response, perhaps response scripts can be made to detect when databases are either unresponsive or sluggish and try different routes to restore connectivity and if not, send an alert to a employee to ask for permission to shut down said database?

  6. Gomez Bracken says:

    All hain the “planned outages calendar” 🙂

    Otherwise known as the “Look what we did last week” calendar 😦

  7. Jeff McNeill says:

    It would be great if a better error message than “Second Life cannot be accessed from this computer” were given. I spent half an hour trying to email an account that has been discontinued, searching in the FAQs, filing a bug report, all before I found out that the grid is actually down, even though Grid Status says Open. Yummy.

  8. Gennifer Meredith says:

    I’m still wondering why I’ve been having horrid lag, inability to rez, inability to access inventory, ‘loading clothes’ issues, slow rezzing of objects (namely clothes) for three days now, even before this rolling restart started.

    It’s happened in several regions, but not all.

    I’ve seen nothing about this in any of the blogs, but my friends and I all are seeing it, we’ve been seeing it all week.

    I realize that S.L. is huge, okay? I’m not really complaining too badly, but this is hundreds of people across several regions. Hasn’t anyone up at L.L. taken notice of this?

    I mean, right now, basically, I’m standing in my S.L. bedroom, naked from the waist down, for the last ten minutes, because clothes won’t rez right. A friend of mine last night had a woman’s body, something he wasn’t really too happy about (although I thought it was funny). I’m not inclined to laugh any more.

    C’mon guys, wasn’t this rolling restart supposed to fix some of this?

  9. Storyof Oh says:

    does this fix the trash can bug? a newbie mate has about 100 cans in inventory… the UK they are talking about a ‘pay as you throw’ tax…is this an extra charge in SL by stealth ? :))

  10. ONE PO'd RESIDENT says:

    @#8 i have had 4 lindens trying to figure out whats going on with my sim same things as yours happening the sim doesnt communicate with data base cant get inventory…. cannot rez …..cannot take… cannot save cannot dl texture …customer transactions failing…. scripts broken… lost from database errors lag lag lag people flailing in the air sim FPS and dilation dropping to 1 and zero
    i think everyone having problems needs to run out and buy the lindens a box of Q tips so they can clean there EARS and hear us SCREAM THERE IS A PROBLEM HERE!

  11. Farallon Greyskin says:

    I just got back on and everything is SO SLOW. I’ve not seen it like this bad in a LONG time.

    Rezzing objects even chat.

    Then someone tells me there was a rolling restart while I was off line.. Coincidence?

    Something has cause some really REALLY slow asset response times… but not just asset, lots of other things too.

  12. Abigail Merlin says:

    I sure hope that with this roling restart the planned update of assets servers can go ahead soon, they sure seem to need it.
    Next step mirrored databases? so a single database failing, temp, network or otherwise, won’t effect opperation as much.
    *puts pitchfork back in the rack for now*

  13. Farallon Greyskin says:

    Hmm, actually, things seem to be back togeather for my sims now. (Hosted in SF)

    Whatever the problem was it looks like it was part of what eventually killed the entire grid in SF but was fixed early this morning.

  14. @9 Storyof Oh:

    The Multiple Trash Can bug was caused by creating new accounts while using the 1.18.4 RC0 or RC1 viewer. This is fixed in RC2, and we’ll be doing behind-the-scenes cleanup on accounts that suffered from this problem, so it should magically fix itself.

  15. @13 Farallon Greyskin: There was a network outage after the rolling restart as was mentioned here:

    (There’s another one going on as I type this, which our systems engineering team is frantically trying to diagnose – appears specific to one of our colocation facilities.)

  16. U M says:

    oh was it?……thinks

  17. Montana Corleone says:

    Well as always, it’s not inability to connect, but mega performance degradation that has ocurred since this was employed, presumably to fix some of the problems with the previous one: mega lag; ages for everything to rez; inventory slow to load; money not reloading again; Group IM “No Perms to post” resurfaced again; loads of big name sims say you cannot access this tp location, and the map shows them empty, days after the “resolution” of the SF failure, eg the cluster of ETD Isle/Canimal/Dazzle/Celestial City; tp failures resurfacing; crashes galore; scripts totally borked or timing out on many (eg take a look at some Money Trees, when the posts eventually show, they don’t show the hovering text data, so a friend who’s just joined says); continued problems with comms such as chat and IM lag, messages not getting through, external problems eg to commercial web sites many merchants use, leading to delivery failures and customer service issues (which we have to cope with and can’t ignore like LL) to mention just a few. The latter is a problem since the new external msessage system was installed in 1.15. There are still performance and drawing related problems, particularly alphas and lighting, since the introduction of the new “faster, improved” render pipeline in 1.14.

    Whatever it is you are doing and tweaking, stop, roll back, and let performance get better first, like we have all been pleading for months and months and months and months and months, before all of the older people finally leave. You are losing 10-15% of older premiums every month from your key metrics taking into account the referrals bonus.

    Normally you can’t implement one thing right. Here you are messing with Het Grid (which means you no longer have to tell us what you are doing exactly), comms, Windlight, a New Search, Havok 4, whatever is in the RC that doesn’t work, a different viewer from Electric Sheep, an influx of people from CSI:NY, and upping the concurrency by 6K – all at the same time – so it’s hardly surprising the whole thing is borked up. Will you ever learn? Will you ever listen?

    So many bugs are resurfacing again you have the sneaky suspicion that all these unasked for shinies are implemented using old code bases, over riding all the bug fixes (few that there are) that have been issued.

    Take a look at the rest of the software world guys, and how they implement things. There, they test properly, and their upates fix things not make them worse. They coddle their databases and guess what? They work…

    The Tao and Linden way of doing things clearly doesn’t work, and should be scrapped, with those who still think it’s a good thing got rid of, by firing squad if necessary…

  18. Since again almost comments are closed…
    Damnit when is Linden Lab going to set up some normal Customer Services !!!!!!!!!!!
    No music I have all of the sudden from sl since days, and boy am I sure that all setting are ok. All outworld music and streams works fine.
    And can we find any Linden for support, none so ever 😦
    GEEEEZZZZ GET CUSTOMER SERVICES ORGANIZED IS THAT SO DIFFICULT ? There are only around 50.000 people online at the same time; what is the difficulty to get basic CS setting up to normal standards. Hire profs to get it organized !!

