Archive for the 'Bugs & Fixes' Category
I blogged earlier this week in response to our grid outages over this past weekend. We have put enormous efforts into fine tuning the data layer, specifically in optimizing queries and cleaning up the data structure.
However, there is a major maintenance step that needs to be completed tomorrow. We are scheduling a 60 minute maintenance window beginning at 5:30am PST. During this window, we will be migrating our central database to an optimized slave database, and making that slave the master. In this way, we expect a significant bump in performance and wanted to take this action as quickly as possible, most especially before our highest load times over the weekend.
During the maintenance window, we will be blocking logins, but those residents who are already in world will not be bumped offline. However, there will be a degradation in performance, as access to the database will be blocked, so transactions, teleporting, asset management and actions requiring a database call will not be available. Movement within a region, chat, and voice will all remain available during the database migration.
While we are sorry about this late notification, and the need to complete this maintenance work, I believed it was important to move quickly and aggressively to address our current data stability challenges, so I advocated completing this work tomorrow. We have made some real progress this week, and this maintenance activity will begin to take full advantage of this work. Thanks for your patience.
I will be in the forums for a short time this evening if you would like to comment.
FJ Linden here, to report on the latest Ongoing Updates from the Grid.
As I promised in my first post, this will be a regular monthly communication to keep all of you up to date on our efforts to improve grid stability and reliability. I’m finishing up my 3rd month at the Lab and have some significant progress to report.
I’m happy to report that we have an approved plan to move away from VPN reliance. We’ve finalized a design and chosen facility and equipment partners to build and deploy a private fiber optic ring to interconnect our datacenters. “LLnet” will be the designation of our private network and we have established an aggressive timeframe to activate it. I’m pushing hard to bring LLnet online by the end of this year (‘08), and begin a phased migration off of the VPN’s immediately after. Given the amount of traffic to move, I would estimate completion of this project by February or March of ‘09 at the latest. So we have a light at the end of the tunnel on one of our biggest stability issues.
(more…)
Hello, I’m Frank Ambrose, the Senior VP of Global Technology, and I’d like to take this opportunity to let you know about some of the work we’re doing on the Second Life Grid.
By way of introduction, I’m a recent hire here at the Lab, having joined to lead our global technology team. Specifically I’ll be focused on grid infrastructure and our stability initiatives. As noted in the press release, I come to the Lab from many years at AOL (and prior to that MCI), where I experienced the kind of explosive growth, global scale and inherent stability challenges we face here at Linden Lab.
More than anything else, my tenures at those companies taught me the direct relationship between platform stability and user experience. I’m looking forward to applying that lesson, and a host of others, as we work to maintain, build and improve this complex virtual world. I am keenly aware of the pain that any service outage can cause and am both excited and confident that Linden Lab has focused the right resources to achieve this critical objective.
Given the complexities in our architecture, our stability efforts span many individual areas, most of which were detailed by Ian Linden’s May posting. Some areas will be addressed through short-term initiatives, while others will require significant re-architecture, software changes and new physical hardware. Throughout it all, we’re committed to making the transition to a more stable world as seamless and transparent to you as possible. To that end, members of my team will be using the blog regularly to provide updates on plans and progress towards meeting our stability goals.
As part of our wider stability plan, we’re targeting 4 major infrastructure points both with long-and short-term goals: Intra-Grid Network, Asset Storage Cluster, Central Databases, and Host/Transit Data Services. The strategy is to develop and deploy near-term solutions to improve stability, while looking more broadly at our architecture (hardware, software, networks, etc). In the near term we’ve got a number of projects in flight to address some of these problem points. A couple of examples are:
- Asset collection. We’re collecting many assets that are on our storage clusters, but are rarely (if ever) accessed. These assets take up critical space on the clusters and potentially degrade performance and stability as we hit volume thresholds. We’ll be moving these files to different storage mechanisms and, while they will still be easily accessible, it will help us to avoid pushing the limits of our existing storage clusters, while still preserving all existing assets in a reliable storage environment.
- Reducing the need for VPN connections. Since we don’t encrypt communication between simulators and our databases, there needs to be a safe means to communicate across data centers and so we use VPN connections. The connections don’t scale well and can be unreliable, so establishing a new communications mechanism, that is both safe, scalable and reliable, is another short-term project.
These projects are just a sampling of the work that is currently being done to improve stability, and I’ll be reporting on their progress, as well as other short-term projects, in the coming months.
We have a lot of work to do but be assured that we have the right resources and internal focus to achieve our stability goals. From personal experience, I’ve encountered many equally complex challenges, especially in my time at AOL, and these problems are all solvable with the right level of attention and technical talent. We certainly have both, now we will start delivering.
As your login splash screen may have already told you, an optional viewer upgrade is available today… Announcing the Second Life 1.20 Viewer! It features improved reliability and a more flexible UI architecture so you can select the color of the User Interface. It also brings several improved features and many important bug fixes. Download it here! For those of you who’d like more detail on what’s in the new viewer, please read on. (more…)
Today we are releasing a new Release Candidate, 1.20 RC14. If you have already been using the Release Candidate (RC13), you will be required to update to RC14 with the latest bug fixes. But, the Release Candidate is always an optional series of viewers that you may choose not to use — or use side by side on your computer with the main viewer offered on our Downloads page (and at get.secondlife.com).
This RC14 includes a few last polishes to the Classic and Silver skins, as reported broken in the previous iteration. We have also resolved a few outstanding behavior issues to Snapshots, and to Unmuting a muted Resident (VWR-1735)…. These were small features that were introduced along the way in RC7, but needed more complete thinking and revision based on your feedback in the Issue Tracker. In RC14, these now behave logically/as expected for those features.
We expect this to be the final Release Candidate in the 1.20 viewer series, barring any Showstoppers that you may find. Please of course continue to report any new issues (large or small) in the Issue Tracker, and be sure to set “Affects Version/s” to “1.20 Release Candidate”.
- When this version 1.20 becomes an official version of the Second Life viewer, it will NOT be a mandatory upgrade; you will still be able to log in with the current viewer 1.19.1.4 or the earlier 1.19.0.5 version. (Those versions will simply let you know that a new 1.20 version is available, if you wish to download.)
To get started with this RC14 viewer, visit the Test Viewers download page (NOTE: use the links at bottom of page, under Test Viewers).
(more…)
Update 2008-07-16 05:40am : We have reverted the ~1000 hosts on 1.23.1 to 1.22.4.
Update 2008-07-15 05:28pm : An issue with names showing up as “(???) (???)” in estate ban lists is showing up on the regions which have been updated to 1.23.1. We tentatively plan to revert those regions back to 1.22 by tomorrow morning, and will probably slip the 1.23 roll-out by another day. We will also be analyzing server crash data from this pilot roll to look for other issues not previously identified, before making a firm decision. – Joshua Linden
Update 2008-07-15 02:22pm : The pilot roll to 1174 regions is complete. However, because of an error Prospero made when starting the roll, there are about 300 regions that will remain down for another 10-20 minutes. For this, he apologizes.
We have identified and fixed the memory leak that was in server version 1.23.0. As such, we will be rolling out server version 1.23.1 to Second Life this week. This includes all of the fixes from 1.23.0– see 1.23.0 blog post for a full list of changes– as well as a fix of the object text newline bug (SVC-2633), and the memory leak.
The server will be rolled out according to the schedule
- Tuesday, sometime during the day : a pilot roll to 1000 regions. We are going to do a larger-than-usual pilot roll to have a large enough sample to verify that there are no other memory leaks beyond the one we’ve discovered and fixed.
- Wednesday morning, 5AM-10AM : we will deploy server version 1.23.1 to half of Second Life.
- Thursday morning, 5AM-10AM : we will deploy server version 1.23.1 to the rest of Second Life.
As usual with rolling restarts, this is a change on the server side; there will be no required client udpates associated with this rolling restart. Regions will receive warnings starting five minutes before they are restarted. There is no way to delay the restart of a given region. Regions should restart within 10 minutes of going down. If your region stays down for more than 20 or 30 minuets, please contact support.
Update 2008/07/11 11:15PM : The deploy is complete. (Apologies this wasn’t posted at 11:15PM last night.)
Update 2008/07/11 08:45PM : The backwards roll to revert servers from version 1.23.0 to 1.22.4 has begun.
Update 2008/07/11 04:44PM : We have discovered a somewhat increased crash rate and a likely memory leak affecting some simulators in version 1.23.0. This has not yet had a widespread effect on the grid. However, if we leave it running, problems may compound. We will be reverting all regions on 1.23.0 to 1.22.4 in a rolling restart starting at 8PM tonight.
Update 2008/07/11 1:04PM : We have verified the fix on the Preview Grid. We will re-roll the hosts that were rolled this morning tonight (Friday night) starting at 8:00PM. This half of the roll will take ~3 hours, and will affect all regions on 1.23.0. Tomorrow morning, according to the original schedule, we will deploy 1.23.1 to all hosts currently on 1.22.4.
Update 2008/07/11 12:00PM: A bug (SVC-2633) was identified shortly after the roll completed. A fix has been made and has been deployed to the “Second Life Beta Server” channel of the Preview Grid for testing. The plan is to re-roll the regions with a “1.23.1″ update as soon as the fix is verified and when we can be sure that the roll will not affect grid stability. Watch the Grid Status Updates feed for additional info.
Update 2008/07/11 07:48AM : The first-half rolling restart is complete.
Update 2008/07/11 05:16AM : The first-half rolling restart has begun. We are disabling the land store for the duration of this morning’s roll.
Update 2008/07/10 08:50PM : The pilot roll to 310 regions is done.
Update 2008/07/10 08:00PM : The pilot roll to 310 regions is beginning now. (The error with the central servers mentioned below by Joshua was fixed, and the central servers are all now running version 1.23.0.)
Update 2008/07/09 07:47PM : we are postponing the rolling restart again by a day. We may need to postpone it until next week; we will make that call tomorrow (Thursday). At the moment, the plan is to have the pilot roll Thursday evening, followed by half-grid rolls on Friday and Saturday morning. The schedule below has been updated to reflect this.
[FYI - The initial slip (Tue -> Wed) was due to a subtle bug discovered during internal testing that would have affected voice chat for a small fraction of residents - this was not caught during earlier testing or reported by Beta Server testers on the Preview Grid. The next slip (Wed->Thu) follows an initial roll-out of internal web service updates on Wednesday night - errors were detected during the roll-out, so the code was immediately rolled back, we are currently investigating the issues. -- Joshua Linden]
Update 2008/07/08 03:56PM : we are slipping the rolling restart times by one day, so that the pilot roll will be Wednesday evening, and the full roll will be Thursday and Friday mornings. The schedule below has been updated to reflect this.
We will be deploying server version 1.23 to Second Life in a rolling restart next week, following the schedule:
- Tue, 07/08, Thu, 07/10, 7:30PM : a pilot roll to ~300 regions
- Wed, 07/09, Fri, 07/11, 5:00-9:00AM : a rolling restart of half the grid
- Thu, 07/10, Sat, 07/12, 5:00-9:00AM : a rolling restart of the rest of the grid
Each region will be down for about 10 minutes; regions will receive warnings starting 5 minutes before they are restarted. If your region stays down for more than 20 or 30 minutes, please contact support. There is no way to delay the restart of any given region. No client upgrades will be needed as a result of this rolling restart.
Please help us ensure this code is as bug-free as possible! Test this code on the Preview Grid.
(more…)
Tired of seeing new Residents waddle their way through their first hours in-world? Ever been able to spot a newbie simply by their unmistakably waterfowl-esque sauntering? If so, we need you! Linden Lab is seeking bids for contract work to create an improved default male and female walk animation for new Residents. We are well aware of the wealth of talent in Second Life so we turn to you, dear Residents, to put your best avatar foot forward to help us give new Residents the start they deserve.
So if you live, eat, and sleep Poser, Maya, or 3D Max, we may have a job for you. If you would like to be considered, please drop a notecard with your name, preferred method of contact, and any links to your work on Bucket Linden. You must be experienced with animation in Second Life and able to submit files of your work with samples of realistic human walks or dances in Second Life. Linden Lab will work closely with any contractor to make sure all animations work with our system and won’t disrupt existing content. If you meet the submissions criteria, we’ll be in touch! Thanks for your continued support and together we can fight the unsightly gait afflicting so many of our underprivileged Residents every day.
Today we are releasing a new Release Candidate, 1.20 RC13. If you have already been using the Release Candidate (RC12), you will be required to update to RC13 with the latest bug fixes. But, the Release Candidate is always an optional series of viewers that you may choose not to use — or use side by side on your computer with the main viewer offered on our Downloads page (or get.secondlife.com).
This RC13 introduces… drum roll… the feature VWR-5059 which enables you to switch the look and feel of the User Interface between 2 options: the classic grey color or a lighter blue/silver theme. Go to Preferences>Skins. (Side note for SL historians keeping track: As you know, the multi-month project was code named “Project Dazzle”– but that’s really the name of the architecture juggernaut work that got us to refresh the UI. In the final presentation in 1.20 Viewer, we opted to call the resulting color option by a simpler name, ‘Silver’.)
EDIT 2008-07-10 11:30PST: To be clear, the feature VWR-5059 allows you to switch between the 2 pre-installed skins only, as requested by many Residents in this 1.20 viewer. This is not the complete Skinning architecture outlined in the Skinning project… which is a long term project and takes more time. Resident-created skins will still need to update their instructions to install them, if they are posted on the SL wiki. (more…)
Today we are releasing a new Release Candidate, 1.20 RC12. If you have already been using the Release Candidate (RC11), you will be required to update to RC12 with the latest bug fixes. But, the Release Candidate is always an optional series of viewers that you may choose not to use — or use side by side on your computer with the main viewer which is always offered on our Downloads page (or get.secondlife.com).
As I mentioned in the last RC announcement, we are currently developing code which will enable you to switch the User Interface between the classic grey color or a lighter Dazzle theme (as requested in VWR-5059). But this feature is not yet tested and ready to be released. In the meantime… we are releasing RC12 with the thread monitoring “watchdog” again disabled. Also, we have also eliminated the proposed change to Snapshot behavior that appeared in RC6– in which Uploading a Snapshot would crop your screen to enforce a powers-of-two dimensions.
Please continue to report any new issues in the Issue Tracker and be sure to set “Affects Version/s” to “1.20 Release Candidate”.
To get started with the new Release Candidate, visit the Test Viewers download page (Note: use the links at bottom of page, under Test Viewers)!
Release Notes for Second Life 1.20(12) July 2nd, 2008
=====================================
Fixes:
* Fixed: VWR-7178: ‘Upload a snapshot’ cannot take full screen snapshot; limited to square images
* Fixed: Allow the –set option to be specified multiple times on the command line
* Fixed: Disable the thread monitoring (watchdog) in settings again for the Release Candidate
Localization Fixes:
* Fixed: VWR-7086: floater_buy_land.xml still contains messages regarding First Land
|
2