Last month, Ian Linden described Linden Lab’s efforts to improve grid stability and there has already been an improvement in September’s unplanned outages, although it’s much too early to declare victory. However, one thing that was not discussed in detail in that blog post was what we are doing about Resident inventory loss. Residents have shown their frustration with inventory loss via numerous emails, calls, support requests, Office Hours discussions, Town Hall Meetings, and Project Open Letter. In response, we have begun an Inventory Loss Reduction Initiative within Linden Lab. There are currently a number projects under this initiative, which I’ll describe in this post.
Currently, the Second Life inventory consists of over 1 billion unique Resident assets whose size is 98 terabytes on disk. Each month over 15 terabytes of new data on disk is created by millions of inventory transactions. The Second Life Grid is large, with over 900,000 unique Resident logins each month across over 14,000 Regions. As a percent of all inventory transactions, the rate of inventory loss is low; however, when it happens to a Resident, it can be devastating. The primary challenge with Resident reported inventory loss is that we often cannot verify precisely where in the complex inventory system it occurred. As part of our ongoing efforts to focus on stability and performance, we have begun a Reduce Inventory Loss Initiative, which includes the following internal projects:
* Metrics Instrumentation
* Region Crash Reduction
* Asset Collection Improvement
* Bug Fixing
* Resident Reported Inventory Loss Analysis
* Architectural Enhancements
* Perceived Inventory Loss
In order to identify where and why inventory loss is occurring in our complex system we’ve created this project. For this project we are adding much more precise logging of error codes throughout the entire system for inventory rez, derez and transfers, as well as other system processes including teleport errors. For those not familiar with this terminology, in this context: rezzing is the act of taking an object out of your inventory and placing it in a Region, and derezzing is the reverse act of taking an item from a Region and placing it in your inventory. The first round of instrumentation in June 2007 allowed us to identify and fix a major derez problem, that was averaging 38,000 derez failures per day between June 24th and August 15th when it was completely eliminated. In addition, there was a much larger spike on June 24th and around July 27th due to bugs that were temporarily introduced and load testing of the fix on June 24th. The good news is this particular problem has been completely fixed. The second round of instrumentation went live on the Second Life Grid on 2007-10-01, and we’re just starting to analyze the data — we expect this will give us significantly more insight into where the problems are.
Region Crash Reduction
Residents sometimes experience data loss during region crashes. This is especially true if a Resident rezzes a no-copy item and then the Region has to rollback to a previous checkpoint. This happens occasionally, but most of the time Regions are rolled back without data loss. Currently, 8% of all Residents sessions are terminated because of region crashes and rolling restarts. While most of these region crashes are able to recover without a rollback or data loss, we know this number is unacceptable. We hope Havok 4 will play a major role in reducing region crashes. We are also fixing other non-Havok 4 region crash bugs as part of the Havok 4 program.
Asset Collection Improvement
Assets are the items in your inventory and those you’ve rezzed onto land. When an asset is no longer in any Residents’ inventory and not rezzed in any Region, then we collect it and set aside for deletion. However, because it is a very complex problem to determine which assets are not being used, there are cases where we accidentally collect an asset, which is still in someone’s inventory. When you get an “Object cannot be found in inventory” error message, asset collection might be the cause. Fortunately, we have a program that is watching and puts the item back in your inventory within an hour if we still have it. So next time you get this error, do be sure to check back in an hour. We have just kicked off a project to improve asset collection to fix the underlying problem of accidentally collecting assets still in use.
Inventory loss bug fixing is not a new project, but it is important to point out that it’s an ongoing effort and inventory loss is considered very high priority. Sometimes we find code glitches or reproducible bugs causing inventory loss which have straightforward fixes, and these are given a very high priority. We encourage you to report inventory loss bugs with all relevant details in our public Issue Tracker, especially if they have solid reproductions, because they help our engineers pinpoint problems and fix bugs faster. See SVC-242 for an actual example.
Resident Reported Inventory Loss Analysis
While we expect to make significant improvements with the projects above, we know that we will not catch all sources of inventory loss. Thus, we are planning on starting a project within the next month to collect and analyze patterns of inventory loss reported by our Residents. This will help us develop use cases of where inventory loss is still occurring and allow us to validate whether we have really fixed various problem areas.
There are some longer-term architectural enhancements we are planning that will significantly improve the robustness of our system. These are discussed in Ian Linden’s Post from August and include our moving Second Life to web-services based technology.
Perceived Inventory Loss
While there are many causes of actual inventory loss, there are also cases where the asset is recoverable by the resident without help from Linden Lab, primarily because the asset was never truly lost. For example, sometimes in order to see missing inventory items, you need to clear your Second Life Viewer cache, logout and back into Second Life. We have a very helpful wiki page that describes Inventory Loss Recovery Steps, which should be the first place you go when you believe you’ve lost an item. Looking through this rather long list, we realized that we could fix some perceived loss just by changing the look and feel of the UI (user interface) and inventory-related functions. Thus, we’ve started a project to reduce some of the causes of perceived inventory loss.
If you have further questions or would like to discuss this blog post inworld, I’ll be holding an office hour at Longfellow at 56, 146, 25 on 2007-10-17 @ 2 PM PDT. I’ll be posting again on this topic after we have gathered more data and made some more progress to keep you up-to-date. In the meantime, we’ll be heads-down on the projects I’ve described above. I believe we have made a good start on addressing inventory loss problems and are headed in the right direction. Thank you for your patience while we work hard to improve Second Life.