I want to explain our ongoing efforts to make Second Life as robust and stable as possible, while at the same time continuing to improve the quality and depth of the in-world experience. Innovative new features like 3D voice, WindLight, and Sculpties are essential for the continued evolution of Second Life; feature stasis just isn’t a good option. However, we are now devoting the majority of our development resources to considerably less visible behind-the-scenes improvements. These back-end changes have long-term benefits for the entire community, and will also help the grid withstand the sort of infrastructure failure we’ve encountered recently (described in my post here). I’ll use the rest of this post to highlight this work.
Fixing individual bugs and system failures remains a high priority for us, and our largest (and growing) internal development group is dedicated to these tasks. The open source community is also a great help in this effort, and our team integrates a large number of fixes submitted by external developers. Some of these improvements were recently highlighted in Sardonyx Linden’s post on this blog.
Much of the server-to-server communication happening within the Second Life grid is based on old technology which has served well beyond its initial design. When a failed database causes grid-wide problems, a de-rez doesn’t complete, or a teleport explodes on lift-off, this is usually why. So, there are currently multiple teams working on the wholesale elimination of this technology, replacing it with a modern web-service based approach. This is a big effort and will take a long time, but the benefits are myriad: when finished, this will improve reliability, eliminate data bottlenecks on crowded sims, allow multiple versions to co-exist, enable global scalability, and allow us to dispense with VPNs altogether. We’ve started to see first fruit in the form of fewer planned outages; most new releases no longer require us to shut down the whole grid.
Fewer Failure Points
In addition, Linden Lab engineers are working to improve our database failover capabilities and to reduce single points-of-failure in the network infrastructure. We’re also refactoring all of our recurring data maintenance processes to reduce the impact of these on the main grid. Ideally, all of this work will be completely invisible to residents, but it will mean less grid downtime.
Generally Second Life client crashes are handled by the ongoing bug-fixing project mentioned above, but a group of our graphics specialists have recently formed a team to improve the rendering code which is at the root of most crashes. This should lead to a more stable client on a wider variety of computer hardware.
We’ve been talking about upgrading the Havok physics engine included in the sim software for years, and haven’t managed to deliver. However, this project is making real progress now and will be ready for a beta test soon. The Havok 4 version in development can withstand the majority of physics glitches and attacks that cause most sim crashes. A little farther out, we’ll be replacing the guts of LSL execution with Mono, which was on hold for a long time but is moving forward again. This will result in massive performance gains for scripts, but will also make sims (especially very busy sims) more stable.
A New Response Team
As Second Life has grown, so too has Linden Lab. We’ve recently created the Production Operations team (a virtual fire department, if you will), which will add a new level of grid monitoring and trouble response. This group will provide superior 24/7 monitoring, and will diagnose, solve and escalate technical issues that arise on the Second Life grid, network and website. These engineers will also develop tools and automation technologies for monitoring and repairing Second Life more efficiently. We’re currently hiring new members for Production Ops all over the world – the job listing is here.
We’ll periodically be posting updates on our continued efforts, and their relative success or failure. In the meantime, we depend on your feedback – please continue to report the bugs you experience.
As Second Life matures – more residents have more time, effort and money invested in-world than ever before – Linden Lab must be accountable for making the virtual world robust and stable, without losing sight of all the new capabilities we hope to add. There’s a great deal of work to be done, but we have far more resources than we did a year ago, and there are more engineers dedicated to stability and scalability than are working on 3D voice, WindLight, or sculpties combined. I believe that we’re headed in the right direction.