Hello, I’m Frank Ambrose, the Senior VP of Global Technology, and I’d like to take this opportunity to let you know about some of the work we’re doing on the Second Life Grid.
By way of introduction, I’m a recent hire here at the Lab, having joined to lead our global technology team. Specifically I’ll be focused on grid infrastructure and our stability initiatives. As noted in the press release, I come to the Lab from many years at AOL (and prior to that MCI), where I experienced the kind of explosive growth, global scale and inherent stability challenges we face here at Linden Lab.
More than anything else, my tenures at those companies taught me the direct relationship between platform stability and user experience. I’m looking forward to applying that lesson, and a host of others, as we work to maintain, build and improve this complex virtual world. I am keenly aware of the pain that any service outage can cause and am both excited and confident that Linden Lab has focused the right resources to achieve this critical objective.
Given the complexities in our architecture, our stability efforts span many individual areas, most of which were detailed by Ian Linden’s May posting. Some areas will be addressed through short-term initiatives, while others will require significant re-architecture, software changes and new physical hardware. Throughout it all, we’re committed to making the transition to a more stable world as seamless and transparent to you as possible. To that end, members of my team will be using the blog regularly to provide updates on plans and progress towards meeting our stability goals.
As part of our wider stability plan, we’re targeting 4 major infrastructure points both with long-and short-term goals: Intra-Grid Network, Asset Storage Cluster, Central Databases, and Host/Transit Data Services. The strategy is to develop and deploy near-term solutions to improve stability, while looking more broadly at our architecture (hardware, software, networks, etc). In the near term we’ve got a number of projects in flight to address some of these problem points. A couple of examples are:
– Asset collection. We’re collecting many assets that are on our storage clusters, but are rarely (if ever) accessed. These assets take up critical space on the clusters and potentially degrade performance and stability as we hit volume thresholds. We’ll be moving these files to different storage mechanisms and, while they will still be easily accessible, it will help us to avoid pushing the limits of our existing storage clusters, while still preserving all existing assets in a reliable storage environment.
– Reducing the need for VPN connections. Since we don’t encrypt communication between simulators and our databases, there needs to be a safe means to communicate across data centers and so we use VPN connections. The connections don’t scale well and can be unreliable, so establishing a new communications mechanism, that is both safe, scalable and reliable, is another short-term project.
These projects are just a sampling of the work that is currently being done to improve stability, and I’ll be reporting on their progress, as well as other short-term projects, in the coming months.
We have a lot of work to do but be assured that we have the right resources and internal focus to achieve our stability goals. From personal experience, I’ve encountered many equally complex challenges, especially in my time at AOL, and these problems are all solvable with the right level of attention and technical talent. We certainly have both, now we will start delivering.