FJ Linden here, with my monthly grid update.
It’s been a good stretch of grid stability over the last month, with one very poor day in the mix. Some central database issues and then a Level 3 outage in the middle of the month cascaded into a series of problems, although we were able to isolate and fix them in just over 3 hours. However, that event only served to reinforce just how important it is to bring LLnet online, and quickly. On that topic, I’m pleased to start this month’s updates with the status of LLnet.
LLnet 30 Days Ahead of Schedule
LLnet, our private fiber optic ring, is a good 30 days ahead of schedule. This network, which will privately interconnect our datacenters, will allow us to move away from VPN reliance. “LLnet” fiber facilities have been delivered into our 3 data centers, and are currently in the configuration and testing phase with the routing infrastructure. This work should be concluded by the end of this week, and we will then start full testing in a production environment. We want to move as quickly as possible, but also do not want to destabilize the grid for the sake of speed, so we will take most of December to finish production testing, and begin cutover of live traffic in late December or early January. We have thousands of machines across the data centers, so the cutover process is expected to take about 60 days, but we have been very good (so far) at beating our projected dates.
On the infrastructure project front, we’ve completed most of the HTTP Dataserver project to migrate all C++ mysql traffic from mysql protocol to http(s). This project will allow us to move farther away from VPN dependency as well as off of MySQL wire protocol over the WAN, to better enable tracking and monitoring of queries. We expect to be through testing in the next week.
Agent Inventory Services
Agent Inventory Services is scheduled to be deployed with the server code update in January. This is one of the ongoing projects to address inventory issues for Residents.
These projects are both designed to provide more reliability, especially as it relates to inventory delivery and database queries, by better handling messaging across the databases and simulators, as well as back to the viewer. I intend to use my December/January post to talk about our strategy for inventory services, our storage strategy and our thoughts on our data architecture.
My primary goal has always been to improve grid stability and reliability and we are making great strides on that front. We’re not through the woods yet, but I want to re-emphasize how important I believe it is to address “foundational” issues that have the potential to cause huge impairment (like network problems), and then decide how we scale other components of the infrastructure.
Finally, I have made some internal organizational changes over the past month, that I hope will begin to drive more specialization in some key areas. This included adding a new network director, and more focused team leads managing databases, asset management, and data services. My belief is that, in addition to sound technical strategy, we need the right organizational alignment and specialized technical skills to achieve long term stability and scalability on the grid.