Sunday, June 1, 2008

The Planet - outage

The Planet is currently experiencing an outage which is effecting a number of customers' servers. This issue may also be affecting customers' ability to get through to our call center.

....

Today at approximately 5:45 p.m., a transformer in our H1 data center in Houston caught fire, thus requiring us to take down all generators as instructed by the fire department. All servers are down.

We are working with the fire department, with our facilities staff on site, to assess the situation.

....

We have determined that no servers in the data center have been damaged. Nonetheless, they are down because power is out. Teams across the board are working to take appropriate action.

....

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.

This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.

We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.

There is no impact in any of our other five data centers.

I am sorry that this accident has occurred and I apologize for the impact.

....

As you know, we have vendors onsite at the H1 data center. With their help, we’ve created a list of equipment that will be required, and we’re already dealing with those manufacturers to find the gear. Since it’s Saturday night, we do have a few challenges.

We are prioritizing issues as follows:
  1. Getting the network up at H1 is first and foremost. We’re pulling components from our five other data centers – including Dallas – which will be an all-night effort.
  2. Getting power back to the data center is key, though it is too early to establish success there.
  3. Because ServerCommand is in H1, our legacy EV1 customers are blinded about this incident. We are in the process of moving the ServerCommand servers to other Houston data centers so that we’re able to loop them into communications.
  4. We absolutely intend to live up to our SLA agreements, and we will proactively credit accounts once we understand full outage times. Right now, getting customers back online is the most critical.
....

Our UNIX and development team continue to work to restore service to both ServerCommand and EV1 DNS. Based on current information, 4 of the 8 DNS servers are in service and we expect the remaining DNS servers to come online within the next 180 minutes. The same approx. time line holds true for ServerCommand. The server farm has been relocated to another data center and development is currently working on bringing the services back online.

In terms of the facility, we do not have a firm ETR at the moment. Facilities continues to work with our on-site vendors to acquire replacement equipment and get it installed to bring service back online.

We do not have an Estimated Time to Repair at present. Our staff and management continue to work through the night and we will continue to provide hourly updates.

....

The networking teams are ensuring connectivity to bring ServerCommand back online. Please expect another update on ServerCommand shortly. We are seeing fewer DNS issues as the new addresses continue to propagate.

Our primary focus is to hit our 5:00pm CDT initial power test and all necessary staff are onsite and are working diligently to hit this deadline. Additional staff and spare server hardware is being delivered in on site in preparation for bringing customer servers online pending a successful power test.

No comments: