xbanchon@telconet.net (Xavier Banchon) wrote:
> Does anyone have issues with Internet connection through NAP of Americas?
Yes - there's obviously been some failure on the DC power, which
took the peering grid down (and a few ISPs, too). Session's have
come up again around an hour ago.
Btw - anyone there and not peering with 31529 (.de ccTLD service),
please drop me an email. It's pretty hard to get a list of
participants...
Cheers,
Elmar.
--
We do have problems since 13:27 CET
BR
Philipp
Friday, September 11, 2009
NAP of Americas
Friday, August 21, 2009
Equinix Paris Facility Hit by Cooling Outage
>>Equinix Paris Facility Hit by Cooling Outage
The problems began late Wednesday afternoon as temperatures in Paris rose into the upper 90s. “Multiple chillers that support the second floor failed, and the standby chiller system did not start in time to absorb the load,” Michel Brignano, General Manager of Equinix France, wrote in an incident report which was posted to the FrNOG list.
“This impacted temperatures on the second floor and had indirect effect on the ground floor as well,” Brignano continued. “The specific causes of the failures are still under investigation but there appear to have been component/subsystem failures in at least two of the three primary chiller systems supporting the second floor. At this time, we can not say definitively whether the failures were related or not.”
Thursday, August 20, 2009
Nouvel incident chez Equinix à Paris
C’est l’été, et les centres de données sont toujours autant sollicités.
Après une première panne début Juillet lié à une erreur humaine, c’est cette fois un incident mettant en cause l’infrastructure de refroidissement qui est survenu hier, dans le centre Equinix à Saint-Denis.
La panne aurait débuté vers 16h et a provoqué plusieurs ralentissements chez quelques clients d’Equinix en raison de l’arrêt de matériels (serveurs, routeurs).
Il aura fallu attendre le début de matinée, ce Jeudi 20 Août, pour que la situation revienne à la normale.Sunday, September 14, 2008
Ike Hammers Texas Internet
>>The counties around Galveston and Houston, TX (most notably Harris County) have suffered a slowly climbing number of network outages over the last day. We expect to see this number continue to climb as the secondary effects (e.g. power loss, UPS battery failure, generator fuel unavailability) of the storm hit the region.
We examined the withdrawals reported in BGP for prefixes (networks) in Arkansas, Florida, Louisiana, Mississippi, Oklahoma and Texas and notice that, aside from Texas, the coastal Gulf region has fared pretty well against Ike, so far. The Texan cities and counties immediately in the path of Ike are, however, definitely and noticeably suffering network connectivity failures.
Affected cities, counties and organizations
To get a sense for the organizations affected in the initial aftermath of Ike, we can look at the top ten organizations, top five cities and top five counties, sorted by number of networks suffering an outage as of approximately 13:00 CDT. Notably, these do not account for the relative sizes of the organizations, cities or counties.
Organizations affected (prefix count)
- Schlumberger Limited (19)
- AT&T WorldNet Services (15)
- NASA/Johnson Space Center (12)
- Suddenlink Communications (10)
- Level 3 Communications, Inc. (10)
- Cebridge Connections (10)
- Internet America, Inc. (9)
- Comcast - Houston (9)
- NTT America, Inc. (8)
- Moore Concepts (6)
Cities affected (prefix count)
- Houston (110)
- Kingwood (14)
- Dallas (14)
- The Woodlands (10)
- Friendswood (5)
Friday, July 11, 2008
Is Facebook Down?
>>The best thing that’s happened to Facebook: Apple’s MobileMe outage, the iPhone launch and iPhone activation problems across the board. Why? Because no one seems to be reporting on them being out for most of the morning.
"iPocalypse."
>>It's already being called the "iPocalypse." Many early purchasers of the new iPhone 3G are unable to activate their phones through iTunes because the activation servers are overwhelmed. Similar problems are being reported with U.S. activations of AT&T service and U.K. activations through O2. The problems with the activation servers have become the story of the day, transforming iPhone mania into one of the most public crunch-time operational failures imaginable. Here are some of the headlines on major tech news sites:
Sunday, June 15, 2008
Outage in San Jose, California - Internap
We’ve had a major outage on Friday evening in our San Jose, CA data center. Our data center provider tells us that its the biggest in 11 years. This was our own biggest outage to date. It turns out it was a configuration issue that happened at Internap that had caused this problem (we use Internap bandwidth). No hardware had failed.
I am not sure why Internap did not know that this change had caused a major network outage, until our network provider escalated - that too after a long time. Internap is a premium provider, so this is a question we are still waiting to hear back from our network provider/Internap. It is unacceptable.
To our customers - I am very dissapointed and concerned with what had happened. Entic.net is a startup, and is marketing to high quality hosting market. So, this kind of thing just throws the work we do, out the door, especially since we put in a lot of money to maximize reliability on the servers themselves. Sorry!
Tuesday, June 10, 2008
Amazon outage
>>Dr. Supranamaya Ranjan, a senior member of the technical staff at Narus, says Amazon is typically adept at "load balancing" -- responding to customer visits by spreading its computing resources efficiently between the computer servers that are in the best position to respond to that customer at that time. But during the outages, he said, that was not the case; his visits to Amazon.com were often handled by faraway or already overloaded servers.
"That does lead me to conjecture that they are in the process of re-architecting the whole way their content distribution system works and the causes for this could have been this new Unbox service," Dr. Ranjan said.
Monday, June 2, 2008
the Planet - outage, part 2
'Extensive' Damage at The Planet's Data Center
>>Damage from Saturday night's explosion at The Planet's Houston data center was more extensive than initially believed, the company said late Sunday. CEO Doug Erwin reported that the explosion and fire destroyed the cabling under the first floor of the data center, known as H1. The Planet expects to have servers on the second floor of the facility back online early Monday, but Erwin said the downtime will be longer for 3,000 servers housed on the first floor.
LINX in London, as well, had a few problems:
The London Internet Exchange (LINX), one of the world's busiest peering points, was off-line for about an hour last Tuesday. The Register has the details on the incident, which was the second outage at LINX in the past month.
Sunday, June 1, 2008
The Planet - outage
....
Today at approximately 5:45 p.m., a transformer in our H1 data center in Houston caught fire, thus requiring us to take down all generators as instructed by the fire department. All servers are down.
We are working with the fire department, with our facilities staff on site, to assess the situation.
....
We have determined that no servers in the data center have been damaged. Nonetheless, they are down because power is out. Teams across the board are working to take appropriate action.
....
We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.
This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.
We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.
There is no impact in any of our other five data centers.
I am sorry that this accident has occurred and I apologize for the impact.
....
As you know, we have vendors onsite at the H1 data center. With their help, we’ve created a list of equipment that will be required, and we’re already dealing with those manufacturers to find the gear. Since it’s Saturday night, we do have a few challenges.
We are prioritizing issues as follows:
- Getting the network up at H1 is first and foremost. We’re pulling components from our five other data centers – including Dallas – which will be an all-night effort.
- Getting power back to the data center is key, though it is too early to establish success there.
- Because ServerCommand is in H1, our legacy EV1 customers are blinded about this incident. We are in the process of moving the ServerCommand servers to other Houston data centers so that we’re able to loop them into communications.
- We absolutely intend to live up to our SLA agreements, and we will proactively credit accounts once we understand full outage times. Right now, getting customers back online is the most critical.
Our UNIX and development team continue to work to restore service to both ServerCommand and EV1 DNS. Based on current information, 4 of the 8 DNS servers are in service and we expect the remaining DNS servers to come online within the next 180 minutes. The same approx. time line holds true for ServerCommand. The server farm has been relocated to another data center and development is currently working on bringing the services back online.
In terms of the facility, we do not have a firm ETR at the moment. Facilities continues to work with our on-site vendors to acquire replacement equipment and get it installed to bring service back online.
We do not have an Estimated Time to Repair at present. Our staff and management continue to work through the night and we will continue to provide hourly updates.
....
The networking teams are ensuring connectivity to bring ServerCommand back online. Please expect another update on ServerCommand shortly. We are seeing fewer DNS issues as the new addresses continue to propagate.
Our primary focus is to hit our 5:00pm CDT initial power test and all necessary staff are onsite and are working diligently to hit this deadline. Additional staff and spare server hardware is being delivered in on site in preparation for bringing customer servers online pending a successful power test.