Lightning strike takes Amazon's cloud offline
By Stewart Mitchell
Posted on 8 Aug 2011 at 09:07
Some Amazon cloud customers could have to wait two days before services come back online after a weekend lightning strike in Ireland knocked servers out.
According to Amazon, the strike hit a transformer at a utility provider and sparked an explosion and fire that interrupted power and the back-up generators failed to deal with the problem, leaving its Elastic Compute Cloud (EC2) down.
The latest update from the company explained that some customers will have to wait another 48-hours for services to be returned following the strike, which reportedly caused problems for Microsoft's Business Productivity Online Suite - the predecessor to its new Office 365.
“Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators,” Amazon said in its service status update.
Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored
“The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronises the backup generator plant, disabling some of them,” Amazon said.
In a stark warning to companies relying on cloud services for critical business, the company said it had restored power to the main centre Availability Zone, but was still recovering service on EC2 servers as it dealt with capacity issues that meant many companies were unable to access databases.
Amazon said it had to manually update individual servers, which requires a back-up of all data, with a capacity shortage making matters worse.
“Due to the scale of the power disruption, a large number of EBS [elastic block storage] servers lost power and require manual operations before volumes can be restored,” the company said.
“Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process.”
Amazon said it was adding additional capacity and switching capacity from other regions, but admitted some customers would be without services for up to two days, and may still have file issues to resolve before they are back up and running normally.
“While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed,” the company said.
“In some cases, EC2 instances or EBS servers lost power before writes to their volumes were completely consistent. Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service.”
Cloud disabled by lightning
Oh the irony.
By Lacrobat on 8 Aug 2011
Does this only affect companies that implemented a half-arsed approach to cloud? Those that offsited their data into a single EC2 datacentre, without going the whole hog and using the redundancy of hosting in multiple EC2 data centres?
By big_D on 8 Aug 2011
That would be a Cumulo Nimbus cloud then
Obvious, but it is only Monday morning
By botwot on 8 Aug 2011
Maybe I'm misunderstanding what "cloud" is supposed to mean but - isn't none of this supposed to happen? Isn't capacity supposed to just be there? Isn't the cloud supposed to be impervious to a single point of failure?
Surely, this is no more "cloud" than any off-site data centre is? Or is this the reality of "cloud"? One bucket?
By AdrianB on 8 Aug 2011
Not sure it's a bucket
More like, say, the M25...
Seems to me that this is all about pricing. Those who have rushed to the bottom of the cost curve have minimised spend, but then maximised risk. After all, looking at Amazon EC2's status pages, this is what, one line out of about 80 lines?
It is rather freaky that one substation can take out both EC2 and BPOS in Ireland though - again to be fair, that's more a comment on the irish government's business tax break schemes resulting in clusters of hi-tech companies, than it is a cloud architecture commentary.
By Steve_Cassidy on 8 Aug 2011
By Lomskij on 8 Aug 2011
What surprises me is
that the backup generators failed!! surely they should have been tested when they set up the datacentre during a simulation to see if they would work in a real situation,I guess Amazon have been caught with their pants well and truly down!!
By DeanC on 8 Aug 2011
Noticed the ad below.
"Cloud Power can change the way you do Business" :)
Certainly those at Amazon Ireland changed the way did Business.
By nicomo on 8 Aug 2011
Amazon seems to be having a lot of cloud problems.
If I did not know better I would think they were in the monsoon season.
It appears this latest problem was not a lightning strike, but a (possible) defective transformer.
Such data outages hardly add confidence to the cloud system, when operational and business integrity is suffering loss.
The acceptable rate of such loss is ZERO when it may mean an entire business failure.
It never rains when it pours.
By lenmontieth on 11 Aug 2011
- Switching from iPhone to Android: what I miss, what I don't
- Tech City: Easy to score when you move the goalposts
- How to remove SkyDrive from the Windows 8.1 Explorer
- Switching from iPhone to Android? Switch off iMessage
- Why is Google pumping more money into Firefox?
- Sky Broadband Shield review
- Samsung Galaxy S4: how to double your battery life
- Motorola Moto G review: first look
- IBM Watson meets Willy Wonka
- Google’s support policies shove users towards Chrome