BBC knocked offline by network failure
By Nicole Kobie
Posted on 30 Mar 2011 at 08:21
The BBC is looking into the causes of a technical failure that knocked its entire web estate down for an hour last night.
BBC News editor Steve Herrmann said the sites were down for a full hour, but came back live at about midnight. "We haven't yet had a full technical debrief, but it's clear it was a major network problem," he said in a post on the BBC blog.
BBC staff suggested via Twitter it was down to a "faulty switch" or configuration problems. Others suggested that it looked like a problem with the BBC's DNS servers.
A BBC news report said Siemens, which provides the broadcaster's IT support, was looking into the problem, but its engineers got the sites back online by powering down equipment and turning it back on again.
During the outage, many took to Twitter to suggest more dramatic causes, including a distributed-denial-of-service (DDoS) attack by Anonymous.
"It's not often we get a message from the BBC's technical support teams saying, 'Total outage of all BBC websites'," Herrmann said.
"We'd like to apologise to everyone who couldn't get onto the BBC News website during that time," he added.
Update: The BBC's controller for digital distribution, Richard Cooper, has issued an explanation to the outage, saying multiple failures knocked the sites offline.
"Our systems are designed to be sufficiently resilient (multiple systems, and multiple data centres) to make an outage like this extremely unlikely," he said in a blog post. However, I'm afraid that last night we suffered multiple failures, with the result that the whole site went down."
"For the more technically minded, this was a failure in the systems that perform two functions," he added. "The first is the aggregation of network traffic from the BBC's hosting centres to the internet. The second is the announcement of 'routes' onto the internet that allows BBC Online to be 'found.' With both of these having failed, we really were down."
He said the BBC would take a "hard look" at its systems to ensure such a fault didn't happen again.
The Fall back position if IT managers everywhere.
"have you tried switching it off and back on again madam"
Funny how it works 9/10 times though.:-)
By Jaberwocky on 30 Mar 2011
Wondered why my streaming radio app could not get BBC last night ...
By mike0whit on 30 Mar 2011
I've know idea what he means by "announcement of 'routes'". I'd prefer a actual technical explanation.
By peterm2k on 30 Mar 2011
I agree with peterm2k, his description for the technically minded is very vague
By DaChimp on 30 Mar 2011
It is vague but I imagine he means either a dns problem or all of their routers started sending faulty updates to each other
By PCmaster1000 on 31 Mar 2011
1 hour ?
The website may have been "down" for an hour, but it was completely unusable for a great deal longer than that, more like five or six hours.
By howardabates1 on 1 Apr 2011
- Play it again: Berlin's Computer Game Museum
- Switching from iPhone to Android: what I miss, what I don't
- Tech City: Easy to score when you move the goalposts
- How to remove SkyDrive from the Windows 8.1 Explorer
- Switching from iPhone to Android? Switch off iMessage
- Why is Google pumping more money into Firefox?
- Sky Broadband Shield review
- Samsung Galaxy S4: how to double your battery life
- Motorola Moto G review: first look
- IBM Watson meets Willy Wonka
- The importance of load balancing
- Windows Phone App Studio: an easy way to create your first Windows Phone 8 app
- The end of Windows XP support: what it really means for businesses
- Don't rely on Chrome's password vault
- Using Buffer to manage your social media
- Microsoft needs its own Steve Jobs
- Forget credit cards: hackers want your Facebook account
- Can't get fast enough broadband? Here's what to do
- Leap Motion and the battle against UI stagnation
- How to build a really bad network