BBC knocked offline by network failure
By Nicole Kobie
Posted on 30 Mar 2011 at 08:21
The BBC is looking into the causes of a technical failure that knocked its entire web estate down for an hour last night.
BBC News editor Steve Herrmann said the sites were down for a full hour, but came back live at about midnight. "We haven't yet had a full technical debrief, but it's clear it was a major network problem," he said in a post on the BBC blog.
BBC staff suggested via Twitter it was down to a "faulty switch" or configuration problems. Others suggested that it looked like a problem with the BBC's DNS servers.
A BBC news report said Siemens, which provides the broadcaster's IT support, was looking into the problem, but its engineers got the sites back online by powering down equipment and turning it back on again.
During the outage, many took to Twitter to suggest more dramatic causes, including a distributed-denial-of-service (DDoS) attack by Anonymous.
"It's not often we get a message from the BBC's technical support teams saying, 'Total outage of all BBC websites'," Herrmann said.
"We'd like to apologise to everyone who couldn't get onto the BBC News website during that time," he added.
Update: The BBC's controller for digital distribution, Richard Cooper, has issued an explanation to the outage, saying multiple failures knocked the sites offline.
"Our systems are designed to be sufficiently resilient (multiple systems, and multiple data centres) to make an outage like this extremely unlikely," he said in a blog post. However, I'm afraid that last night we suffered multiple failures, with the result that the whole site went down."
"For the more technically minded, this was a failure in the systems that perform two functions," he added. "The first is the aggregation of network traffic from the BBC's hosting centres to the internet. The second is the announcement of 'routes' onto the internet that allows BBC Online to be 'found.' With both of these having failed, we really were down."
He said the BBC would take a "hard look" at its systems to ensure such a fault didn't happen again.
The Fall back position if IT managers everywhere.
"have you tried switching it off and back on again madam"
Funny how it works 9/10 times though.:-)
By Jaberwocky on 30 Mar 2011
Wondered why my streaming radio app could not get BBC last night ...
By mike0whit on 30 Mar 2011
I've know idea what he means by "announcement of 'routes'". I'd prefer a actual technical explanation.
By peterm2k on 30 Mar 2011
I agree with peterm2k, his description for the technically minded is very vague
By DaChimp on 30 Mar 2011
It is vague but I imagine he means either a dns problem or all of their routers started sending faulty updates to each other
By PCmaster1000 on 31 Mar 2011
1 hour ?
The website may have been "down" for an hour, but it was completely unusable for a great deal longer than that, more like five or six hours.
By howardabates1 on 1 Apr 2011
- Google Glass: mugger bait, pub problem and other lessons learned from two dangerous weeks
- Twitter, please don't fiddle with my feed
- How Satya Nadella can get some pay-raise karma
- Windows 10: a step back to go forward
- Michael Dell: Cloud infrastructure is the roads, bridges and highways of the 21st century
- How to check your identity hasn’t been sold to the hackers
- Tim Cook: this is how much TV has changed since the 70s
- Westminster wins the .London battle
- 20 years of PC Pro: from deep pan pizza to virtualisation
- Five reasons why the Apple Watch leaves me cold
- How to sell more ebooks on Amazon
- 10 ways to make your business more secure
- Top five VoIP mistakes
- How to add in-app purchasing to an iPhone, Android or Windows app
- Remote-control ransomware: TeamViewer and software hardball
- Why laptops with serial ports matter to the Internet of Things
- Make your mobile battery last longer
- Small steps into handling Big Data
- Nexus 5: does it really run stock Android?
- How to get broadband to a garden office