Skip to navigation

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.

Real World Computing

USB drives (me to distraction)

Posted on 21 Sep 2005 at 17:33

Steve Cassidy demonstrates the variety of threats posed by the increasingly popular USB flash drive

It's these sorts of pressures that have brought us the hot-swap concept. In theory, on a modern HP, Dell or IBM server, you can hot-swap out almost all of the system components. Hard disks in caddies we're all familiar with; hot-swap power supplies are another easily comprehended part of the picture; hot-swap PCI card cages come rather more exotically presented. Hot-swap fans sound faintly absurd; surely the fan comes with the power supply, so aren't they already swappable, and wouldn't you be crazy to go in sticking your fingers near to those whirling blades?

In reality, the effect of hot-swap design on servers is a little different. This week I encountered a long-serving Compaq server of the hot-swap persuasion, which seemed to suffer a bit from monitoring senility. By this I mean that while all of its parts were working absolutely fine, the monitoring system that spreads sensors throughout all the components had become unruly and would report failures (and thus the need to do some hot-swapping) where none actually existed. This irritating habit brought us the hot-swap power supply that reported itself dead, then sprang back into life as soon as it was moved from one PSU bay to the next. We'd already seen the 'fan failure' problem, where a fan jiggled just loose enough in its cage to make the hot-swap alerter think it had gone down. While the machine is running, this isn't that big a problem because most of the time the temperature sensors that can trigger a system shutdown are independent of the fan rotation sensors, so fan sensor failure doesn't make the system turn off. But this server (in common with almost all the servers running a mix of applications I've seen) reboots every night, and the startup BIOS won't let the machine complete a reboot if it believes a fan is offline.

This behaviour gets even wilder when it comes to drives. There doesn't seem to be any single RAID controller and drive combination that's immune to having one of its member drives rated as 'degraded' for completely invisible or unguessable reasons. If I see a winking red light on a hot-swap drive array, my first reaction is to whip the drive out - but not to twist it to the horizontal if it's mounted vertically, which is a good way to force the heads onto the surface of the still-spinning disk - wait for it to stop spinning, then slot it straight back in. If the controller's any good, it will notice that data on the rest of the RAID has changed while the rogue drive was out and initiate an automatic rebuild, and most often the drive in question simply restarts perfectly and continues indefinitely. If it won't restart after such treatment then it really is genuinely dead and should be retired.
So the rule when faced with a reported failure in some hot-swap kit is to first suspect the hot-swap reporting mechanism before you suspect the kit itself, and to do a quick juggle of components to see if the fault recurs. Only then believe what the monitoring system is saying.

1 2 3 4
Be the first to comment this article

You need to Login or Register to comment.

(optional)

advertisement

Latest Real World Computing
Latest Blog Posts Subscribe to our RSS Feeds
Latest News Stories Subscribe to our RSS Feeds
Latest Reviews Subscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2008