The perfect open-source task scheduler
Posted on 26 Feb 2010 at 14:47
An epic quest to find the perfect open-source task scheduler for his company tests Simon Brock's resilience
We have ever more regular tasks that need to be run on our servers in a reliable and resilient way, but resilience is harder to achieve than reliability.
I can schedule scripts on our Unix system using cron, and on our Windows systems via Task Scheduler, but neither of these are resilient – if the machine running those cron tasks crashes, my tasks won’t get executed.
The failed machine may come back online reasonably fast – and were I running an enhanced cron such as “anacron” it might catch up on the jobs it missed – but a task scheduled to run every hour probably wouldn’t get done on time, and performing a catch-up might make matters worse. What I need is some kind of failover so that if one machine fails, another will run its tasks on time.
We’ve implemented such a system for a client on a pair of Linux machines, by using heartbeat and the Disaster Recovery Block Device (DRBD). This client runs two sets of services out of these two machines, one set running on each machine under normal conditions.
Should one machine get taken out of service, its services will fail over onto the other. We use heartbeat to enable each machine to monitor the other, so that in case of failure it can take over its services. Part of this service switch involves a simple cluster file system implemented using the DRBD.
We use heartbeat to enable each machine to monitor the other, so that in case of failure it can take over its services. Part of this service switch involves a simple cluster file system implemented using the DRBD
Under normal operation each machine reads and writes its own copy of each service’s files, and DRBD copies those changes to the other machine, so that if a failover occurs the failed-over services will have access to up-to-date copies of their files (normally each machine sees only the files for its own services and can’t see files on the other machine).
This combination of heartbeat and DRBD works well for services such as web serving (the primary application), but there was also a need to have various regular tasks failover from one machine to another. We implemented this by putting the same crontab onto each machine, so that each knows what tasks it might have to run in the event of failover and has their scripts in its file system.
Then we prefix each task with a script that checks whether the service task is present: since each machine normally sees only the files for its own services, the other’s task files will only be visible if a failover is in progress. This rather complex schema enables us to implement a form of resilient cron, but it isn’t without its problems.
First of all, the system is complicated, and second of all it isn’t human-proof: when a new task is required you must remember to add it to both machines. Third, there are all sorts of boundary conditions relating to what happens to tasks that are running when a failure occurs, or worse still when a failure occurs and the services “fail back” in the middle of a scheduled task.
Finally, it isn’t scalable – it can fail services from one node to another, but for a large server farm of identical machines with potential multiple failures it wouldn’t work.
This last point is where you see the real problem with a cron-based solution. Cron and Task Scheduler were only designed to run regular housekeeping tasks on a single machine, but when you’re providing a service from a collection of machines then every single machine is a single point of failure.
From around the web
Simon Brock
Simon runs UK-based Wide Area Communications, the company behind websites such as The Spectator. He's a contributing editor to PC Pro and a fervent believer in open-source technologies..
advertisement
- Why virtualisation hasn't slowed the growth of data
- How to make Google AdWords work for your business
- The curse of sloppily written software
- Paying for your crimes with Bitcoin
- Behind the scenes: tech support for Formula 1
- The security risk of fat fingers
- Why Windows Phone 7 isn't quite ready for business
- When will Microsoft stop fiddling with Windows 8?
- Flash down the pan?
- Metro Style apps vs desktop applications
- Chrome's shine getting lost in translation
- BytePac: the cardboard hard disk enclosure
- How tech loosens our grip on reality
- Hokum watch: Safer Internet Day
- Why I'm deleting Adobe from my PC
- Prepare to be patronised: it's Safer Internet Day
- Dear Sony, Samsung and every other tech company in the world: stop trying to be Apple
- Will Apple's Final Cut Pro X update placate the pros?
- Smartr Contacts for iPhone review
- Switching to Office 365's Outlook Web App
- VeriSign slammed for security breach cover-up
- SAP willing to share HANA with Oracle
- Why using a tablet could harm your health
- New RIM boss: no need for drastic change
- RIM founders fall on their swords
- Slow economy helps boost Red Hat revenue by 23%
- Google+ pages get multiple admins
- One in five companies lack card industry compliance
- Oil industry warns hacking attacks could kill
- British workers fear email monitoring
advertisement
