Cron is a great tool for running scheduled tasks on Unix servers. It has however some shortcomings when running applications in the cloud.
Cron is essentially a single server solution. When running applications on multiple servers, scheduled tasks can be divided into two categories:
- Those which should run on all servers of a given role, e.g. log rotation on application servers.
- Those which should be executed on a single server only, e.g. mysql data aggregation.
Handling the former is straightforward – you simply configure identical cron jobs on the servers. It is the latter which creates a challenge.
A simplistic way of handling category two tasks would be to to configure these type of cron jobs on one server only. The problem with this type of approach is that:
- It requires manual work to configure a unique server.
- If that server goes down, another server has to be set up manually or a cron specific failover mechanism needs to be put in place.
- The solution lacks symmetry and elegance.
What is needed is a better, fully automated and a scalable solution.
Requirements
- Cron jobs should be automatically configured on all servers upon launch.
- There needs to be an easy mechanism to prevent jobs which should be executed only once from running on multiple servers.
- Cron jobs’ load should be distributed across multiple servers.
Solution
Our solution (we are a Ruby on Rails shop) includes three components:
- whenever gem to configure cron jobs
- a custom database semaphore mechanism to guarantee that only one instance of a job will run
- delayed_job gem for load distribution
Whenever
Whenever gem provides a mechanism for defining and deploying cron jobs.
Jobs are defined in a config/schedule.rb file such as the one shown below:
# Use this file to easily define all of your cron jobs. # # It's helpful, but not entirely necessary to understand cron before proceeding. # http://en.wikipedia.org/wiki/Cron # Example: # # every 2.hours do # command "/usr/bin/some_great_command" # runner "MyModel.some_method" # rake "some:great:rake:task" # end # # every 4.days do # runner "AnotherModel.prune_old_records" # end # Learn more: http://github.com/javan/whenever every 1.day, :at => '1:00 am' do rake "data:verify" end
To deploy jobs, a crontab file is generated from schedule.rb by running the following command:
whenever --set environment=#{ENV['RAILS_ENV']} -w
This is incorporated into the server launch process.
The benefits of using Whenever include:
- Cron jobs are managed with the rest of an application using the common source code management tool and processes.
- Servers are automatically configured with the cron jobs upon launch.
Database Semaphore
The custom database semaphore class shown below provides a locking mechanism. (The code is also available at https://gist.github.com/1295372.)
class DatabaseSemaphore < ActiveRecord::Base
validates_presence_of :name, :message => "can't be blank"
def self.open?(name, lock_duration = 600)
# only one requestor can get open semaphore at a time
# sempahore can be locked in a closed position for lock_duration in seconds
semaphore_open = false
now = Time.now
# insert record if it does not exist yet
DatabaseSemaphore.create(:name => name, :locked_at => now - lock_duration) if !DatabaseSemaphore.find_by_name(name)
DatabaseSemaphore.transaction do
semaphore = DatabaseSemaphore.find_by_name(name, :lock => "LOCK IN SHARE MODE")
if semaphore and semaphore.locked_at <= now - lock_duration
semaphore.locked_at = now
semaphore_open = true if semaphore.save
end
end
return semaphore_open
rescue ActiveRecord::StatementInvalid => e
# deadlock
return false
end
end
class CreateDatabaseSemaphores < ActiveRecord::Migration
def self.up
create_table :database_semaphores do |t|
t.string :name
t.datetime :locked_at
t.timestamps
end
add_index :database_semaphores, :name, :unique => true
end
def self.down
drop_table :database_semaphores
end
end
Jobs defined in schedule.rb are added to rake files such as lib/tasks/data.rake file shown below:
namespace :data do
desc "Run data integrity check"
task :verify => [:environment] do
Delayed::Job.enqueue CronJob::VerifyData.new if DatabaseSemaphore.open?("VerifyData")
end
end
All servers with a cron job configured will attempt to run the job. Only the first to call the Database Semaphore will succeed and all other servers will promptly exit the job.
Delayed_job
Delayed_job gem provides a database based priority queue. Job skeleton is shown below:
module CronJob
class GenericJob
def initialize; end
def perform; end
end
class VerifyData < GenericJob
def perform
# job code goes here
rescue Exception => e
notify_hoptoad(e)
end
end
end
We are running delayed_job workers on all application servers. A worker picks up a job to execute when it is available. If multiple jobs are scheduled to run a the same time, they will be picked up by different workers and thus distributed across multiple servers.
Summary
With a small amount of effort we were able to set up a distributed, scalable, fault tolerant cron system.

Hi
Thanks for this. I am using Rails 2.3 and delayed_job plugin. Could you please advise where the code staring with “module CronJob” is to be placed?
Thanks
Please ignore first post. I added it to lib/ . But some relating questions I have. I am usning postgresql. And the line semaphore = DatabaseSemaphore.find_by_name(name, :lock => “LOCK IN SHARE MODE”) is not working. How can I make it work? Currently I solved it like below. But I dont know it is the right method
semaphore = DatabaseSemaphore.find_by_name(name)
if semaphore and semaphore.locked_at <= now – lock_duration
semaphore.lock!
semaphore.locked_at = now
semaphore_open = true if semaphore.save
end
And one more question is if there is data in DatabaseSemaphore , then semaphore_open always return false. So once the task executed it will never executing second time. Please help
Thanks
LOCK IN SHARE MODE is a MySQL clause. For PostgreSQL please try FOR UPDATE NOWAIT. You can find more information on row level locking at http://www.postgresql.org/docs/current/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE
Hi Bob,
Thanks for your reply. It worked with FOR UPDATE NOWAIT. Could you please answer my second question? ie, if there is data in DatabaseSemaphore , then semaphore_open always return false. Suppose I have a task hourly_alert_mail. Then I will call it like
Delayed::Job.enqueue CronJob::HourlyAlertMail.new if DatabaseSemaphore.open?(“hourly_alert_mail”)
First time it will work. But after 1 hour when it tries to execute the below line wont execute since data with name ‘hourly_alert_mail’ already exists and as a result semaphore_open will be false
DatabaseSemaphore.create(:name => name, :locked_at => now – lock_duration) if !DatabaseSemaphore.find_by_name(name)
So once the task executed it will never executing second time.
Please advise
sijo,
DatabaseSempahore row is indeed created only once. open? call however will return false only if the row is locked or if time elapsed since the last update is less than duration parameter value. duration defaults to 600 (10 min). If you try to make open? calls, they will fail for 10 min and then start returning true. If you have a job running every hour, a call to open? an hour later will return true on the first server making the call and false on all other servers.
To test, you can make lock duration much shorter so that you do not have to wait 10 min. For example use 5 sec duration:
DatabaseSemaphore.open?(“hourly_alert_mail”, 5)
Thank you
Bob
Dear Bob,
Great. Many thanks for this. Now got the idea.
Thanks