Cron is a great tool for running scheduled tasks on Unix servers. It has however some shortcomings when running applications in the cloud.

Cron is essentially a single server solution. When running applications on multiple servers, scheduled tasks can be divided into two categories:

  1. Those which should run on all servers of a given role, e.g. log rotation on application servers.
  2. Those which should be executed on a single server only, e.g. mysql data aggregation.

Handling the former is straightforward – you simply configure identical cron jobs on the servers. It is the latter which creates a challenge.

A simplistic way of handling category two tasks would be to to configure these type of cron jobs on one server only. The problem with this type of approach is that:

  1. It requires manual work to configure a unique server.
  2. If that server goes down, another server has to be set up manually or a cron specific failover mechanism needs to be put in place.
  3. The solution lacks symmetry and elegance.

What is needed is a better, fully automated and a scalable solution.

Requirements

  1. Cron jobs should be automatically configured on all servers upon launch.
  2. There needs to be an easy mechanism to prevent jobs which should be executed only once from running on multiple servers.
  3. Cron jobs’ load should be distributed across multiple servers.

Solution

Our solution (we are a Ruby on Rails shop) includes three components:

  1. whenever gem to configure cron jobs
  2. a custom database semaphore mechanism to guarantee that only one instance of a job will run
  3. delayed_job gem for load distribution

Whenever

Whenever gem provides a mechanism for defining and deploying cron jobs.

Jobs are defined in a config/schedule.rb file such as the one shown below:

# Use this file to easily define all of your cron jobs.
#
# It's helpful, but not entirely necessary to understand cron before proceeding.
# http://en.wikipedia.org/wiki/Cron

# Example:
#
# every 2.hours do
#   command "/usr/bin/some_great_command"
#   runner "MyModel.some_method"
#   rake "some:great:rake:task"
# end
#
# every 4.days do
#   runner "AnotherModel.prune_old_records"
# end

# Learn more: http://github.com/javan/whenever

every 1.day, :at => '1:00 am' do
  rake "data:verify"
end

To deploy jobs, a crontab file is generated from schedule.rb by running the following command:

whenever --set environment=#{ENV['RAILS_ENV']} -w

This is incorporated into the server launch process.

The benefits of using Whenever include:

  1. Cron jobs are managed with the rest of an application using the common source code management tool and processes.
  2. Servers are automatically configured with the cron jobs upon launch.

Database Semaphore

The custom database semaphore class shown below provides a locking mechanism. (The code is also available at https://gist.github.com/1295372.)

class DatabaseSemaphore < ActiveRecord::Base
  validates_presence_of :name, :message => "can't be blank"

  def self.open?(name, lock_duration = 600)
  # only one requestor can get open semaphore at a time
  # sempahore can be locked in a closed position for lock_duration in seconds
    semaphore_open = false
    now = Time.now
    # insert record if it does not exist yet
    DatabaseSemaphore.create(:name => name, :locked_at => now - lock_duration) if !DatabaseSemaphore.find_by_name(name)
    DatabaseSemaphore.transaction do
      semaphore = DatabaseSemaphore.find_by_name(name, :lock => "LOCK IN SHARE MODE")
      if semaphore and semaphore.locked_at <= now - lock_duration
        semaphore.locked_at = now
        semaphore_open = true if semaphore.save
      end
    end
    return semaphore_open
  rescue ActiveRecord::StatementInvalid => e
    # deadlock
    return false
  end
end

class CreateDatabaseSemaphores < ActiveRecord::Migration
  def self.up
    create_table :database_semaphores do |t|
      t.string :name
      t.datetime :locked_at

      t.timestamps
    end
    add_index :database_semaphores, :name, :unique => true
  end

  def self.down
    drop_table :database_semaphores
  end
end

Jobs defined in schedule.rb are added to rake files such as lib/tasks/data.rake file shown below:

namespace :data do
  desc "Run data integrity check"
  task :verify => [:environment] do
    Delayed::Job.enqueue CronJob::VerifyData.new if DatabaseSemaphore.open?("VerifyData")
  end
end

All servers with a cron job configured will attempt to run the job. Only the first to call the Database Semaphore will succeed and all other servers will promptly exit the job.

Delayed_job

Delayed_job gem provides a database based priority queue. Job skeleton is shown below:

module CronJob
  class GenericJob
    def initialize; end
    def perform; end
  end

  class VerifyData < GenericJob
    def perform
      # job code goes here
    rescue Exception => e
      notify_hoptoad(e)
    end
  end
end

We are running delayed_job workers on all application servers. A worker picks up a job to execute when it is available. If multiple jobs are scheduled to run a the same time, they will be picked up by different workers and thus distributed across multiple servers.

Summary

With a small amount of effort we were able to set up a distributed, scalable, fault tolerant cron system.

Advertisements