Announcing EncryptedCookieStore plugin for Rails 2.3
EncryptedCookieStore is similar to Ruby on Rails’s CookieStore (it saves session data in a cookie), but it uses encryption so that people can’t read what’s in the session data. This makes it possible to store sensitive data in the session.
EncryptedCookieStore is written for Rails 2.3. Other versions of Rails have not been tested.
Note: This is not ThinkRelevance’s EncryptedCookieStore. In the Rails 2.0 days they wrote an EncryptedCookieStore, but it seems their repository had gone defunct and their source code lost. This EncryptedCookieStore is written from scratch by Phusion.
Source code at http://github.com/FooBarWidget/encrypted_cookie_store
Installation and usage
First, install it:
./script/plugin install git://github.com/FooBarWidget/encrypted_cookie_store.git
Then edit config/initializers/session_store.rb and set your session store to EncryptedCookieStore:
ActionController::Base.session_store = EncryptedCookieStore
You need to set a few session options before EncryptedCookieStore is usable. You must set all options that CookieStore needs, plus an encryption key that EncryptedCookieStore needs. In session_store.rb:
ActionController::Base.session = {
# CookieStore options...
:key => '_session', # Name of the cookie which contains the session data.
:secret => 'b4589cc9...', # A secret string used to generate the checksum for
# the session data. Must be longer than 64 characters
# and be completely random.
# EncryptedCookieStore options...
:encryption_key => 'c306779f3...', # The encryption key. See below for notes.
}
The encryption key must be a hexadecimal string of exactly 32 bytes. It should be entirely random, because otherwise it can make the encryption weak.
You can generate a new encryption key by running rake secret:encryption_key. This command will output a random encryption key that you can then copy and paste into your environment.rb.
Operational details
Upon generating cookie data, EncryptedCookieStore generates a new, random initialization vector for encrypting the session data. This initialization vector is then encrypted with 128-bit AES in ECB mode. The session data is first protected with an HMAC to prevent tampering. The session data, along with the HMAC, are then encrypted using 256-bit AES in CFB mode with the generated initialization vector. This encrypted session data + HMAC are then stored, along with the encrypted initialization vector, into the cookie.
Upon unmarshalling the cookie data, EncryptedCookieStore decrypts the encrypted initialization vector and use that to decrypt the encrypted session data + HMAC. The decrypted session data is then verified against the HMAC.
The reason why HMAC verification occurs after decryption instead of before decryption is because we want to be able to detect changes to the encryption key and changes to the HMAC secret key, as well as migrations from CookieStore. Verifying after decryption allows us to automatically invalidate such old session cookies.
EncryptedCookieStore is quite fast: it is able to marshal and unmarshal a simple session object 5000 times in 8.7 seconds on a MacBook Pro with a 2.4 Ghz Intel Core 2 Duo (in battery mode). This is about 0.174 ms per marshal+unmarshal action. See rake benchmark in the EncryptedCookieStore sources for details.
EncryptedCookieStore vs other session stores
EncryptedCookieStore inherits all the benefits of CookieStore:
- It works out of the box without the need to setup a seperate data store (e.g. database table, daemon, etc).
- It does not require any maintenance. Old, stale sessions do not need to be manually cleaned up, as is the case with PStore and ActiveRecordStore.
- Compared to MemCacheStore, EncryptedCookieStore can “hold” an infinite number of sessions at any time.
- It can be scaled across multiple servers without any additional setup.
- It is fast.
- It is more secure than CookieStore because it allows you to store sensitive data in the session.
There are of course drawbacks as well:
- It is prone to session replay attacks. These kind of attacks are explained in the Ruby on Rails Security Guide. Therefore you should never store anything along the lines of is_admin in the session.
- You can store at most a little less than 4 KB of data in the session because that’s the size limit of a cookie. “A little less” because EncryptedCookieStore also stores a small amount of bookkeeping data in the cookie.
- Although encryption makes it more secure than CookieStore, there’s still a chance that a bug in EncryptedCookieStore renders it insecure. We welcome everyone to audit this code. There’s also a chance that weaknesses in AES are found in the near future which render it insecure. If you are storing *really* sensitive information in the session, e.g. social security numbers, or plans for world domination, then you should consider using ActiveRecordStore or some other server-side store.
JRuby: Illegal Key Size error
If you get this error (and your code works with MRI)…
Illegal key size
[...]/vendor/plugins/encrypted_cookie_store/lib/encrypted_cookie_store.rb:62:in `marshal'
…then it probably means you don’t have the “unlimited strength” policy files installed for your JVM. Download and install them. You probably have the “strong” version if they are already there.
As a workaround, you can change the cipher type from 256-bit AES to 128-bit by
inserting the following in config/initializer/session_store.rb:
EncryptedCookieStore.data_cipher_type = 'aes-128-cfb'.freeze # was 256
Please note that after changing to 128-bit AES, EncryptedCookieStore still requires a 32 bytes hexadecimal encryption key, although only half of the key is actually used.
Securely store passwords with bcrypt-ruby; now compatible with JRuby and Ruby 1.9
When writing web applications, or any application for that manner, any passwords should be stored securely. As a rule of thumb, one should never store passwords as clear text in the database for the following reasons:
- If the database ever gets leaked out, then all accounts are compromised until every single user resets his password. Imagine that you’re an MMORPG developer; leaking out the database with clear text passwords allows the attacker to delete every player’s characters.
- Many people use the same password for multiple sites. Imagine that the password stored in your database is also used for the user’s online banking account. Even if the database does not get leaked out, the password is still visible to the system administrator; this can be a privacy breach.
There are several “obvious” alternatives, which aren’t quite secure enough:
- Storing passwords as MD5/SHA1/$FAVORITE_ALGORITHM hashes
- These days MD5 can be brute-force cracked with relatively little effort. SHA1, SHA2 and other algorithms are harder to brute-force, but the attacker can still crack these hashes by using rainbow tables: precomputed tables of hashes with which the attacker can look up the input for a hash with relative ease. This rainbow table does not have to be very large: it just has to contain words from the dictionary, because many people use dictionary words as passwords.
Using plain hashes also makes it possible for an attacker to determine whether two users have the same password.
- Encrypting the password
- This is not a good idea because if the attacker was able to steal the database, then there’s a possibility that he’s able to steal the key file as well. Plus, the system administrator is able to read everybody’s passwords, unless he’s restricted access to either the key file or the database.
The solution is to store passwords as salted hashes. One calculates a salted hash as follows:
salted_hash = hashing_algorithm(salt + cleartext_password)
Here, salt is a random string. After calculating the salted hash, one should store the salted hash in the database, along with the (cleartext) salt. It is not necessary to keep the salt secret or to obfuscate it.
When a user logs in, one can verify his password by re-computing the salted hash and comparing it with the salted hash in the database:
salted_hash = hashing_algorithm(salt_from_database + user_provided_password)
if (salted_hash == salted_hash_from_database):
user is logged in
else:
password incorrect
The usage of the salt forces the attacker to either brute-force the hash or to use a ridiculously large rainbow table. In case of the latter, the sheer size of the required rainbow table can make it unpractical to generate. The larger the salt, the more difficult it becomes for the cracker to use rainbow tables.
However, even with salting, one should still not use SHA1, SHA2, Whirlpool or most other hashing algorithms because these algorithms are designed to be fast. Although brute forcing SHA2 and Whirlpool is hard, it’s still possible given sufficient resources. Instead, one should pick a hashing algorithm that’s designed to be slow so that brute forcing becomes unfeasible. Bcrypt is such a slow hashing algorithm. A speed comparison on a MacBook Pro with 2 Ghz Intel Core 2 Duo:
- SHA-1: 118600 hashes per second.
- Bcrypt (with cost = 10): 7.7 hashes per second.
Theoretically it would take 4*10^35 years for a single MacBook Pro core to crack an SHA-1 hash, assuming that the attacker does not harness any weaknesses in SHA-1. To crack a bcrypt hash one would need 6*10^39 years, or 10000 more times. Therefore, we recommend the use of bcrypt to store passwords securely.
There’s even a nice Ruby implementation of this algorithm: bcrypt-ruby! Up until recently, bcrypt-ruby was only available for MRI (“Matz Ruby Interpreter”, the C implementation that most people use). However, we’ve made it compatible with JRuby! The code can be found in our fork at Github. The current version also has issues with Ruby 1.9, which we’ve fixed as well. The author of bcrypt-ruby has already accepted our changes and will soon release a new version with JRuby and Ruby 1.9 support.
default_value_for Rails plugin: declaratively define default values for ActiveRecord models
Introduction
The default_value_for plugin allows one to define default values for ActiveRecord models in a declarative manner. For example:
class User < ActiveRecord::Base
default_value_for :name, "(no name)"
default_value_for :last_seen do
Time.now
end
end
u = User.new
u.name # => "(no name)"
u.last_seen # => Mon Sep 22 17:28:38 +0200 2008
We at Phusion use it for generating UUIDs for models.
Note: critics might be interested in the “When (not) to use default_value_for?” section. Please read on.
Installation
Install with:
./script/plugin install git://github.com/FooBarWidget/default_value_for.git
See also the AgileWebDevelopment Plugins entry.
If you like this plugin, then please consider donating and/or recommending us:
Hongli Lai
|
Ninh Bui
|
The default_value_for method
The default_value_for method is available in all ActiveRecord model classes.
The first argument is the name of the attribute for which a default value should be set. This may either be a Symbol or a String.
The default value itself may either be passed as the second argument:
default_value_for :age, 20
…or it may be passed as the return value of a block:
default_value_for :age do
if today_is_sunday?
20
else
30
end
end
If you pass a value argument, then the default value is static and never changes. However, if you pass a block, then the default value is retrieved by calling the block. This block is called not once, but every time a new record is instantiated and default values need to be filled in.
The latter form is especially useful if your model has a UUID column. One can generate a new, random UUID for every newly instantiated record:
class User < ActiveRecord::Base
default_value_for :uuid do
UuidGenerator.new.generate_uuid
end
end
User.new.uuid # => "51d6d6846f1d1b5c9a...."
User.new.uuid # => "ede292289e3484cb88...."
Note that record is passed to the block as an argument, in case you need it for whatever reason:
class User < ActiveRecord::Base
default_value_for :uuid do |x|
x # <--- a User object
UuidGenerator.new.generate_uuid
end
end
Rules
Instantiation of new record
Upon instantiating a new record, the declared default values are filled into the record. You’ve already seen this in the above examples.
Retrieval of existing record
Upon retrieving an existing record, the declared default values are not filled into the record. Consider the example with the UUID:
user = User.create user.uuid # => "529c91b8bbd3e..." user = User.find(user.id) # UUID remains unchanged because it's retrieved from the database! user.uuid # => "529c91b8bbd3e..."
Mass-assignment
If a certain attribute is being assigned via the model constructor’s mass-assignment argument, that the default value for that attribute will not be filled in:
user = User.new(:uuid => "hello") user.uuid # => "hello"
However, if that attribute is protected by attr_protected or attr_accessible, then it will be filled in:
class User < ActiveRecord::Base default_value_for :name, 'Joe' attr_protected :name end user = User.new(:name => "Jane") user.name # => "Joe"
Inheritance
Inheritance works as expected. All default values are inherited by the child
class:
class User < ActiveRecord::Base default_value_for :name, 'Joe' end class SuperUser < User end SuperUser.new.name # => "Joe"
Attributes that aren’t database columns
default_value_for also works with attributes that aren’t database columns. It works with anything for which there’s an assignment method:
# Suppose that your 'users' table only has a 'name' column.
class User < ActiveRecord::Base
default_value_for :name, 'Joe'
default_value_for :age, 20
default_value_for :registering, true
attr_accessor :age
def registering=(value)
@registering = true
end
end
user = User.new
user.age # => 20
user.instance_variable_get('@registering') # => true
Caveats
A conflict can occur if your model class overrides the ‘initialize’ method, because this plugin overrides ‘initialize’ as well to do its job.
class User < ActiveRecord::Base
def initialize # <-- this constructor causes problems
super(:name => 'Name cannot be changed in constructor')
end
end
We recommend you to alias chain your initialize method in models where you use default_value_for:
class User < ActiveRecord::Base
default_value_for :age, 20
def initialize_with_my_app
initialize_without_my_app(:name => 'Name cannot be changed in constructor')
end
alias_method_chain :initialize, :my_app
end
Also, stick with the following rules:
- There is no need to
alias_method_chainyour initialize method in models that don’t usedefault_value_for. - Make sure that
alias_method_chainis called after the lastdefault_value_foroccurance.
When (not) to use default_value_for?
You can also specify default values in the database schema. For example, you can specify a default value in a migration as follows:
create_table :users do |t| t.string :username, :null => false, :default => 'default username' t.integer :age, :null => false, :default => 20 t.timestamp :last_seen, :null => false, :default => Time.now end
This has the same effect as passing the default value as the second argument to default_value_for:
user = User.new user.username # => 'default username' user.age # => 20 user.timestamp # => Mon Sep 22 18:31:47 +0200 2008
It’s recommended that you use this over default_value_for whenever possible.
However, it’s not possible to specify a schema default for serialized columns. With default_value_for, you can:
class User < ActiveRecord::Base serialize :color default_value_for :color, [255, 0, 0] end
And if schema defaults don’t provide the flexibility that you need, then default_value_for is the perfect choice. For example, with default_value_for you could specify a per-environment default:
class User < ActiveRecord::Base
if RAILS_ENV == "development"
default_value_for :is_admin, true
end
end
Or, as you’ve seen in an earlier example, you can use default_value_for to generate a default random UUID:
class User < ActiveRecord::Base
default_value_for :uuid do
UuidGenerator.new.generate_uuid
end
end
Or you could use it to generate a timestamp that’s relative to the time at which the record is instantiated:
class User < ActiveRecord::Base
default_value_for :account_expires_at do
3.years.from_now
end
end
User.new.account_expires_at # => Mon Sep 22 18:43:42 +0200 2008
sleep(2)
User.new.account_expires_at # => Mon Sep 22 18:43:44 +0200 2008
Finally, it’s also possible to specify a default via an association:
# Has columns: 'name' and 'default_price'
class SuperMarket < ActiveRecord::Base
has_many :products
end
# Has columns: 'name' and 'price'
class Product < ActiveRecord::Base
belongs_to :super_market
default_value_for :price do |product|
product.super_market.default_price
end
end
super_market = SuperMarket.create(:name => 'Albert Zwijn', :default_price => 100)
soap = super_market.products.create(:name => 'Soap')
soap.price # => 100
What about before_validate/before_save?
True, before_validate and before_save does what we want if we’re only interested in filling in a default before saving. However, if one wants to be able to access the default value even before saving, then be prepared to write a lot of code. Suppose that we want to be able to access a new record’s UUID, even before it’s saved. We could end up with the following code:
# In the controller
def create
@user = User.new(params[:user])
@user.generate_uuid
email_report_to_admin("#{@user.username} with UUID #{@user.uuid} created.")
@user.save!
end
# Model
class User < ActiveRecord::Base
before_save :generate_uuid_if_necessary
def generate_uuid
self.uuid = ...
end
private
def generate_uuid_if_necessary
if uuid.blank?
generate_uuid
end
end
end
The need to manually call generate_uuid here is ugly, and one can easily forget to do that. Can we do better? Let’s see:
# Controller
def create
@user = User.new(params[:user])
email_report_to_admin("#{@user.username} with UUID #{@user.uuid} created.")
@user.save!
end
# Model
class User < ActiveRecord::Base
before_save :generate_uuid_if_necessary
def uuid
value = read_attribute('uuid')
if !value
value = generate_uuid
write_attribute('uuid', value)
end
value
end
# We need to override this too, otherwise User.new.attributes won't return
# a default UUID value. I've never tested with User.create() so maybe we
# need to override even more things.
def attributes
uuid
super
end
private
def generate_uuid_if_necessary
uuid # Reader method automatically generates UUID if it doesn't exist
end
end
That’s an awful lot of code. Using default_value_for is easier, don’t you think?
What about other plugins?
I’ve only been able to find 2 similar plugins:
- Default Value: http://agilewebdevelopment.com/plugins/default_value
- ActiveRecord Defaults: http://agilewebdevelopment.com/plugins/activerecord_defaults
Default Value appears to be unmaintained; its SVN link is broken. This leaves only ActiveRecord Defaults. However, it is semantically dubious, which leaves it wide open for corner cases. For example, it is not clearly specified what ActiveRecord Defaults will do when attributes are protected by attr_protected or attr_accessible. It is also not clearly specified what one is supposed to do if one needs a custom initialize method in the model.
I’ve taken my time to thoroughly document default_value_for’s behavior.
Credits
I’ve wanted such functionality for a while now and it baffled me that ActiveRecord doesn’t provide a clean way for me to specify default values. After reading http://groups.google.com/group/rubyonrails-core/browse_thread/thread/b509a2fe2b62ac5/3e8243fa1954a935, it became clear that someone needs to write a plugin. This is the result.
Thanks to Pratik Naik for providing the initial code snippet on which this plugin is based on: http://m.onkey.org/2007/7/24/how-to-set-default-values-in-your-model
If you like this plugin, then please consider donating and/or recommending us:
Hongli Lai
|
Ninh Bui
|
daemon_controller: a library for robust daemon management
Problem description and motivation
There is a lot of software (both Rails related and unrelated) which rely on servers or daemons. To name a few, in no particular order:
- Ultrasphinx, a Rails library for full-text searching. It makes use the Sphinx search software for indexing and searching. Indexing is done by running a command, while searching is done by querying the Sphinx search server.
- acts_as_ferret, another Rails library for full-text searching. It uses the Ferret search software. On production environments, it relies on the Ferret DRB server for both searching and indexing.
- BackgrounDRb, a Ruby job server and scheduler. Scheduling is done by contacting the BackgrounDRb daemon.
- mongrel_cluster, which starts and stops multiple Mongrel daemons.
Relying on daemons is quite common, but not without problems. Let’s go over some of them.
Starting daemons is a hassle
If you’ve used similar software, then you might agree that managing these daemons is a hassle. If you’re using BackgrounDRb, then the daemon must be running. Starting the daemon is not hard, but it is annoying. It’s also possible that the system administrator forgets to start the daemon. While configuring the system to automatically start a daemon at startup is not hard, it is an extra thing to do, and thus a hassle. We thought, why can’t such daemons be automatically started? Indeed, this won’t be possible if the daemon is to be run on a remote machine. But in by far the majority of use cases, the daemon runs on the same host as the Rails application. If a Rails application – or indeed, any application – is configured to contact a daemon on the local host, then why not start the daemon automatically on demand?
Daemon starting code may not be robust or efficient
We’ve also observed that people write daemon controlling code over and over again. Consider for example UltraSphinx, which provides a rake sphinx:daemon:start Rake task to start the daemon. The time that a daemon needs to initialize is variable, and depends on things such as the current system load. The Sphinx daemon usually needs less than a second before we can connect to it. However, the way different software handles starting of a daemon varies. We’ve observed that waiting a fixed amount of time is by far the most common way. For example, UltraSphinx’s daemon starting code looks like this:
system "searchd --config '#{Ultrasphinx::CONF_PATH}'"
sleep(4) # give daemon a chance to write the pid file
if ultrasphinx_daemon_running?
say "started successfully"
else
say "failed to start"
end
This is in no way a slam against UltraSphinx. However, if the daemon starts in 200 miliseconds, then the user who issued the start command will be waiting for 3.8 seconds for no good reason. This is not good for usability or for the user’s patience.
Startup error handling
Different software handles daemon startup errors in different ways. Some might not even handle errors at all. For example, consider mongrel_cluster. If there’s a typo in one of your application source files, then mongrel_cluster will not report the error. Instead, you have to check its log files to see what happened. This is not good for usability: many people will be wondering why they can’t connect to their Mongrel ports after issuing a mongrel_rails cluster::start — until they realize that they should read the log file. But the thing is, not everybody realizes this. And typing in an extra command to read the log file to check whether Mongrel started correctly, is just a big hassle. Why can’t the daemon startup code report such errors immediately?
Stale or corrupt Pid files
Suppose that you’re running a Mongrel cluster, and your server suddenly powers off because of a power outage. When the server is online again, it fails to start your Mongrel cluster because the PID file that it had written still exists, and wasn’t cleaned up properly (it’s supposed to be cleaned up when Mongrel exits). mongrel_cluster provides the –clean option to check whether the PID file is stale, and will automatically clean it up if it is. But not all daemon controlling software supports this. Why can’t all software check for stale PID files automatically?
Implementation problems
From the problem descriptions, it would become apparent that our wishlist is as follows. Why is this wishlist often not implemented? Let’s go over them.
- A daemon should be automatically started on demand, instead of requiring the user to manually start it.
-
The most obvious problems are related to concurrency. Suppose that your web application has a search box, and you want to start the search daemon if it isn’t already started, then connect to. Two problems will arise:
- Suppose that Rails process A is still starting the daemon. At the same time, another visitor tries to search something, and Rails process B notices that the daemon is not running. If B tries to start the daemon while it’s already being started by A, then things can go wrong. A robust daemon starter must ensure that only one process at the same time may start the daemon.
- It’s not a good idea to wait a fixed amount of time for the daemon to start, because you don’t know in advance how long it will take for it to start. For example, if you wait 2 seconds, then try to connect to the daemon, and the daemon isn’t done initializing yet, then it will seem as if the daemon failed to start.
These are the most probable reasons why people don’t try to write auto-starting code, and instead require the user to start the daemon manually.
These problems, as well as several less obvious problems, are closely related to the next few points.
- The daemon starter must wait until the daemon is done initializing, no longer and no shorter
-
Because only after the daemon is fully initialized, is it safe to connect to it. And because the user should not have to wait longer than he really has to.
During startup, the daemon will have to be continuously checked whether it’s done initializing or whether an error occured. Writing this code can be quite a hassle, which is why most people don’t do it.
- The daemon starter must report any startup errors
-
If the daemon starting command — e.g. "sphinx -c config_file.conf", "apachectl start" or "mongrel_rails cluster::start" — reports startup errors, then all is fine as long as the user is starting the command from a terminal. A problem occurs when the error occurs after the daemon has already gone into the background. Such errors are only reported to the log file. The daemon starter should also check the log file for any startup errors.
Furthermore, it should be able to raise startup errors as exceptions. This allows the the application to decide what to do with the error. For less experienced system administrators, the error might be displayed in the browser, allowing the administrators to become aware of the problem without forcing them to manually check the log files. Or the error might be emailed to a system administrator’s email address.
- The daemon starter must be able to correct stale or corrupted PID files
- If the PID file is stale, or for some reason has been corrupted, then the daemon starter must be able to cope with that. It should check whether the PID file contains a valid PID, and whether the PID exists.
Introducing daemon_controller
daemon_controller is a library for managing daemons in a robust manner. It is not a tool for managing daemons. Rather, it is a library which lets you write applications that manage daemons in a robust manner. For example, mongrel_cluster or UltraSphinx may be adapted to utilize this library, for more robust daemon management.
daemon_controller implements all items in the aforementioned wishlist. It provides the following functionalities:
- Starting a daemon
-
This ensures that no two processes can start the same daemon at the same time. It will also reports any startup errors, even errors that occur after the daemon has already gone into the background but before it has fully initialized yet. It also allows you to set a timeout, and will try to abort the daemon if it takes too long to initialize.
The start function won’t return until the daemon has been fully initialized, and is responding to connections. So if the start function has returned, then the daemon is guaranteed to be usable.
- Stopping a daemon
-
It will stop the daemon, but only if it’s already running. Any errors are reported. If the daemon isn’t already running, then it will silently succeed. Just like starting a daemon, you can set a timeout for stopping the daemon.
Like the start function, the stop function won’t return until the daemon is no longer running. This makes it save to immediately start the same daemon again after having stopped it, without worrying that the previous daemon instance hasn’t exited yet and might conflict with the newly started daemon instance.
- Connecting to a daemon, starting it if it isn’t running
- Every daemon has to be connected to using a different way. As a developer, you tell daemon_controller how to connect to the daemon. It will then attempt to do that, and if that fails, it will check whether the daemon is running. If it isn’t running, then it will automatically start the daemon, and attempt to connect to the daemon again. Failures are reported.
- Checking whether a daemon is running
- This information is retrieved from the PID file. It also checks whether the PID file is stale.
- All failures are reported via exceptions
- So that you can exactly determine how you want to handle errors.
- Lots and lots of error checking
- So that there are very few ways in which the system can screw up.
daemon_controller’s goal is to make daemon management less of a hassle, and as automatic and straightforward as possible.
What about Monit/God/rc.d/inittab/launchd/runit/daemon tools?
daemon_controller is not a replacement for Monit or God, or for tools like launchd, daemons tools, inittab, etc. Rather, it is a solution to the following problem:
Hongli: hey Ninh, do a ‘git pull’, I just implemented awesome searching features in our application!
Ninh: cool. *pulls from repository*
Ninh: hey Hongli, it doesn’t work.
Hongli: what do you mean, it doesn’t work?
Ninh: it says “connection refused”, or something
Hongli: oh I forgot to mention it, you have to run the Sphinx search daemon before it works. type “rake sphinx:daemon:start” to do that
Ninh: great. but now I get a different error. something about BackgrounDRb.
Hongli: oops, I forgot to mention this too. you need to start the BackgrounDRb server with “rake backgroundrb:start_server”
Ninh: okay, so every time I want to use this app, I have to type “rake sphinx:daemon:start”, “rake backgroundrb:start_server” and “./script/server”?
Hongli: yep
Imagine the above conversation becoming just:
Hongli: hey Ninh, do a ‘git pull’, I just implemented awesome searching features in our application!
Ninh: cool. *pulls from repository*
Ninh: awesome, it works!
This is not something that can be achieved with Monit/God. Monit/God are for monitoring daemons, auto-restarting them when they use too much resources. Nor can it be achieved runit/daemon tools/inittab/rc.d/etc because they all require an extra step, and probably root privileges as well. daemon_controller’s goal is to allow developers to implement daemon starting/stopping and daemon auto-starting code that’s robust. daemon_controller is intended to be used to make daemon-dependent applications Just Work(tm) without having to start the daemons manually.
Tutorial #1: controlling Apache
Suppose that you’re a Phusion Passenger developer, and you need to write tests for the Apache module. In particular, you want to test whether the different Phusion Passenger configuration directives are working as expected. Obviously, to test the Apache module, the Apache web server must be running. For every test, you will want the unit test suite to:
- Write an Apache configuration file, with the relevant configuration directive set to a specific value.
- Start Apache.
- Send an HTTP request to Apache and check whether the HTTP response matches your expectations.
- Stop Apache.
That can be done with the following code:
require 'daemon_controller'
File.open("apache.conf", "w") do |f|
f.write("PidFile apache.pid\n")
f.write("LogFile apache.log\n")
f.write("Listen 1234\n")
f.write(... other relevant configuration options ...)
end
controller = DaemonController.new(
:identifier => 'Apache web server',
:start_command => 'apachectl -f apache.conf -k start',
:ping_command => lambda { TCPSocket.new('localhost', 1234) },
:pid_file => 'apache.pid',
:log_file => 'apache.log',
:timeout => 25
)
controller.start
.... apache is now started ....
.... some test code here ....
controller.stop
The File.open line is obvious: it writes the relevant Apache configuration file.
The next line is for creating a new DaemonController object. We pass a human-readable identifier for this daemon (“Apache web server”) to the constructor. This is used for generating friendlier error messages.
We also tell it how Apache is supposed to be started (:start_command), how to check whether it can be connected to (:ping_command), and where its PID file and log file is. If Apache failed with an error during startup, then it will be reported. If Apache failed with an error after it has gone into the background, then that will be reported too: the given log file is monitored for new error messages.
Finally, a timeout of 25 seconds is given. If Apache doesn’t start within 25 seconds, then an exception will be raised.
The ping command is just a Proc which returns true or false. If the Proc raises Errno::ECONNREFUSED, then that’s also interpreted by DaemonController as meaning that the daemon isn’t responding yet.
After controller.start has returned, we can continue with the test case. At this point, we know that Apache is done with initializing.
When we’re done with Apache, we stop it with controller.stop. This does not return until Apache has fully stopped.
The cautious reader might notice that the socket returned by the ping command is never closed. That’s true, because DaemonController will close it automatically for us, if it notices that the ping command proc’s return value responds to #close.
From this example, it becomes apparent that for daemon_controller to work, you must know how to start the daemon, how to contact the daemon, and you must know where it will put its PID file and log file.
Tutorial #2: Sphinx indexing and search server management
We at Phusion are currently developing a web application with full-text search capabilities, and we’re using Sphinx for this purpose. We want to make the lives of our developers and our system administrators as easy as possible, so that there’s little room for human screw-up, and so we’ve developed this library. Our Sphinx search daemon is completely managed through this library and is automatically started on demand.
Our Sphinx config file is generated from an ERB template. This ERB template writes different values in the config file, depending on whether we’re in development, test or production mode. We will want to regenerate this config file every time, just before we start the search daemon.
But there’s more. The search daemon will fail if there is no search index. If a new developer has just checked out the application’s source code, then there is no search index yet. We don’t want him to go through the pain of having to generate the index manually. (That said, it isn’t that much of a pain, but it’s just yet-another-thing to do, which can and should be automated.) So before starting the daemon, we will also want to check whether the index exists. If not, then we’ll generate it, and then start the daemon. Of course, no two Rails processes may generate the config file or the index at the same time.
When querying the search server, we will want to automatically start it if it isn’t running.
This can be achieved with the following code:
require 'daemon_controller'
class SearchServer
SEARCH_SERVER_PORT = 1234
def initialize
@controller = DaemonController.new(
:identifier => 'Sphinx search server',
:start_command => "searchd -c config/sphinx.conf",
:before_start => method(:before_start),
:ping_command => lambda { TCPSocket.new('localhost', SEARCH_SERVER_PORT) },
:pid_file => 'tmp/pids/sphinx.pid',
:log_file => 'log/sphinx.log')
end
def query(search_terms)
socket = @controller.connect do
TCPSocket.new('localhost', SEARCH_SERVER_PORT)
end
send_query(socket, search_terms)
return retrieve_results(socket)
end
private
def before_start
generate_configuration_file
if !index_exists?
generate_index
end
end
...
end
Notice the :before_start option. We pass a block of code which is to be run, just before the daemon is started. This block, along with starting the daemon, is completely serialized. That is, if you’re inside the block, then it’s guaranteed that no other process is running this block at the same time as well.
The query method is the method for querying the search server with search terms. It returns a list of result. It uses DaemonController#connect: one passes a block of that method, which contains code for connecting to the daemon. If the block returns nil, or if it raises Errno::ECONNREFUSED, then DaemonController#connect will automatically take care of auto-starting the Sphinx daemon for us.
A little bit of history
The issue of managing daemons has been a thorn in our eyes for quite some time now. Until now, we’ve solved this problem by equipping any daemons that we write with the ability to gracefully handle being concurrently started, the ability to initialize as much as possible before forking into the background, etc. However, equipping all this robustness into our code over and over is a lot of work. We’ve considered documenting a standard behavior for daemons so that they can properly support auto-starting and such.
However, we’ve recently realized that that’s probably a futile effort. Convincing everybody to write a lot of code for a bit more robustness is probably not realistic. So we took the pragmatic approach and developed a library which adds more robustness on top of daemons’ existing behavior. And thus, daemon_controller was born. It is a little bit less efficient compared to when the daemon is designed from the beginning with such abilities in mind, but it’s compatible with virtually all daemons, and is easy to use.
Availability
The source code is available on Github: http://github.com/FooBarWidget/daemon_controller/tree/master
Detailed API documentation is available in the form of inline comments in lib/daemon_controller.rb.
Hongli Lai
Phusion. All rights reserved.