Phusion white papers Phusion overview

daemon_controller: a library for robust daemon management

By Hongli Lai on August 25th, 2008

Problem description and motivation

There is a lot of software (both Rails related and unrelated) which rely on servers or daemons. To name a few, in no particular order:

  • Ultrasphinx, a Rails library for full-text searching. It makes use the Sphinx search software for indexing and searching. Indexing is done by running a command, while searching is done by querying the Sphinx search server.
  • acts_as_ferret, another Rails library for full-text searching. It uses the Ferret search software. On production environments, it relies on the Ferret DRB server for both searching and indexing.
  • BackgrounDRb, a Ruby job server and scheduler. Scheduling is done by contacting the BackgrounDRb daemon.
  • mongrel_cluster, which starts and stops multiple Mongrel daemons.

Relying on daemons is quite common, but not without problems. Let’s go over some of them.

Starting daemons is a hassle

If you’ve used similar software, then you might agree that managing these daemons is a hassle. If you’re using BackgrounDRb, then the daemon must be running. Starting the daemon is not hard, but it is annoying. It’s also possible that the system administrator forgets to start the daemon. While configuring the system to automatically start a daemon at startup is not hard, it is an extra thing to do, and thus a hassle. We thought, why can’t such daemons be automatically started? Indeed, this won’t be possible if the daemon is to be run on a remote machine. But in by far the majority of use cases, the daemon runs on the same host as the Rails application. If a Rails application – or indeed, any application – is configured to contact a daemon on the local host, then why not start the daemon automatically on demand?

Daemon starting code may not be robust or efficient

We’ve also observed that people write daemon controlling code over and over again. Consider for example UltraSphinx, which provides a rake sphinx:daemon:start Rake task to start the daemon. The time that a daemon needs to initialize is variable, and depends on things such as the current system load. The Sphinx daemon usually needs less than a second before we can connect to it. However, the way different software handles starting of a daemon varies. We’ve observed that waiting a fixed amount of time is by far the most common way. For example, UltraSphinx’s daemon starting code looks like this:

system "searchd --config '#{Ultrasphinx::CONF_PATH}'"
sleep(4) # give daemon a chance to write the pid file
if ultrasphinx_daemon_running?
   say "started successfully"
else
   say "failed to start"
end

This is in no way a slam against UltraSphinx. However, if the daemon starts in 200 miliseconds, then the user who issued the start command will be waiting for 3.8 seconds for no good reason. This is not good for usability or for the user’s patience.

Startup error handling

Different software handles daemon startup errors in different ways. Some might not even handle errors at all. For example, consider mongrel_cluster. If there’s a typo in one of your application source files, then mongrel_cluster will not report the error. Instead, you have to check its log files to see what happened. This is not good for usability: many people will be wondering why they can’t connect to their Mongrel ports after issuing a mongrel_rails cluster::start — until they realize that they should read the log file. But the thing is, not everybody realizes this. And typing in an extra command to read the log file to check whether Mongrel started correctly, is just a big hassle. Why can’t the daemon startup code report such errors immediately?

Stale or corrupt Pid files

Suppose that you’re running a Mongrel cluster, and your server suddenly powers off because of a power outage. When the server is online again, it fails to start your Mongrel cluster because the PID file that it had written still exists, and wasn’t cleaned up properly (it’s supposed to be cleaned up when Mongrel exits). mongrel_cluster provides the –clean option to check whether the PID file is stale, and will automatically clean it up if it is. But not all daemon controlling software supports this. Why can’t all software check for stale PID files automatically?

Implementation problems

From the problem descriptions, it would become apparent that our wishlist is as follows. Why is this wishlist often not implemented? Let’s go over them.

A daemon should be automatically started on demand, instead of requiring the user to manually start it.

The most obvious problems are related to concurrency. Suppose that your web application has a search box, and you want to start the search daemon if it isn’t already started, then connect to. Two problems will arise:

  • Suppose that Rails process A is still starting the daemon. At the same time, another visitor tries to search something, and Rails process B notices that the daemon is not running. If B tries to start the daemon while it’s already being started by A, then things can go wrong. A robust daemon starter must ensure that only one process at the same time may start the daemon.
  • It’s not a good idea to wait a fixed amount of time for the daemon to start, because you don’t know in advance how long it will take for it to start. For example, if you wait 2 seconds, then try to connect to the daemon, and the daemon isn’t done initializing yet, then it will seem as if the daemon failed to start.

These are the most probable reasons why people don’t try to write auto-starting code, and instead require the user to start the daemon manually.

These problems, as well as several less obvious problems, are closely related to the next few points.

The daemon starter must wait until the daemon is done initializing, no longer and no shorter

Because only after the daemon is fully initialized, is it safe to connect to it. And because the user should not have to wait longer than he really has to.

During startup, the daemon will have to be continuously checked whether it’s done initializing or whether an error occured. Writing this code can be quite a hassle, which is why most people don’t do it.

The daemon starter must report any startup errors

If the daemon starting command — e.g. "sphinx -c config_file.conf", "apachectl start" or "mongrel_rails cluster::start" — reports startup errors, then all is fine as long as the user is starting the command from a terminal. A problem occurs when the error occurs after the daemon has already gone into the background. Such errors are only reported to the log file. The daemon starter should also check the log file for any startup errors.

Furthermore, it should be able to raise startup errors as exceptions. This allows the the application to decide what to do with the error. For less experienced system administrators, the error might be displayed in the browser, allowing the administrators to become aware of the problem without forcing them to manually check the log files. Or the error might be emailed to a system administrator’s email address.

The daemon starter must be able to correct stale or corrupted PID files
If the PID file is stale, or for some reason has been corrupted, then the daemon starter must be able to cope with that. It should check whether the PID file contains a valid PID, and whether the PID exists.

Introducing daemon_controller

daemon_controller is a library for managing daemons in a robust manner. It is not a tool for managing daemons. Rather, it is a library which lets you write applications that manage daemons in a robust manner. For example, mongrel_cluster or UltraSphinx may be adapted to utilize this library, for more robust daemon management.

daemon_controller implements all items in the aforementioned wishlist. It provides the following functionalities:

Starting a daemon

This ensures that no two processes can start the same daemon at the same time. It will also reports any startup errors, even errors that occur after the daemon has already gone into the background but before it has fully initialized yet. It also allows you to set a timeout, and will try to abort the daemon if it takes too long to initialize.

The start function won’t return until the daemon has been fully initialized, and is responding to connections. So if the start function has returned, then the daemon is guaranteed to be usable.

Stopping a daemon

It will stop the daemon, but only if it’s already running. Any errors are reported. If the daemon isn’t already running, then it will silently succeed. Just like starting a daemon, you can set a timeout for stopping the daemon.

Like the start function, the stop function won’t return until the daemon is no longer running. This makes it save to immediately start the same daemon again after having stopped it, without worrying that the previous daemon instance hasn’t exited yet and might conflict with the newly started daemon instance.

Connecting to a daemon, starting it if it isn’t running
Every daemon has to be connected to using a different way. As a developer, you tell daemon_controller how to connect to the daemon. It will then attempt to do that, and if that fails, it will check whether the daemon is running. If it isn’t running, then it will automatically start the daemon, and attempt to connect to the daemon again. Failures are reported.
Checking whether a daemon is running
This information is retrieved from the PID file. It also checks whether the PID file is stale.
All failures are reported via exceptions
So that you can exactly determine how you want to handle errors.
Lots and lots of error checking
So that there are very few ways in which the system can screw up.

daemon_controller’s goal is to make daemon management less of a hassle, and as automatic and straightforward as possible.

What about Monit/God/rc.d/inittab/launchd/runit/daemon tools?

daemon_controller is not a replacement for Monit or God, or for tools like launchd, daemons tools, inittab, etc. Rather, it is a solution to the following problem:

Hongli: hey Ninh, do a ‘git pull’, I just implemented awesome searching features in our application!
Ninh: cool. *pulls from repository*
Ninh: hey Hongli, it doesn’t work.
Hongli: what do you mean, it doesn’t work?
Ninh: it says “connection refused”, or something
Hongli: oh I forgot to mention it, you have to run the Sphinx search daemon before it works. type “rake sphinx:daemon:start” to do that
Ninh: great. but now I get a different error. something about BackgrounDRb.
Hongli: oops, I forgot to mention this too. you need to start the BackgrounDRb server with “rake backgroundrb:start_server”
Ninh: okay, so every time I want to use this app, I have to type “rake sphinx:daemon:start”, “rake backgroundrb:start_server” and “./script/server”?
Hongli: yep

Imagine the above conversation becoming just:

Hongli: hey Ninh, do a ‘git pull’, I just implemented awesome searching features in our application!
Ninh: cool. *pulls from repository*
Ninh: awesome, it works!

This is not something that can be achieved with Monit/God. Monit/God are for monitoring daemons, auto-restarting them when they use too much resources. Nor can it be achieved runit/daemon tools/inittab/rc.d/etc because they all require an extra step, and probably root privileges as well. daemon_controller’s goal is to allow developers to implement daemon starting/stopping and daemon auto-starting code that’s robust. daemon_controller is intended to be used to make daemon-dependent applications Just Work(tm) without having to start the daemons manually.

Tutorial #1: controlling Apache

Suppose that you’re a Phusion Passenger developer, and you need to write tests for the Apache module. In particular, you want to test whether the different Phusion Passenger configuration directives are working as expected. Obviously, to test the Apache module, the Apache web server must be running. For every test, you will want the unit test suite to:

  1. Write an Apache configuration file, with the relevant configuration directive set to a specific value.
  2. Start Apache.
  3. Send an HTTP request to Apache and check whether the HTTP response matches your expectations.
  4. Stop Apache.

That can be done with the following code:

require 'daemon_controller'

File.open("apache.conf", "w") do |f|
   f.write("PidFile apache.pid\n")
   f.write("LogFile apache.log\n")
   f.write("Listen 1234\n")
   f.write(... other relevant configuration options ...)
end

controller = DaemonController.new(
   :identifier    => 'Apache web server',
   :start_command => 'apachectl -f apache.conf -k start',
   :ping_command  => lambda { TCPSocket.new('localhost', 1234) },
   :pid_file      => 'apache.pid',
   :log_file      => 'apache.log',
   :timeout       => 25
)
controller.start

.... apache is now started ....
.... some test code here ....

controller.stop

The File.open line is obvious: it writes the relevant Apache configuration file.

The next line is for creating a new DaemonController object. We pass a human-readable identifier for this daemon (“Apache web server”) to the constructor. This is used for generating friendlier error messages.
We also tell it how Apache is supposed to be started (:start_command), how to check whether it can be connected to (:ping_command), and where its PID file and log file is. If Apache failed with an error during startup, then it will be reported. If Apache failed with an error after it has gone into the background, then that will be reported too: the given log file is monitored for new error messages.
Finally, a timeout of 25 seconds is given. If Apache doesn’t start within 25 seconds, then an exception will be raised.

The ping command is just a Proc which returns true or false. If the Proc raises Errno::ECONNREFUSED, then that’s also interpreted by DaemonController as meaning that the daemon isn’t responding yet.

After controller.start has returned, we can continue with the test case. At this point, we know that Apache is done with initializing.
When we’re done with Apache, we stop it with controller.stop. This does not return until Apache has fully stopped.

The cautious reader might notice that the socket returned by the ping command is never closed. That’s true, because DaemonController will close it automatically for us, if it notices that the ping command proc’s return value responds to #close.

From this example, it becomes apparent that for daemon_controller to work, you must know how to start the daemon, how to contact the daemon, and you must know where it will put its PID file and log file.

Tutorial #2: Sphinx indexing and search server management

We at Phusion are currently developing a web application with full-text search capabilities, and we’re using Sphinx for this purpose. We want to make the lives of our developers and our system administrators as easy as possible, so that there’s little room for human screw-up, and so we’ve developed this library. Our Sphinx search daemon is completely managed through this library and is automatically started on demand.

Our Sphinx config file is generated from an ERB template. This ERB template writes different values in the config file, depending on whether we’re in development, test or production mode. We will want to regenerate this config file every time, just before we start the search daemon.
But there’s more. The search daemon will fail if there is no search index. If a new developer has just checked out the application’s source code, then there is no search index yet. We don’t want him to go through the pain of having to generate the index manually. (That said, it isn’t that much of a pain, but it’s just yet-another-thing to do, which can and should be automated.) So before starting the daemon, we will also want to check whether the index exists. If not, then we’ll generate it, and then start the daemon. Of course, no two Rails processes may generate the config file or the index at the same time.

When querying the search server, we will want to automatically start it if it isn’t running.

This can be achieved with the following code:

require 'daemon_controller'

class SearchServer
   SEARCH_SERVER_PORT = 1234

   def initialize
      @controller = DaemonController.new(
         :identifier => 'Sphinx search server',
         :start_command => "searchd -c config/sphinx.conf",
         :before_start => method(:before_start),
         :ping_command => lambda { TCPSocket.new('localhost', SEARCH_SERVER_PORT) },
         :pid_file => 'tmp/pids/sphinx.pid',
         :log_file => 'log/sphinx.log')
   end
   
   def query(search_terms)
      socket = @controller.connect do
         TCPSocket.new('localhost', SEARCH_SERVER_PORT)
      end
      send_query(socket, search_terms)
      return retrieve_results(socket)
   end
   
private
   def before_start
      generate_configuration_file
      if !index_exists?
         generate_index
      end
   end
   
   ...
end

Notice the :before_start option. We pass a block of code which is to be run, just before the daemon is started. This block, along with starting the daemon, is completely serialized. That is, if you’re inside the block, then it’s guaranteed that no other process is running this block at the same time as well.

The query method is the method for querying the search server with search terms. It returns a list of result. It uses DaemonController#connect: one passes a block of that method, which contains code for connecting to the daemon. If the block returns nil, or if it raises Errno::ECONNREFUSED, then DaemonController#connect will automatically take care of auto-starting the Sphinx daemon for us.

A little bit of history

The issue of managing daemons has been a thorn in our eyes for quite some time now. Until now, we’ve solved this problem by equipping any daemons that we write with the ability to gracefully handle being concurrently started, the ability to initialize as much as possible before forking into the background, etc. However, equipping all this robustness into our code over and over is a lot of work. We’ve considered documenting a standard behavior for daemons so that they can properly support auto-starting and such.

However, we’ve recently realized that that’s probably a futile effort. Convincing everybody to write a lot of code for a bit more robustness is probably not realistic. So we took the pragmatic approach and developed a library which adds more robustness on top of daemons’ existing behavior. And thus, daemon_controller was born. It is a little bit less efficient compared to when the daemon is designed from the beginning with such abilities in mind, but it’s compatible with virtually all daemons, and is easy to use.

Availability

The source code is available on Github: http://github.com/FooBarWidget/daemon_controller/tree/master
Detailed API documentation is available in the form of inline comments in lib/daemon_controller.rb.

  • Jonno

    With regards for keep daemons alive and ‘healthy’, does this provide any great benefit over monit?

  • http://www.phusion.nl/ hongli

    Jonno, this is not a replacement for Monit. This is a library for (auto-)starting and stopping daemons, and for checking whether it’s still running. It is not a daemon monitoring service. We use it to implement on-demand daemon auto-starting features in our applications.

    Though you could use this library to write something like Monit.

    And even though it’s not meant to replace Monit, it will automatically restart the daemon if it crashed (provided that you use DaemonController#connect to connect to the daemon).

    See also the “What about Monit/God?” subsection.

  • Yves

    I tried to use it in combination with acts_as_ferret as an initializer:

    require 'daemon_controller'
    
    conf = YAML.load_file("#{RAILS_ROOT}/config/ferret_server.yml")[RAILS_ENV]
    
    controller = DaemonController.new(
       :identifier    => 'Ferret Server',
       :start_command => "#{RAILS_ROOT}/script/ferret_server start -e #{RAILS_ENV}",
       :stop_command  => "#{RAILS_ROOT}/script/ferret_server stop -e #{RAILS_ENV}",
       :ping_command  => lambda { TCPSocket.new(conf['host'], conf['port']) },
       :pid_file      => conf['pid_file'],
       :log_file      => conf['log_file'],
       :timeout       => 25
    )
    
    controller.start
    can't convert nil into String (DaemonController::StartError)
    #<DaemonController:0x2226a94 @ping_interval=0.1, @log_file="log/ferret_server.log", @log_file_activity_timeout=7, @ping_command=#, @start_timeout=15, @pid_file="tmp/pids/ferret.pid", @stop_command="/Users/Yves/Sites/union-street.de/script/ferret_server stop -e development", @identifier="Ferret Server", @lock_file=#, @stop_timeout=15, @start_command="/Users/Yves/Sites/union-street.de/script/ferret_server start -e development", @before_start=nil>
    
    0  	/Users/Yves/Sites/example.org/lib/daemon_controller.rb  	519  	in `run_command'
    1 	/Users/Yves/Sites/example.org/lib/daemon_controller.rb 	335 	in `spawn_daemon'
    2 	/Users/Yves/Sites/example.org/lib/daemon_controller.rb 	279 	in `start_without_locking'
    3 	/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb 	56 	in `timeout'
    4 	/Users/Yves/Sites/example.org/lib/daemon_controller.rb 	277 	in `start_without_locking'
    5 	/Users/Yves/Sites/example.org/lib/daemon_controller.rb 	163 	in `start'
    6 	/Users/Yves/Sites/example.org/lib/daemon_controller/lock_file.rb 	34 	in `exclusive_lock'
    7 	/Users/Yves/Sites/example.org/lib/daemon_controller/lock_file.rb 	29 	in `open'
    8 	/Users/Yves/Sites/example.org/lib/daemon_controller/lock_file.rb 	29 	in `exclusive_lock'
    9 	/Users/Yves/Sites/example.org/lib/daemon_controller.rb 	162 	in `start'
    10 	/Users/Yves/Sites/example.org/config/initializers/daemon_controller.rb 	15

    I’m sure it’s not ment as replacement for Monit – but it’s a great way to get the stuff booted up automatically.

  • http://www.phusion.nl/ hongli

    Yves: could you edit line 519 of daemon_controller.rb and put some debugging code there to check whether the temp_file variable is nil?

  • http://robsanheim.com Rob J Sanheim

    Why not use God for this?

  • http://www.phusion.nl/ hongli

    Rob: because it’s not the same thing. daemon_controller is not a replacement for Monit/God. It has vastly different goals than Monit/God.

    Take for example the Sphinx search functionality, where a Sphinx daemon is necessary. I don’t want to bother writing a Monit/God configuration file. I want my application to automatically start the daemon, when it needs to, so I – the human – don’t have to bother configuring anything.

    See also the “What about Monit/God” subsection.

  • http://chopine.be/lrz Laurent Sansonetti

    Looks like launchd? :)

    Laurent

  • Jos Backus

    Why not just use daemontools or runit? Pidfiles are evil imnsho.

  • http://www.phusion.nl/ hongli

    Jos: because it’s not a replacement for daemon tools (I’m not familiar with runit). You should read the “What about Monit/God?” section. If we used daemon tools then I’d have to ask Ninh to install a daemon tools ‘run’ file, which is one step too many.

  • Jos Backus

    hongli: I see. Couldn’t daemon_controller wrap daemontools/runit and write the run file (and log/run file, for that matter)? But I guess that’s not its intended purpose.

    Anyway, using pid files seems like a hack to me; the kernel will notify the parent when the child changes state (e.g. exits) – no pid files necessary.

    Laurent: it does remind me of launchd, too. I wish FreeBSD had launchd…

  • http://www.phusion.nl/ hongli

    Jos: It is within the realm of possibility, but it would require root privileges, which would add another step (namely asking for the password). Plus it wouldn’t work on all systems: not everybody has daemon tools or runit.

    And yes I agree, PID files are hacks. But they’re the lowest common denominator and are the easiest to work with, without adapting existing software or setting up a specialized environment.

  • Jos Backus

    hongli: True. What is really needed is standardized process management in UNIX. Probably never going to happen. Thankfully (because of efforts by yours truly ;-)) Apache runs fine under daemontools/runit, and so do many other pieces of server software. For software that does not, the required changes are usually very small.

  • http://twosecondmemory.org/ Richard Heycock

    I’d just like to say thanks for this, it’s something that’s become higher and higher on my “things to research” list.

    To the other commenters who mhentioned runit and launchd launchd is not GPL compatable and consequently is unlikely to make it in to many Linux distros (I’m using debian so definitely not) and runit seems to require that you replace you existing startup scripts with runit. Now I’m quite happy with my startup scripts they do there job very well, no they aren’t the most elegent solution and yes they do have problems but changing the entire way your machine boots is not a trivial matter especially when most of the distros still ship with either sysv or BSD style init scripts. So yes you could change over to runit but quite frankly I’ve got better things to do with my time.

    And, as others have mentioned it isn’t a replacement for monit/god/nagious/etc/etc. They are monitoring tools not daemon management tools. Yes they will manage your daemons’ lifecycle but their primary purpose is to monitor.

  • http://diminishing.org Michael Guterl

    This is awesome! Great job as always, I cannot wait to try it out!

  • Tim Haines

    Hi Hongli,

    I just wanted to say you’ve done a great job writing this up. Excellent explanation of the whys.

    I may very well use this in the near future.

    Cheers,

    Tim.

  • Pingback: Nome do Jogo » Blog Archive » Rails Podcast Brasil - Episódio 29

  • Pingback: links for 2008-08-25 « Donghai Ma

  • nickgrim

    > “We thought, why can’t such daemons be automatically started?”

    I believe we call that “security”.

  • http://www.phusion.nl/ hongli

    nickgrim: What sort of security? Security in which use cases? Sure, for daemons such as initd, nfs, sshd etc I would agree. But look at the daemons that are usually used with Rails applications. The things that I mentioned, like the BackgrounDRb server, the Sphinx server, the Ferret DRb server, etc. In the vast majority of the use cases, people run those daemons as the same user as their Rails application. Wouldn’t it make sense to auto-start such daemons in the default case, while giving the possibility to run them on a different machine or a different user account in the unusual case? Software should be easy to use by default.

  • Nathan

    Love this idea. I think for the typical rails daemons such as backgroundrb, sphinx, juggernaut, ferret, etc the default usecase is nearly always same machine, same user at least in the applications I have seen and been involved with. In this case, having the daemons be controlled independently would be perfect. One of my application is using backgroundrb, monit and a custom daemon all at once. managing them all is a pain. I hope we can standardize with the use of daemon_controller.

  • nickgrim

    hongli: But it’s much more important that software should be secure than easy-to-use (and if you think that having to start daemons make software not-easy-to-use, then perhaps you need a better sysadmin, but that’s a different comment)

    My point: I’ll confess that I haven’t investigated the functionality herein, but if you think it’s a good thing that users can “git pull” arbitrary code and have it start daemons *as that user* – with all of their access and privileges – then I am very scared by your security-practices.

  • Pingback: A Fresh Cup » Blog Archive » Double Shot #277

  • http://www.phusion.nl/ hongli

    nickgrim: Yes, software should be secure. But as I’ve asked before: what kind of security? Security in which use cases? Security in which contexts? How, exactly, does auto-starting the aforementioned daemons make things less secure, given the fact that in 90% of the uses cases people start those daemons manually as the same user?

    It seems that what you really want is that each daemon is sandboxed under its own user account. Auto-starting daemons does not prevent this. Notice that daemon_controller accepts arbitrary commands for starting a daemon. That could even be "ssh some_other_user@localhost some_daemon_command". If you configure and install the correct SSH keys then you can auto-start a daemon under an entirely different user account, without any user interaction, thus making it hassle-free.

    Now, whether all daemons should be sandboxed is an entirely different discussion. But in the vast, vast, vast majority of users, people already run daemons as the same user. Making lives harder for 90% of the people isn’t going to magically make things more secure.

    (and if you think that having to start daemons make software not-easy-to-use, then perhaps you need a better sysadmin, but that’s a different comment)

    First, it’s not just for production servers, it’s also for development machines. Look at the conversation in the “What about Monit/God?” section. Ninh and I are both developers. Our laptops aren’t servers, so why should we have to bother with starting daemons? What kind of security have we forsaken by letting them start automatically instead of manually?

    Furthermore, I would argue that auto-starting daemons makes software easier to use even when I have a good sysadmin. If he doesn’t have to start daemons, then that’s one less thing that he has to do, thus giving him more time to do things that really matter. Nobody should have to micro-manage computers. Computers are here to serve humans, not vice versa.

    but if you think it’s a good thing that users can “git pull” arbitrary code and have it start daemons *as that user* – with all of their access and privileges – then I am very scared by your security-practices.

    They are already running arbitrary code by running ./script/server, are they not? They are also running arbitrary code when they type “rake ultrasphinx:daemon:start”, are they not? Then what difference is auto-starting these daemons going to make?

  • Pingback: Pablo Formoso » daemon_controller, una librería para gestionar los demonios

  • Yves

    >> Could you edit line 519 of daemon_controller.rb and put some debugging code there to check whether the temp_file variable is nil?

    tempfile_path => “/tmp/daemon-output3269-0″
    tempfile => “#”

  • Yves

    Oops… tempfile was stripped off:

    File:/tmp/daemon-output3273-0 (closed)

    I’m running Mac OS X 10.5

  • http://www.phusion.nl/ hongli

    Yves: I’m afraid I can’t figure out what’s going with such limited information. I’m unable to reproduce the problem here either. According to your description, *something* at line 519 is nil. I’d really appreciate it if you can figure what exactly is nil around there that causes the problem.

  • Anonymous

    Text width for this blog is to wide! (and fixed! wtf!?)

  • Pingback: daemon_controller: Automatic Daemon Process Management

  • Yves

    Hi Hongli,

    I’d like to offer some more information without spamming your weblog – I’ll mail you

  • http://rubypond/ruby-on-rails Glenn

    Great work, and excellent tutorial on how to set it up

  • http://dewey.ws John

    You can pretty much do the same thing with inittab. Solaris also has this pretty much built in aka SMF.

  • http://www.phusion.nl/ hongli

    John: you should read the section called “What about Monit/God/rc.d/inittab/launchd/runit/daemon tools?”

  • http://saimonmoore.net Saimon Moore

    Hi Hongli,

    I think this will become an invaluable tool for my case. My partner is a designer so he really doesn’t like to have to remember to type in any additional commands to script/server.

    My question is if daemons are started via the rails initialization process (fine for deveopment), how do you handle the case with a pack of mongrels/passenger processes each trying to startup/control daemons? (Most of the time shouldn’t be a problem, but occasionally it perhaps may be for daemons slow to start up)

    Thinking about multiserver setups, I suppose I could use a start command like ‘ssh server2 searchd -c config/sphinx.conf’. Have you handled this use case?

    Regards,

    Saimon

  • http://saimonmoore.net Saimon Moore

    Scrap my main question :)

  • http://www.phusion.nl/ hongli

    Saimon: I suppose you’ve already found out that daemon_controller will ensure that there are no concurrent starting attempts. :)

    As for multiserver setups, that’s a lot trickier, so daemon_controller doesn’t support it. In multi-server setups you should just connect to the daemon without using daemon_controller, and you should use Monit/God/daemontools/runit to monitor the daemon.

  • http://blog.ritirisi.com J. Ryan Sobol

    I think we’re seeing the start of a revolution in ruby/rails process management. Phusion – thank you for leading the charge!

  • http://saimonmoore.net Saimon Moore

    Yep :)

    I’m currently already using god to start/stop/monitor all my daemons.
    I’d like to move to daemon_controller for some of these but maintain god’s monitoring of them (in production).

    I’m going to do some experimenting and report back…

    Saimon

  • http://saimonmoore.net Saimon Moore

    Question: If I’m using config.after_initialize to trigger the daemon_controller to start my daemon, how do I handle stopping the daemon when the rails processes are all stopped?

  • http://www.phusion.nl/ hongli

    Saimon Moore: DaemonController has no builtin way to do that. Part of the reason for that is that every Rails web container has different shutdown procedures. You’ll have to come up with your own solution.

  • http://jnewland.com/ Jesse Newland

    I like where this is going. ./script/server does not a web application make – in most large ‘Rails Applications’, Rails is only one component in a set of tools that combine to serve as a full web app.

    But, a couple things bug me about this – about both this post and DaemonController itself:

    This is not something that can be achieved with Monit/God.

    That’s just not true. I just used the following command to start several mongrels, nginx, haproxy, memcached, starling, several worker processes, and sphinx’s searchd on my development box: sudo god -c config/god/development.god. One command. My designer uses this too. OMGZ! That’s unpossible!

    And, since DaemonController doesn’t have any facility for shutting down processes, it seems that something like God is required to really stop/start things. Am I missing something there?

    Also, does this support running different sets of daemons on separate servers? That’s something that’s often needed when dealing with large web apps.

  • http://www.phusion.nl/ hongli

    Jesse: hm I didn’t know God can do that. But does it handle concurrent attempts? That is, if 2 Rails applications run “god -c config-file.god”, will God run the daemon twice or will it automatically detect that the two are the same?

    As for shutting down processes, I think you misunderstood it. DaemonController provides a way to shut down a daemon. It just doesn’t provide a way to *automatically* shut down a daemon, e.g. a way to shut down the daemon if all Rails applications have exited. As far as I know God can’t do that either, so whether you’re using DaemonController or God, if you want to shutdown a daemon then you still have to issue a shutdown command manually.

    DaemonController is not designed to support daemons on remote servers. If you have multiple servers then then you’ll need to manually setup a lot of things anyway, so that area is best left to God/Monit/etc. DaemonController is designed for the common case where there’s only one server, and where convenience is both important and possible.

  • http://saimonmoore.net Saimon Moore

    Hongli,

    Well seems my experimenting paid off: http://gist.github.com/7954

    That works great for development mode. Will automatically start/stop my daemons as I start/stop mongrel.

    The only downside I see with this process is if you depend on restarting mongrel often. This will slow you down. (As I don’t it works great for me)

    As for production, I’m still unsure of what to do. I see two possibilities:

    1. Let daemon_controller start my daemons and have god monitor and restart them if required
    2. Just use god for starting/stopping/monitoring.

    Since I already have 2 setup and 1 still means I’d need a god config file for each of these daemons, I think I’ll just stick with 2. But certainly my partner will be very happy when he next starts up mongrel. (Thing is this’ll probably set a precedent and he’ll want me to automatically st up anything and everything for him in the future :P )

    I’ve only got one more server I’d like to use daemon_controller for (ejabberd) but this is going to be tricky as it doesn’t use pid files. I’d use a system init script normally, but this ejabberd instance authentication is dependant on my app so ideally I’d like to stick it in the same boat as the rest of the daemons.

    Are you on irc somewhere I can contact you?

    Regards,

    Saimon

  • http://www.phusion.nl/ hongli

    Yeah that seems right, though it won’t work on Phusion Passenger because in Phusion Passenger, at_exit hooks are never called. If you’re on an environment where the pool of Rails processes is dynamic (e.g. Glassfish, or in the hypothetical case where Phusion Passenger does support at_exit) then the daemon may be shut down prematurely. It’ll be started by the next connect attempt, but it may cause a slight delay for your users. Because of this I’m tempted to keep my daemons running, they usually don’t use a lot of memory anyway.

    Yeah, I’m on IRC from time to time. My nick is FooBarWidget, irc.freenode.net. Usually in #passenger, #rails-contrib or #rubyonrails.

  • http://saimonmoore.net Saimon Moore

    In my case, practically all of my daemons are running a single instance of my rails app so whenever my app code changes they need restarting if I’m testing them. So for dev mode I can work with this solution using mongrel.

    For production, then yes, I’ll be using god for handling the daemons.

    Does passenger support trapping of any signals? (i.e. could I trap a USR1/2 signal to restart the daemons?)

  • http://www.phusion.nl/ hongli

    Yes, you can trap any signal. SIGABRT is reserved for aborting with a backtrace, and SIGQUIT in the development version is reserved for showing the backtrace without aborting. Though things will continue to work fine if you override those signals.

  • Pingback: daemon_controller memcached configuration « Gem Install That

  • http://www.opensourceconnections.com Eric Pugh

    Are you planning on pushing a gem out for this, or is this considered too raw yet for that? Seems like it could really simplify deployment issues that we face when you step beyond just a basic Mongrel + Database!

  • http://www.phusion.nl/ hongli

    We consider daemon_controller production-ready. We just haven’t had the chance to write a proper website for it, or to register a RubyForge project.

  • Pingback: เร็วส์ หกสิบหก » Blog Archive » นั่งเทียนเขียนข่าว#1

  • Gary Iams

    Hi Hongli,

    I’m experiencing some heartache with daemon_controller and BackgrounDrb, and was wondering if you (and the rest of the readers) could lend me your eyes. The situation is this: I’m attempting to set BackgrounDrb up as an initializer for my project, and have it configured as follows:

    config/backgroundrb.yml:

    :backgroundrb:
    :ip: 0.0.0.0
    :port: 11111
    :debug_log: true
    :log: foreground # log to console

    intializers/backgroundrb.rb:
    BACKGROUNDRB_PORT = 11111

    backgroundrb = DaemonController.new(
    :identifier => ‘BackgrounDrb job server/scheduler’,
    :start_command => ‘script/backgroundrb start’,
    :stop_command => ‘script/backgroundrb stop’,
    :ping_command => lambda { TCPSocket.new(‘localhost’, BACKGROUNDRB_PORT) },
    :pid_file => “tmp/pids/backgroundrb_#{BACKGROUNDRB_PORT}.pid”,
    :log_file => “log/backgroundrb_#{BACKGROUNDRB_PORT}.log”,
    :log_file_activity_timeout => 20
    )

    backgroundrb.start unless backgroundrb.running?

    When I attempt to fire up the web server, it bombs with the following backtrace:
    ** Starting Rails with development environment…
    (wait 15 seconds)
    Exiting
    [...]/plugins/daemon_controller/lib/daemon_controller.rb:321:in `start_without_locking’: Daemon ‘BackgrounDrb job server/scheduler’ failed to start in time. (DaemonController::StartTimeout)
    from [...]/plugins/daemon_controller/lib/daemon_controller.rb:163:in `start’
    from [...]/plugins/daemon_controller/lib/daemon_controller/lock_file.rb:34:in `exclusive_lock’
    from [...]/plugins/daemon_controller/lib/daemon_controller/lock_file.rb:29:in `open’
    from [...]/plugins/daemon_controller/lib/daemon_controller/lock_file.rb:29:in `exclusive_lock’
    from [...]/plugins/daemon_controller/lib/daemon_controller.rb:162:in `start’
    from [...]/config/initializers/backgroundrb.rb:14

    There are two oddities that I noticed while trying to get to the root cause of this problem:
    1) The ruby process for the BackgrounDrb daemon starts the same way I would normally run it by hand, save for the timeout. With the exception originating in file locking territory, this makes me think BackgrounDrb is causing the issue, yet I know that version 1.0.4 requires no args other that ‘start’ to start it.
    2) If I start BackgrounDrb manually from a console, and then load its DaemonController declaration into an irb session, I can stop/start it as one would expect.

    Thanks for a great daemon management solution,

    -gary

  • http://aviewfromafar.net/ Ashley Moran

    Awesome! But… it doesn’t work if the controlled daemon creates its own pid dir. Ramaze does this, and the following controller will not start:

    CONTROLLERS[:mock_news_sniffer] = DaemonController.new(
    :identifier => “RamazeApp”,
    :start_command => “ramaze –daemonize start –port 2999 #{app_path}”,
    :stop_command => “ramaze –daemonize stop –port 2999 #{app_path}”,
    :ping_command => lambda { TCPSocket.new(“localhost”, 2999) },
    :pid_file => File.join(Dir.tmpdir, “ramaze.pids/ramaze_app.pid”),
    :log_file => File.join(Dir.tmpdir, “ramaze.pids/ramaze_app.log”),
    :timeout => 25
    )

    because it can’t access the lock dir.

    But works if you put this in front of it:
    FileUtils.mkdir_p(File.join(Dir.tmpdir, “ramaze.pids”))

    Otherwise working great for me!

  • Ryan Schlesinger

    I’m trying to automate starting thinking_sphinx. TS provides rake tasks for start/stopping/rebuilding and I would like to use these withing daemon_controller if possible. Obviously rake starts up the rails environment before running the TS tasks which doesn’t work at all when I’m trying to run them from within daemon_controller. Is there an easy way to make this work? Should I just write some scripts that know how to do the TS tasks I’m interested in from daemon_controller?

  • http://www.facebook.com/peter.swank.23 Peter Swank

    I am trying to run faye automatically using gem daemon_controller.

    My Class

    require “daemon_controller”
    class FayeDaemon
    def initialize
    @controller = DaemonController.new(
    :identifier => ‘Faye server’,
    :start_command => “rackup faye.ru -s thin -E production”,
    :ping_command => [:tcp , 'localhost', 9292],
    :log_file => ‘log/faye.log’,
    :pid_file => ‘tmp/pids/faye.pid’,
    :start_timeout => 5
    )
    end

    def start
    @controller.start
    end
    end

    Function I use as before_filter in ApplicationController

    def start_faye
    fayes = FayeDaemon.new
    fayes.start
    end

    as a result faye doesn’t run with error

    DaemonController::StartTimeout (Daemon ‘Faye server’ didn’t daemonize in time.)

    when fayes.start is called.

    what i did wrong?