Phusion white papers Phusion overview

The new Rack socket hijacking API

By Hongli Lai on January 23rd, 2013

Yesterday saw the release of Rack 1.5.0, which adds a new feature to the Rack specification dubbed socket hijacking. This feature allows applications to take over the client socket and perform arbitrary operations on it, e.g. implementing WebSockets, streaming data to the client, etc.

Did Rack not support streaming? Actually yes it did, you can do it by returning a body object that outputs body chunks in the #each method, as explained in our past article Why Rails 4 Live Streaming is a Big Deal. But this API is a bit clunky. The socket hijacking API provides access to a Ruby IO object-like API.

Support for socket hijacking has been added to Phusion Passenger 4 yesterday. The upcoming Phusion Passenger 4 has been covered here, here and here. Phusion Passenger Enterprise customers can already test and enjoy a preview of this feature by downloading the “3.9.2 beta preview (4.0.0 beta 2)” file from the Customer Area.

The socket hijacking API was surprisingly easy to implement, but unfortunately poorly documented at this time. The application-level API is not immediately obvious, and the Rack specification documentation has not yet been updated to cover the hijacking API. In this article we’ll introduce the API and provide an example program.

What the socket hijacking API is not

Some of you may have heard of efforts to develop a “Rack 2.0″ specification which properly covers things such as streaming and evented servers. According to the hijacking API developer, this API is not an attempt towards Rack 2.0. It is a “good enough” solution that works within the confines of the Rack 1.x specification. Things may change in Rack 2.0, though at this time it’s unclear what the progress towards Rack 2.0 is.

It is also unclear whether the API is supposed to be final or not. While implementing this API and writing this article we’ve discovered some room for improvement. The suggestions (which you can find later in this article) have been submitted to the developers.

Overview of the API

The hijacking API provides two modes:

  1. A full hijacking API, which gives the application complete control over what goes over the socket. In this mode, the application server doesn’t send anything over the socket, and lets the application take care of it. This mode is useful if you want to implement arbitrary (even non-HTTP) protocols over the socket. This is subject to limitations: if your application is behind a web server or an HTTP load balancer then those components dictate which protocols you can implement.
  2. A partial hijacking API, which gives the application control over the socket after the application server has already sent out headers. This mode is mostly useful for streaming.

The hijacking API is accessible through the Rack env hash. You can check whether the application server supports the hijacking API by checking env['rack.hijack?'], which returns a boolean value.

Full hijacking

You can perform a full hijack by calling env['rack.hijack'].call. You can access the hijacked socket object through env['rack.hijack_io']. Phusion Passenger’s implementation of env['rack.hijack'] returns the socket object, but it is unclear whether this is supposed to be standard behavior.

You are responsible for:

  • Outputting any HTTP headers, if applicable.
  • Closing the IO object when you no longer need it.

You should output the “Connection: close” header unless you plan on implementing HTTP keep-alive yourself.

Here’s am example of the full hijacking API in action:

# encoding: utf-8
require 'thread'

# Streams the response "Line 1" .. "Line 10", with
# 1 second sleep time between each line.
# 
# Non-Phusion Passenger users may have to turn off their
# web servers' buffering options for streaming to work.
# Phusion Passenger 4 users don't have to do anything, it
# works out-of-the-box thanks to our real-time response
# buffering feature.
app = lambda do |env|
  # Fully hijack the client socket.
  env['rack.hijack'].call
  io = env['rack.hijack_io']
  begin
    io.write("Status: 200\r\n")
    io.write("Connection: close\r\n")
    io.write("Content-Type: text/plain\r\n")
    io.write("\r\n")
    10.times do |i|
      io.write("Line #{i + 1}!\n")
      io.flush
      sleep 1
    end
  ensure
    io.close
  end
end

run app

Partial hijacking

You can perform a partial hijack by assigning a lambda to the rack.hijack response header. This lambda will be called after the application server has sent out headers. The application server will ignore the body part of the Rack response, and will call the ‘rack.hijack’ lambda, passing it the client socket. You are responsible for closing the socket when it’s no longer needed.

It is unclear what the value of the Rack response body should be. Phusion Passenger’s implementation doesn’t care: you can return a two-array response, or a three-array response where where the body can be anything. If the ‘rack.hijack’ response header is set, the body will be completely ignored.

Example:

# encoding: utf-8
require 'thread'

# Streams the response "Line 1" .. "Line 10", with
# 1 second sleep time between each line.
# 
# Non-Phusion Passenger users may have to turn off their
# web servers' buffering options for streaming to work.
# Phusion Passenger 4 users don't have to do anything, it
# works out-of-the-box thanks to our real-time response
# buffering feature.
app = lambda do |env|
  response_headers = {}
  response_headers["Content-Type"] = "text/plain"
  response_headers["rack.hijack"] = lambda do |io|
    # This lambda will be called after the app server has outputted
    # headers. Here we can output body data at will.
    begin
      10.times do |i|
        io.write("Line #{i + 1}!\n")
        io.flush
        sleep 1
      end
    ensure
      io.close
    end
  end
  [200, response_headers, nil]
end

run app

Issues with the hijacking API

Here’s how we think the hijacking API can be improved.

  • env['rack.hijack?'] appears to be unnecessary. You can already check for hijacking support by checking env['rack.hijack'].
  • The partial hijacking API should not involve assigning a lambda to the response headers. As far as we can see, you can just return the lambda as the body. That would be a much more elegant solution.
  • The return value for env['rack.hijack'] should be well-defined.

Conclusion

The Rack hijacking API, while having some quirks in our opinion, gets the job done. We hope that the usage of the hijacking API has become more clear after reading this article. If you have any comments, questions, suggestions or corrections, please let us know.

We at Phusion are working feverishly at the upcoming Phusion Passenger 4 (covered here, here and here). Implementing the hijacking API so quickly is our way of showing you how dedicated we are. Together with Phusion Passenger Enterprise, we aim to deliver the most stable, performant and feature rich polyglot application server out there. If you’re interested in future updates, please subscribe to our newsletter. Until next time!



Phusion Passenger 4.0 supports JRuby, Rubinius

By Hongli Lai on October 30th, 2012

JRuby

Rubinius

Phusion Passenger is an Apache and Nginx module for deploying Ruby and Python web applications. It has a strong focus on ease of use, stability and performance. Phusion Passenger is built on top of tried-and-true, battle-hardened Unix technologies, yet at the same time introduces innovations not found in most traditional Unix servers. Since mid-2012, it aims to be the ultimate polyglot application server.

In the announcement for Phusion Passenger 4.0.0 beta 1, we introduced a myriad of changes such as support for multiple Ruby versions, Python WSGI support, multithreading (Enterprise only), improved zero-copy architecture, better error diagnostics and more. And as we promised, the story would not end there. A commit has just landed in our Github repository for JRuby (1.7.0 required) and Rubinius support!

JRuby: the past and the current state of affairs

JRuby is an excellent Ruby implementation for the JVM, and in the past few years they have been doing a great job with regard to compatibility and performance. But for a long time, application server support for JRuby had been limited:

  • While Mongrel and Thin had limited JRuby support, these setups have not been very popular. Since so few people use these setups, their caveats are not very well known.
  • Unicorn does not support JRuby at all because it was designed to take advantage of Unix features, which JRuby does not (and cannot) always support well.
  • Phusion Passenger was in the same position: we used too many Unix features and were not able to support JRuby well.
  • Goliath does not seem to have official support for JRuby thanks to the unknown status of EventMachine’s Java support.

So the only options left were J2EE app servers such as JBoss, Tomcat, GlassFish and TorqueBox; as well as the recently developed Puma, which is almost pure Ruby.

Thanks to the new ApplicationPool and Spawner architecture in Phusion Passenger 4, we’re now able to support JRuby with ease. Because a lot of code has been moved into C++, we no longer need the Ruby implementation to support Unix features. We only needed an hour to add support for JRuby.

Phusion Passenger vs J2EE

With Phusion Passenger’s support for JRuby, you don’t need to learn about J2EE deployment. Using JRuby on Phusion Passenger is very straightforward: set PassengerRuby to your JRuby command, point the virtual host’s document root to your application’s ‘public’ directory, and you’re done.

With Phusion Passenger Enterprise, JRuby users get to enjoy all the enterprise features such as multithreading, rolling restarts, deployment error resistance, time and memory usage limiting, and more.

Rubinius is impressive as well

We remember that back in the days, Rubinius was quite slow during startup and did not support MRI native extensions. Fast forward to 2012, and what we find is a very impressive Ruby implementation. They have 1.9 support and MRI extension support. The Ruby interpreter starts quickly. They support Unix features. Adding Rubinius support was pretty straightforward. The Rubinius team has done an excellent job!

Why you should use JRuby or Rubinius

JRuby and Rubinius support real multi-core concurrency. JRuby and Rubinius threads map to real OS threads, and neither Ruby implementations have a global interpreter lock. In contrast, MRI Ruby 1.8 uses userspace threading and so cannot take advantage of multi-core using a single process. MRI Ruby 1.9 has real OS threads, but also has a global interpreter lock and so still cannot take advantage of multi-core using a single process.

Granted, the multi-core issue isn’t that big. Phusion Passenger spawns multiple processes in order to take advantage of multi-core. But if you’re in a position in which you can only use 1 process, for whatever reason, then JRuby and Rubinius are what you need. With Phusion Passenger Enterprise’s multithreading support, you can have hybrid multi-processed and multi-threaded applications – the best of both worlds.

JRuby and Rubinius also often have superior performance. Both implementations support JIT compilation, which MRI Ruby does not.

That said, MRI Ruby still has the best compatibility in the Ruby ecosystem, so JRuby and Rubinius are not silver bullets. You should use the best tool for the best job. With Phusion Passenger 4′s support for multiple Ruby versions, this should be a breeze.

Where to get Phusion Passenger with JRuby/Rubinius support

JRuby/Rubinius support will become part of the upcoming 4.0.0 beta 2. Please stay tuned!

These Phusion Passenger 4 updates are just the beginning. We have more exciting changes planned for the near future! Curious? Enter your email address and name below and we’ll keep you up to date.


SHA-3 extensions for Ruby and Node.js

By Hongli Lai on October 6th, 2012

A few days ago, NIST announced the winner of the SHA-3 competition: Keccak (prounced [kɛtʃak], ketchak). The researchers who authored Keccak released a reference implementation in C.

We’ve created Ruby and Node.js extensions for Keccak. Our extensions utilize the code from the official reference implementation and come with a extensive suite of unit tests. But note however that I do not claim to be a security expert. Feel free to review the code for any flaws.

Install with:

gem install digest-sha3
npm install sha3

We’ve strived to emulate the languages’ standard hash libraries’ interfaces, so using these extensions is straightforward:

require 'digest/sha3'
Digest::SHA3.hexdigest("foo")

and

var SHA3 = require('sha3');
var hash = SHA3.SHA3Hash();
hash.update('foo');
hash.digest();

Both libraries are MIT licensed. Enjoy!

Why does the world need SHA-3?

If you’re not a security researcher then you’ve undoubtedly asked the same questions as I did. What’s wrong with SHA-1, SHA-256 and SHA-512? Why does the world need SHA-3? Why was Keccak the winner?

According to to well-known security researcher Bruce Schneier, there’s nothing wrong with the SHA-2 family of hashing functions. However he likes SHA-3 because it’s completely different. SHA-1 and SHA-2 are both based on the Merkle–Damgård construction. It may be feasible to find a flaw in SHA-1 that also affects SHA-2. In contrast, Keccak is based on the sponge construction so any attacks on SHA-1 and SHA-2 will probably not affect Keccak. Indeed, it appears that NIST chose Keccak because it was looking for some kind of insurance in case SHA-2 would be broken. Many people commented that they expected Skein to win.

Do not hash your passwords

In any case, the following cannot be repeated enough. Do not use SHA-3 (or SHA-256, SHA-512, RIPEMD-160 or whatever fast hash) to hash passwords! Do not even use SHA-3 + salt to hash passwords. Instead, use a slow hash like bcrypt. As Coda Hale explained in his article, you’ll want the hash to be slow so you can defend against attackers effectively.

The right way to deal with frozen processes on Unix

By Hongli Lai on September 21st, 2012

Those who administer production Unix systems have undoubtedly encountered the problem of frozen processes before. They just sit around, consuming CPU and/or memory indefinitely until you forcefully shut them down.

Phusion Passenger 3 – our high-performance and advanced web application server – was not completely immune to this problem either. That is until today, because we have just implemented a fix which will be part of Phusion Passenger 4.0 (4.0 hasn’t been released yet, but a first beta will appear soon). It’s going into the open source version, not just Phusion Passenger Enterprise, because we believe this is an important change that should benefit the entire community.

Behind our solution lies an array of knowledge about operating systems, process management and Unix. Today, I’d like to take this opportunity to share some of our knowledge with our readers. In this article we’re going to dive into the operating system-level details of frozen processes. Some of the questions answered by this article include:

  • What causes a process to become frozen and how can you debug them?
  • What facilities does Unix provide to control and to shut down processes?
  • How can a process manager (e.g. Phusion Passenger) automatically cleanup frozen processes?
  • Why was Phusion Passenger not immune to the problem of frozen processes?
  • How is Phusion Passenger’s frozen process killing fix implemented, and under what constraints does it work?

This article attempts to be generic, but will also provide Ruby-specific tips.

Not all frozen processes are made equal

Let me first clear some confusion. Frozen processes are sometimes also called zombie processes, but this is not formally correct. Formally, a zombie process as defined by Unix operating systems is a process that has already exited, but its parent process has not waited for its exit yet. Zombie processes show up in ps and on Linux they have “<defunct>” appended to their names. Zombie processes do not consume memory or CPU. Killing them – even with SIGKILL – has no effect. The only way to get rid of them is to either make the parent process call waitpid() on them, or by terminating the parent process. The latter causes the zombie process to be “adopted” by the init process (PID 1), which immediately calls waitpid() on any adopted processes. In any case, zombie processes are harmless.

In this article we’re only covering frozen processes, not zombie processes. A process is often considered frozen if it has stopped responding normally. They appear stuck inside something and are not throwing errors. Some of the general causes are:

  1. The process is stuck in an infinite loop. In practice we rarely see this kind of freeze.
  2. The process is very slow, even during shutdown, causing it to appear frozen. Some of the causes include:
    2.1. It is using too much memory, causing it to hit the swap. In this case you should notice the entire system becoming slow.
    2.2. It is performing an unoptimized operation that takes a long time. For example you may have code that iterates through all database table rows to perform a calculation. In development you only had a handful of rows so you never noticed this, but you forgot that in production you have millions of rows.
  3. The process is stuck in an I/O operation. It’s waiting for an external process or a server, be it the database, an HTTP API server, or whatever.

Debugging frozen processes

Obviously fixing a frozen process involves more than figuring out how to automatically kill it. Killing it just fixes the symptoms, and should be considered a final defense in the war against frozen processes. It’s better to figure out the root cause. We have a number of tools in our arsenal to find out what a frozen process is doing.

crash-watch

crash-watch is a tool that we’ve written to easily obtain process backtraces. Crash-watch can be instructed to dump backtraces when a given process crashes, or dump their current backtraces.

Crash-watch is actually a convenience wrapper around gdb, so you must install gdb first. It dumps C-level backtraces, not language-specific backtraces. If you run crash-watch on a Ruby program it will dump the Ruby interpreter’s C backtraces and not the Ruby code’s backtraces. It also dumps the backtraces of all threads, not just the active thread.

Invoke crash-watch as follows:

crash-watch --dump <PID>

Here is a sample output. This output is the result of invoking crash-watch on a simple “hello world” Rack program that simulates being frozen.

Crash-watch is especially useful for analyzing C-level problems. As you can see in the sample output, the program is frozen in a freeze_process call.

Crash-watch can also assist in analyzing problems caused by Ruby C extensions. Ruby’s mysql gem is quite notorious in this regard because it blocks all Ruby threads while it’s doing its work. If the MySQL server doesn’t respond (e.g. because of network problems, because the server is dead, or because the query is too heavy) then the mysql gem will freeze the entire process, making it even unable to respond to signals. With crash-watch you are able to clearly see that a process is frozen in a mysql gem call.

Phusion Passenger’s SIGQUIT handler

Ruby processes managed by Phusion Passenger respond to SIGQUIT by printing their backtraces to stderr. On Phusion Passenger, stderr is always redirected to the global web server error log (e.g. /var/log/apache2/error.log or /var/log/nginx/error.log), not the vhost-specific error log. If your Ruby interpreter supports it, Phusion Passenger will even print the backtraces of all Ruby threads, not just the active one. This latter feature requires either Ruby Enterprise Edition or Ruby 1.9.

Note that for this to work, the Ruby interpreter must be responding to signals. If the Ruby interpreter is frozen inside a C extension call (such as is the case in the sample program) then nothing will happen. In that case you should use crash-watch or the rb_backtrace() trick below.

rb_backtrace()

If you want to debug a Ruby program that’s not managed by Phusion Passenger then there’s another trick to obtain the Ruby backtrace. The Ruby interpreter has a nice function called rb_backtrace() which causes it to print its current Ruby-level backtrace to stdout. You can use gdb to force a process to call that function. This works even when the Ruby interpreter is stuck inside a C call. This method has two downsides:

  1. Its reliability depends on the state of the Ruby interpreter. You are forcing a call from arbitrary places in the code, so you risk corrupting the process state. Use with caution.
  2. It only prints the backtrace of the active Ruby thread. It’s not possible to print the backtraces of any other Ruby threads.

First, start gdb:

$ gdb

Then attach gdb to the process you want:

attach <PID>

This will probably print a whole bunch of messages. Ignore them; if gdb prints a lot of library names and then asks you whether to continue, answer Yes.

Now we get to the cream. Use the following command to force a call to rb_backtrace():

p (void) rb_backtrace()

You should now see a backtrace appearing in the process’s stdout:

from config.ru:5:in `freeze_process'
from config.ru:5
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `call'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/rack/thread_handler_extension.rb:67:in `process_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:126:in `accept_and_process_next_request'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:100:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/utils/robust_interruption.rb:82:in `disable_interruptions'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler/thread_handler.rb:98:in `main_loop'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:432:in `start_threads'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426:in `initialize'
from /Users/hongli/Projects/passenger/lib/phusion_passenger/request_handler.rb:426

strace and dtruss

The strace tool (Linux) and the dtruss tool (OS X) can be used to see which system calls a process is calling. This is specially useful for detecting problems belonging to categories (1) and (2.2).

Invoke strace and dtruss as follows:

sudo strace -p <PID>
sudo dtruss -p <PID>

Phusion Passenger’s role

Phusion Passenger

Phusion Passenger was traditionally architected to trust application processes. That is, it assumed that if we tell an application process to start, it will start, and if we tell it to stop it will stop. In practice this is not always true. Applications and libraries contain bugs that can cause freezes, or maybe interaction with an external buggy component causes freezes (network problems, database server problems, etc). Our point of view was that the developer and system administrator should be responsible for these kind of problems. If the developer/administrator does not manually intervene, the system may remain unusable.

Starting from Phusion Passenger 3, we began turning away from this philosophy. Our core philosophy has always been that software should Just Work™ with the least amount of hassle, and that software should strive to be zero-maintenance. As a result, Phusion Passenger 3 introduced the Watchdog, Phusion Passenger Enterprise introduced request time limiting, and Phusion Passenger 4 will introduce application spawning time limiting. These things attack different aspects of the freezing process problem:

  1. Application spawning time limiting solves the problem of application processes not starting up quickly enough, or application processes freezing during startup. This feature will be included in the open source version of Phusion Passenger.
  2. Request time limiting solves the problem of application processes freezing during a web request. This feature is not available in the open source version of Phusion Passenger, and only in Phusion Passenger Enterprise.
  3. The Watchdog traditionally only solves the problem of Phusion Passenger helper processes (namely the HelperAgent and LoggingAgent) freezing during web server shutdown. Now, it also solves the problem of application processes freezing during web server shutdown. These features will too be included in the open source version of Phusion Passenger.

The shutdown procedure and the fix

When you shut down your web server, the Watchdog will be the one to notice this event. It will send a message to the LoggingAgent and the HelperAgent to tell them to gracefully shut down. In turn, the HelperAgent will tell all application processes to gracefully shut down. The HelperAgent does not wait until they’ve actually shut down. Instead it will just assume that they will eventually shut down. It is for this reason that even if you shut down your web server, application processes may stay behind.

The Watchdog was already designed to assume that agent processes could freeze during shutdown, so it gives agent processes a maximum amount of time during which they shut down gracefully. If they don’t, then the Watchdog will forcefully kill them with SIGKILL. It wouldn’t just kill the agent processes, but also all application processes.

The fix was therefore pretty straightforward. Always have the watchdog kill applications processes, even if the HelperAgent terminates normally and in time. The final fix was effectively 3 lines of code.

Utilizing Unix process groups

The most straightforward method to shutdown application processes would be to maintain a list of their PIDs and then killing them one-by-one. The Watchdog however uses a more powerful and little-used Unix mechanism, namely process groups. Let me first explain them.

A system can have multiple process groups. A process belongs to exactly one process group. Each process group has exactly 1 “process group leader”. The process group ID is equal to the PID of the group leader.

The process group that a process belongs to is inherited from its parent process upon forking. However a process can be a member of any process group, no matter what group the parent belongs to.


Same-colored processes denote processes belonging to the same process group. As you can see in the process tree on the right, process group membership is not constrained by parent-child relationships.

You can simulate the process tree on the right using the following Ruby code.

top = $$
puts "Top process PID: #{$$}"
Process.setpgrp
pid = fork do
  pid = $$
  pid2 = fork do
    pid2 = $$
    Process.setpgrp
    pid3 = fork do
      pid3 = $$
      sleep 0.1
      puts "#{top} belongs to process group: #{Process.getpgid(top)}"
      puts "#{pid} belongs to process group: #{Process.getpgid(pid)}"
      puts "#{pid2} belongs to process group: #{Process.getpgid(pid2)}"
      puts "#{pid3} belongs to process group: #{Process.getpgid(pid3)}"
    end

    # We change process group ID of pid3 after the fact!
    Process.setpgid(pid3, top)

    Process.waitpid(pid3)
  end
  Process.waitpid(pid2)
end
Process.waitpid(pid)

As you can see, you can change the process group membership of any process at any time, provided you have the permissions to do so.

The kill() system call provides a neat little feature: it allows you to send a signal to an entire process group! You can already guess where this is going.

Whenever the Watchdog spawns an agent process, it creates a new process group for it. The HelperAgent and the LoggingAgent are both process group leaders of their own process group. Since the HelperAgent is responsible for spawning application processes, all application processes automatically belong to the HelperAgent’s process group, unless the application processes themselves change this.

Upon shutdown, the Watchdog sends SIGKILL to HelperAgent’s and the LoggingAgent’s process groups. This ensures that everything is killed, including all application processes and even whatever subprocesses they spawn.

Conclusion

In this article we’ve considered several reasons why processes may freeze and how to debug them. We’ve seen which mechanisms Phusion Passenger provides to combat frozen processes. We’ve introduced the reader to Unix process groups and how they interact with Unix signal handling.

A lot of engineering has gone into Phusion Passenger to ensure that everything works correctly even in the face of failing software components. As you can see, Phusion Passenger uses proven and rock-solid Unix technologies and concepts to implement its advanced features.

Support us by buying Phusion Passenger Enterprise

It has been a little over a month now since we’ve released Phusion Passenger Enterprise, which comes with a wide array of additional features not found in the open source Phusion Passenger. However as we’ve stated before, we remain committed to maintaining and developing the open source version. This is made possible by the support of our customers; your support. If you like Phusion Passenger or if you want to enjoy the premium Enterprise features, please consider buying one or more Phusion Passenger Enterprise licenses. Development of the open source Phusion Passenger is directly funded by Phusion Passenger Enterprise sales.

Let me know your thoughts about this article in the comments section and thank you for your support!

How to fix the Ruby 1.9 HTTPS/Bundler segmentation fault on OS X Lion

By Hongli Lai on May 9th, 2012

If you’ve installed a gem bundle on OS X Lion the past few weeks then you may have seen the dreaded “[BUG] Segmentation fault” error, where Ruby sees to crash in the connect C function in http.rb. Upgrading to the latest Ruby 1.9.3 version (p194) doesn’t seem to help. Luckily someone has found a solution for this problem.

It turns out the segmentation fault is caused by an incompatibility between MacPort’s OpenSSL and RVM. MacPorts installs everything to /opt/local but RVM does not look for OpenSSL in /opt/local. We solved the problem by reinstalling Ruby 1.9.3 with the MacPorts OpenSSL, as follows.

First edit $HOME/.rvmrc and add:

export CFLAGS="-O2 -arch x86_64"
export LDFLAGS="-L/opt/local/lib"
export CPPFLAGS="-I/opt/local/include"

Then run:

sudo port install libyaml
rvm reinstall ruby-1.9.3 --with-openssl-dir=/opt/local --with-opt-dir=/opt/local

Bundler and public applications

By Hongli Lai on January 19th, 2012

I think Bundler is a great tool. Its strength lies not in its ability to install all the gems that you’ve specified, but in automatically figuring out a correct dependency graph so that nothing conflicts with each other, and in the fact that it gives you rock-solid guarantees that whatever gems you’re using in development is exactly what you get in production. No more weird gem version conflict errors.

This is awesome for most Ruby web apps that are meant to be used internally, e.g. things like Twitter, Basecamp, Union Station. Unfortunately, this strength also turns in a kind of weakness when it comes to public apps like Redmine and Juvia. These apps typically allow the user to choose their database driver through config/database.yml. However the driver must also be specified inside Gemfile, otherwise the app cannot load it. The result is that the user has to edit both database.yml and Gemfile, which introduces the following problems:

  • The user may not necessarily be a Ruby programmer. The Gemfile will confuse him.
  • The user is not able to use the Gemfile.lock that the developer has provided. This makes installing in deployment mode with the developer-provided Gemfile.lock impossible.

This can be worked around in a very messy form with groups. For example:

group :driver_sqlite do
  gem 'sqlite3'
end

group :driver_mysql do
  gem 'msyql'
end

group :driver_postgresql do
  gem 'pg'
end

And then, if the user chose to use MySQL:

bundle install --without='driver_postgresql driver_sqlite'

This is messy because you have to exclude all the things you don’t want. If the app supports 10 database drivers then the user has to put 9 drivers on the exclusion list.

How can we make this better? I propose supporting conditionals in the Gemfile language. For example:

condition :driver => 'sqlite' do
  gem 'sqlite3'
end

condition :driver => 'mysql' do
  gem 'mysql'
end

condition :driver => 'postgresql' do
  gem 'pg'
end

condition :driver => ['mysql', 'sqlite'] do
  gem 'foobar'
end

The following command would install the mysql and the foobar gems:

bundle install --condition driver=mysql

Bundler should enforce that the driver condition is set: if it’s not set then it should raise an error. To allow for the driver condition to not be set, the developer must explicitly define that the condition may be nil:

condition :driver => nil do
  gem 'null-database-driver'
end

Here, bundle install will install null-database-driver.

With this proposal, user installation instructions can be reduced to these steps:

  1. Edit database.yml and specify a driver.
  2. Run bundle install --condition driver=(driver name)

I’ve opened a ticket for this proposal. What do you think?

Making Ruby threadable: properly handling context switching in native extensions

By Hongli Lai on June 10th, 2010

In the previous article Does Rails Performance Need an Overhaul? we had discussed the fact that proper Ruby threading is hindered by various broken native extensions. Writing a native extension for Ruby is pretty easy, however writing it right can not only be difficult, but can also be an obscure practice that requires l33t sk1llz because of the lack of documentation in this area. We’ve written several native extensions so far and in the process of figuring out how to make threading-friendly native extensions we had to wade through tons of Ruby source code. In this article I want to teach some best practices in the writing of threading-friendly native extensions.

Threading basics

As discussed in the previous article, Ruby 1.8 implements userspace threads, meaning that no matter how many Ruby threads you have, only one can run at a time, and only on a single CPU core. The threads are scheduled by Ruby itself, not by the operating system.

Ruby 1.9 implements native operating system threads. However it has a global interpreter lock which must be locked when the thread is running Ruby code. This effectively makes Ruby 1.9 single threaded most of the time.

With both Ruby 1.8 and 1.9 threads, system calls such as I/O operations can block the thread and preventing Ruby from context switching to another. Thus, system calls require special attention. Expensive calculations that do not involve system calls can also block the thread, but something can be done those as well, as you will read later on.

Handling I/O

Suppose that you have a file descriptor on which you want to perform some potentially blocking I/O. The naive approach is to perform the I/O command anyway and risk blocking the entire Ruby process. This is exactly what makes the mysql extension thread-unfriendly: while waiting on MySQL no other threads can run, grinding your multi-threaded Rails web app to a halt.

However there are a number of functions in your arsenal that you can use to combat this problem. And as a general rule, you should set to file descriptors to non-blocking mode.

rb_thread_wait_fd(fd)

Just before performing a blocking read, you should call rb_thread_wait_fd() on the file descriptor that you’re reading from. On 1.8, this function marks the current thread as waiting for readable data on this file descriptor and then invokes the scheduler. The scheduler uses the select() system call to check which file descriptors are readable and then selects a thread which may continue. If the file descriptor that you were waiting on is not readable, then your thread will be suspended until the next time the scheduler is invoked and selects your thread. But even if the file descriptor is immediately readable, the scheduler does not guarantee that your thread will be selected immediately.

On 1.9, rb_thread_wait_fd() simply unlocks the global interpreter lock, calls the select() system call on the given file descriptor, and re-acquires the global interpreter lock when select() returns. While select() is blocking, other threads can run.

As an optimization, if only the main thread exists then this function does nothing. This applies to both 1.8 and 1.9.

rb_thread_fd_writable(fd)

This works the same as rb_thread_fd(), but waits until the given file descriptor becomes writable. The single-thread optimization applies here too. You should call rb_thread_fd_writable() just before you perform a write I/O operation.

rb_thread_select()

To wait on multiple file descriptors, use this function instead of select() or poll(). Unlike the native system calls, this function will take care of invoking the scheduler or unlocking the global interpreter lock. Unlike rb_thread_wait_fd() and rb_thread_fd_writable(), there is no do-nothing-when-there’s-only-one-thread optimization here so it will always invoke the scheduler and call select().

rb_io_wait_readable()

I/O system calls can return a variety of error codes that indicate that you should restart the system call, such as EINTR (system call interrupted by signal) and EAGAIN (the file descriptor is set to non-blocking mode and the data is not yet available). You should therefore always call I/O system calls in a loop until it returns success or a different error code. You must however not forget to call rb_thread_wait_fd() or rb_thread_select() before you restart the system call, or you will risk blocking the thread again.

Ruby provides a function rb_io_wait_readable() to aid you in writing restart code. This function should be called right after your I/O reading system call has returned. It checks whether the system call should be restarted (returning Qtrue) or whether you should report an error (returning Qfalse). Here’s a code example:

int done = 0;
int ret;

/* Have the Ruby scheduler suspend this thread until the file descriptor becomes
 * readable; or if this is the only thread in the system, rb_thread_wait_fd() does
 * nothing and we immediately continue to the 'do' loop.
 */
rb_thread_wait_fd(fd);
do {
    /* Actually you should surround your system call with some more code, but
     * we'll get to this later. This example code is only partial. */
    ret = ...your read system call here...
    if (ret == -1) {
        if (rb_io_wait_readable(fd) == Qfalse) {
            ...throw an exception here...
        } /* else restart loop */
    } else {
        done = 1;
    }
} while (!done);

rb_io_wait_readable() checks whether errno equals EINTR or ERESTART, in which case it will call rb_thread_wait_for() on the file descriptor and return Qtrue. If errno is EAGAIN or EWOULDBLOCK then it calls rb_thread_select() on the file descriptor and returns true. Otherwise it returns false.

The difference between calling rb_thread_wait_for() and rb_thread_select() here is subtle, but important. The former only blocks (calls select() on the file descriptor) when there are multiple Ruby threads in the Ruby process, while the latter always blocks no matter what. This behavior is important because EAGAIN and EWOULDBLOCK occur when a non-blocking file descriptor is not yet readable; if we don’t block here on a select() then the code will enter a 100% CPU busy loop.

rb_io_wait_writable()

Works the same way as rb_io_wait_readable(). Use this for I/O write operations instead.

Sleeping

Use rb_thread_wait_for() instead of sleep() or usleep(). On 1.8 rb_thread_wait_for() marks the current thread as sleeping for a period of time and then invokes the scheduler, which does not select this thread until the period of time has expired. On 1.9 Ruby unlocks the global interpreter lock, calls some sleeping function, and then re-locks it after that function returns.

Other non-I/O blocking system calls

Sometimes you will want to wait on a blocking system call that isn’t related to I/O, such as waitpid(). There are several ways to deal with these kind of system calls.

Blocking outside the global interpreter lock

This method only works on Ruby 1.9. Unlock the global interpreter lock, do your thing, then re-locks it. Dealing with the global interpreter lock will be discussed later.

Non-blocking polling

Some system calls have non-blocking equivalents which return a certain error instead of blocking. For example waitpid() blocks by default, but it can be set to non-blocking by passing the WNOHANG flag, which causes it to return immediately with an error instead of blocking. You must call the non-blocking version in a loop. Upon detecting a blocking error, you must call rb_thread_polling(). On 1.8 this function lets the scheduler put the current thread to sleep for 60 msec, on 1.9 for 100 msec.

For example, Ruby’s Process#waitpid function does not block other threads. On 1.9 it simply unlocks the global interpreter lock while blocking on waitpid(). On 1.8 it is implemented as follows (simplified version):

retry:

int result = waitpid(..., WNOHANG);
if (result < 0) {
    if (errno == EINTR) {
        /* Process isn't ready yet. Tell the scheduler and then restart the call. */
        rb_thread_polling();
        goto retry;
    } else {
        ...throw exception...
    }
}

The actual code is actually more optimized than this. For example if there's only a single thread in the system then it calls waitpid() without WNOHANG and just have it block.

Calling the system call in a native OS thread and use I/O to report results

This is probably the most complex way but on 1.8 sometimes you don't have any choice. On 1.9 you should always prefer unlocking the global interpreter lock over this method.

Create a pipe, then spawn a native OS thread which calls the system call. When the system call is done, have your native thread report the result back via the pipe. On the Ruby side, use rb_thread_wait_fd() and friends to block on the pipe and then receive the results. Be sure to join the thread after you've read the result because rb_thread_wait_fd() does not necessarily block until there is data, so when rb_thread_wait_fd() returns it is not guaranteed that the thread has returned yet.

Another thing to watch out for is that your thread must not refer to data that's on the Ruby thread's stack. This is because Ruby overwrites the main OS thread's C stack upon context switching to another Ruby thread. For example code like this is not OK:

static void thread_main(int *value) {
    /* 'value' here refers to the 'value' variable on foobar's stack, but
     * that data is overwritten when Ruby context switches, so we
     * really can't use 'value' here!
     */
}

/* Native extension Ruby method. */
static void foobar() {
    int value = 1234;
    
    thread_t thread = create_a_thread(thread_main, &value);
    ...do something which can cause a Ruby thread context switch...
    join_thread(thread);
}

To pass data to the thread, you should put the data on the heap instead of the stack. This is OK:

typedef struct {
    ...
} Data;

static void thread_main(Data *data) {
    /* 'data' is safe to access. */
}

/* Native extension Ruby method. */
static void foobar() {
    Data *data = malloc(sizeof(Data));
    thread_t thread = create_a_thread(thread_main, data);
    ...do something which can cause a Ruby thread context switch...
    join_thread(thread);
    free(data);
}

Heavy CPU computations

Not only blocking system calls can block other threads, CPU-heavy computation code can also do that. While executing non-Ruby-API C code, context switching to other threads is not possible. Calls to Ruby APIs may sometimes cause context switching. However there are several ways to make context switching possible while running CPU-heavy computations.

Unlocking the global interpreter lock

This only works on 1.9. Unlock the global interpreter lock and then call the computation code, and relock when done. Consider BCrypt-Ruby as an example. BCrypt is a very heavy hashing algorithm used for securely hashing passwords; depending on the configured cost it could need several minutes to calculate a hash. We've recently patched BCrypt-Ruby to unlock the global interpreter lock while running the BCrypt algorithm, so that when you run BCrypt-Ruby in multiple threads the algorithms can be spread across multiple CPU cores.

However, be aware of the fact that unlocking and relocking the global interpreter lock comes with some overhead as well. Unlocking and relocking the global interpreter lock is only worth it if you know that the computation is going to take a while (say, longer than 50 msec). If the computation time is short then you will actually make your code slower because of all the locking overhead. Therefore BCrypt-Ruby only unlocks the global interpreter lock if the BCrypt cost is set to 9 or higher.

Explicit yielding

You can call rb_thread_schedule() once in a while to force context switching to another thread. However this approach does not allow your code to make use of multiple cores even if you're on 1.9.

Running the C code in a native OS thread

This is pretty much the same approach as described by "Calling the system call in a native OS thread and use I/O to report results". In my opinion, unless your computation takes a very long time, implementing this is almost never worth the trouble. For BCrypt-Ruby we didn't bother: if you want multi-core support in BCrypt-Ruby you need to be on 1.9.

TRAP_BEG/TRAP_END and the global interpreter lock

TRAP_BEG and TRAP_END

On 1.8, you should surround system calls with calls to TRAP_BEG and TRAP_END. TRAP_BEG performs some preparation work. TRAP_END performs a variety of things:

  1. It checks whether there are any pending signals, e.g. whether the user pressed Ctrl-C. If so it will raise an appropriate SignalException.
  2. It also calls the scheduler if a certain amount of time has been spent on the current thread.

On 1.9 TRAP_BEG and TRAP_END are macros that unlock and lock the global interpreter lock. However these macros are deprecated and are likely to disappear in the future so you should not use them on 1.9. Instead, you should use rb_thread_blocking_region().

On 1.9 TRAP_BEG and TRAP_END are defined in ruby/backward/rubysig.h.

rb_thread_blocking_region()

This is a 1.9-specific function which allows you to call a function outside the global interpreter lock. Its declaration is as follows:

rb_thread_blocking_region(rb_blocking_function_t *func, void *data1,
                          rb_unblock_function_t *ubf, void *data2);

func is a pointer to a function that is to be called outside the global interpreter lock. This function must look similar to:

VALUE foobar(void *data)

The data passed via the data1 parameter is passed to the function.

ubf is either RUBY_UBF_IO (indicating that you're performing some kind of I/O operation) or RUBY_UBF_PROCESS (indicating that you're calling some kind of process management system call). However I'm not sure what this parameter exactly does. data2 is supposedly passed to ubf when it's called.

The return value of this function is the return value of func.

Global interpreter lock caveats

Do not call any Ruby API functions while the global interpreter lock is unlocked! No rb_yield(), rb_str_new(), or anything. The entirety of the Ruby API is only safe to call when the global interpreter lock is obtained.

Does Rails Performance Need an Overhaul?

By Hongli Lai on June 9th, 2010

Igvita.com has recently published the article Rails Performance Needs an Overhaul. Rails performance… no, Ruby performance… no Rails scalability… well something is being criticized here. From my experience, talking about scalability and performance can be a bit confusing because the terms can mean different things to different people and/or in different situations, yet the meanings are used interchangeably all the time. In this post I will take a closer look at Igvita’s article.

Performance vs scalability

Let us first define performance and scalability. I define performance as throughput; number of requests per second. I define scalability as the amount of users a system can concurrently handle. There is a correlation between performance and scalability. Higher performance means each request takes less time, and so is more scalable, right? Sometimes yes, but not necessarily. It is entirely possible for a system to be scalable, yet manages to have a lower throughput than a system that’s not as scalable, or for a system to be uber-fast yet not very scalable. Throughout this blog post I will show several examples that highlight the difference.

“Scalability” is an extremely loaded word and people often confuse it with “being able to handle tons and tons of traffic”. Let’s use a different term that better reflects what Igvita’s actually criticizing: concurrency. Igvita claims that concurrency in Ruby is pathetic while referring to database drivers, Ruby application servers, etc. Some practical examples that demonstrate what he means are as follows.

Limited concurrency at the app server level

Mongrel, Phusion Passenger and Unicorn all use a “traditional” multi-process model in which multiple Ruby processes are spawned, each process handling a single request per second. Thus, concurrency is (assuming that the load balancer has infinite concurrency) limited by the number of Ruby processes: having 5 processes allow you to handle 5 users concurrently.

Threaded servers, where the server spawns multiple threads, each handling 1 connection concurrently, allow more concurrency because because it’s possible to spawn a whole lot more threads than processes. In the context of Ruby, each Ruby process needs to load its own copy of the application code and other resources, so memory increases very quickly as you spawn additional processes. Phusion Passenger with Ruby Enterprise Edition solves this problem somewhat by using copy-on-write optimizations which save memory, so you can spawn a bit more processes, but not significantly (as in 10x) more. In contrast, a multi-threaded app server does not need as much memory because all threads share application code with each other so you can comfortably spawn tens or hundreds of threads. At least, this is the theory. I will later explain why this does not necessarily hold for Ruby.

When it comes to performance however, there’s no difference between processes and threads. If you compare a well-written multi-threaded app server with 5 threads to a well-written multi-process app server with 5 processes, you won’t find either being more performant than the other. Context switch overhead between processes and threads are roughly the same. Each process can use a different CPU core, as can each thread, so there’s no difference in multi-core utilization either. This reflects back on the difference between scalability/concurrency and performance.

Multi-process Rails app servers have a concurrency level that can be counted with a single hand, or if you have very beefy hardware, a concurrency level in the range of a couple of tens, thanks to the fact that Rails needs about 25 MB per process. Multi-threaded Rails app servers can in theory spawn a couple of hundred of threads. After that it’s also game over: an operating system thread needs a couple MB of stack space, so after a couple hundreds of threads you’ll run out of virtual memory address on 32-bit systems even if you don’t actually use that much memory.

There is another class of servers, the evented ones. These servers are actually single-threaded, but they use a reactor style I/O dispatch architecture for handling I/O concurrency. Examples include Node.js, Thin (built on EventMachine) and Tornado. These servers can easily have a concurrency level of a couple of thousand. But due to their single-threaded nature they cannot effectively utilize multiple CPU cores, so you need to run a couple of processes, one per CPU core, to fully utilize your CPU.

The limits of Ruby threads

Ruby 1.8 uses userspace threads, not operating system threads. This means that Ruby 1.8 can only utilize a single CPU core no matter how many Ruby threads you create. This is why one typically needs multiple Ruby processes to fully utilize one’s CPU cores. Ruby 1.9 finally uses operating system threads, but it has a global interpreter lock, which means that each time a Ruby 1.9 thread is running it will prevent other Ruby threads from running, effectively making it the same multicore-wise as 1.8. This is also explained in an earlier Igvita article, Concurrency is a Myth in Ruby.

On the bright side, not all is bad. Ruby 1.8 internally uses non-blocking I/O while Ruby 1.9 unlocks the global interpreter lock while doing I/O. So if one Ruby thread is blocked on I/O, another Ruby thread can continue execution. Likewise, Ruby is smart enough to cause things like sleep() and even waitpid() to preempt to other threads.

On the dark side however, Ruby internally uses the select() system call for multiplexing I/O. select() can only handle 1024 file descriptors on most systems so Ruby cannot handle more than this number of sockets per Ruby process, even if you are somehow able to spawn thousands of Ruby threads. EventMachine works around this problem by bypassing Ruby’s I/O code completely.

Naive native extensions and third party libraries

So just run a couple of multi-threaded Ruby processes, one process per core and multiple threads per process, and all is fine and we should be able to have a concurrency level of up to a couple hundred, right? Well not quite, there are a number of issues hindering this approach:

  • Some third party libraries and Rails plugins are not thread-safe. Some aren’t even reentrant. For example Rails < 2.2 suffered from this problem. The app itself might not be thread-safe.
  • Although Ruby is smart enough not to let I/O block all threads, the same cannot be said of all native extensions. The MySQL extension is the most infamous example: when executing queries, other threads cannot run.

Mongrel is actually multi-threaded but in practice everybody uses in multi-process mode (mongrel_cluster) exactly because of these problems. It is also the reason why Phusion Passenger has also gone the multi-process route.

And even though Thin is evented, a typical Ruby web application running on Thin cannot handle thousands of concurrent users. This is because evented servers typically require a special evented programming style, such as the one seen in Node.js and EventMachine. A Ruby web app that is written in an evented style running on Thin can definitely handle a large number of concurrent users.

When is limited application server concurrency actually a problem?

Igvita is clearly disappointed at all all the issues that hinder Ruby web apps from achieving high concurrency. For many web applications I would however argue that limited concurrency is not a problem.

  • Web applications that are slow, as in CPU-heavy, max out CPU resources pretty quickly so increasing concurrency won’t help you.
  • Web applications that are fast are typically quick enough at handling the load so that even large number of users won’t notice the limited concurrency of the server.

Having a concurrency of 5 does not mean not mean that the app server can only handle 5 requests per second; it’s not hard to serve hundreds of requests per second with only a couple of single-threaded processes.

The problem becomes most evident for web applications that have to wait a lot for I/O (besides its own HTTP request/response cycle). Examples include:

  1. Apps that have to spend a lot of time waiting on the database.
  2. Apps that perform a lot of external HTTP calls that respond slowly.
  3. Chat apps. These apps typically have thousands of users, most of them doing nothing most of the time, but they all require a connection (unless your app uses polling, but that’s a whole different discussion).

We at Phusion have developed a number of web applications for clients that fall in the second category, the most recent one being a Hyves gadget. Hyves is the most popular social network in the Netherlands and they get thousands of concurrent visitors during the day. The gadget that we’ve developed has to query external HTTP servers very often, and these servers can take 10 seconds to respond in extreme cases. The servers are running Phusion Passenger with maybe a couple tens of processes. If every request to our gadget also causes us to wait 10 seconds for the external HTTP call then we’d soon run out of concurrency.

But even suppose that our app and Phusion Passenger can have a concurrency level of a couple of thousand, all of those visitors will still have to wait 10 seconds for the external HTTP calls, which is obviously unacceptable. This is another example that illustrates the difference between scalability and performance. We had solved this problem by aggressively caching the results of the HTTP calls, minimizing the number of external HTTP calls that are necessary. The result is that even though the application’s concurrency is fairly limited, it can still comfortably serve many concurrent users with a reasonable response time.

This anecdote should explain why I believe that web apps can get very far despite having a limited concurrency level. That said, as Internet usage continues to increase and websites get more and more users, we may at some time come to a point where much a larger concurrency level is required than most of our current Ruby tools allow us to (assuming server capacity doesn’t scale quickly enough).

What was Igvita.com criticizing?

Igvita.com does not appear to be criticizing Ruby or Rails for being slow. It doesn’t even appear to be criticizing the lack of Ruby tools for achieving high concurrency. It appears to be criticizing these things:

  • Rails and most Ruby web application servers don’t allow high concurrency by default.
  • Many database drivers and libraries hinder concurrency.
  • Although alternatives exist that allow concurrency, you have to go out of your way to find them.
  • There appears to be little motivation in the Ruby community for making the entire stack of web frame work + web app server + database drivers etc scalable by default.

This is in contrast to Node.js where everything is scalable by default.

Do I understand Igvita’s frustration? Absolutely. Do I agree with it? Not entirely. The same thing that makes Node.js so scalable is also what makes it relatively hard to program for. Node.js enforces a callback style of programming and this can eventually make your code look a lot more complicated and harder to read than regular code that uses blocking calls. Furthermore, Node.js is relatively young – of course you won’t find any Node.js libraries that don’t scale! But if people ever use Node.js for things other than high-concurrency servers apps, then non-scalable libraries will at some time pop up. And then you will have to look harder to avoid these libraries. There is no silver bullet.

That said, all would be well if at least the preferred default stack can handle high concurrency by default. This means e.g. fixing the MySQL extension and have the fix published by upstream. The mysqlplus extension fixes this but for some reason their changes aren’t accepted and published by the original author, and so people end up with a multi-thread-killing database driver by default.

Is Node.js innovative? Is Ruby lacking innovation?

A minor gripe that I have with the article is that Igvita calls Node.js innovative while seemingly implying that the Ruby stack isn’t innovating. Evented servers like Node.js actually have been around for years and the evented pattern is well-known long before Ruby or Javascript have become popular. Thin is also evented and predates Node.js by several years. Thin and EventMachine also allow Node.js-style evented programming. The only innovation that Node.js brings, in my opinion, is the fact that it’s Javascript. The other “innovation” is the lack of non-scalable libraries.

Conclusion

Igvita appears to be criticizing something other than Rails performance, as his article’s title would imply.

I don’t think the concurrency levels that the Rails stack provides by default is that bad in practice. But as a fellow programmer, it does intuitively bother me that our laptops, which are a million times more powerful than supercomputers from two decades ago, cannot comfortably handle a couple of thousand concurrent users. We can definitely work towards something better, but in the mean time let’s not forget that the current stack is more than capable of Getting Work Done(tm).

Securely store passwords with bcrypt-ruby; now compatible with JRuby and Ruby 1.9

By Hongli Lai on August 13th, 2009

When writing web applications, or any application for that manner, any passwords should be stored securely. As a rule of thumb, one should never store passwords as clear text in the database for the following reasons:

  • If the database ever gets leaked out, then all accounts are compromised until every single user resets his password. Imagine that you’re an MMORPG developer; leaking out the database with clear text passwords allows the attacker to delete every player’s characters.
  • Many people use the same password for multiple sites. Imagine that the password stored in your database is also used for the user’s online banking account. Even if the database does not get leaked out, the password is still visible to the system administrator; this can be a privacy breach.

There are several “obvious” alternatives, which aren’t quite secure enough:

Storing passwords as MD5/SHA1/$FAVORITE_ALGORITHM hashes
These days MD5 can be brute-force cracked with relatively little effort. SHA1, SHA2 and other algorithms are harder to brute-force, but the attacker can still crack these hashes by using rainbow tables: precomputed tables of hashes with which the attacker can look up the input for a hash with relative ease. This rainbow table does not have to be very large: it just has to contain words from the dictionary, because many people use dictionary words as passwords.

Using plain hashes also makes it possible for an attacker to determine whether two users have the same password.

Encrypting the password
This is not a good idea because if the attacker was able to steal the database, then there’s a possibility that he’s able to steal the key file as well. Plus, the system administrator is able to read everybody’s passwords, unless he’s restricted access to either the key file or the database.

The solution is to store passwords as salted hashes. One calculates a salted hash as follows:

salted_hash = hashing_algorithm(salt + cleartext_password)

Here, salt is a random string. After calculating the salted hash, one should store the salted hash in the database, along with the (cleartext) salt. It is not necessary to keep the salt secret or to obfuscate it.

When a user logs in, one can verify his password by re-computing the salted hash and comparing it with the salted hash in the database:

salted_hash = hashing_algorithm(salt_from_database + user_provided_password)
if (salted_hash == salted_hash_from_database):
    user is logged in
else:
    password incorrect

The usage of the salt forces the attacker to either brute-force the hash or to use a ridiculously large rainbow table. In case of the latter, the sheer size of the required rainbow table can make it unpractical to generate. The larger the salt, the more difficult it becomes for the cracker to use rainbow tables.

However, even with salting, one should still not use SHA1, SHA2, Whirlpool or most other hashing algorithms because these algorithms are designed to be fast. Although brute forcing SHA2 and Whirlpool is hard, it’s still possible given sufficient resources. Instead, one should pick a hashing algorithm that’s designed to be slow so that brute forcing becomes unfeasible. Bcrypt is such a slow hashing algorithm. A speed comparison on a MacBook Pro with 2 Ghz Intel Core 2 Duo:

  • SHA-1: 118600 hashes per second.
  • Bcrypt (with cost = 10): 7.7 hashes per second.

Theoretically it would take 4*10^35 years for a single MacBook Pro core to crack an SHA-1 hash, assuming that the attacker does not harness any weaknesses in SHA-1. To crack a bcrypt hash one would need 6*10^39 years, or 10000 more times. Therefore, we recommend the use of bcrypt to store passwords securely.

There’s even a nice Ruby implementation of this algorithm: bcrypt-ruby! Up until recently, bcrypt-ruby was only available for MRI (“Matz Ruby Interpreter”, the C implementation that most people use). However, we’ve made it compatible with JRuby! The code can be found in our fork at Github. The current version also has issues with Ruby 1.9, which we’ve fixed as well. The author of bcrypt-ruby has already accepted our changes and will soon release a new version with JRuby and Ruby 1.9 support.

Further recommended reading

How to Safely Store a Password by Coda Hale.

Getting ready for Ruby 1.9.1

By Hongli Lai on February 2nd, 2009

We are excited about Ruby 1.9.1. Of course, with all the performance improvements, who wouldn’t be? Unfortunately a large number of Ruby libraries and extensions still don’t work on 1.9.1, so Ruby 1.9 cannot be considered production-ready yet. Ryan Bigg has done an excellent job on documenting most of the problems that one would encounter when trying to get a basic Rails app up-and-running on Ruby 1.9.1. Basically, the problems he countered were:

  • 2.2.2 isn’t compatible with 1.9.1. Use Rails 2.3.0 RC1 or Rails edge.
  • The mysql gem needs patching.
  • The hpricot gem needs patching.
  • The postgres gem needs patching.
  • Thin needs patching.
  • The fastthread gem needs patching.
  • Mongrel needs patching.

But what about Phusion Passenger? Good news:
Phusion Passenger is Ruby 1.9.1-compatible since this commit (today).

Here’s a screenshot of a Rails 2.3.0 app running in Phusion Passenger on Ruby 1.9.1:

passenger-ruby19

Do you see the changes? Me neither. That’s the point. :)

We’ve encountered the following issues upon trying to get a simple Rails 2.3 app up running with Phusion Passenger and Ruby 1.9.1:

Fastthread isn’t compatible with 1.9
Both Mongrel and Phusion Passenger depend on Fastthread, which is a threading library that fixes some threading implementation bugs in older versions of Ruby 1.8. Fastthread is only a required dependency when running on older versions of Ruby 1.8. Unfortunately there’s no way to tell RubyGems “we depend on fastthread, but only when running on older versions of Ruby 1.8, and not on JRuby”.

Fastthread doesn’t compile on Ruby 1.9 (or on JRuby or other Ruby implementations for that matter), so when you type “gem install passenger” or “gem install mongrel” on Ruby 1.9, the installation fails with a ton of compile errors.

We’ve patched fastthread so that it becomes a no-op on Ruby 1.9.1 and on JRuby (that is, fastthread will install correctly but it won’t do anything). These patches have been submitted to Mentalguy, the maintainer of fastthread.

The sqlite3-ruby gem doesn’t work on 1.9
Jeremy Kemper submitted a 1.9 compatibility patch in the past, which had been committed. Unfortunately even with this patch, sqlite3-ruby isn’t compatible with 1.9.1.

We’ve gone ahead and fixed 1.9.1 support. The patch can be found here: http://rubyforge.org/tracker/index.php?func=detail&aid=23792&group_id=254&atid=1045

Hongli Lai Ninh Bui