Phusion white papers Phusion overview

Phusion Passenger 4.0 beta 2: Syscall failure simulation framework, focus on stability

By Hongli Lai on January 24th, 2013


Phusion Passenger is an Apache and Nginx module for deploying Ruby and Python web applications. It has a strong focus on ease of use, stability and performance. Phusion Passenger is built on top of tried-and-true, battle-hardened Unix technologies, yet at the same time introduces innovations not found in most traditional Unix servers. Since mid-2012, it aims to be the ultimate polyglot application server.

Development of the Phusion Passenger 4.x series is progressing steadily. The 4.x series is a huge improvement over the 3.x series: in the announcement for Phusion Passenger 4.0.0 beta 1, we introduced a myriad of changes such as support for multiple Ruby versions, Python WSGI support, multithreading (Enterprise only), improved zero-copy architecture, better error diagnostics and more. That was just the beginning, because soon after we announced JRuby and Rubinius support, Out-of-Band Work and the Rack socket hijacking API.

Today we are proud to announce Phusion Passenger 4.0 beta 2, which brings us closer to a final release.

Better stability, documentation, test coverage

Beta 1 was usable, but not yet production-ready. While it worked well most of the time, there were some bugs that could cause crashes. So for beta 2 we haven’t introduced too many new features. Instead we’ve been focussing a lot on fixing bugs, improving stability, improving documentation and improving test coverage. It is easy to fall into the trap of constantly adding features, but we want a rock-solid product that our users and customers can rely on.

How do we ensure quality? There are a few tools and techniques that we use:

System call failure simulation framework

SQLite
Our inspiration

The system call failure simulation framework is a new developer feature in 4.0 beta 2 and allows us to simulate random system call failures so that we can test whether error handling in Phusion Passenger is done correctly. Although we already test for many error handling scenarios in our unit tests, test coverage is not perfect. This framework gives us another tool to ensure quality.

A few months ago we sat down with a customer who was experiencing seemingly random crashes with Phusion Passenger. These crashes could not be reproduced on any of our systems, but could be reliably reproduced on theirs. The crashes would only manifest under high concurrency scenarios. After a day of intensive investigation, we found that the crash was caused because their systems’ file descriptor limit is much lower than any of our systems’. Phusion Passenger did not always catch out-of-file-descriptors errors, so those errors caused Phusion Passenger to crash. Due to other unrelated issues, relevant error messages could not be printed to the log file and were lost.

All of those issues have since been fixed, but it made us realize that our testing tools were not adequate. That situation could and should have been prevented. Thus, the system call failure simulation framework was born. This framework allows us to specify which system calls should fail, and with what probability. For example, the following configuration simulates the “out of file descriptors” error in the helper agent with a probability of 1%.

export PASSENGER_SIMULATE_SYSCALL_FAILURES=PassengerHelperAgent=ENFILES=0.01

Different runs will produce different errors, but you can force determinism by specifying the same random seed that was used in the last run. The random seed is printed as a debugging message during startup and during crash.

export PASSENGER_RANDOM_SEED=...

The system call failure simulation framework was inspired by SQLite’s testing process. Real hardware, network or OS-level errors are difficult to create, so simulating them is the next best thing. SQLite has an internal virtual filesystem layer, and it is in that layer that they simulate failures. In our case we have a similar layer, namely the system call interruption framework which was originally written to facilitate interrupting threads that are blocked on blocking system calls.

Continuously expanding and improving our test suite

We already had an extensive test suite which consists of a hybrid of C++ and Ruby RSpec code. In 4.0 beta 2 we’ve improved the test suite by modernizing some dependencies, testing more edge cases, testing more failure conditions, etc.

Setting up Continuous integration

Before today our extensive test suite was run on our development machines as well as an army of virtual machines with different OSes. We have now setup Travis CI so that we would have an additional quality assurance tool. The test suite has also been extended to cover more cases.

Ruby 1.8 is now considered legacy

Ruby 1.9 is the future
Ruby 1.8 is no longer supported by its authors, and Ruby Enterprise Edition has been End-Of-Lifed a while ago. Many gems these days are Ruby 1.9-only. It is more than apparent that Ruby 1.8 is considered legacy by the community, and for good reasons. We too are joining the community by considering Ruby 1.8 legacy. This has the following implications:

  • Phusion Passenger 4.x will continue to support Ruby 1.8. Our support goes as far back as Ruby 1.8.5.
  • We will optimize performance for Ruby 1.9. Phusion Passenger will still work on Ruby 1.8, but we will no longer put in any effort to make it work fast on Ruby 1.8.

Installing and testing 4.0.0 beta 2

Quick install

Phusion Passenger Enterprise users can download the Enterprise version of 4.0 beta 2 from the Customer Area.

Open source users can install the open source version of 4.0 beta 2 with the following commands:

gem install passenger --pre
passenger-install-apache2-module
passenger-install-nginx-module

You can also download the tarball at Google Code.

In-depth

In-depth installation and upgrade instructions can be found in the Installation section of the documentation. The documentation has been updated to cover 4.0 changes, including Enterprise features. You can view them online here:

Final

We are excited about the final release. You can help us by testing beta 2 and reporting any bugs. Please submit bug reports to our bug tracker.

We at Phusion are regularly updating our products. Want to stay up to date? Fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.



  • kristianps

    The “out of band work” link is wrong.

  • http://www.phusion.nl/ Hongli Lai

    It was fixed soon after. Thanks for letting us know.

  • Anonymous Coward

    We have it installed on both a staging and internal production server instance with most apps still running rails 2.x on REE. After some setup and config snafus, it is running like a champ; no problems that we can tell and subjective performance is better than passenger 3.x.

  • http://k776.tumblr.com/ Kieran P

    Opps. Looks like something went wrong. Response times skyrocketed after upgrading from beta 1 to beta 2. See this ticket: http://code.google.com/p/phusion-passenger/issues/detail?id=834&thanks=834&ts=1360540794

  • Tommy McNeely

    I finally found this… I was having trouble figuring out why I couldn’t put “passenger_ruby” directives in the “server” block (like the documentation says I can). Once I found this, I noticed it says the documentation was already updated. It would be “nice” if you could somehow delineate what features are part of the beta release within the documentation. Maybe a little (since 3.9.x) in places where stuff has changed. Just a thought :) Another idea would be to version the documentation similar to the way that MySQL or PHP work where you can look at the 3.0.x documentation as is.

    Thanks,
    Tommy