Stopping slow client DoS attacks with Puma on Passenger 6

Imagine you're the proud owner of a guitar shop. One day, you wake up expecting to enjoy happy customers browsing the shop.

But something is amiss today... there is a crowd of robots, idling around in your shop and not buying anything. Your customers can't even enter your shop and leave in frustration. No business today.

Welcome to the slow client attack, and let's find out how you can supercharge Puma with the power of Passenger 6.

What is the slow client attack?

A slow client attack is a form of Denial of Service. It's real: see the Slowloris attack. An attacker can open an HTTP connection and then send the request very slowly, receive the response very slowly, or both. The reason why this is problematic is because most apps or app servers have a limited amount of I/O concurrency.

Imagine your application as the city hall, and your app server as the city hall desks. People enter a desk (send a request), do some paper work (processing inside the app) and leave with stamps on their papers (receive a response). Slow clients are like people entering a desk, but never leaving, so that the clerk can not help anybody else.

One desk (thread) can help one person (request) at the same time. This is called the multithreaded I/O model, and is what Puma uses. Threads are relatively expensive compared to an evented I/O model (think Node.js style I/O), and that's why slow client attacks cannot be efficiently mitigated by spawning more threads (adding clerks and desks). Threads use memory, and more threads means the OS kernel's scheduler has to do more work. There are only so many desks you can fit in the building before you run out of space.

Puma + Nginx evented buffering

The standard mitigation for slow client attacks is by employing buffering reverse proxy that can handle a much larger I/O concurrency.

Imagine that we put a million clerks outside the building. These clerks are not trained to process your paperwork. Instead, they only accept your paperwork (buffer your requests), bring them to the clerks inside the city hall, and bring the stamped paperwork back to you (buffer application responses). These clerks never stand still in front of the desks so they'll never cause the slow client problem. Because these outside clerks are lightly trained, they're cheap (use less RAM) and we can have a lot of them.

This is roughly what you can achieve by setting Nginx to reverse proxy to your application, however it is extra work beyond deploying your application and setting up Puma.

Here is an example systemd service file and Nginx config snippet to achieve this setup.

[Unit]
Description=Puma HTTP Server
After=network.target

[Service]
Type=simple
Restart=always
User=www-data
WorkingDirectory=/path/to/my/app
ExecStart=/path/to/my/app/sbin/puma -b tcp://0.0.0.0:9292 -w 4 --preload

[Install]
WantedBy=multi-user.target
server {
    listen 80;
    server_name example.com;

    proxy_buffering on; # default, here for demonstration purposes
    proxy_buffers 16 4k;
    proxy_buffer_size 2k;
    proxy_busy_buffers_size 8k;
    proxy_max_temp_file_size 2048m;
    proxy_temp_file_write_size 32k;
    proxy_pass http://localhost:9292;
}

This results in a setup that looks like this:

Nginx -> Puma -> Your webapp

Just add Passenger

Instead it's easier to deploy Passenger 6 with Puma, that way you don't have to install and configure Nginx yourself; all you have to do is setup your Gemfile and configure Passenger with one line to start Puma for you, and you get all of the above.

passenger start --max-pool-size 1 --app-startup-command "bundle exec puma -w $(grep -c ^processor /proc/cpuinfo) --preload -b tcp://0.0.0.0:$PORT"

That way your actual configuration will look like this:

Nginx + Passenger -> Puma -> Your webapp

In preventing a slow client attack, the easiest way to go is not an either-or solution, but rather to combine the strenghts of Puma with the virtues of Passenger. Truly the best of both worlds.