We need to talk about the problem with using the built-in or de facto webserver that ships with your language of choice. I'll focus on Ruby, also covering the Rails case due to its popularity. Given time, I'll extend my manifest to include other languages, and where their respective http servers fall short.

Built in ≠ Nginx

These days - with the rise of heterogenous programming environments where many different languages are mixed together - it is increasingly common to see people using the built-in http server that comes with their language. The inherent issue with doing so is that these webservers weren't meant for production use, and were included as a convenience for development work.

These makeshift http servers are often unoptimized, and don't handle malformed input well, because they haven't been hardened against the myriad kinds of attacks that for example Apache and Nginx have had to deal with and have been optimized for. Furthermore they don't offer any sort of process supervision or quality assurance, and often don't even support the full http 1.1 spec (nevermind http2).

WEBrick and the slow client attack

WEBrick is the default http server for Ruby, it is single threaded and not particularly fast. WEBrick can only handle small loads per its own documentation. It also regularly has issues dealingwith difficult input and only supports http 1.1. It provides no process management or other production admin tools. Due to being an in-process webserver, WEBrick is by definition susceptible to a client making a request and then reading the response as slowly as possible to maintain the tcp connection.

This is an attack type that uses so few resources on the attacker's system that the attacker can make as many connections from their box as the kernel can handle, and with only one or two boxes can completely DOS a server, known as a slow client attack.

The right way

You should use a hardened webserver such as Apache or Nginx as a reverse proxy and ssl/tls terminator, as well as to buffer requests before passing them along. That way you can reject malformed requests and prevent slow client attacks because these web servers can be configured for massive parallel capacity. Unlike WEBrick, which uses too much memory and becomes far too slow when you try to connect a large number of clients. They should be configured to pass valid requests to a pool of app processes that are able to fully utilize the cpu cores available, if necessary keeping sessions pinned to the same process using sticky session cookies.

Then you'll want to setup some kind of process monitoring that will restart your setup if it crashes, and make sure that individual processes don't use too much memory or get stuck in infinite loops. You'll need a way to log the behaviour of all these pieces, as well as administrate them for updates at both the system and app level.

That's a lot of extra work to setup and it's work that has to be customized for different OSs and/or hardware, if you aren't deploying on a purely homogenous cluster.

Rails & Puma

Rails has long recognized that WEBrick isn't suitable for use as a production webserver, and while for many years they abstained from directly recommending a specific alternative they have recently thrown their support behind Puma.

Puma brings threads and a more resilient parser (though we'd still recommend using a more mature webserver which has received more scrutiny and security work) to the table when compared to WEBrick. Yet doesn't handle http2 or more subtle issues like slow client attacks (Puma increases concurrency by adding threads, which cost too much memory on a marginal basis to handle a slow client attack) or enforcing system resource use limits. That's still left to you, meaning you have to configure a reverse proxy like Nginx correctly, setup your monitoring and other tooling and generally just spend a lot of time doing non-development work.

Also, when using MRI (which is the usual case for Ruby) you need to spin up multiple instances of Puma to fully utilize all of the cpu cores on your box; which is possible with clustered mode, but the administration tools aren't super polished.

Passenger Enterprise

Passenger is, like Puma, an app server. But unlike Puma, Passenger deals with the setup of a battle tested reverse proxy (Nginx or Apache, your choice) in a way that addresses the slow client problem, process supervision and restarting, and system resource limit enforcement for you. It also provides more polished administration tools to make dealing with your deployments much easier.

One could also combine Puma and Passenger. The sum is greater than its parts, and I wrote about this before.

Takeaway

There's nothing stopping you from taking the time to develop your own solution to all of the work required to deploy into production, and in fact it's a fun thing to do as a learning exercise. But once you've done it once or twice, it becomes tedious and slows your development velocity. It's easier to just use software that already does the work of setting all of that up for you (as well as maintaining it, and continuously optimizing it), so you can focus on what makes your project awesome.