Migrating Passenger from C++ to Go?
Passenger is mostly written in C++. The main reasons were: Apache and Nginx integration (via modules), ease of installation for users (users commonly have a C/C++ compiler installed), and performance. Back when Passenger was first created, C and C++ were the only viable options that satisfy these purposes.
But the programming language ecosystem has changed a lot since then. Comparatively, C++ is limiting our development velocity. That's why we've been pondering slowly migrating Passenger to an alternative language.
This article describes:
- What the problems are with C++ that makes us consider a migration.
- What requirements alternatives must satisfy.
- How a migration path would look like.
- What implications a migration would have on users.
Let us hear from you!
What do you think about the user implications? Read the last section and post a comment in the Github issue!
The problems with C++
Compared to alternatives, it feels like C++ is getting more and more in our way. Here are the issues we are running into:
- People with C++ skills were already relatively rare, and are becoming increasingly rare still.
- The language itself is complex, hard to learn, and scares away contributors.
- It is easier to introduce bugs and harder to debug compared to many other languages.
- Writing evented networking code in C++ is especially painful: the resulting code is hard to understand, hard to extend, hard to maintain.
- Fragmented ecosystem of libraries. For example, nobody can agree on an event loop library to standardize on. So different libraries belong to different camps (some using libev, some libuv, some libevent, some Asio, some invent their own) or try to be agnostic by providing non-blocking APIs. In any case, integrating an I/O library that doesn't belong to the camp you picked, means painfully trying to integrate two event loops with each other, and may lead to bad performance (if it is possible at all).
Towards an alternative
An alternative must solve all of the above issues. Plus it must satisfy two of the three original reasons why Passenger was written in C++: it must allow users to install Passenger easily, and it must have good performance.
We don't have to migrate everything, just the most important parts that need frequent updates. Thus, the ability to write Apache and Nginx modules is not considered so important: those parts rarely need updates.
Indeed, we shouldn't migrate everything in a big rewrite. That takes too long and is a big risk, and therefore we should migrate part-by-part. So another requirement is the ability to integrate with the existing C++ codebase.
Alternatives considered
What are the viable alternatives? We've considered two contenders:
-
Rust. High-performance, designed for system software, lots of libraries, claims to provide better ergonomics without sacrificing performance, used in practice by Firefox.
However: has the reputation of being hard, for entirely different reasons than C++; language and ecosystem still too much in flux or immature for our taste.
-
Go. Sort-of-high-performance, popular choice for server software, huge ecosystem with even more libraries than Rust, easy to learn, lots of momentum and community support.
However: GC could be problematic; C/C++ integration has performance issues.
Alternatives dropped
We've already written off Rust. It's not there yet for our purposes.
We never considered Java/JVM because although it's high-performance, it doesn't feel fast: slow boot times, high memory usage. These factors are all malleable to a degree (e.g. JVM client mode allows faster boot times, memory usage depends on GC tweaks) but I've never seen a JVM application to be tweaked in such a way that it feels as snappy as a C++ application. To users, feeling fast is just as important as actually being fast. And then there's also the issue that JVM has very limited integration possibilities with an existing C++ codebase.
Investigating Go
We are left with Go. My biggest concern is performance. Passenger only has one performance-sensitive path: inside the Passenger Core Controller, which is an HTTP server. In case Passenger does not need to spawn an application process, the request handling code path has to be extremely fast.
In this hot path, the Core Controller only has to call one other subsystem: the ApplicationPool. In optimistic cases, this is only one interaction, i.e. 1 function call that returns a result.
It turns out that this performance concern aligns well with the idea to migrate the codebase in a piecemeal manner. The HTTP related code in Passenger is one of the most complicated code in the codebase. For Passenger 5 (which focused a lot on performance), we had to invent an entire non-blocking I/O framework from scratch.
In other words: the first part we would want to replace with Go, is the Core Controller HTTP server anyway. A rewritten Controller will, in its hot path, need to make exactly 1 C++ call (into the ApplicationPool) per request. If we can show that a rewritten Controller can be made fast enough in its hot path, then we won't have any further performance concerns for the rest of the migration.
Our performs questions are:
- How fast is Go's builtin HTTP library? Are there faster alternatives?
- How much of a problem is the GC? If it is problematic, what are our optimization possibilities?
- Since we have to migrate piecemeal, we need to call into C++. What are the performance implications of using CGo, Go's FFI system?
FastHTTP
Go apparently has a FastHTTP server library which claims to be faster than Go's net/http. Its FAQ describes how this is possible: net/http's API design fundamentally requires dynamc memory allocations and GC. FastHTTP reuses objects and reduces memory allocated, reducing GC pressure.
On the flip side, FastHTTP is not fully RFC-compliant. The author has made it clear that he favors performance over strict compliance, especially if the compliance issues cover edge cases.
I believe this latter is not a big enough reason to reject FastHTTP. It is a solvable problem: there is no fundamental reason why an HTTP server cannot be fast and correct. In the worst case, fixing FastHTTP ourselves makes more sense than continuing to invest in the C++ codebase.
I/O model memory usage
Go's I/O model fundamentally requires more memory for idle connections than our current evented C++ server. This is because the evented C++ server does not allocate buffers until there is data to be read over the socket. If we have many idle HTTP connections (e.g. waiting for keep-alive, or waiting for new WebSocket data) then Go will use a lot more memory than our C++ server.
This problem cannot be solved without intervention from the Go authors. But whether this is a real problem in practice for Passenger users remains to be seen. We have not heard of anybody using Go in production to complain about this issue. If you have experience on this subject, please share your thoughts with us.
GC optimization opportunities
Segment.com wrote an excellent blog post on how the Go garbage collector works and how to reduce GC stress. It is encouraging to see that there are so many optimization opportunities and tools.
CGo performance implications
It is said that calling C/C++ from Go and vice versa is expensive because the Go runtime differs so much from C's. So whenever one crosses the C <-> Go boundary, Go needs to perform a lot of pessimistic setup and teardown work.
How expensive is this exactly? According to issue #16051 they've optimized this aspect in Go 1.8, so it's already better than what it used to be.
We've run some benchmarks (code here) on Go 1.11. Here are our results on a 2016 Macbook Pro with macOS Sierra. The results in a Linux VM on the same machine are similar, so I've omitted those results.
CGo plusOne: 58.6255 ns/call
Go plusOne: 2.04889 ns/call
Go mutex : 14.22403 ns/call
C mutex : 20.29652 ns/call
C time : 37.00688 ns/call
Go plusOne
is a function that calculates x+1. Calling the CGo version (CGo plusOne
) is approximately 30x slower. For comparison:
- Locking+unlocking a Go mutex (
Go mutex
) takes takes ~14ns (~7.5x slower). - Locking+unlocking a pthread mutex (
C mutex
) takes ~20ns (~10x slower). - Calling
gettimeofday()
(C time
) takes 37ns (~17x slower).
So we conclude that the CGo overhead isn't all that bad. The overhead is similar to 3x locking+unlocking a pthread mutex. It seems that having a Go-written Controller make 1 CGo call per request won't result in many performance problems.
Conclusion: Go is a viable alternative
We conclude that a slow migration to Go causes no real performance concerns.
Implications on users: let's hear from you!
We supply binaries for Passenger. We believe most users use these binaries, so for most users there will be no change in the way Passenger is installed or used.
But some users compile from source either because we don't have binaries for their OS/platform, or because they prefer installation from source. If we adopt Go, then these users must have a sufficiently recent Go compiler installed (we're targeting version 1.11).
What do you think of such a migration effort? And if you're a user who installs Passenger from source, is depending on Go acceptable to you? Please let us know by posting a comment in the Github issue.