Or: doing Kubernetes, without doing Kubernetes
For years it has been the ambition of Phusion and Hongli to standardise Passenger to enable more people to contribute, be less dependent on Hongli as the project’s benevolent leader (Guido van Rossum didn’t fancy that role for Python either) and establish design rules.
This discussion came back to me when I attended last month’s CodeDaze conference. There, Kerri Miller shared how her fascination with maps carried over to her development career. Kerri is a former Application Engineer at GitHub, former Senior Software Engineer at HashiCorp and works for Travis CI from the beginning of this year. She’s a familiar face at Ruby conferences. With her talk Kerri introduced me to the DOT language and its applications. In one example, mapping a Rails monolith (or rather, making a network diagram of it) with the DOT language allowed Kerri and her team to untangle elements into separate API calls.
Inspired by her talk, and, oddly enough, by the simple beauty of the graphs and maps from the DOT language, I jumped at the opportunity to map Passenger Open Source to see if we could separate out elements from the code base.
Context: Kubernetes meetup
This post was originally the script for a talk for a Kubernetes meetup we're hosting in our Amsterdam office. For this post we include a primer on Kubernetes (for which I borrow heavily from the excellent zdnet.com article on the topic), which we skip for the meetup.
Kubernetes and alternative technologies for orchestrating workloads for maximum availability, make sure the impact of failure upon the availability and functionality of your workloads is virtually non-existent. Kubernetes effectively keeps count of all the services and components that comprise the active job, or jobs, throughout the network.
In a containerized network, programs are run in isolation from one another. Even though they may share the same processor and memory space, the host operating system outside the containers maintains their separation. The act of orchestration lays out the patterns for individual applications to work together, or: in concert with one another. The composer produces the software's original pattern (the term for assembling a software container actually is composition).
Instead of a data center having to replicate an entire application, and trigger a load balancer to switch over to the secondary application should the primary one fail, Kubernetes maintains active replicas of container groups (replica sets) for the express purpose of maintaining uptime and responsiveness in the event that any container or container grouping (a pod) fails.
Kubernetes is the main element providing the power to a distributed software system, although it does not provide those elements itself. It allows companies to ultimately decide how they build the best platform for what they want to do, plugging in all the best elements available today, while still have all those be interoperable and supportable.
Where Kubernetes is considered a container orchestrator, Passenger can be seen as “Process Orchestrator”. Kubernetes allows you to manage containers and pods, and Passenger manages application processes. Three of the features that make Kubernetes appealing to enterprises are continuity, resilience and scalability. Both Passenger and Kubernetes were designed and developed with these properties in mind.
Continuity; applications that are comprised of granular components are generally easier to maintain and evolve. The Passenger team has made an effort with this in Passenger, allowing them to improve and rewrite several of its subsystems without affecting the rest of the codebase. In Kubernetes the orchestrator can make adjustments to individual changes, without impacting the system as a whole.
Resilience; Passenger comes with a Watchdog that closely monitors the vital parts of the Passenger core. Should the core crash, it is automatically restarted by the Watchdog. The Core supervises the applications, making it more of a supervision tree. In a similar manner Kubernetes monitors pods and containers, taking appropriate action when necessary.
Scalability; depending on the workload and configuration the Passenger spawns extra processes, or shuts down idle processes. Kubernetes’ autoscaler offers a similar functionality by spawning or shutting down pods, when it determines that resources allocated to those pods are not being utilized as much as they could be.
Mapping Passenger core
Part of the Passenger team has been wanting to reorganise code files for a while. We already rewrote entire subsets to make for a leaner architecture. In my cheerful naivety I thought that using DOT we'd maybe come to the same conclusion Kerri and her team did with the monolith, that we can separate out some of Passenger’s complexity and make it more adept to the quick changing tech landscape.
We found an open source DOT library that can map C++ (and other) source code, called Doxygen. Doxygen is in active development, which is always a good sign. Its maintainer is a Dutch guy, Dimitry van Heesch. There’s a Doxygen GUI frontend for Mac OS X, and in addition you'll want to install Graphviz using the Homebrew package manager. Graphviz takes care of the actual representing of structural information as diagrams of abstract graphs and networks.
Doxygen promises to help in three ways:
1. It can generate an online documentation browser (in HTML) and/or an offline reference manual (in LaTeX) from a set of documented source files.
2. You can configure Doxygen to extract the code structure from undocumented source files.
3. You can also use Doxygen for creating normal documentation.
We are using it for the second purpose, to find our way in the VERY large source distribution Passenger is, and to visualize the relations between the various elements by means of include dependency graphs, inheritance diagrams, and collaboration diagrams.
After several tries we were finally getting some (useful?) results:
Passenger Core class diagram
The main insight was, honestly, that I knew less about Passenger's internals than I thought. I asked Hongli to help me comprehend the massive diagrams. Hongli took to Lucidchart, but if you're interested in repeating our experiment, I can also recommend using Edotor to fiddle with your diagrams.
Architecture overview
Passenger in a nutshell, with public facing components, like our CLI tools, packaging, and integration modes, and Passenger agent building on top of shared C++ utilities.
Request handling flow interaction (simplified)
When a request comes in via Apache or Nginx and their subsequent Passenger modules, or via Passenger’s standalone integration, Passenger Agent’s Controller asks the ApplicationPool for the address and other information of an application process to forward the request to. Should there not be enough application processes, then the ApplicationPool will instruct the SpawningKit to spawn a new application process. The Controller then forwards the request to the application process.
Process architecture overview
It's about the principle, stupid
Hongli cautioned that Microservices vs spaghetti code is an oversimplification of the options available. Thoughtfully separating out things in a codebase is not the same as doing Microservices. He referenced the closing keynote from this year's Rustconf by Catherine West, 'Using Rust For Game Development', in which Catherine compares the application of Object Oriented Design (inheritance) and Entity Component Systems - ECS (composition) in game development. While most non-game software follows OO design over ECS, the ECS pattern does help in avoiding the surfacing of a God object by abstracting behavior and making it available for composition, rather than inheritance.
The ApplicationPool functionality (in the above diagram detailing the Passenger request handling interaction flow) is central to Passenger's architecture, and a likely candidate to become such a God object. Which is why, for ApplicationPool only, Hongli opted for following the ECS pattern instead of OO design rules.
We're looking at Kubernetes with great interest for the lessons it's teaching in software development. Software maintainability is a long-term problem. One that many try to solve using a container / microservices architecture. At Phusion, we have a more abstract way of thinking about maintainability. We believe the fundamental issue is modularity, and whereas microservices help enforce modularity, they're hardly a silver bullet. As long as you fail to understand the abstract notion that is modularity, you'll end up with a "distributed big ball of mud".
ECS will get you modularity, in that it's multiple loose systems running separately, while operating on a shared data structure. Which sounds a ton like Kubernetes, except Kubernetes isn't 1 app, but a whole distributed system.
The amount deployment units is a red herring, in that that's not what it's about. You can achieve modularity through classes as well. Grasp the abstract principles of ECS and Kubernetes instead of imitating their structure, and make better decisions regarding your architecture.
Conclusion
Did mapping the dependency graph of Passenger Open Source provide us with insights we did not have before? Nah. Will it have implications on how we design our infrastructure moving forward? Maybe. Did looking at ECS and Kubernetes add anything to the goal of standardizing Passenger to enable more people to contribute, be less dependent on Hongli or establish design rules? Hell yes!
We won't be moving to Kubernetes and Microservices any time soon, but we have a clearer picture of our technical 'north star' because of this experiment.