Recently a bug was fixed in Passenger which caused the OOM score adjustment to be inherited by processes spawned by Passenger (your web app). In effect the OOM score adjustment applied to Passenger's Watchdog1 process would not be reset in the Passenger Core, and then they would be inherited by the individual app processes that Passenger started.
This was especially problematic because the languages that Passenger currently supports are all high level interpreted and garbage collected scripting languages, which can easily use a lot of memory if you aren't careful, and the operating system wouldn't be able to kill the process to reclaim that memory. I had looked into this issue, and hadn't been able to reproduce it, however Hongli had been able to reproduce it earlier (and was the one to fix the issue, yay!).
When debugging Passenger becomes debugging Docker
In testing the fix, I was still unable to reproduce the issue, and thus couldn't confirm that the fix worked. So Hongli and I got together and held a pair debugging session to figure out what was going on. Here's the rough sequence of steps we took to debug the problem:
- Install a version of Passenger known to be affected by the bug
- Add logging to the OOM score adjustment function in Passenger Watchdog
- Check the logs to see if the score is adjusted
- Check that the correct value is being written
- Check that the write succeeds
- Realize that this is due to my testing env being based on Docker which makes spinning up a clean Linux install super fast and easy but doesn't allow writing to
/proc
- Retest in a priviledged Docker container
- See everything work as expected and verify the bugfix 🎉
What the what happened?
Interestingly an app running in a Docker container was immune to the original problem because the OOM score adjustment couldn't be set and thus couldn't be inherited by child apps, and thus the child apps couldn't be immune to OOM killing. A Docker security feature, this does lead to Passenger's Watchdog being vulnerable to being OOM killed which is a problem for Docker users.
The incident was a good reminder that two pair of eyes see more than one. We've decided to invest some time in finding the right pair programming setup, since I work remote. More on that in a seperate blog post.
1. Watchdog is Passenger quietly restarting applications that crash, hang, or leak memory. ↩