Researching a potential new form of HTTP caching optimization

It's been a while since we released the first beta of Phusion Passenger 5 (codename "Raptor"), the application server for Ruby, Python and Node.js web apps. We have received a lot of great feedback from the community regarding its performance, stability and features.

Passenger 5 isn't production-ready yet, but we are getting close because 5.0 beta 3 will soon be released. But in the mean time, we would like to share a major new idea with you.

While Passenger 5 introduced many performance optimizations and is much faster than Passenger 4, the impact on real-world application performance varies greatly per application. This is because in many cases the overall performance is more dependent on the application than on the app server.

It's obvious that just making the app server itself fast is not enough to improve overall performance. So what else can the app server do? After contemplating this question for some time, we believe we have found an answer in the form of a modified HTTP caching mechanism. Its potential is huge.

Update: Over the course of the day, readers have made us aware that some of the functionality can also be achieved through Varnish and through the use of Edge Side Includes, but there are also some ideas which cannot be achieved using only Varnish. These ideas require support from the app. Please read this article until the end before drawing conclusions.

Please also note that the point of this article is not to show we can "beat" Varnish. The point is to share our ideas with the community, to have a discussion about these ideas and to explore the possibilities and feasibility.

Turbocaching

One of the main new features in Passenger 5 is turbocaching. This is an HTTP cache built directly in Passenger so that it can achieve much higher performance than external HTTP caches like Varnish (update: no, we're not claiming to be better than Varnish). It is fast and small, specifically designed to handle large amounts of traffic to a limited number of end points. For that reason, we described it as a "CPU L1 cache for the web".


Turbocaching is a major contributor of Passenger 5's performance

The turbocache has the potential to improve app performance dramatically, no matter how much work the app does. This is seen in the chart above. A peculiar property is that the relative speedup is inversely proportional to the app's native performance. That is, the slower your app is, the bigger the speedup multiplier you can get from caching. At worst, caching does not hurt. In extreme cases -- if the app is really slow -- you can see a hundred fold performance improvement.

The limits of caching

So far for the potential of caching, but reality is more nuanced. We have received a lot of feedback from the community about the Passenger 5 beta, including feedback about its turbocache.

As expected, the turbocache performs extremely well in applications that serve data that is publicly cacheable by everyone, i.e. they do not serve data that is login-specific. This includes blogs and other sites that consist mostly of static content. The Phusion Passenger website itself is also an example of a mostly static site. But needless to say, this still makes the turbocache's usefulness rather limited. Most sites serve some login-specific data, even if it's just a navigation bar displaying the username.

However, these limitations apply to all other HTTP caches too. For example, the Varnish HTTP cache has been used very successfully to speed up Wordpress. But Varnish doesn't help a lot with logged-in traffic.

Even the CloudFlare CDN -- which is essentially a geographically distributed HTTP cache -- does not help a lot with logged-in traffic. Although CloudFlare can reduce the bandwidth between the origin server and the cache server through its Railgun technology, it doesn't reduce the load on the origin server, which is what we are after.

Update: some readers have pointed out that Varnish supports Edge Side Include (ESI), which is like a text postprocessor at the web server/cache level. But using ESI only solves half of the problem. Read on for more information.

A glimmer of hope

Hope is not all lost though. We have identified two classes of apps for which there is hope:

  1. Apps which have more anonymous traffic than logged in traffic. Examples of such apps include Ted.com, Wikipedia, Imgur, blogs, news sites, video sites, etc. Let's call these mostly-anonymous apps. What if we can cache responses by user, so that anonymous users share a single cache entry?
  2. Apps which serve public data for the most part. Examples of such apps include: Twitter, Reddit, Discourse, discussion forums. Let's call these mostly-public apps. Most of the data that they serve is the same for everyone. There are only minor variations, e.g. the a navigation bar that displays the username, and secured pages. What if we can cache the cacheable content, and skip the rest?

Class 1: caching mostly-anonymous apps

There is almost a perfect solution for making apps in the first class cacheable: the HTTP Vary header. This header allows you to send a different cached response, based on the value of some header that is sent by the client.

For example, suppose that your app...

  • ...serves gzip-compressed responses to browsers that support gzip compression.
  • ...serves regular responses to browsers that don't support gzip compression.

You don't want a cache to serve gzipped responses to browsers that don't support gzip. Browsers tell the server whether they support gzip by sending the Accept-Encoding: gzip header. If the application sets the Vary: Accept-Encoding header in its responses, then the cache will know that that particular response should only be served to clients with the particular Accept-Encoding value that it has received now.

HTTP caching Vary header effect
The Vary response header makes HTTP caches serve different cached responses based on the headers the browsers send.

In theory, we would be able to cache responses differently based on cookies (Vary: Cookie). Each logged in user would get its own cached version. And because most traffic is anonymous, all anonymous users can share cache entries.

Unfortunately, on the modern web, cookies are not only set by the main site, but also by third-party services which the site uses. This includes Google Analytics, Youtube and Twitter share buttons. The values of their cookies can change very often and often differ on a per-user basis, probably for the purpose of user tracking. Because these widely different values are also included in the cache variation key, they make it impossible for anonymous users to share cache entries if we were to try to vary the cache by the Cookie header. The situation is so bad that Varnish has decided not to cache any requests containing cookies by default.

Even using Edge Side Include doesn't seem to help here. The value of the cookie header can change quickly even for the same user, so when using Edge Side Include the cache may not even be able to cache the previous user-specific response.

The eureka moment: modifying Vary

While the Vary header is almost useless in practice, the idea of varying isn't so bad. What we actually want is to vary the cache by user, not by the raw cookie value. What if the cache can parse cookies and vary the cached response by the value of a specific cookie, not the entire header?

And this is exactly what we are researching for Passenger 5 beta 3. Initial tests with a real-world application -- the Discourse forum software -- show promising results. Discourse is written in Ruby. We have modified Discourse to set a user_id cookie on login.

# lib/auth/default_current_user_provider.rb

TURBOCACHE_COOKIE = "user_id"

def log_on_user(user, session, cookies)
  ...
  cookies.permanent[TURBOCACHE_COOKIE] = { value: user.id, httponly: true }
  ...
end

We modified Passenger to parse cookies, and to vary turbocache responses based on the value of this user_id cookie. We invoke Passenger like this:

passenger start --vary-turbocache-by-cookie user_id

This changeset can be seen in Github commit a760649cd7.

View commit on Github

WARNING: this feature is totally experimental. You should not use it in production yet. There are some security concerns which we haven't addressed yet.

The result is a Discourse where all anonymous users share the same cache entry. Uncached, Discourse performance is pretty constant at 97 req/sec no matter which app server you use. But with turbocaching, performance is 19 000 req/sec.

This is caching that Varnish and other "normal" HTTP caches (including CloudFlare) could not have done. The benefit that turbocaching adds in this scenario is exactly in line with our vision of a "CPU L1 cache" for the web. You can still throw in Varnish for *extra caching on top of Passenger's turbocaching, but Passenger's turbocaching's provides an irreplaceable service.

* Maybe Varnish's VCL allows it, but we have not been able to find a way so far. If we're wrong, please let us know in the comments section.

Class 2: caching mostly-public apps

Apps that serve pages where most data is publicly cacheable, except for small fragments, appear not to be cacheable at the HTTP level at all. Currently these apps utilize caching at the application level, e.g. using Rails fragment caching or Redis. View rendering typically follows this sort of pseudo-algorithm:

send_response("<nav>Welcome " + current_user_name "!</nav>")
cached_fragment = fetch_from_cache("rest_of_page")
if cached_fragment != null
    send_response(cached_fragment)
else
    fragment = render_rest_of_page()
    store_in_cache("rest_of_page", fragment)
    send_response(fragment)

However, this still means the request has to go through the application. If there is a way to cache this at the Passenger level then we can omit the entire application, boosting the performance even further.

We've come to the realization that this is possible, if we change the app into a "semi single page app":

  1. Instead of rendering pages on the server side, render them on the client side, e.g. using Ember. This way, the view templates can be simple static HTML files, which are easily HTTP cacheable.
  2. PushState is then used to manipulate the location bar, making it feel like a regular server-side web app.
  3. The templates are populated using JSON data from the server. We can categorize this JSON data in two categories:
    1. User-independent JSON data, which is HTTP-level cacheable. For example, the list of subforums.
    2. User-specific JSON data, which is not HTTP-level cacheable. For example, information about the logged in user, such as the username and profile information.

      And here lies the trick: we only load this data once, when the user loads the page. When the user clicks on any links, instead of letting the browser navigate there, the Javascript loads the user-independent JSON data (which is easily cacheable), updates the views and updates the location bar using PushState.

By using this approach, we reduce the performance impact of non-cacheable fragments tremendously. Normally, non-cacheable page fragments would make every page uncacheable. But by using the approach we described, you would only pay the uncacheability penalty once, during the initial page load. Any further requests are fully cacheable.

And because of the use of HTML PushState, each page has a well-defined URL. This means that, despite the app being a semi-single-page app, it's indexable by crawlers as long as they support Javascript. GoogleBot supports Javascript.

Discourse is a perfect example of an app that's already architected this way. Discourse displays the typical "navigation bar with username", but this is only populated on the first page load. When the user clicks on any of the links, Discourse queries JSON from the server and updates the views, but does not update the navbar username.

An alternative to this semi-single page app approach is by using Edge Side Include technology, but adoption of the technology is fairly low at this point. Most developers don't run Varnish in their development environment. In any case, ESI doesn't solve the whole problem: just half of it. Passenger's cookie varying turbocaching feature is still necessary.

Even when there are some protected/secured subforums, the turbocache cookie varying feature is powerful enough make even this scenario cacheable. Suppose that the Discourse content depends on the user's access level, and that there are 3 access levels: anonymous users, regular registered members, staff. You can put the access level in a cookie, and vary the cache by that:

cookies.permanent[TURBOCACHE_COOKIE] = { value: user.access_level, httponly: true }

That way, all users with the same access level share the same cache entry.

Due to time constraints we have not yet fully researched modifying Discourse this way, but that leads us to the following point.

Call for help: please participate in our research

The concepts we proposed in this blog post are ideas. Until tested in practice, they remain theory. This is why we are looking for people willing to participate in this research. We want to test these ideas in real-world applications, and we want to look for further ways to improve the turbocache's usefulness.

Participation means:

  • Implementing the changes necessary to make your app turbo-cache friendly.
  • Benchmarking or testing whether performance has improved, and by how much.
  • Actively working with Phusion to test ideas and to look for further room for improvements. We will happily assist active participants should they need any help.

If you are interested, please send us an email at info@phusion.nl and let's talk.

comments powered by Disqus