Ultra-Performant Dynamic Websites with Varnish

This article describes how we configured and used the Varnish web cache for the popular German online shop www.lidl.de. Varnish gave us a tremendous performance boost. With this new caching setup, we easily achieve request rates of several thousand pages per second, which are quite common during marketing campaigns like special offers.

In a typical non-caching setup of a web application as illustrated in the figure below, Apache handles static requests for images, scripts, etc. and forwards requests for the HTML pages to an application server like Tomcat or Glassfish. There the dynamic content is generated and then sent back to Apache and finally to the user. In this scenario, the database access is the most critical bottleneck. Even worse, each page request can cause multiple database requests, i.e. SQL statements.

Initial setup without our caching solution: the slow components are shown in red. (Load balancing, which could be done by Apache, is not considered here.)

Let’s assume that without caching, an application server can serve up to 100 dynamic pages per second. Through a bit of vertical scaling, i.e. using two server instances (nodes) and load balancing, this can be increased to about 200 dynamic requests. However, this scalability is not perfect and once it grows to three and more nodes, it already starts to get worse as the sessions have to be distributed among the nodes in the cluster.

The system can of course handle many more simultaneous users than the number 200 suggests, as users do not permanently access links. So the number of users did not really pose a problem during normal operation. However, the situation immediately got critical when newsletters with special offers were sent, as the application server instances were now under “siege”. An overload of the instances led to slower and slower responses and decreasing customer satisfaction. Another reason a shop would want to be responsive, is that search engines consider measured response times during crawling for ranking search results.

So the question we had to solve was: How can we keep the system responsive (ideally with a response time of 1-2 seconds) during high load and peak situations? Please be aware that in the case of online shops, the highest turnover occurs in these situations.

When we analyzed the server log of the www.lidl.de online shop, we noticed an interesting fact, which we used to our advantage later on: the behavior of users is different in these situations. Most users are just browsing and reading. Consider e.g. a newsletter sent to a few million users: most of the readers will just click a few links (which can still easily amount to several million page impressions). Taking a deeper look we found out that most users are viewing absolutely and exactly identical content which has nevertheless been produced exclusively for them. Only a small percentage used the interactive services of the website like shopping carts, ordering etc.

Introducing Varnish

The peak situation described above implies that most content (even though dynamically generated by the web application) is identical for all users. So the obvious idea for a cache is to store the most frequently requested pages. The Varnish manual describes Varnish as a lightweight, efficient reverse proxy server, meaning it is working in front of the web servers (Apache). It acts as a so-called HTTP accelerator which stores (caches) copies of the pages served by the web server (thus the synonym “web cache”). The next time the same page is requested by a user, Varnish will serve the copy instead of requesting the page from the Apache server. Varnish is blazingly fast, since it stores its cached data in memory.

The new architecture with Varnish as a web cache now looks like this:

Varnish in front of Apache acting as a Web cache. It is configured to cache only stateless page requests. Stateful page requests (session) and static resources are forwarded to Apache.

Performance Improvements

Caching with Varnish removes the need for the web application to regenerate the same page over and over again, resulting in a tremendous performance boost. Varnish can easily handle 10,000 requests/s on a single node. Especially in high load situations the hit rate is easily above 90% (and almost 100% for the mostly clicked homepage) so that the setup described above can now handle 50 times the original volume. However, this high performance will only hold for stateless users. Any user with a session will fall back to the 100 requests/s class.

As most of the load is now taken by the Varnish cache servers, the load on the application servers has dropped considerably. Even in high load situations where the Varnish servers handle several thousand requests per second, most of the content comes from the cache and the application servers can concentrate on re-creating expired content (which is then kept in the cache for s-maxage seconds) and handling users with a session (who are hopefully going to order).

Our setup leads to a significantly improved end-to-end performance of the system – even during normal operation. This is interesting as it creates an advantage for users during normal operation and saves money for the website owner at the same time.

Using less hardware means investing less money initially. However, an even more important fact is, that the operating costs will also be much lower. These operating costs are caused by permanent maintenance of the system, like powering servers around the clock, updating, applying patches etc. Since these costs are the main drivers for the total cost of ownership (TCO), the potential savings are also largest in this regime.

Using fewer servers also means consuming less power. By reducing the energy bill this “green IT” approach therefore leads to lower operating costs. Compared to extending the existing system without a cache, an enormous amount of money was saved both in hardware and operating costs, while introducing a “performance buffer” for situations with even higher loads at the same time.

Another effect is that the shop’s marketing division can now act freely without having to keep technical constraints in mind: new campaigns can be planned to increase the turnover significantly, like sending more frequent newsletters, using special offers etc.

Challenges

Before we dive into the details of our Varnish configuration, let’s first discuss three problems we had to solve, specifically handling stateful users, keeping users stateless w.r.t. caching as long as possible, and caching pages with changing content.

Problem: Websites are Stateful

Most websites nowadays are stateful, e.g. a server-side session is created when a user logs in. In case of an online shop, the session might contain the shopping cart, login information etc.

The problem is that as soon as the session contains personalized information, caching must immediately stop. But, as long as state information does not have an effect on the content of generated pages, it can be ignored. This is what we call a stateless or browsing user, and our first objective should be to cache pages suitable for this user class.

Thus, our solution is to classify users, i.e. to carefully distinguish between stateless and stateful users. As the web application did not originally take care of that, it had to be changed in two fundamental ways:

  1. The application must only generate and send cookies if it has created some internal state for a user.
  2. This state transition can happen at any time. So a user who has not even touched the application server and is completely unknown to the application must be able to become a stateful user at any time.

Two classes of users are distinguished by certain attributes. A user should stay stateless as long as possible. Stateful (red) users will need contact to the application server and experience slower performance.

Fortunately, the web application was already obeying the REST paradigm. HTTP GET requests were used for all content that was just shown to the users. In contrast to this, all user actions which were actually creating some state on the server side were already modeled in HTTP POST requests. This proved to be extremely helpful when we started to configure the cache software.

Keeping Users Stateless

The general goal must be to keep users stateless, at least as long as possible. In a first naive approach, only this facilitates caching.

Keeping users stateless means that the server should never send a session cookie unless really necessary. On the other hand, a lot of web applications require some basic personalization. This dilemma can be solved by using cookies which will be evaluated on the client side only. For example, let’s assume that users can change the background color of the website as a very simple form of personalization. This can be performed by Javascript and, for the sake of caching (and achieving a high hit rate), this should be the preferred way of doing simple personalization. Of course, a server-side cookie for personalized background color could be used to get the same result. But the cache hit rate would then suffer considerably (to be exact, by a factor identical to the number of background colors, since exactly the same amount of cached copies has to be saved).

So one recipe for staying stateless is to keep simple state on the client-side and never send it to the server. This state does not necessarily have to reside in a cookie – you can also use the browser local storage for that, as described in Smashing Magazin’s “Using Local Storage In HTML5-Capable Browsers” article.

Dealing with Content that is Changing

Even if now all stateless users can see the same cached content, this content is changing over time. In an online shop, for example, some products might run out of stock and become unavailable or need to be replaced by other products. Unfortunately, this does not only affect the product pages themselves but sometimes also pages that reference them; e.g. links and thumbnail images will have to be changed or removed. Similar situations often occur in online publishing and in nearly all websites which change over time.

Thus, another requirement for the cache is its ability to partially expire content. And of course, the bookkeeping must be performed externally so that the affected pages can be removed individually.

For the cache to work properly and perform automatic expiration of content, it needs to know how long the currently cached content should be kept (i.e. its maximal age). The web application therefore has to generate this so-called time-to-live (TTL) information.

The HTTP specification has defined HTTP response header fields such as Cache-Control for exactly this purpose a long time ago. These are set by the web application itself, since it knows best how long the content will be considered “current”/”valid”. This setting could even be dynamic , e.g. giving a shorter time-to-live to a product page if stock is low. The Cache-Control directive most suitable for this purpose is s-maxage as it specifies the maximum age of the object in seconds that the response is allowed to be kept in the web cache.

Determining Cacheable Candidates

Not all content can be or even should be cached. Caching on completely static websites is easier by far, however, these tend to be very unattractive, could be pre-generated and then moved to the web server. As the cache will sit in front of the web server, all requests will go to the cache first. It does not make much sense to store pages in the cache which are kept statically in the web server’s file system anyway.

On the other hand, only GET URLs can be candidates for caching. As a POST request transmits information from the browser to the server, it cannot be cached and must always be handled by the application server. This might sound like a big constraint at first but is actually a feature that can be nicely utilized: all URLs which are candidates for performing the state transition of a user from stateless to stateful will be POST requests. And consequently, the application itself can decide whether the POST requests actually qualify for making a user stateful or whether s/he can remain stateless, for example when a wrong login/password combination is entered.

Anatomy of Varnish’s Request Processing

Varnish distinguishes three stages when processing a request:

  • The request is received from the browser (vcl_recv).
    At this stage, Varnish calls the subroutine vcl_recv in the configuration file (VCL). Here, the request header can be manipulated e.g. by removing cookies. It can be decided whether the content should be looked up in the cache or be propagated to the backend server.
  • The response is received from the backend (vcl_fetch).
    This function is only executed when the content is not delivered from the cache. In this phase, response headers from the backend can be modified (either for delivery or for saving in the cache). The request attributes are also still available and can be used for manipulating several settings.
  • The response is sent to the browser (vcl_deliver).
    This stage is passed by all requests and can be used to add headers (like TTL), change cookies etc. The request parameters are available for reading.

Different stages of Varnish's request processing. Everything related to the cache is in red, i.e. all cacheable content is looked up in the cache and possibly delivered; if it's not in the cache, the web server will be asked via vcl_fetch.

Varnish defines additional subroutines which also hook into the Varnish workflow, but they are not as important. See also the VCL tutorial and the VCL reference.

A Sample Varnish Configuration (VCL)

This section contains a simple Varnish configuration that provides caching as required. The challenge is to keep the user stateless as long as possible. In order to achieve this, a simple trick is used: if a request does not contain a JSESSIONID cookie, it is a stateless request and even if the (uneducated) backend wants to set a cookie, it will be removed. Only POST requests will set necessary cookies. Manipulating the TTL compliments the configuration. A lot of logging is used in the example; this is not just for illustrative purposes but also practical for debugging and optimizing the configuration.

import std;

backend default {
    .host = "localhost";  # Varnish is running on same server as Apache
    .port = "80";
}

sub vcl_recv {
  # remove unnecessary cookies
  if (req.http.cookie ~ "JSESSIONID") {
    std.log("found jsessionid in request, passing to backend server");
    return (pass);
  } else {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.http.cookie ~ "JSESSIONID" || req.request == "POST") {
    std.log("not removing cookie/passing POST, url " + req.url);
    return (pass);
  } else {
    # remove all other cookies and prevent backend from setting any
    std.log("removing cookie in url " + req.url);
    unset beresp.http.set-cookie;
    set beresp.ttl = 600s;
  }
}

sub vcl_deliver {
  # send some handy statistics back, useful for checking cache
  if (obj.hits > 0) {
    set resp.http.X-Cache-Action = "HIT";
    set resp.http.X-Cache-Hits = obj.hits;
  } else {
    set resp.http.X-Cache-Action = "MISS";
  }
}

Notice the C-like syntax in the Varnish configuration. This is no accident; in fact, the whole configuration code is compiled to a binary shared object at startup and when reloading the script to optimize for performance. As the subroutines in this configuration are called for each request, this helps immensely in creating a fast cache server. Moreover, it is possible to add C code directly to the configuration.

It might seem strange at first to define the configuration in a procedural language, but it proved to be extremely valuable as it enables us to be flexible and to formulate how exactly to handle the requests. Overall, this leads to a much more readable configuration than a declarative approach.

Notice the different “top level” objects in the configuration file:

  • req is the request (i.e. the URL including all headers) coming from the browser,
  • resp is the response before it is sent to the client, i.e. when it can still be manipulated.
  • beresp: The response which Varnish gets from the backend (if the object is not cacheable or not cached) is also available as beresp and can be evaluated.

On a side note, Varnish can use ACLs to restrict the access to certain resources. The same ACLs can also be used to (declaratively) tell Varnish what to cache and what not. This technique is the sometimes used “banning”. Varnish can also (atomically) delete certain elements from the cache. This is accomplished via a “purge” command through the HTTP interface and should be restricted to IP addresses (which is the standard configuration together with a secret).

Configuration Details and Tips

Now that we have seen the basic VCL file and understood how a request is usually processed, let’s dive in even further and discuss the details and lessons learned.

Improving the Hit rate with Header Normalization

Varnish has to be told which HTTP request header fields it should use as a cache index. The index is organized as a hash, thus these selected header fields are often referred to as the hash key.

On a side note, you can select the header fields to be used as a hash by implementing the subroutine vcl_hash. If you don’t implement it, Varnish uses the full URL plus the Host request header field by default. In addition to the hash key computed in vcl_hash, the Vary header field is always automatically added to the hash key . For further information on the hash key, see “What Varnish Does” and “Varnish and http header” on Stackoverflow.

To improve the cache hit-rate, it is crucial that you clean up the request header fields used for the hash key. Cleaning up means to change them to a common denominator (so-called header normalization). Another very good candidate is of course the Host header field, where a normalized version (like “www.sitename.com”) should be used even if “sitename.com” is sent in the request header. In addition to that, removing unnecessary headers is always a good idea.

Be careful that the application server does not send a Vary header field for the user agent as this effectively means that there has to be a distinct copy for each user agent. There are so many different browsers (http://panopticlick.eff.org/) that this will basically make caching useless. See also “Understanding the HTTP Vary Header and Caching Proxies (Squid, etc.)” and the
Varnish Documentation on Vary.

Compression

The Accept-Encoding request header field plays an important role: it can have different values like “plain”, “gzip” or “deflate”. Unfortunately, Internet Explorer prefers the deflate encoding while all other browsers favor gzip. Without intervention, this leads to different copies of the same content in the cache, one in deflate format, the other in gzip format.

Since the request header can be modified on the fly in the vcl_recv
subroutine, we can effectively control that only one variant of the content is cached. In your VCL you can modify the request header field and use gzip exclusively if it is available (which is true for both Internet Explorer and others). This technique is presented in detail in the article “Normalize Accept-Encoding header”. Since both browser families have a market share of roughly 50%, this simple change effectively doubles the hit rate.

Please note that beginning with Varnish 3.0, Varnish supports gzip natively and can modify the Accept-Encoding field by itself, so the measures discussed in the previous paragraph can be skipped.

Handling Cookies

Cookies basically fall into different categories:

  • Cookies relevant for caching: These should be kept and their values can be used as part of the hash key for cache index.
  • Cookies irrelevant for caching: These should be discarded and not considered by the cache.
  • Cookies partially relevant for caching: These should be modified and the irrelevant parts should be removed. The remaining cookie should then be used as part of the hash key for the cache index.
  • Session cookies: These cookies must be treated differently as they basically make caching impossible. If such cookies are detected, Varnish should not cache anything but work as a proxy only sending data from the backend server directly to the client.

Consistent Values for TTL and the Expires Field

Varnish has to decide whether and how long to keep elements in the cache. As we have already learned, the Cache-Control header field is utilized here. More specifically, the s-maxage directive part (or maxage as a fallback if s-maxage is not present) is examined to determine the specified maximum lifetime of a cacheable object. Of course, this only works as long as the cache is not full; in the latter case the LRU algorithm is used.

If the web application was not designed with a web cache in mind, it might have conflicting values in s-maxage and the Expires response header field. (See the HTTP specification for a discussion of the Expires versus the maxage field.) This might lead to the bizarre situation that the cached content is sent by Varnish with an Expires header field value that lies in the past if s-maxage has a larger value than Expires.

This weird behavior can be fixed in several ways, e.g. by statically setting the Expires header in Varnish for each request to s-maxage seconds into the future during “vcl_fetch”. This will increase the cache efficiency on the browser side and lead to a more responsive website.

File Descriptors

In our first tests the solution performed well, but not excellently. But even more critical were the many dropped connections, i.e. requests from browsers that did not even reach Varnish.

The reason and the fix were easy – the number of file descriptors had to be increased. This is even more important in real-life situation where connections tend to be slow, as each TCP connection consumes one file descriptor. It does not hurt to allow 32768 descriptors for Varnish.

Monitoring

If you have setup a web cache solution with Varnish, it is important to measure its performance and especially monitor the hit rate of the cache. This turns out to be a bit complicated since the Varnish log files are not written to disk for performance reasons; instead of this, Varnish logs to a circular buffer residing in a shared memory segment. The circular buffer can be read at any time but past values will vanish forever. Since we wanted a monitoring solution that would also allow us to perform a post-mortem analysis in case of a problem, we configured the logging to write the circular buffer to a persistent file.

The most relevant tools for monitoring Varnish are:

  • varnishlog: This shows current requests from the logging ring buffer. Usually request phases will be shown in chronological order which mixes up the requests themselves. This can be fixed by using appropriate options though.
  • varnishtop: This shows the CPU distribution inside the varnish process and can be used to optimize the configuration if too much time is spent in only a few functions.
  • varnishhist: This is easily the most intuitive and graphical tool for analyzing Varnish. It shows a (text) histogram of the response time distribution and thus gives a good overview how the whole system is performing.
  • varnishstat: This shows important statistical information about hit rates, total cache hits, accepted connections from clients etc.

Why Varnish is the best Caching Solution (for us)

When we began to investigate ways to speed up the www.lidl.de site, our first choice was to add the Apache mod_cache caching module to the Apache web server already in use. The first hurdle was the declarative configuration; it is well-suited for a web server but not perfect for modeling a caching behavior. After some fiddling around, it was working smoothly. But more serious problems arose from the fact that certain cookies had to be considered and others had to be neglected. It was impossible to find a viable solution, so the cookie was filtered out by the load balancer. Cache invalidation is performed lazily in Apache, i.e. an outdated resource is removed from the cache only after it is requested. Consequently, outdated resources which are not requested will stay in the cache forever and can only be expired externally. As all cached components are distributed in single files, this expiry is slow and the whole process complicated. For our situation, Apache was not a good solution (although it was in use for quite some time) and hit rates were also rather disappointing.

So our search continued. Via dedicated proxy servers, which are more suitable for large client-side installations like Squid, we finally encountered Varnish, an HTTP accelerator specially built for caching purposes on the server-side. Varnish is already used by many big websites like Facebook, Twitter (Search), Hulu.

Varnish is very flexible as it offers procedural configuration of all request stages in a C-like language (which is actually translated to C and compiled at start time to be as efficient as possible). This enables creative cookie handling and all kinds of other tricks which are usually needed in such a scenario. Varnish was specially designed to run on servers with a VM subsystem, so all cached objects live in a single memory-mapped file and can be accessed extremely fast. Varnish handles expiry automatically and correctly and is even much faster than Apache. So the decision was made to go with Varnish.

Other HTTP accelerators were also considered, but proved to be not feasible, like Oracle Web Cache, a commercial software package from Oracle Inc.; the problem here is that the cache cannot grow easily, and that the manipulation of requests and responses is limited. A hardware-based solution is e.g. F5’s BIG-IP WebAccelerator.

Further Optimizations

Below is a discussion of measures that build on a Varnish setup and would speed-up the page delivery even further.

Using a CDN to increase Scale, Reach & Performance

CDNs take care of delivering the static content while the dynamic content is served via the usual stack. They work in an inherently distributed way and have clever algorithms to select the topologically nearest server for each user. Static and dynamic content can be separated by using virtual webservers with different hostnames. The PDF article “Globally distributed content delivery” from Akamai provides an excellent introduction.

Most CDNs offer an API for invalidating all or partial content and respect the expires header field sent from the originating servers. So the Varnish server can work as a central content repository and will be the upstream server for refreshing the CDN.

Almost all traffic would then be served by the CDN. This saves a lot of bandwidth on the Varnish server and the Gigabit interface will not so easily be overloaded. Moreover, as traffic costs in the CDN are negligible, money can be saved as the hosting company does not have to increase its own upstream link. For more information on how to build a CDN see “How to build your own CDN using BIND, GeoIP, Nginx, and Varnish”.

ESI: Caching Page Fragments with diverse TTL

From a technical point of view, only pages which are requested by the GET method can be cached at all. This is due to the fact that – by definition – POST requests change state on the server which then necessarily needs to reach the application server.

However, the solution described above performs less “aggressive” caching since it just stops caching as soon as a session cookie is present. The effect is that stateful users never get cached pages and therefore might have to wait longer for the page to render completely. On the other hand, it does not make sense to cache pages for individual users since it is quite unlikely that the same user will come back to the exactly same page. Even if the user would come back, it would not be safe to assume that the page is still up-to-date (e.g. since the shopping cart might have changed in the meantime).

To speed things up again, a compromise needs to be found between caching invariant fragments of a page and producing personalized content on the fly for stateful users. Fortunately, Varnish offers the correct arsenal to perform exactly this decomposition by leveraging Edge Side Include (ESI).

When Varnish processes ESI tags, the page assembly (out of fragments) is done by Varnish. As these fragments are separate web resources (requested through GET or POST) they can be assigned their own cache settings and handling information. For example, a cache time-to-live (TTL) of several days could be appropriate for the template, but a fragment containing a frequently-changing story or ad may require a much lower TTL. Some fragments may have to be marked uncacheable.

It must be carefully analyzed how the decomposition of the page might look like as getting it right is essential to achieve a high hit-rate and a low overhead. In case of an online shop, the page could e.g. consist of different (graphical) fragments:

Decomposition of a typical page into user-specific, dynamic (red) and static (blue) fragments.

The shopping cart and the login details would then be transferred directly from the application server via an appropriate ESI fragment, whereas the rest of the page is identical for all users and can be stored in the cache. To minimize the number of requests from Varnish to the application server, both fragments can be transferred in one part and integrated in different locations on the page on the client side or via CSS.

Compared to the performance numbers above, the stateful performance is much higher when using ESI. Rates of about 500 stateful requests per second are now easily possible.

Memcached: Caching Session-specific Page-Fragments

If you examine the page diagram above, you might notice that even though the shopping cart and login details are user-specific elements on the page, they are not very dynamic, i.e. they change infrequently.

This leads to an opportunity for further optimization: the user-specific fragments can also be stored, but must of course be associated with the session of the corresponding user. As the information is not persistent (as it becomes invalid with an invalidated session) it can be stored in memory. Memcached is just made for this scenario and therefore a perfect fit, see e.g. the article “Storing Sessions in Memcache”.

Any change in the shopping cart or login details will trigger a regeneration of the HTML fragments which will then be stored in memcached. (This can be done in the same POST request by the application server.) Varnish will include the fragment from Memcached (either via direct integration, via Apache or via Nginx). A SessionListener within Tomcat can take care of removing stale sessions from Memcached.

Memcached is extremely fast. Even for stateful users this leads to a performance of well above 5,000 GET requests/s. POST requests are a different story as they still have to be handled by the application server. As they perform only internal tasks and write both to the database and Memcached, a rate of 500 requests/s is nonetheless realistic.

Share

Leave a Reply

You must be logged in to post a comment.

7 Responses to “Ultra-Performant Dynamic Websites with Varnish”

  1. Roelof says:

    Thanks. Nice article. I’m wondering how personalization with respect to “Postleitzahl” is handled. This still would leave cacheable pages with regional differentiation. Means not removing “Postleitzahl” cookie in vcl_recv? And this for a selection of pages?

  2. Dr. Christian Winkler says:

    Very good question, Roelof.

    In fact regionalization via Postleitzahl is handled purely on the client side via a cookie which is evaluated in Javascript. This cookie contains only information about the address of the store and no other personalized information. It is removed in vcl_recv.

    Regionalization of the product line and product attributes is handled differently by a separate cookie which contains the region of the user (there are only a few). This cookie is used by Varnish to cache separate versions of the page.

  3. John K says:

    FYI, the default number of file descriptors available to Varnish is 131072 now (more than the 32768 you suggested)

  4. Richard says:

    Thanks for taking the time to be so comprehensive. We’ve been using Varnish Cache for a while and it’s worked out pretty well for us.

  5. [...] The Varnish Book Blog do Kristian Lyngstøl – Um dos autores do Varnish Book High-End Varnish – 275 thousand requests per second. Configuring Varnish for High-Availability with Multiple Web Servers Adventures in Varnish Ultra-Performant Dynamic Websites with Varnish [...]

  6. mike says:

    How i can increase the compression of deflate or gzip, like .htaccess ?