Customizing Varnish VCL on Mercury Server

When getting started with Varnish on Mercury (1.1), I decided to use two resources for VCL logic:
  1. Lullabot Nate Haug's article "Configuring Varnish for High-Availability with Multiple Web Servers" (my implementation only utilizes parts of it)
  2. Stewart Robinson's HIT/MISS header set in sub vcl_deliver
After I got the first site setup, I added a backend and routing logic to Varnish to support another site on the same server (not a multi-site). But first, I had to setup (adjust) the template BCFG2 builds the VCL from, and add an extra BCFG2 Probe.

BCFG2 Probes

BCFG2 uses plugins called "Probes" to 'gather information from a client machine' before generating configuration. These are scripts that live in /var/lib/bcfg2/Probes. Mercury placed several probes in here to look for customizations declared in /etc/mercury/server_tunables. We need to add one more for our Varnish VCL.

1. Add a probe for ACL_INTERNAL

The VCL from Lullabot's article implements an access control to ensure only internal access to cron and install scripts - very smart :)

1.a. Create a new file
/var/lib/bcfg2/Probes/varnish_acl_internal
with this content:

#!/bin/bash
#If you wish to replace our default value for acl_internal in /etc/varnish/default.vlc,
#edit the VARNISH_ACL_INTERNAL variable in /etc/mercury/server_tuneables

if [[ -a /etc/mercury/server_tuneables ]]; then
. /etc/mercury/server_tuneables
fi

if [[ -n "$VARNISH_ACL_INTERNAL" ]]; then
   echo "$VARNISH_ACL_INTERNAL"
fi

2. Edit the default.vcl TEMPLATE (after backing up the default, of course)

2.a. Backup default source template

cp /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid-default

2.b. Create custom source template

cp /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid-custom

2.c. Edit custom source template

vi /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid-custom
NOTE: The "PLEASE DON'T EDIT..." comment applies to the *output* (`default.vcl`) produced from this source/template, not this source/template itself (`/var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid`):
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.

# PLEASE DON'T EDIT THIS FILE DIRECTLY - SEE /etc/mercury/server_tuneables

# Default backend definition.  Set this to point to your content server.
backend default {
  .host = "127.0.0.1";
  .port = "8080";
  .connect_timeout = 600s;
  .first_byte_timeout = 600s;
  .between_bytes_timeout = 600s;
}

# Alternative backend(s)
backend dev {
  .host = "dev.example.com";
  .port = "8080";
  .connect_timeout = 600s;
  .first_byte_timeout = 600s;
  .between_bytes_timeout = 600s;
}

# Define the internal network subnet.
# These are used below to allow internal access to certain files while not
# allowing access from the public internet.
acl internal {
  ${metadata.Probes['varnish_acl_internal']}
}

# Respond to incoming requests.
sub vcl_recv {
  ${metadata.Probes['varnish_vcl_recv']}
}

# Routine used to determine the cache key if storing/retrieving a cached page.
sub vcl_hash {
  ${metadata.Probes['varnish_vcl_hash']}
}

# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
  ${metadata.Probes['varnish_vcl_fetch']}
}

# In the event of an error, show friendlier messages.
sub vcl_error {
  ${metadata.Probes['varnish_vcl_error']}
  return (deliver);
}

# Delivery routine
sub vcl_deliver {
  ${metadata.Probes['varnish_vcl_deliver']}
  return (deliver);
}

2.d. Copy customizations over source template

cp /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid /var/lib/bcfg2/TGenshi/varnish/vcl/template.newtxt.G50_lucid-default

3. Edit /etc/mercury/server_tunables:

I only changed the variables used for default.vcl: UNIX NOTE: each cat command and argument (path to file) is wrapped in ticks, so the exported variables are assigned *evaluations* of the expression `cat`. Using quotes will assign the literal string value, which of course we don't want.
#/etc/varnish/default.vcl
# export VARNISH_ACL_INTERNAL="\"127.0.0.1\"/24"
export VARNISH_ACL_INTERNAL=`cat /etc/varnish/custom/varnish_internal.acl`
# export VARNISH_VCL_ERROR=""
export VARNISH_VCL_ERROR=`cat /etc/varnish/custom/varnish_error.vcl`
# export VARNISH_VCL_FETCH=""
export VARNISH_VCL_FETCH=`cat /etc/varnish/custom/varnish_fetch.vcl`
# export VARNISH_VCL_HASH=""
export VARNISH_VCL_HASH=`cat /etc/varnish/custom/varnish_hash.vcl`
# export VARNISH_VCL_RECV=""
export VARNISH_VCL_RECV=`cat /etc/varnish/custom/varnish_recv.vcl`
# export VARNISH_VCL_DELIVER=""
export VARNISH_VCL_DELIVER=`cat /etc/varnish/custom/varnish_deliver.vcl`

4. Edit the ACL and VCL templates

These are the building blocks used to piece together our default.vcl.

4.a. /etc/varnish/custom/varnish_internal.acl

  "127.0.0.1"/24;

4.b. /etc/varnish/custom/varnish_error.vcl

  # Redirect to some other URL in the case of a homepage failure.
  #if (req.url ~ "^/?$") {
  #  set obj.status = 302;
  #  set obj.http.Location = "http://backup.example.com/";
  #}

  # Otherwise redirect to the homepage, which will likely be in the cache.
  set obj.http.Content-Type = "text/html; charset=utf-8";
  synthetic {"
<_html>
<_head>
  <_title>Page Unavailable<_/title>
  <_style>
    body { background: #303030; text-align: center; color: white; }
    #page { border: 1px solid #CCC; width: 500px; margin: 100px auto 0; padding: 30px; background: #323232; }
    a, a:link, a:visited { color: #CCC; }
    .error { color: #222; }
  <_/style>
<_/head>
<_body onload="setTimeout(function() { window.location = '/' }, 5000)">
  <_div id="page">
    <_h1 class="title">Page Unavailable<_/h1>
    <_p>The page you requested is temporarily unavailable.<_/p>
    <_p>We're redirecting you to the <_a href="/">homepage<_/a> in 5 seconds.<_/p>
    <_div class="error">(Error "} obj.status " " obj.response {")<_/div>
  <_/div>
<_/body>
<_/html>
"};
(Remove the '_' underscores from above code).

4.c. /etc/varnish/custom/varnish_fetch.vcl

  # Don't allow static files to set cookies.
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$") {
    # beresp == Back-end response from the web server.
    unset beresp.http.set-cookie;
  }

  # Allow items to be stale if needed.
  set beresp.grace = 6h;

4.d. /etc/varnish/custom/varnish_hash.vcl

  # From http://www.lullabot.com/articles/varnish-multiple-web-servers-drupal (http://www.lullabot.com/sites/lullabot.com/files/default.vcl_.txt)

  # Include cookie in cache hash.
  # This check is unnecessary because we already pass on all cookies.
  # if (req.http.Cookie) {
  #   set req.hash += req.http.Cookie;
  # }

4.e. /etc/varnish/custom/varnish_recv.vcl

  # From http://www.lullabot.com/articles/varnish-multiple-web-servers-drupal (http://www.lullabot.com/sites/lullabot.com/files/default.vcl_.txt)

  # Dev Backend Routing
  if (req.http.host ~ "^(www\.)?dev\.example\.com$") {
    set req.backend = dev;
  }

  # Use anonymous, cached pages if all backends are down.
  if (!req.backend.healthy) {
    unset req.http.Cookie;
  }

  # Allow the backend to serve up stale content if it is responding slowly.
  set req.grace = 6h;

  # Do not cache these paths.
  if (req.url ~ "^/status\.php$" ||
      req.url ~ "^/update\.php$" ||
      req.url ~ "^/ooyala/ping$" ||
      req.url ~ "^/admin/build/features" ||
      req.url ~ "^/info/.*$" ||
      req.url ~ "^/flag/.*$" ||
      req.url ~ "^.*/ajax/.*$" ||
      req.url ~ "^.*/ahah/.*$" ||
      req.url ~ "^.*/filefield/ahah/.*$" ||
      req.url ~ "^/ahah_helper/.*$") {
       return (pass);
  }

  # Pipe these paths directly to Apache for streaming.
  if (req.url ~ "^/admin/content/backup_migrate/export") {
    return (pipe);
  }

  # Do not allow outside access to cron.php or install.php.
  if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ internal) {
    # Have Varnish throw the error directly.
    error 404 "Page not found.";
    # Use a custom error page that you've defined in Drupal at the path "404".
    # set req.url = "/404";
  }

  # Handle compression correctly. Different browsers send different
  # "Accept-Encoding" headers, even though they mostly all support the same
  # compression mechanisms. By consolidating these compression headers into
  # a consistent format, we can reduce the size of the cache and get more hits.=
  # @see: http://varnish.projects.linpro.no/wiki/FAQ/Compression
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      # If the browser supports it, we'll use gzip.
      set req.http.Accept-Encoding = "gzip";
    }
    else if (req.http.Accept-Encoding ~ "deflate") {
          # Next, try deflate if it is supported.
      set req.http.Accept-Encoding = "deflate";
    }
    else {
      # Unknown algorithm. Remove it and send unencoded.
      unset req.http.Accept-Encoding;
    }
  }

  # Always cache the following file types for all users.
  if (req.url ~ "(?i)\.(png|gif|jpg|jpeg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$") {
    unset req.http.Cookie;
  }

  # Remove all cookies that Drupal doesn't need to know about. ANY remaining
  # cookie will cause the request to pass-through to Apache. For the most part
  # we always set the NO_CACHE cookie after any POST request, disabling the
  # Varnish cache temporarily. The session cookie allows all authenticated users
  # to pass through as long as they're logged in.
  if (req.http.Cookie) {
    set req.http.Cookie = ";" req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
      # If there are no remaining cookies, remove the cookie header. If there
      # aren't any cookie headers, Varnish's default behavior will be to cache
      # the page.
      unset req.http.Cookie;
    }
    else {
      # If there is any cookies left (a session or NO_CACHE cookie), do not
      # cache the page. Pass it on to Apache directly.
      return (pass);
    }
  }

4.f. /etc/varnish/custom/varnish_deliver.vcl

  # Set HIT/MISS X-Varnish-Cache header
  # from http://www.stewsnooze.com/content/what-stopping-varnish-and-drupal-pressflow-caching-anonymous-users-page-views

  if (obj.hits > 0) {
    set resp.http.X-Varnish-Cache = "HIT";
  }
  else {
    set resp.http.X-Varnish-Cache = "MISS";
  }

5. Rebuild Mercury

bcfg2 -veqd