Ignore utm_ * values ​​with varnish?

advertisements

Can I 'ignore' query string variables before pulling matching objects from the cache, but not actually remove them from the URL to the end-user?

For example, all the marketing utm_source, utm_campaign, utm_* values don't change the content of the page, they just vary a lot from campaign to campaign and are used by all of our client-side tracking.

So this also means that the URL can't change on the client side, but it should somehow be 'normalized' in the cache.

Essentially I want all of these...

http://example.com/page/?utm_source=google

http://example.com/page/?utm_source=facebook&utm_content=123

http://example.com/page/?utm_campaign=usa

... to all access HIT the cache for http://example.com/page/

However, this URL would cause a MISS (because the param is not a utm_* param)

http://example.com/page/?utm_source=google&variation=5

Would trigger the cache for

http://example.com/page/?variation=5

Also, keeping in mind that the URL the user sees must remain the same, I can't redirect to something without params or any kind of solution like that.


So I'll add a disclaimer that this regex probably isn't perfect, but it should work pretty well:

sub vcl_recv {
  set req.url = regsuball(req.url, "\?(utm_[^=&]*=[^&=]*&?)+", "?");
  set req.url = regsuball(req.url, "&(utm_[^=&]*=[^&=]*(&|$))+", "\2");
  set req.url = regsub(req.url, "\?$", "");

  return (pass);
}

This should remove any query parameters starting with utm_. I used three regexs to make it clearer and easier to read.

The first regsuball removes any utm_ parameters at the beginning of the query string. It looks for one or more utm_ parameters immediately after the ?. The second regsuball removes any utm_ parameters that aren't at the beginning of the query string.

The third regex will cleanup the URL by removing the ? if there are no query parameters left after we are done removing utm_ parameters.

Both regexes need to be in ()+ as this will match one or more consecutive utm_ parameters (they wouldn't be matched otherwise).

Example results:

Source URL: /?utm_track=1&utm_test2=hey&test=utm_blah&utm_source=google&variation=5&utm_query=abc&utm_test7=yes
Maps to:    /?test=utm_blah&variation=5

Source URL: /?variation=5&utm_test1=abc&utm_test2=def&blah=1
Maps to:    /?variation=5&blah=1