23 Aug 2011

Wars


...And it took a war to make it that way :-)

Idea and picture of this post come when reading this, maybe unrelated topic

22 Aug 2011

haproxy: redirect prefix vs redirect location

From haporxy document:
redirect location <to> [code <code>] <option> [{if | unless} <condition>]
redirect prefix   <to> [code <code>] <option> [{if | unless} <condition>]

The author also provides a description about how 2 redirect rules work

With "redirect location", the exact value in <to> is placed into
              the HTTP "Location" header. In case of "redirect prefix", the
              "Location" header is built from the concatenation of <to> and the
              complete URI, including the query string, unless the "drop-query"
              option is specified (see below). As a special case, if <to>
              equals exactly "/" in prefix mode, then nothing is inserted
              before the original URI. It allows one to redirect to the same
              URL.


For example, when you define an acl (access control list) in haproxy likes this:
acl right_request hdr_sub(cookie) -i Human=1
redirect prefix http://www.oursite.com/?lt=verify code 302 if !right_request

With configuration above, what we want is once a request without cookie Human with value=1 will be redirect to page: http://www.oursite.com/?lt=verify to check for human and set right cookies. But it will not work as expected.
A request header looks like this:
GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.google.com.vn
Connection: Keep-Alive

As haproxy document describes, a redirect prefix rule will concate the <to> path with the complete URI which is "/" in http header in our example. Back to redirect prefix rule, it will redirect and rewrite the URL to:
http://www.oursite.com/?lt=verify/
and for sure, that URL is not exist.

How to make that rule works? Just replace prefix with location.

*My note when using haproxy*


17 Aug 2011

Why Etag is (generally) a good idea, and why it should not be used

Etags (Entity tags) is a part of HTTP headers, which is used to compare cached object on client side (the browser) with the original object on server side. What does it compare for? Normally, every object that is considered cache-able will be cached on client cache (if cache memory of client is still enough). When server send a response include ETag header to client,
HTTP/1.0 200 OK
Content-Length: 121217
Content-Type: text/html
Content-Location: http://www.website.vn/home/index.htm
Last-Modified: Thu, 18 Aug 2011 13:34:08 GMT
ETag: "ab26ff81ab5dcc1:2878"
Date: Thu, 18 Aug 2011 13:34:52 GMT
X-Cache: HIT from Node-Cache-22
Connection: keep-alive
browser cache will store that Etag value. Next time, if we browse the same object, client will send that value to server to validate the state of cached object,

Host    http://www.website.vn/home/index.htm
User-Agent    Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20100101 Firefox/5.0
Accept    */*
Accept-Language    en-us,en;q=0.5
Accept-Encoding    gzip, deflate
Accept-Charset    UTF-8,*
Connection    keep-alive
Referer    http://www.somewhereonthe.net
Cookie    __name=value
If-None-Match    "ea3d79c3fc8cb1:2878"
Cache-Control    max-age=0
If the value of If-None-Match is different from saved Etags value, server send full response to client include new object and new ETag. Otherwise, it will send a 304 Not Modified reponse and client use the object that's already cached on browser's cache memory.

Date    Thu, 18 Aug 2011 07:56:57 GMT
Content-Type: text/html
Last-Modified    Wed, 20 Jul 2011 01:38:44 GMT
Etag    "40b519c37d46cc1:2878"
Connection    keep-alive 
Commonly, Etags value is generated by web server (or programmer by computing the Etags by md5sum(the-object)). Apache web server uses 3 components (or file attributes) to built the Etags: INode, MTime, Size. User of nginx can use these modules: https://github.com/mikewest/nginx-static-etags and https://github.com/kali/nginx-dynamic-etags to add ETag value.
 From wikipedia
An ETag, or entity tag, is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for cache validation, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control,[1] as a way to help prevent simultaneous updates of a resource from overwriting each other.

To test how ETag works yourself, a great-mind already wrote a python module for your need: http://www.feedparser.org/docs/http-etag.html
So, after all of these lines of text to describe how ETag works, how it applies to HTTP protocol and how it helps to reuse the unchanged resources on client, avoiding full server responses if the content has not been changed, saving bandwitdth...it's generally a very good idea.
Why it should not be used? there's 2 reasons
 1. CPU consuming: When using Etag, server has to calculate the ETag value for all objects that it is configured to apply to. For each request from client that include the Etag header, server also has to calculate again, do the comparison these 2 values, then desire how reply to client with right reponse code. It takes too much resource (CPU usage) on server side.

 2. Websites that are applied ETag (with serious thought) are mostly served from multiple servers. For example if you're using Apache (same thing will happen if you're using nginx, because 2 Etag modules of nginx is ported from Apache AFAIK)
With default FileEtag settings, N Apache boxes will generate N ETag values for same object. If client A makes first request to box 1, it receives Etag1, after that, user re-visits the url and reach box 2, even if content was not changed, the If-None-Match or If-Match value will be different, box2 has to compute the ETag, then send the full reponse to client with its ETag value. It's a REAL waste of resource.

One solution when using Etag is: remove the Inode from FileEtag setting. But this just solve the second dis-advantage of Etag, not all of it.

Another solution is instead of using Etag, we rely on Last-Modified header. If you dont know how to use Last-Modified yet, it will be explained on next post (hopefully soon ;) ).
 To remove Etag on Apache:

Header unset Etag
FileETag none
 Nginx: dont use the Etag modules ;)

 If you do not control the back-end web servers (like mine), and but control caching boxes, you can also remove it from reverse proxies.
If squid:

header_access Etag deny all
head_access If-Match deny all
header_access If-None-Match deny all


TrafficServer
CONFIG proxy.config.http.cache.required_headers INT 0
This post is also the part 2 of Web Caching series.

Disqus