ETag based caching for GET APIs

Many client applications require caching of data to work with low bandwidth connections. Many of them do it to provide faster loading time to the client user. The Webapp and Android app had similar requirements. Previously they provided caching using a versions API that would keep track of any modifications made to Events or Services. The response of the API would be something like this:

  "event_id": 6,
  "event_ver": 1,
  "id": 27,
  "microlocations_ver": 0,
  "session_ver": 4,
  "speakers_ver": 3,
  "sponsors_ver": 2,
  "tracks_ver": 3

The number corresponding to "*_ver" tells the number of modifications done for that resource list. For instance, "tracks_ver": 3 means there were three revisions for tracks inside the event (/events/:event_id/tracks). So when the client user starts his app, the app would make a request to the versions API, check if it corresponds to the local cache and update accordingly. It had some shortcomings, like checking modifications for a individual resources. And if a particular service (microlocation, track, etc.) resource list inside an event needs to be checked for updates, a call to the versions API would be needed.

ETag based caching for GET APIs

The concept of ETag (Entity Tag) based caching is simple. When a client requests (GET) a resource or a resource list, a hash of the resource/resource list is calculated at the server. This hash, called the ETag is sent with the response to the client, preferably as a header. The client then caches the response data and the ETag alongside the resource. Next time when the client makes a request at the same endpoint to fetch the resource, he sets an If-None-Match header in the request. This header contains the value of ETag the client saved before. The server grabs the resource requested by the client, calculates its hash and checks if it is equal to the value set for If-None-Match. If the value of the hash is same, then it means the resource has not changed, so a response with resource data is not needed. If it is different, then the server returns the response with resource data and a new ETag associated with that resource.

Little modifications were needed to deal with ETags for GET requests. Flask-Restplus includes a Resource class that defines a resource. It is a pluggable view. Pluggable views need to define a dispatch_request method that returns the response.

import json
from hashlib import md5

from flask.ext.restplus import Resource as RestplusResource

# Custom Resource Class
class Resource(RestplusResource):
    def dispatch_request(self, *args, **kwargs):
        resp = super(Resource, self).dispatch_request(*args, **kwargs)

        # ETag checking.
        # Check only for GET requests, for now.
        if request.method == 'GET':
            old_etag = request.headers.get('If-None-Match', '')
            # Generate hash
            data = json.dumps(resp)
            new_etag = md5(data).hexdigest()

            if new_etag == old_etag:
                # Resource has not changed
                return '', 304
                # Resource has changed, send new ETag value
                return resp, 200, {'ETag': new_etag}

        return resp

To add support for ETags, I sub-classed the Resource class to extend the dispatch_request method. First, I grabbed the response for the arguments provided to RestplusResource‘s dispatch_request method. old_etag contains the value of ETag set in the If-None-Match header. Then hash for the resp response is calculated. If both ETags are equal then an empty response is returned with 304 HTTP status (Not Modified). If they are not equal, then a normal response is sent with the new value of ETag.

[smg:~] $ curl -i 
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 1061
ETag: ada4d057f76c54ce027aaf95a3dd436b
Server: Werkzeug/0.11.9 Python/2.7.6
Date: Thu, 21 Jul 2016 09:01:01 GMT

{"description": "string", "sessions": [{"id": 1, "title": "Fantastische Hardware Bauen & L\u00f6ten Lernen mit Mitch (TV-B-Gone) // Learn to Solder with Cool Kits!"}, {"id": 2, "title": "Postapokalyptischer Schmuck / Postapocalyptic Jewellery"}, {"id": 3, "title": "Query Service Wikidata "}, {"id": 4, "title": "Unabh\u00e4ngige eigene Internet-Suche in wenigen Schritten auf dem PC installieren"}, {"id": 5, "title": "Nitrokey Sicherheits-USB-Stick"}, {"id": 6, "title": "Heart Of Code - a hackspace for women* in Berlin"}, {"id": 7, "title": "Free Software Foundation Europe e.V."}, {"id": 8, "title": "TinyBoy Project - a 3D printer for education"}, {"id": 9, "title": "LED Matrix Display"}, {"id": 11, "title": "Schnittmuster am PC erstellen mit Valentina / Valentina Digital Pattern Design"}, {"id": 12, "title": "PC mit Gedanken steuern - Brain-Computer Interfaces"}, {"id": 14, "title": "Functional package management with GNU Guix"}], "color": "GREEN", "track_image_url": "", "location": "string", "id": 1, "name": "string"}

[smg:~] $ curl -i --header 'If-None-Match: ada4d057f76c54ce027aaf95a3dd436b' 
Connection: close
Server: Werkzeug/0.11.9 Python/2.7.6
Date: Thu, 21 Jul 2016 09:01:27 GMT

ETag based caching has a drawback. Since the hash is calculated for every GET request it increases the load on servers. So if four clients request the same resource, the server calcuates hashes four times. This can be solved by calculating and saving the ETag during creation and modification of resources, and then getting and sending this ETag directly.