A simple cache

One of my customers recently had the need for a simple cache in their python code. The items that needed to be cached are generated from information held on disk. There are a lot of instances of objects and retrieving from disk is relatively slow; most of these objects are used to create other objects that differ only by name and position in a 2-D layout. To speed up the import of these objects a I implemented a simple cache that meant that the underlying base object could quickly be retrieved from memory instead of disk if it had already been seen.

The first simple implementation that I came up with is shown below and can be used as a generic Cache class. The constructor for the class takes a reference to a lookup function that performs the (possibly) expensive retrieval of the object from persistent storage and adds it to the local cache. Once an object is in the cache then the local copy is returned as required. One of our customer’s requirements is that the cached object needs to accept a tuple as a key to identify the object and this is implemented in this cache.

The cache will raise a KeyError exception if the required object cannot be recovered from backing store. The handling for this is a bit agressive as it catches any exception from the lookup function. A better implementation would only catch specific errors. This is left as an exercise for the reader.

#
# A simple cache class for Python 3
#

class CacheError(Exception):
    """ Cache problem """

class CacheLookupError(CacheError):
    """ Key lookup failed """

class Cache:
    """ A simple cache class """

    def __init__(self, lookup_function):
        """ Create a new cache.

        @param [in] lookup_function The function called if a
                                    requested object is not
                                    already in the cache.
        """
        self._cache = {}
        self._lookup_function = lookup_function

    def __getitem__(self, *keys):
        """ Get cached entry using [] syntax.

        @param [in] keys The attributes to be merged
                         to form a single key into the cache.
        @return The cached item or a new item if not in
                the cache already
        """
        if len(keys) > 1:
            key = tuple(keys)
        else:
            key = keys[0]

        try:
            return self._cache[key]
        except KeyError:
            try:
                self._cache[key] = self._lookup_function(key)
                return self._cache[key]
            except:
                raise CacheLookupError(key)

            

And the tests for this code (requires pytest to run):

#===========================
# Tests
# -----
# These tests assume that pytest is installed. If you
# don't have pytest then install it with pip:
#
#   pip install pytest
#
# Required imports
import pytest
import time

# Unit under test
import cache

# Test fixtures
# -------------
#
def dummy_lookup_function(key):
    """ Function to perform a lookup
    Dummy lookup function that uses a timer
    to make the lookup 'expensive' in delay
    time.
    """
    time.sleep(1)
    return "VALUE: {}".format(key)

def dummy_lookup_fail(key):
    """ Function to perform a lookup
    Dummy lookup function that always raises
    an exception.
    """
    raise Exception()

# Test cases
# ----------
def test_create_one_entry():
   """ Create a single entry in a new cache """
   # Create the cache instance passing the lookup function
   test_cache = cache.Cache(dummy_lookup_function)

   # Create and populate the cache with one entry
   assert test_cache['x'] == 'VALUE: x'

def test_lookup_fail():
   """ Check that a lookup error raises KeyError """
   # Create the cache instance passing the lookup function
   test_cache = cache.Cache(dummy_lookup_fail)

   # Create and populate the cache with one entry
   with pytest.raises(cache.CacheLookupError):
        x = test_cache['x']

def test_create_and_lookup_two_entries():
   """ Create two entries in a new cache and check they are returned """
   # Create the cache instance passing the lookup function
   test_cache = cache.Cache(dummy_lookup_function)

   # Create and populate the cache with two entries
   assert test_cache['x'] == 'VALUE: x'
   assert test_cache['Y'] == 'VALUE: Y'

   # Check that the entries are returned
   assert test_cache['x'] == 'VALUE: x'
   assert test_cache['Y'] == 'VALUE: Y'

def test_create_and_lookup_tuple():
   """ Create entries using tuples as keys """
   # Create the cache instance passing the lookup function
   test_cache = cache.Cache(dummy_lookup_function)
   t1 = (1, 2)
   t2 = (1, 2, 3)
   t3 = ("one", "two", "three")
   t4 = (1, 2, "three", "4")
   for key in [t1, t2, t3, t4]:
       assert test_cache[key] == 'VALUE: {}'.format(key)
       assert test_cache[key] == 'VALUE: {}'.format(key)

def test_create_and_lookup_10_entries_with_timing():
   """ Create 10 entries and then look them up.
   The create for each should take approx 1 second, the
   lookup should be near instantaneous.
   """
   # Create the cache instance passing the lookup function
   test_cache = cache.Cache(dummy_lookup_function)
   for key in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
       expected = "VALUE: {}".format(key)

       # Create an entry
       t_start = time.monotonic()
       assert test_cache[key] == expected
       t_end = time.monotonic()
       # Check that the time to create is approximately 1 second
       assert (t_end - t_start) > 0.9

       # Lookup the entry
       t_start = time.monotonic()
       assert test_cache[key] == expected
       t_end = time.monotonic()

       # Check that the lookup takes a lot less than 1 second
       assert (t_end - t_start) < 0.1

This is a very simple cache that solved a particular need to handle tuples as key.

Having had a while to think about the code I decided that a neater implementation would use standard dict as a base class and use the __missing__ operation to perform the lookup for new values. The updated versions uses the code below. Personally I think that this is a more elegant solution than than the initial code shown above.

#
# A simple cache class for Python 3
#

class CacheError(Exception):
    """ Cache problem """

class CacheLookupError(CacheError):
    """ Key lookup failed """

class Cache(dict):
    """ A simple cache class """

    def __init__(self, lookup_function):
        """ Create a new cache.

        @param [in] lookup_function The function called if a
                                    requested object is not
                                    already in the cache.
        """
        self._lookup_function = lookup_function

    def __missing__(self, *keys):
        """ Error retrieving keys from cache, lookup entry, add to cache and return.

        @param [in] keys The attributes to be merged
                         to form a single key into the cache.
        @return The new item 
        
        @raises CacheLookupError if the given key is not known.

        """
        if len(keys) > 1:
            key = tuple(keys)
        else:
            key = keys[0]

        try:
            self[key] = self._lookup_function(key)
            return self[key]
        except:
            raise CacheLookupError(key)

Notes:

  1. There are more complete caching solutions available in pypi.
  2. In the block:
    if len(keys) > 1:

    the code could be written in a single line: key = keys[0] if len(keys) < 2 else keys
    but I personally find this difficult to read (even if it is very pythonic). I prefer explicit code that doesn’t need a pen and paper to decode what it does; I’m old-fashioned I guess.
  3. Had our customer been running python >= 3.8 then I may have recommended using functools.cached_property instead but this is not available in python 3.6 (the version of python that my customer is using). One day I will get them to migrate but that is a story for another day…