Heitor's log


Caching computation in Python


Caching is an optimization strategy that stores data in some memory, instead of using and discarding it. The idea is to store data to serve if faster in the future. This might save CPU time at the cost of increased memory usage.

There are many types of cache (hardware caches, network caches, software caches…), with different applications and performances. In this post I will discuss one type of software cache: memoization.

Memoization

Memoization is a software cache technique in which the results of functions are saved in a cache. The entries of this cache are served when the function is called with the same inputs, instead of executing the function again.

It has this name because it is a “memorization” process: it saves the output of a function for a given input. Let’s see an example: the factorial. A Python code of the recursive version is:

def factorial(n):
    if n > 1:
        return n * factorial(n - 1)
    else:
        return 1

Every time this function is called, it calculates \( (n-1)! \), \( (n-2)! \), \( (n-3)! \), \( (n-4)! \), …, this is very fast for small values of \(n\), but it can take some time for larger inputs:

In [3]: %timeit factorial(30)
3.6 µs ± 81.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [4]: %timeit factorial(50)
7.4 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit factorial(100)
13.5 µs ± 92.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit factorial(1000)
294 µs ± 9.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [18]: %timeit factorial(2000)
1.04 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So, it takes longer to calculate more intermediate results. Let’s try to memoize it manually:

class MemoizedFactorial:
    def __init__(self):
        self.cache = {}

    def calculate(self, n):
        if n in self.cache:
            return self.cache[n]
        else:
            if n > 1:
                self.cache[n] = n * self.calculate(n-1)
                return self.cache[n]
            else:
                self.cache[1] = 1
                return self.cache[1]

And to use it:

x = MemoizedFactorial()
print(x.calculate(5)) # prints 120

And how fast it is?

In [20]: %timeit x.calculate(2000)
147 ns ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So, about 150 ns vs 1 ms to calculate the same number. It is only ~7000 times faster to calculate this version of \(2000!\). And then there’s something interesting for \(n < 2000\):

In [21]: %timeit x.calculate(100)
147 ns ± 2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [22]: %timeit x.calculate(137)
150 ns ± 0.896 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [23]: %timeit x.calculate(5)
147 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [24]: %timeit x.calculate(1234)
175 ns ± 0.81 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

It takes roughly the same amount of time to “calculate” any value. And it is expected: we are only retrieving values of a dictionary.

And notice the memory consumption for each function:

In [25]: import sys

In [26]: sys.getsizeof(x.cache)
Out[26]: 73816

This trades a function to an object that holds 73 KB of cache and yields a dramatic performance increase.

It stores 2000 results in 73KB, what would happen if we had 2 million entries? And then continue calculating more million entries? This implementation would keep consuming more and more memory until there’s no more space.

To overcome this situation, there are many caching algorithms that put in place policies to discard cached data when it becomes full. Here I will discuss the Least Recently Used approach.

Least Recently Used (LRU)

It is important to define a maximum size for the cache. This way the system does not grow indefinitely and you can have control about its memory consumption.

The downside is to discard information when the cache is at its limit. One way of doing so is keeping only the recent entries, discarding the least recently used.

This is not as simple as the example above, because it needs to also keep track of when each entry was cached. On the other hand, it is already implemented in Python’s Standard Library :) The module functools provides us the decorator lru_cache:

from functools import lru_cache

@lru_cache(maxsize=2000)
def factorial(n):
    if n > 1:
        return n * factorial(n - 1)
    else:
        return 1

It is the same function from the first code, but with a decorator added. The default maximum size of the cache is 128, I increased so we can compare the timings with our handmade version:

In [27]: %timeit factorial(1000)
68 ns ± 0.697 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [28]: %timeit factorial(2000)
64.1 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

This decorated function is almost three times faster than the handmade cache.

Another interesting advantage of this decorator is that it provides some methods to see what happened with the cache:

  • The cache_info() method displays some statistics about that cache: how many hits (the number of times the requested data was in the cache), misses (the number of times the requested data was not in the cache), the maximum number of entries in the cache and the current number of cached items.

  • The cache_clear() method empties the cache.

Let’s see them in action:

In [13]: factorial.cache_clear()

In [14]: factorial.cache_info()
Out[14]: CacheInfo(hits=0, misses=0, maxsize=2000, currsize=0)

In [15]: factorial(1000)
Out[15]: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

In [16]: factorial.cache_info()
Out[16]: CacheInfo(hits=0, misses=1000, maxsize=2000, currsize=1000)

In [17]: factorial(1000)
Out[17]: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

In [18]: factorial.cache_info()
Out[18]: CacheInfo(hits=1, misses=1000, maxsize=2000, currsize=1000)

After emptying the cache, there’s no information apart the maximum size. The first time the function is called, there will be only misses as the cache is empty. The second time this function is called, the result is already available.

Conclusions

Caching is an optimization technique in which we trade expensive computation for memory. This memory, of course, needs to have a faster access than computing the data again. As a consequence, the cache needs to be small.

To keep the cache under control, a number of caching strategies exists. The Least Recently Used approach is an efficient way to keep only the most recently used entries. It is available in Python as functools.lru_cache.