Python 缓存机制与 functools.lru_cache

缓存是一种将定量数据加以保存以备迎合后续获取需求的处理方式，旨在加快数据获取的速度。数据的生成过程可能需要经过计算，规整，远程获取等操作，如果是同一份数据需要多次使用，每次都重新生成会大大浪费时间。所以，如果将计算或者远程请求等操作获得的数据缓存下来，会加快后续的数据获取需求。

@functools.lru_cache(maxsize=128, typed=False)：一个为函数提供缓存功能的装饰器，缓存 maxsize 组传入参数，在下次以相同参数调用时直接返回上一次的结果。用以节约高开销或 I/O 函数的调用时间。

由于使用了字典存储缓存，所以该函数的固定参数和关键字参数必须是可哈希的。不同模式的参数可能被视为不同从而产生多个缓存项，例如, f(a=1, b=2) 和 f(b=2, a=1) 因其参数顺序不同，可能会被缓存两次。如果指定了 user_function，它必须是一个可调用对象。这允许 lru_cache 装饰器被直接应用于一个用户自定义函数，让 maxsize 保持其默认值 128:

@lru_cache
def count_vowels(sentence):
    sentence = sentence.casefold()
    return sum(sentence.count(vowel) for vowel in 'aeiou')

如果 maxsize 设为 None，LRU 特性将被禁用且缓存可无限增长。
如果 typed 设置为 true，不同类型的函数参数将被分别缓存。例如， f(3) 和 f(3.0) 将被视为不同而分别缓存。

为了衡量缓存的有效性以便调整 maxsize 形参，被装饰的函数带有一个 cache_info() 函数。当调用 cache_info() 函数时，返回一个具名元组，包含命中次数 hits，未命中次数 misses，最大缓存数量 maxsize 和当前缓存大小 currsize。在多线程环境中，命中数与未命中数是不完全准确的。

该装饰器也提供了一个用于清理/使缓存失效的函数 cache_clear()。

原始的未经装饰的函数可以通过 __wrapped__ 属性访问。它可以用于检查、绕过缓存，或使用不同的缓存再次装饰原始函数。

“最久未使用算法”（LRU）缓存在“最近的调用是即将到来的调用的最佳预测因子”时性能最好（比如，新闻服务器上最受欢迎的文章倾向于每天更改）。 “缓存大小限制”参数保证缓存不会在长时间运行的进程比如说网站服务器上无限制的增加自身的大小。简言之，这个装饰器实现了备忘的功能，是一项优化技术，把耗时的函数的结果保存起来，避免传入相同的参数时重复计算。

一般来说，LRU 缓存只在当你想要重用之前计算的结果时使用。因此，用它缓存具有副作用的函数、需要在每次调用时创建不同、易变的对象的函数或者诸如 time() 或 random() 之类的不纯函数是没有意义的。

静态 Web 内容的 LRU 缓存示例:

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'

>>> for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
...     pep = get_pep(n)
...     print(n, len(pep))

>>> get_pep.cache_info()
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

以下是使用缓存通过动态规划计算斐波那契数列的例子。

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

>>> [fib(n) for n in range(16)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

>>> fib.cache_info()
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

还有一个用 C 语言实现的，更快的，同时兼容 Python2 和 Python3 的第三方模块 fastcache 能够实现同样的功能，且其能支持 TTL。

看一个十分简单的例子来理解缓存：

图1 缓存的简单例子

从结果可以看出，当第二次调用 add(1, 2) 时，并没有真正执行函数体，而是直接返回缓存的结果。

lru_cahce 是将数据缓存到内存中的，其实也可以将数据缓存到磁盘上。以下示例尝试实现了一个基于磁盘的缓存装饰器：

import os
import uuid
import pickle
import shutil
import tempfile
from functools import wraps as func_wraps


class DiskCache(object):
    """缓存数据到磁盘

    实例化参数:
    -----
        cache_path: 缓存文件的路径
    """

    _NAMESPACE = uuid.UUID("c875fb30-a8a8-402d-a796-225a6b065cad")

    def __init__(self, cache_path=None):
        if cache_path:
            self.cache_path = os.path.abspath(cache_path)
        else:
            self.cache_path = os.path.join(tempfile.gettempdir(), ".diskcache")

    def __call__(self, func):
        """返回一个包装后的函数

        如果磁盘中没有缓存，则调用函数获得结果并缓存后再返回
        如果磁盘中有缓存，则直接返回缓存的结果
        """
        @func_wraps(func)
        def wrapper(*args, **kw):
            params_uuid = uuid.uuid5(self._NAMESPACE, "-".join(map(str, (args, kw))))
            key = '{}-{}.cache'.format(func.__name__, str(params_uuid))
            cache_file = os.path.join(self.cache_path, key)

            if not os.path.exists(self.cache_path):
                os.makedirs(self.cache_path)

            try:
                with open(cache_file, 'rb') as f:
                    val = pickle.load(f)
            except Exception:
                val = func(*args, **kw)
                try:
                    with open(cache_file, 'wb') as f:
                        pickle.dump(val, f)
                except Exception:
                    pass
            return val
        return wrapper

    def clear(self, func_name):
        """清理指定函数调用的缓存"""
        for cache_file in os.listdir(self.cache_path):
            if cache_file.startswith(func_name + "-"):
                os.remove(os.path.join(self.cache_path, cache_file))

    def clear_all(self):
        """清理所有缓存"""
        if os.path.exists(self.cache_path):
            shutil.rmtree(self.cache_path)


cache_in_disk = DiskCache()


@cache_in_disk
def add(x, y):
    return x + y

一个 OrderedDict 对于实现 functools.lru_cache() 的变体也很有用:

class LRU(OrderedDict):
    'Limit size, evicting the least recently looked-up key when full'

    def __init__(self, maxsize=128, /, *args, **kwds):
        self.maxsize = maxsize
        super().__init__(*args, **kwds)

    def __getitem__(self, key):
        value = super().__getitem__(key)
        self.move_to_end(key)
        return value

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        if len(self) > self.maxsize:
            oldest = next(iter(self))
            del self[oldest]

此外，还有一些其他的缓存模块，如 cachelib, cacheout 等等，实际使用需要时可以按需求去选择合适的缓存实现。