iOS方法缓存-cache

1. cache的结构

我们之前探索过Class的结构以及其内部的成员，其中了解到了isa，superClass以及bits的作用，但是剩下的cache，我们只能基本知道，其内部存放的只是一个key和imp的键值对，至于具体的作用我们还不是很清楚
首先看一下，cache是一个cache_t结构体，在objc源码的objc-runtime-new.h中可以看到定义，以下就是cache_t的完整结构

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;

public:
    struct bucket_t *buckets();
    mask_t mask();
    mask_t occupied();
    void incrementOccupied();
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    void initializeToEmpty();

    mask_t capacity();
    bool isConstantEmptyCache();
    bool canBeFreed();

    static size_t bytesForCapacity(uint32_t cap);
    static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);

    void expand();
    void reallocate(mask_t oldCapacity, mask_t newCapacity);
    struct bucket_t * find(cache_key_t key, id receiver);

    static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));
};

cache_t的内部定义了三个成员，分别为mask_t类型的 _mask和_occupied，以及一个bucket_t的结构体指针
其中mask_t可以看出是一个无符号Int类型，在64位下为uint32_t
而bucket_t则是存放着imp和key

#if __LP64__
typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    MethodCacheIMP _imp;
    cache_key_t _key;
#else
    cache_key_t _key;
    MethodCacheIMP _imp;
#endif

public:
    inline cache_key_t key() const { return _key; }
    inline IMP imp() const { return (IMP)_imp; }
    inline void setKey(cache_key_t newKey) { _key = newKey; }
    inline void setImp(IMP newImp) { _imp = newImp; }

    void set(cache_key_t newKey, IMP newImp);
};

2. cache功能

根据名字，大家可以猜想，cache肯定是一种缓存，而且imp又是函数的调用地址，所以可以猜想一样，cache的功能就是对方法进行缓存，从加快之后的方法调用速度

3. cache验证

还是在我们的源码工程下，新建一个类，然后调用一下方法sayHello，按照之前的逻辑在lldb调试台上，打印一下bucket的内容，可以看出bucket中的确保存了方法sayHello的imp

image.png

2019-12-25 00:39:22.566292+0800 LGTest[3586:42169] LGPerson say : -[LGPerson sayHello]
(lldb) x/4gx pClass
0x1000012e0: 0x001d8001000012b9 0x0000000100b36140
0x1000012f0: 0x0000000101e23c20 0x0000000100000003
(lldb) p (cache_t *)0x1000012f0
(cache_t *) $1 = 0x00000001000012f0
(lldb) p *$1
(cache_t) $2 = {
  _buckets = 0x0000000101e23c20
  _mask = 3
  _occupied = 1
}
(lldb) p $2._buckets
(bucket_t *) $3 = 0x0000000101e23c20
(lldb) p *$3
(bucket_t) $4 = {
  _key = 4294971020
  _imp = 0x0000000100000c60 (LGTest`-[LGPerson sayHello] at LGPerson.m:13)
}
(lldb)

这里要注意一点，可能有人会问，为什么调用了alloc和class，但是这两个方法怎么没有缓存，这里要提到我们之前探索类的方法存储中说到的，对象的方法存在类中，类的类方法以实例方法的形式存在元类中，我们这里探索的是类的cache缓存，所以只能找到实例方法sayHello，下面直接给大家看一下元类里的cache以及bucket，也找到了alloc方法的缓存，这也说明，我们的思路是正确的

(lldb) p/x 0x001d8001000012b9 & 0x00007ffffffffff8ULL
(unsigned long long) $5 = 0x00000001000012b8
// 0x00000001000012b8这个玩意就是元类的地址了，有疑惑的可以看我之前的isa的走向分析，里面介绍到了如何从类查找到元类
(lldb) x/4gx 0x00000001000012b8
0x1000012b8: 0x001d800100b360f1 0x0000000100b360f0
0x1000012c8: 0x0000000101e236c0 0x0000000200000003
(lldb) p (cache_t *)0x1000012c8
(cache_t *) $6 = 0x00000001000012c8
(lldb) p *$6
(cache_t) $7 = {
  _buckets = 0x0000000101e236c0
  _mask = 3
  _occupied = 2
}
(lldb) p $7._buckets
(bucket_t *) $8 = 0x0000000101e236c0
(lldb) p *$8
(bucket_t) $9 = {
  _key = 4298994200
  _imp = 0x00000001003cc3b0 (libobjc.A.dylib`::+[NSObject alloc]() at NSObject.mm:2294)
}
(lldb)

4. cache的策略

4.1验证缓存是的确存在策略的

现在，我们尝试多调用几次类方法，然后继续看看cache和buckets的值

image.png
如上图，我们依次调用了 init，sayHello，sayCode，sayNB一共4个实例方法，按照我们的猜测，cache中应该缓存了他们4个方法，我们下面打印输出看了一下，结果发现mask的值的确如我们所想的那样增加了很多，从3增加到了7，但是在buckets存放的值中，只有_buckets[2]中缓存了我们最新调用了的实例方法sayNB，其他位置全部都是空的
那么我们可以推测，cache的缓存并不是无脑的，肯定是在某个条件达成时，进行了一些优化

2019-12-25 00:57:52.143504+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayHello]
2019-12-25 00:57:52.144031+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayCode]
2019-12-25 00:57:52.144133+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayNB]
(lldb) x/4gx pClass
0x1000012e8: 0x001d8001000012c1 0x0000000100b36140
0x1000012f8: 0x0000000101029950 0x0000000100000007
(lldb) p (cache_t *)0x1000012f8
(cache_t *) $1 = 0x00000001000012f8
(lldb) p *$1
(cache_t) $2 = {
  _buckets = 0x0000000101029950
  _mask = 7
  _occupied = 1
}
(lldb) p $2._buckets
(bucket_t *) $3 = 0x0000000101029950
(lldb) p *$3
(bucket_t) $4 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[0]
(bucket_t) $5 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[1]
(bucket_t) $6 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[2]
(bucket_t) $7 = {
  _key = 4294971026
  _imp = 0x0000000100000ce0 (LGTest`-[LGPerson sayNB] at LGPerson.m:25)
}
(lldb) p $2._buckets[3]
(bucket_t) $8 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[5]
(bucket_t) $9 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[6]
(bucket_t) $10 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $2._buckets[7]
(bucket_t) $11 = {
  _key = 0
  _imp = 0x0000000000000000
}

4.2 找出缓存策略

那么现在只能回归到源码当中，首先因为mask的值是增加的了，所以我们先找到cache_t当中的mask_t mask()方法，结果发现其只是反回了_mask本身

mask_t cache_t::mask() 
{
    return _mask; 
}

继续搜索mask()，发现在capacity方法中有mask的相应操作，但是操作目的不是很明确

mask_t cache_t::capacity() 
{
    return mask() ? mask()+1 : 0; 
}

那么现在关注点放到搜索capacity方法上，在扩容方法expand方法内部看到了capacity方法的调用，意思是，如果oldCapacity获取的值为0，那么久用INIT_CACHE_SIZE（1<<2 实际为4）来初始化，如果存在，那么就用oldCapacity的2倍来作为newCapacity，扩容的逻辑我们已经找到

enum {
    INIT_CACHE_SIZE_LOG2 = 2,
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2) //就是4
};

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
    }

    reallocate(oldCapacity, newCapacity);
}

那么接下来，找到cache在哪里，在什么条件下进行了expand，cache_fill_nolock方法内部，如果newOccupied大于capacity的3/4，则进行扩容，cache->capacity()返回的就是缓存的值（0或者mask+1），

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
     // 好多代码

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return; // 如果有缓存，直接取imp，并且返回

    cache_t *cache = getCache(cls);
    cache_key_t key = getKey(sel);

    // Use the cache as-is if it is less than 3/4 full
    mask_t newOccupied = cache->occupied() + 1;
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);
}

到这里，还是没有解决，为什么bucke中只缓存了一个sayNB的问题，这里让我们看expand方法的最后，reallocate(oldCapacity, newCapacity)方法，在reallocate方法中，首先使用newCapacity初始化了一个newBuckets，之后设置了新的buckets以及mask，并且最后释放了旧的oldBuckets，这里之所以直接用newBuckets代替，而不是用追加或者修改oldBuckets的方式，主要还是为了安全以及执行效率

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    bool freeOld = canBeFreed();

    bucket_t *oldBuckets = buckets();
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    // -1 是一种算法，为了提前扩容，更安全
    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}

在上面的cache_fill_nolock方法内部，可以发现，expand之后，才会把把最新的imp和key缓存了下来，这里就解释了为什么cache中仅仅只留下了最新的sayNB方法，这里就适用了LRU算法，把最近调用过的方法缓存下来

bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);

知识扩展

最后延伸一下，关于cache_fill_nolock的调用时机，我们在源码中可以看到，是在cache_fill中进行了调用，其中cache_fill，我也追踪源码发现，其调用时机其实是在method lookup的过程中调用的，而方法查找则要牵扯到OC底层的objc_msgSend，也就是消息发送机制，所以我们姑且可以认为，在消息发送的过程中，先通过缓存查找imp，如果查找到就直接调用，如果没有，那么就进行缓存。

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if !DEBUG_TASK_THREADS
    mutex_locker_t lock(cacheUpdateLock);
    cache_fill_nolock(cls, sel, imp, receiver);
#else
    _collecting_in_critical();
    return;
#endif
}

/* method lookup */
extern IMP lookUpImpOrNil(Class, SEL, id obj, bool initialize, bool cache, bool resolver);
extern IMP lookUpImpOrForward(Class, SEL, id obj, bool initialize, bool cache, bool resolver);

总结

Class中的Cache主要是为了在消息发送的过程中，进行方法的缓存，加快调用效率，其中使用了动态扩容的方法，当容量达到最大值的3/4时，开始2倍扩容，扩容时会完全抹除旧的buckets，并且创建新的buckets代替，之后把最近一次临界的imp和key缓存进来，经典的LRU算法案例~
那么此次对于cache的分析就到这里，如果有不足的地方，还请大家留言沟通，我会及时更改~
诙谐学习，不干不燥~

iOS方法缓存-cache

1. cache的结构

2. cache功能

3. cache验证

4. cache的策略

4.1验证缓存是的确存在策略的

4.2 找出缓存策略

知识扩展

总结

推荐阅读更多精彩内容