OC类的探索（二）—类对象的方法缓存

一、前言

上篇文章我们探索了类对象的结构，知道了实例对象和对象方法是存储在类的class_ro_t里。我们知道对象方法sel和方法的实现imp是一一对应，以hash表的方式存储在类对象的cache_t里的，那么方法的存储流程是怎样的，系统又是如何分配内存的，本文我们就来探索一下。

二、`cache_t`的结构和缓存函数

cache_t的完整结构

struct cache_t {
    struct bucket_t *_buckets;//结构体指针， 8字节
    mask_t _mask; //4字节
    mask_t _occupied; //4字节

public: //缓存方法
    struct bucket_t *buckets(); //存储单元
    mask_t mask();//总内存大小
    mask_t occupied();//已使用
    void incrementOccupied();//已使用增长
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);//设置存储单元和mask
    void initializeToEmpty();//初始化

    mask_t capacity();//容量
    bool isConstantEmptyCache();//是否空缓存
    bool canBeFreed();//是否可以释放

    static size_t bytesForCapacity(uint32_t cap);//容量大小
    static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);//结束标志

    void expand();//扩容
    void reallocate(mask_t oldCapacity, mask_t newCapacity);//重新开辟空间
    struct bucket_t * find(cache_key_t key, id receiver);//寻找可用存储单元

    static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));//存储异常
};

bucket_t的结构

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    MethodCacheIMP _imp;
    cache_key_t _key;
#else
    cache_key_t _key;
    MethodCacheIMP _imp;
#endif

public:
    inline cache_key_t key() const { return _key; }
    inline IMP imp() const { return (IMP)_imp; }
    inline void setKey(cache_key_t newKey) { _key = newKey; }
    inline void setImp(IMP newImp) { _imp = newImp; }

    void set(cache_key_t newKey, IMP newImp);
};

cache_t是一个结构体，包含_buckets、_mask和_occupied。public :下方的都是有关方法缓存的函数，下方的探索过程基本都会用到。

_buckets数组，是bucket_t结构体的数组，bucket_t是用来存放方法的SEL内存地址和IMP的
_mask的大小是数组大小 - 1，用作掩码。（因为这里维护的数组大小都是2的整数次幂，所以_mask的二进制位000011, 000111, 001111）刚好可以用作hash取余数的掩码。刚好保证相与后不超过缓存大小。也可以认为是当前能达到的最大index（从0开始的），所以缓存的size（total）是mask+1。
_occupied是当前已缓存的方法数。即数组中已使用了多少位置。

三、方法的缓存流程探索

通过全局搜索cache_t里的缓存函数，最终找到了缓存方法流程的一个函数cache_fill_nolock，而cache_fill_nolock的调用是在cache_fill()函数，cache_fill()又是在lookUpImpOrForward和lookupMethodInClassAndLoadCache方法里调用的，因此我们可以想到是在消息发送objc_msgSend的时候调用了方法缓存。
1、cache_fill_nolock实现如下（看注释）：

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    //没有初始化的类直接return
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    ////可以获取到方法imp，直接return
    if (cache_getImp(cls, sel)) return;

    //获取类的缓存内容
    cache_t *cache = getCache(cls);
    //生成(获取)该方法缓存key
    cache_key_t key = getKey(sel);

    // Use the cache as-is if it is less than 3/4 full
    //已占用的 + 1
    mask_t newOccupied = cache->occupied() + 1;
    //获取缓存总容量
    mask_t capacity = cache->capacity();
    //判断是否有缓存过内容
    if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        //没有缓存过内容，重新开辟空间，最少4字节
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
//调用新方法之后，占用总内存时候大于总容量的四分之三
    else {
        // Cache is too full. Expand it.
        //大于总容量的四分之三，那就扩容
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    //把新方法添加到缓存内（不管有没有扩容，都要把新调用的方法添加到缓存中）
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    //key和imp绑定
    bucket->set(key, imp);
}

2、reallocate ()：重新开辟内存函数

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    //是否要释放旧的内存
    bool freeOld = canBeFreed();

    bucket_t *oldBuckets = buckets();//旧的存储单元
    bucket_t *newBuckets = allocateBuckets(newCapacity);//生成新的存储单元

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    //设置新的buckets和mask
    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    //释放掉老的
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}

3、expand()：扩充函数

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();//缓存线程锁解锁
    
    uint32_t oldCapacity = capacity();//旧的容量
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;//开辟的大小是二倍旧的容量大小

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
    }
    //重新开辟内存
    reallocate(oldCapacity, newCapacity);
}

4、find()：生成key函数

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);

    bucket_t *b = buckets();
    mask_t m = mask();
    // 通过cache_hash函数【begin  = k & m】计算出key值 k 对应的 index值 begin，用来记录查询起始索引
    mask_t begin = cache_hash(k, m);
    // begin 赋值给 i，用于切换索引
    mask_t i = begin;
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            //用这个i从散列表取值，如果取出来的bucket_t的 key = k，则查询成功，返回该bucket_t，
            //如果key = 0，说明在索引i的位置上还没有缓存过方法，同样需要返回该bucket_t，用于中止缓存查询。
            return &b[I];
        }
    } while ((i = cache_next(i, m)) != begin);
    
    // 这一步其实相当于 i = i-1,回到上面do循环里面，相当于查找散列表上一个单元格里面的元素，再次进行key值 k的比较，
    //当i=0时，也就i指向散列表最首个元素索引的时候重新将mask赋值给i，使其指向散列表最后一个元素，重新开始反向遍历散列表，
    //其实就相当于绕圈，把散列表头尾连起来，不就是一个圈嘛，从begin值开始，递减索引值，当走过一圈之后，必然会重新回到begin值，
    //如果此时还没有找到key对应的bucket_t，或者是空的bucket_t，则循环结束，说明查找失败，调用bad_cache方法。
 
    // hack
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}

5、setBucketsAndMask ()：设置buckets&mask

void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds).
    // It is unsafe for objc_msgSend to see old buckets and new mask.
    // Therefore we write new buckets, wait a lot, then write new mask.
    // objc_msgSend reads mask first, then buckets.

    // ensure other threads see buckets contents before buckets pointer
    mega_barrier();//添加线程安全，确保其他线程查看到新的存储单元

    _buckets = newBuckets;
    
    // ensure other threads see new buckets before new mask
    mega_barrier();//添加线程安全，确保其他线程设置新的mask之后，查看新的存储单元
    
    _mask = newMask;
    _occupied = 0;//清空旧的缓存，已占用容量为0
}

6、通过源码流程的分析，我们可以得出方法缓存的流程，如下图：

方法缓存流程

语言描述：当对象调用方法时，首先查看缓存中是否有此方法，没有的话会进入cache_fill_nolock方法。如果是第一次缓存则会开辟一个4字节的空间；如果不是会再判断添加此方法后总占用内存是否大于总容量的3/4，如果大于会扩展总容量为原来的2倍，清空原来缓存的方法；最后生成此方法的newBuckets和key，设置key和imp对应关系，缓存完成。

三、方法缓存流程验证

1、我们第一次先调用init和sayHello方法，然后lldb获取cache_t数据。

调用init和sayHello

断到第一个断点。

(lldb) x/4gx LGPerson.class
0x1000012f8: 0x001d8001000012d1 0x0000000100b36140
0x100001308: 0x0000000100ff06c0 0x0000000100000003
(lldb) p/x 0x1000012f8 + 0x10  //isa和superclass占用16字节
(long) $1 = 0x0000000100001308
(lldb) p (cache_t *)$1  //强转
(cache_t *) $2 = 0x0000000100001308
(lldb) p *$2 //获取cache_t内容
(cache_t) $3 = {
  _buckets = 0x0000000100ff06c0
  _mask = 3   //总容量是3
  _occupied = 1  //已占用容量为1
}
(lldb) p $3._buckets   //获取_buckets
(bucket_t *) $5 = 0x0000000100ff06c0
(lldb) p *$5
(bucket_t) $6 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $5[0]  //第一个
(bucket_t) $7 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $5[1]  //第二个
(bucket_t) $8 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $5[2]  //第三个，获取到了init方法
(bucket_t) $9 = {
  _key = 4309539970
  _imp = 0x00000001003cc660 (libobjc.A.dylib`::-[NSObject init]() at NSObject.mm:2308)
}

断到第二个断点（和上边接着）。

2020-01-03 01:14:46.341406+0800 LGTest[12220:276074] LGPerson say : -[LGPerson sayHello]
(lldb) p *$2  //cache_t数据已经有变化
(cache_t) $10 = {
  _buckets = 0x0000000100ff06c0
  _mask = 3  //总容量3
  _occupied = 2  //已占用由1变为2
}
(lldb) p $10._buckets
(bucket_t *) $12 = 0x0000000100ff06c0
(lldb) p $12[0]
(bucket_t) $13 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $12[1]  //获取到了sayHello
(bucket_t) $14 = {
  _key = 4294971009
  _imp = 0x0000000100000c50 (LGTest`-[LGPerson sayHello] at LGPerson.m:13)
}
(lldb) p $12[2]
(bucket_t) $15 = {
  _key = 4309539970
  _imp = 0x00000001003cc660 (libobjc.A.dylib`::-[NSObject init]() at NSObject.mm:2308)
}
(lldb)

从得出的结果看出，执行过的方法会存储在cache_t里边，是以_key-_imp方法存储的。只有执行了方法才能缓存到cache_t里。

补充：
alloc和class属于类方法，应该是缓存到LGPerson的元类的cache_t里。

2、这次我调用多个方法，看看缓存内存是如何扩充的。

调用多个方法

第一个断点。可以看出总容量已经全部被占用。

2020-01-03 01:28:58.666448+0800 LGTest[12446:282055] LGPerson say : -[LGPerson sayHello]
2020-01-03 01:28:58.667668+0800 LGTest[12446:282055] LGPerson say : -[LGPerson sayCode]
(lldb) x/4gx LGPerson.class
0x100001308: 0x001d8001000012e1 0x0000000100b36140
0x100001318: 0x0000000102246670 0x0000000300000003
(lldb) p/x 0x100001308 + 0x10
(long) $1 = 0x0000000100001318
(lldb) p (cache_t *)$1
(cache_t *) $2 = 0x0000000100001318
(lldb) p *$2
(cache_t) $3 = {
  _buckets = 0x0000000102246670
  _mask = 3
  _occupied = 3
}
(lldb)

第二个断点。

2020-01-03 01:32:34.359532+0800 LGTest[12446:282055] LGPerson say : -[LGPerson sayNB]
(lldb) p *$2
(cache_t) $4 = {
  _buckets = 0x000000010210e110
  _mask = 7
  _occupied = 1
}
(lldb) x/4gx LGPerson.class
0x100001308: 0x001d8001000012e1 0x0000000100b36140
0x100001318: 0x000000010210e110 0x0000000100000007
(lldb) p/x 0x100001308 + 0x10
(long) $6 = 0x0000000100001318
(lldb) p $4._buckets
(bucket_t *) $7 = 0x000000010210e110
(lldb) p $7[0]
(bucket_t) $8 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[1]
(bucket_t) $9 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[2]
(bucket_t) $10 = {
  _key = 4294971026
  _imp = 0x0000000100000ce0 (LGTest`-[LGPerson sayNB] at LGPerson.m:25)
}
(lldb) p $7[3]
(bucket_t) $11 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[4]
(bucket_t) $12 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[4]
(bucket_t) $13 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[5]
(bucket_t) $14 = {
  _key = 0
  _imp = 0x0000000000000000
}
(lldb) p $7[6]
(bucket_t) $15 = {
  _key = 0
  _imp = 0x0000000000000000
}

可以看出，cache_t的_mask和_occupied都有变化，_mask有扩充到原来的二倍，_occupied变为了1，而且是sayNB，说明扩充的时候对之前的缓存做了清理。cache_t的地址是不变的，只是在原地址上做了扩充。

四、总结

类的方法缓存是通过hash表存储的，缓存的目的主要是为了增加方法的调用速度，提高效率。而缓存容量的动态变化，则是为了更为有效的开辟和使用内存空间。