1. cache的结构
- 我们之前探索过Class的结构以及其内部的成员,其中了解到了isa,superClass以及bits的作用,但是剩下的cache,我们只能基本知道,其内部存放的只是一个key和imp的键值对,至于具体的作用我们还不是很清楚
- 首先看一下,cache是一个cache_t结构体,在objc源码的objc-runtime-new.h中可以看到定义,以下就是cache_t的完整结构
struct cache_t {
struct bucket_t *_buckets;
mask_t _mask;
mask_t _occupied;
public:
struct bucket_t *buckets();
mask_t mask();
mask_t occupied();
void incrementOccupied();
void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
void initializeToEmpty();
mask_t capacity();
bool isConstantEmptyCache();
bool canBeFreed();
static size_t bytesForCapacity(uint32_t cap);
static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);
void expand();
void reallocate(mask_t oldCapacity, mask_t newCapacity);
struct bucket_t * find(cache_key_t key, id receiver);
static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));
};
- cache_t的内部定义了三个成员,分别为mask_t类型的 _mask和_occupied,以及一个bucket_t的结构体指针
- 其中mask_t可以看出是一个无符号Int类型,在64位下为uint32_t
- 而bucket_t则是存放着imp和key
#if __LP64__
typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif
struct bucket_t {
private:
// IMP-first is better for arm64e ptrauth and no worse for arm64.
// SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
MethodCacheIMP _imp;
cache_key_t _key;
#else
cache_key_t _key;
MethodCacheIMP _imp;
#endif
public:
inline cache_key_t key() const { return _key; }
inline IMP imp() const { return (IMP)_imp; }
inline void setKey(cache_key_t newKey) { _key = newKey; }
inline void setImp(IMP newImp) { _imp = newImp; }
void set(cache_key_t newKey, IMP newImp);
};
2. cache功能
- 根据名字,大家可以猜想,cache肯定是一种缓存,而且imp又是函数的调用地址,所以可以猜想一样,cache的功能就是对方法进行缓存,从加快之后的方法调用速度
3. cache验证
- 还是在我们的源码工程下,新建一个类,然后调用一下方法sayHello,按照之前的逻辑在lldb调试台上,打印一下bucket的内容,可以看出bucket中的确保存了方法sayHello的imp
2019-12-25 00:39:22.566292+0800 LGTest[3586:42169] LGPerson say : -[LGPerson sayHello]
(lldb) x/4gx pClass
0x1000012e0: 0x001d8001000012b9 0x0000000100b36140
0x1000012f0: 0x0000000101e23c20 0x0000000100000003
(lldb) p (cache_t *)0x1000012f0
(cache_t *) $1 = 0x00000001000012f0
(lldb) p *$1
(cache_t) $2 = {
_buckets = 0x0000000101e23c20
_mask = 3
_occupied = 1
}
(lldb) p $2._buckets
(bucket_t *) $3 = 0x0000000101e23c20
(lldb) p *$3
(bucket_t) $4 = {
_key = 4294971020
_imp = 0x0000000100000c60 (LGTest`-[LGPerson sayHello] at LGPerson.m:13)
}
(lldb)
- 这里要注意一点,可能有人会问,为什么调用了alloc和class,但是这两个方法怎么没有缓存,这里要提到我们之前探索类的方法存储中说到的,对象的方法存在类中,类的类方法以实例方法的形式存在元类中,我们这里探索的是类的cache缓存,所以只能找到实例方法sayHello,下面直接给大家看一下元类里的cache以及bucket,也找到了alloc方法的缓存,这也说明,我们的思路是正确的
(lldb) p/x 0x001d8001000012b9 & 0x00007ffffffffff8ULL
(unsigned long long) $5 = 0x00000001000012b8
// 0x00000001000012b8这个玩意就是元类的地址了,有疑惑的可以看我之前的isa的走向分析,里面介绍到了如何从类查找到元类
(lldb) x/4gx 0x00000001000012b8
0x1000012b8: 0x001d800100b360f1 0x0000000100b360f0
0x1000012c8: 0x0000000101e236c0 0x0000000200000003
(lldb) p (cache_t *)0x1000012c8
(cache_t *) $6 = 0x00000001000012c8
(lldb) p *$6
(cache_t) $7 = {
_buckets = 0x0000000101e236c0
_mask = 3
_occupied = 2
}
(lldb) p $7._buckets
(bucket_t *) $8 = 0x0000000101e236c0
(lldb) p *$8
(bucket_t) $9 = {
_key = 4298994200
_imp = 0x00000001003cc3b0 (libobjc.A.dylib`::+[NSObject alloc]() at NSObject.mm:2294)
}
(lldb)
4. cache的策略
4.1验证缓存是的确存在策略的
-
现在,我们尝试多调用几次类方法,然后继续看看cache和buckets的值
如上图,我们依次调用了 init,sayHello,sayCode,sayNB一共4个实例方法,按照我们的猜测,cache中应该缓存了他们4个方法,我们下面打印输出看了一下,结果发现mask的值的确如我们所想的那样增加了很多,从3增加到了7,但是在buckets存放的值中,只有_buckets[2]中缓存了我们最新调用了的实例方法sayNB,其他位置全部都是空的
那么我们可以推测,cache的缓存并不是无脑的,肯定是在某个条件达成时,进行了一些优化
2019-12-25 00:57:52.143504+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayHello]
2019-12-25 00:57:52.144031+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayCode]
2019-12-25 00:57:52.144133+0800 LGTest[3662:48762] LGPerson say : -[LGPerson sayNB]
(lldb) x/4gx pClass
0x1000012e8: 0x001d8001000012c1 0x0000000100b36140
0x1000012f8: 0x0000000101029950 0x0000000100000007
(lldb) p (cache_t *)0x1000012f8
(cache_t *) $1 = 0x00000001000012f8
(lldb) p *$1
(cache_t) $2 = {
_buckets = 0x0000000101029950
_mask = 7
_occupied = 1
}
(lldb) p $2._buckets
(bucket_t *) $3 = 0x0000000101029950
(lldb) p *$3
(bucket_t) $4 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[0]
(bucket_t) $5 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[1]
(bucket_t) $6 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[2]
(bucket_t) $7 = {
_key = 4294971026
_imp = 0x0000000100000ce0 (LGTest`-[LGPerson sayNB] at LGPerson.m:25)
}
(lldb) p $2._buckets[3]
(bucket_t) $8 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[5]
(bucket_t) $9 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[6]
(bucket_t) $10 = {
_key = 0
_imp = 0x0000000000000000
}
(lldb) p $2._buckets[7]
(bucket_t) $11 = {
_key = 0
_imp = 0x0000000000000000
}
4.2 找出缓存策略
- 那么现在只能回归到源码当中,首先因为mask的值是增加的了,所以我们先找到cache_t当中的mask_t mask()方法,结果发现其只是反回了_mask本身
mask_t cache_t::mask()
{
return _mask;
}
- 继续搜索mask(),发现在capacity方法中有mask的相应操作,但是操作目的不是很明确
mask_t cache_t::capacity()
{
return mask() ? mask()+1 : 0;
}
- 那么现在关注点放到搜索capacity方法上,在扩容方法expand方法内部看到了capacity方法的调用,意思是,如果oldCapacity获取的值为0,那么久用INIT_CACHE_SIZE(1<<2 实际为4)来初始化,如果存在,那么就用oldCapacity的2倍来作为newCapacity,扩容的逻辑我们已经找到
enum {
INIT_CACHE_SIZE_LOG2 = 2,
INIT_CACHE_SIZE = (1 << INIT_CACHE_SIZE_LOG2) //就是4
};
void cache_t::expand()
{
cacheUpdateLock.assertLocked();
uint32_t oldCapacity = capacity();
uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
if ((uint32_t)(mask_t)newCapacity != newCapacity) {
// mask overflow - can't grow further
// fixme this wastes one bit of mask
newCapacity = oldCapacity;
}
reallocate(oldCapacity, newCapacity);
}
- 那么接下来,找到cache在哪里,在什么条件下进行了expand,cache_fill_nolock方法内部,如果newOccupied大于capacity的3/4,则进行扩容,cache->capacity()返回的就是缓存的值(0或者mask+1),
static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
// 好多代码
// Make sure the entry wasn't added to the cache by some other thread
// before we grabbed the cacheUpdateLock.
if (cache_getImp(cls, sel)) return; // 如果有缓存,直接取imp,并且返回
cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);
// Use the cache as-is if it is less than 3/4 full
mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
if (cache->isConstantEmptyCache()) {
// Cache is read-only. Replace it.
cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
}
else if (newOccupied <= capacity / 4 * 3) {
// Cache is less than 3/4 full. Use it as-is.
}
else {
// Cache is too full. Expand it.
cache->expand();
}
// Scan for the first unused slot and insert there.
// There is guaranteed to be an empty slot because the
// minimum size is 4 and we resized at 3/4 full.
bucket_t *bucket = cache->find(key, receiver);
if (bucket->key() == 0) cache->incrementOccupied();
bucket->set(key, imp);
}
- 到这里,还是没有解决,为什么bucke中只缓存了一个sayNB的问题,这里让我们看expand方法的最后,reallocate(oldCapacity, newCapacity)方法,在reallocate方法中,首先使用newCapacity初始化了一个newBuckets,之后设置了新的buckets以及mask,并且最后释放了旧的oldBuckets,这里之所以直接用newBuckets代替,而不是用追加或者修改oldBuckets的方式,主要还是为了安全以及执行效率
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
bool freeOld = canBeFreed();
bucket_t *oldBuckets = buckets();
bucket_t *newBuckets = allocateBuckets(newCapacity);
// Cache's old contents are not propagated.
// This is thought to save cache memory at the cost of extra cache fills.
// fixme re-measure this
assert(newCapacity > 0);
assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
// -1 是一种算法,为了提前扩容,更安全
setBucketsAndMask(newBuckets, newCapacity - 1);
if (freeOld) {
cache_collect_free(oldBuckets, oldCapacity);
cache_collect(false);
}
}
- 在上面的cache_fill_nolock方法内部,可以发现,expand之后,才会把把最新的imp和key缓存了下来,这里就解释了为什么cache中仅仅只留下了最新的sayNB方法,这里就适用了LRU算法,把最近调用过的方法缓存下来
bucket_t *bucket = cache->find(key, receiver);
if (bucket->key() == 0) cache->incrementOccupied();
bucket->set(key, imp);
知识扩展
- 最后延伸一下,关于cache_fill_nolock的调用时机,我们在源码中可以看到,是在cache_fill中进行了调用,其中cache_fill,我也追踪源码发现,其调用时机其实是在method lookup的过程中调用的,而方法查找则要牵扯到OC底层的objc_msgSend,也就是消息发送机制,所以我们姑且可以认为,在消息发送的过程中,先通过缓存查找imp,如果查找到就直接调用,如果没有,那么就进行缓存。
void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if !DEBUG_TASK_THREADS
mutex_locker_t lock(cacheUpdateLock);
cache_fill_nolock(cls, sel, imp, receiver);
#else
_collecting_in_critical();
return;
#endif
}
/* method lookup */
extern IMP lookUpImpOrNil(Class, SEL, id obj, bool initialize, bool cache, bool resolver);
extern IMP lookUpImpOrForward(Class, SEL, id obj, bool initialize, bool cache, bool resolver);
总结
- Class中的Cache主要是为了在消息发送的过程中,进行方法的缓存,加快调用效率,其中使用了动态扩容的方法,当容量达到最大值的3/4时,开始2倍扩容,扩容时会完全抹除旧的buckets,并且创建新的buckets代替,之后把最近一次临界的imp和key缓存进来,经典的LRU算法案例~
- 那么此次对于cache的分析就到这里,如果有不足的地方,还请大家留言沟通,我会及时更改~
- 诙谐学习,不干不燥~