Go语言——Map分析
go\src\runtime\hashmap.go
简介
// A map is just a hash table. The data is arranged
// into an array of buckets. Each bucket contains up to
// 8 key/value pairs. The low-order bits of the hash are
// used to select a bucket. Each bucket contains a few
// high-order bits of each hash to distinguish the entries
// within a single bucket.
//
// If more than 8 keys hash to a bucket, we chain on
// extra buckets.
//
// When the hashtable grows, we allocate a new array
// of buckets twice as big. Buckets are incrementally
// copied from the old bucket array to the new bucket array.
//
// Map iterators walk through the array of buckets and
// return the keys in walk order (bucket #, then overflow
// chain order, then bucket index). To maintain iteration
// semantics, we never move keys within their bucket (if
// we did, keys might be returned 0 or 2 times). When
// growing the table, iterators remain iterating through the
// old table and must check the new table if the bucket
// they are iterating through has been moved ("evacuated")
// to the new table.
map就是一个hash表。数据被分配到bucket桶数组中。每个桶有8个kv键值对。hash值的低八位用来选择桶。高八位用来在桶内部区分kv。这个很有意思,因为一个指针就是8位。
如果桶中kv超过8个,就新建一个桶,与之相连。
当hash表需要扩容时,我们每次增长两倍的桶数组。桶会从老的桶数组赋值到新的桶数组。
为了维护迭代语义,我们不会移动桶中的key(如果这么做了,key有可能返回0次或者2次)。扩容时,迭代器迭代老表,并且必须检查新表,是否正在迭代的桶已经撤离到了新表。
struct
type hmap struct {
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
overflow *[2]*[]*bmap
}
重要属性:
- buckets,桶数组,长度2^B
- oldbuckets,老桶数组,扩容时不为nil
- nevacuate,num evacuate,撤离进度,小于它的都已经撤离到新桶数组
- overflow *[2]*[]*bmap,这个挺有意思的,长度为2的数组,元素是桶数组。overflow 保存溢出的桶,即桶超过8个kv。overflow[0]对应buckets溢出,overflow[1]对应oldbuckets溢出。
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
}
桶中只有一个8字节数组,长度为8。tophash保存hash的高8位,根据高8位找到条目。value保存这里就跟大多数map实现不一样,揣测8位就是value的指针?
code
带着上面的理解,进行源代码学习
创建
func makemap(t *maptype, hint int64, h *hmap, bucket unsafe.Pointer) *hmap {
// find size parameter which will hold the requested # of elements
B := uint8(0)
for ; overLoadFactor(hint, B); B++ {
}
buckets := bucket
if B != 0 {
buckets = newarray(t.bucket, 1<<B)
}
// initialize Hmap
if h == nil {
h = (*hmap)(newobject(t.hmap))
}
h.count = 0
h.B = B
h.flags = 0
h.hash0 = fastrand()
h.buckets = buckets
h.oldbuckets = nil
h.nevacuate = 0
h.noverflow = 0
return h
}
根据B,创建长度2^B的桶数组。这里B会受到初始值8和load factor平衡因子的影响。
get
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))
m := bucketMask(h.B)
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
if c := h.oldbuckets; c != nil {
oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb) {
b = oldb
}
}
top := tophash(hash)
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey {
k = *((*unsafe.Pointer)(k))
}
if alg.equal(key, k) {
v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
if t.indirectvalue {
v = *((*unsafe.Pointer)(v))
}
return v, true
}
}
}
return unsafe.Pointer(&zeroVal[0]), false
}
- 利用hash算法得到key的hash值
- 这个位运算没看懂,根据注释应该是利用hash的低八位找到bucket
- 如果
oldbuckets
不为空,即正在执行扩容,所以优先从oldbuckets
读取 - 根据hash的高8位遍历bucket,得到v。(这一大坨位运算没看懂。)
put
//go:linkname reflect_mapassign reflect.mapassign
func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, val unsafe.Pointer) {
p := mapassign(t, h, key)
typedmemmove(t.elem, p, val)
}
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
put过程比较特别,之前说过bucket里面保存的是uint8数组,也就是指针。所以这里需要先给key找到对应的slot,然后将value拷贝到对应的地址。