go基础之map-迭代（四）-技术圈

写在之前


for k, v := range m3 {        fmt.Println(k, v)}

就是go的map的迭代方法，查看该代码的字节码，发现它调用了底层runtime.mapiterinit方法。本篇会详细分析map的迭代方法。如果想详细查看源码的注释，可以查看我的GitHub,欢迎批评指正。我的打算是把一些常用的数据结构都分析一遍，如果有志同道合的人，可以联系我。

hiter结构体

这个结构体非常重要，该结构体记录了迭代过程中当前的信息，在迭代下个数据的时候会使用当前的hiter。让我们看看他的结构体:

// A hash iteration structure.// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate// the layout of this structure.// 8 * 9 + 4 * 1 + (偏移4位) + 8 * 2 = 8 * 12type hiter struct {    // 8个字节    key unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/internal/gc/range.go).    // 8个字节    elem unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).    // 8个字节    t *maptype    // 8个字节    h *hmap    // 8个字节    buckets unsafe.Pointer // bucket ptr at hash_iter initialization time    // 8个字节    bptr *bmap // current bucket    // 8个字节    overflow *[]*bmap // keeps overflow buckets of hmap.buckets alive    // 8个字节    oldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alive    // 8个字节    startBucket uintptr // bucket iteration started at    // 1个字节    offset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)    // 1个字节    wrapped bool // already wrapped around from end of bucket array to beginning    // 1个字节    B uint8    // 1个字节    i uint8    // 8个字节    bucket uintptr    // 8个字节    checkBucket uintptr}

我用人话简答说明下字段的意思:

key和elem表示当前跌倒获取到的key和value，直白的说fmt.Println(k, v)打印出来的值。
buckets就是hmap的buckets。
bptr比较清晰就是表示当前我正在迭代哪个bucket。
overflow和oldoverflow很容易联想到和hamp的extral里面的overflow和oldoverflow，我这里忽略他就不分析了。
startBucket表示从哪个bucket开始迭代，这个startBucket在每次迭代的开始设置的，而且这个值是不确定的。也就是说每次for k, v := range m3的迭代打印出来的k的顺序很有可能都不是一致的。
offset表示从startBucket这个bucket的第几个key开始迭代。
wrapped表示已经迭代到最后一个桶了。
B无需多说，i表示当前迭代的桶的进度了。
bucket会记录当前迭代到哪个桶了，如果该值等于startBucket值了，表示就迭代完毕了。
checkBucket这个字段的用处在下面的分析过程中说明，单独拎出来讲不清楚。

前景提要说明白了，就开始分析mapiterinit这个入口方法了:


// mapiterinit initializes the hiter struct used for ranging over maps.// The hiter struct pointed to by 'it' is allocated on the stack// by the compilers order pass or on the heap by reflect_mapiterinit.// Both need to have zeroed hiter since the struct contains pointers.func mapiterinit(t *maptype, h *hmap, it *hiter) {    ...    size := unsafe.Sizeof(hiter{})    // 意义在哪里    if size/sys.PtrSize != 12 {        throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go    }    it.t = t    it.h = h
    // grab snapshot of bucket state    it.B = h.B    it.buckets = h.buckets    if t.bucket.ptrdata == 0 {        // Allocate the current slice and remember pointers to both current and old.        // This preserves all relevant overflow buckets alive even if        // the table grows and/or overflow buckets are added to the table        // while we are iterating.        h.createOverflow()        it.overflow = h.extra.overflow        it.oldoverflow = h.extra.oldoverflow    }
    // decide where to start    r := uintptr(fastrand())    if h.B > 31-bucketCntBits {        r += uintptr(fastrand()) << 31    }    // 随机找到一个开始的桶    it.startBucket = r & bucketMask(h.B)    // 随机找到一个开始的桶的数据    it.offset = uint8(r >> h.B & (bucketCnt - 1))
    // iterator state    it.bucket = it.startBucket
    // Remember we have an iterator.    // Can run concurrently with another mapiterinit().    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {        atomic.Or8(&h.flags, iterator|oldIterator)    }
    mapiternext(it)}

简单说明几点:

第9行我想破脑袋也不知道它这么做的意义在哪里，看我对hiter的字节的计算，很明显size/sys.PtrSize ==12。难道是为了内存对齐不想有人改动该结构体的字段顺序？有懂的大佬在下面留言一起讨论。
18~26行无需多言，看过我前面的map系列文章，自然就明白。
33~36行随机找到一个开始的桶和桶里面的一个随机开始的key。
44行标记该map正在被迭代中。

下面就是迭代的重点了：


func mapiternext(it *hiter) {    h := it.h    if raceenabled {        callerpc := getcallerpc()        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))    }    if h.flags&hashWriting != 0 {        throw("concurrent map iteration and map write")    }    t := it.t    bucket := it.bucket    b := it.bptr    i := it.i    checkBucket := it.checkBucket    ...

上面说了在迭代下个key和value的时候会传递hiter进来，这里就说明了原因：会把h、t、bucket、b、i、checkBucket赋值给局部变量。
接下来到了next代码块，该代码块正如它的命名一样，就是不断的获取下一个迭代的值。我主要把他分成2部分分别解析:

it.bptr为空的情况，也就是b == nil的情况。
获取下个迭代的值。

当前迭代的桶为空(刚开始迭代)

b为空的时候有2中情况，这里先说第一种情况：刚开始迭代这个值就为空。


 if b == nil {        // 循环了一圈，到开始迭代的桶，那么就结束了        if bucket == it.startBucket && it.wrapped {            // end of iteration            it.key = nil            it.elem = nil            return        }        if h.growing() && it.B == h.B {            // Iterator was started in the middle of a grow, and the grow isn't done yet.            // If the bucket we're looking at hasn't been filled in yet (i.e. the old            // bucket hasn't been evacuated) then we need to iterate through the old            // bucket and only return the ones that will be migrated to this bucket.            // 在迁移过程中迭代可能有些数据并不会迭代出来，如果老桶的第一个已经被迁移了，但是后面的还没迭代，此时去迭代新桶就会忘掉一些数据            oldbucket := bucket & it.h.oldbucketmask()            b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))            // 如果还没开始搬迁的话那么就去迭代老桶            if !evacuated(b) {                checkBucket = bucket            } else {                // 否则还是跌打新扩容之后的桶                b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))                checkBucket = noCheck            }        } else {            b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))            checkBucket = noCheck        }        bucket++        // 迭代到最后一个桶了，标记wrapped为true，置bucket为0，从第一个开始迭代        if bucket == bucketShift(it.B) {            bucket = 0            it.wrapped = true        }        i = 0    }

第一次开始迭代，在不存在扩容的情况下，26-27行就会找到桶的内存地址。checkBucket标记为noCheck。但是当hamp正在扩容的时候，10~24行得到要迭代的桶:如果新桶对应的老桶已经被迁移，那么就直接迭代新桶，如果没有被迁移就迭代老桶。如果迭代老桶，那么checkBucket设置为新桶的桶号。否者就是正常迭代，设置为固定值noCheck。
29行bucket+1,表示下个即将迭代的桶的序号。
30~34行判断了如果迭代最后一个桶了，那么就标记wrapped为true，要去从头开始迭代。
由于是新桶，所以把i设置为0。
2~8行我的注释直接了当的说明了意图。wraped为true表示我从startBucket开始迭代到最后一个桶之后，然后bucket从0开始不断迭代，没迭代一个桶就+1，最后等于it.startBucket之后就说明已经迭代了一圈了，所以本次所有key和value都迭代完成了。

b为空的第二种情况在接下来的分析说明，这里暂不赘述。

迭代下个k/v


for ; i < bucketCnt; i++ {        // 弄个偏移有意思吗？还不是要循环8次        offi := (i + it.offset) & (bucketCnt - 1)        println("i:", i)        println("offi：", offi)        if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {            // TODO: emptyRest is hard to use here, as we start iterating            // in the middle of a bucket. It's feasible, just tricky.            continue        }        k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))        if t.indirectkey() {            k = *((*unsafe.Pointer)(k))        }        e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))        // 不等的话说明是迭代老桶了        // 而且还不是同数量桶扩容，那么就要判断老桶的数据是否是迁移到当前的新桶，如果不是则忽略该老桶的当前位置的key        if checkBucket != noCheck && !h.sameSizeGrow() {            // Special case: iterator was started during a grow to a larger size            // and the grow is not done yet. We're working on a bucket whose            // oldbucket has not been evacuated yet. Or at least, it wasn't            // evacuated when we started the bucket. So we're iterating            // through the oldbucket, skipping any keys that will go            // to the other new bucket (each oldbucket expands to two            // buckets during a grow).            if t.reflexivekey() || t.key.equal(k, k) {                // If the item in the oldbucket is not destined for                // the current new bucket in the iteration, skip it.                hash := t.hasher(k, uintptr(h.hash0))                if hash&bucketMask(it.B) != checkBucket {                    continue                }            } else {                // Hash isn't repeatable if k != k (NaNs).  We need a                // repeatable and randomish choice of which direction                // to send NaNs during evacuation. We'll use the low                // bit of tophash to decide which way NaNs go.                // NOTE: this case is why we need two evacuate tophash                // values, evacuatedX and evacuatedY, that differ in                // their low bit.                // 不建议在设置key的时候用这种类型，在java中一般都是用不可变对象做key                if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {                    continue                }            }        }        if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||            !(t.reflexivekey() || t.key.equal(k, k)) {            // This is the golden data, we can return it.            // OR            // key!=key, so the entry can't be deleted or updated, so we can just return it.            // That's lucky for us because when key!=key we can't look it up successfully.            // 找到了key和value            it.key = k            if t.indirectelem() {                // 如果value是个指针的，则剖析出值                e = *((*unsafe.Pointer)(e))            }            it.elem = e        } else {            // The hash table has grown since the iterator was started.            // The golden data for this key is now somewhere else.            // Check the current hash table for the data.            // This code handles the case where the key            // has been deleted, updated, or deleted and reinserted.            // NOTE: we need to regrab the key as it has potentially been            // updated to an equal() but not identical key (e.g. +0.0 vs -0.0).            // 同上            rk, re := mapaccessK(t, h, k)            if rk == nil {                continue // key has been deleted            }            it.key = rk            it.elem = re        }        it.bucket = bucket        if it.bptr != b { // avoid unnecessary write barrier; see issue 14921            // bucket变了            it.bptr = b        }        // 迭代下一个        it.i = i + 1        it.checkBucket = checkBucket        return    }    // 迭代逸出桶    b = b.overflow(t)    i = 0    goto next

这里面主要分了3种情况：

扩容情况的迭代。
正常情况。
有逸出桶的情况。

正常情况

6~10行是一种容错处理，我们不应该去迭代一个被迁移过的桶，有可能你在迭代老桶的过程当中，该老桶被迁移了。
11~15行找到key和value的内存地址，常规操作。
47~60行把找到的key和value放到hiter里面返回回去，这里加我一个个人的理解，就是不建议用指针做key，因为这可能会导致2个明明指针指向的值是相等的，却找不到key对应的value，这是因为key!=key,在java中一般都是用不可变对象做key，如String、int等。t.reflexivekey() || t.key.equal(k, k)大概就说的这个意思。
77~80行it.bptr != b代表了已经跌倒下一个桶了，所以要把当前迭代的桶放到bptr里面去，方便下次迭代。
### 溢出情况
该情况很容易理解，you逸出桶肯定会迭代该桶的逸出桶，指导逸出桶被迭代完。


    // 迭代逸出桶    b = b.overflow(t)

当b为空的时候就是被迭代完毕，那么就要回到if b == nil这里了，这里就是上边卖的一个关子，就是第二种情况：逸出桶被迭代完了，那么就要继续迭代下个桶。

扩容情况的迭代

16~64行就是扩容情况的桶的选择。上面介绍过checkBucket明显不等于noCheck，而等于新桶的序号。
下面说明下29~32行的代码，这里有点难懂:


                hash := t.hasher(k, uintptr(h.hash0))                if hash&bucketMask(it.B) != checkBucket {                    continue                }

当这个老桶里面的key没迁移到当前我要迭代的新桶的时候，我就忽略这个key，这个想想也是合理的。这个key本该迁移到其他新桶的话，那我就在迭代那个新桶我自然也会迭代到这个key。

总结

终于把map的所有基本操作都分析完成了，感觉最核心还是map的增加key的情况，搞定这部分，其他操作都很容易理解了。其中map用到了很多标志位和二进制操作，理解起来是真的烧脑，但是配以我在《go基础之map-写在前面（一）》介绍的一些调试技巧，还是能理解到go的map的设计思想，当然我的理解也许比较浅，也许有纰漏之处，希望不吝指教。

References

[1] 我的GitHub: https://github.com/funnycode-org/go-sourcecode/tree/main/runtime