【十二】Golang 映射

💢欢迎来到张胤尘的开源技术站
💥开源如江河，汇聚众志成。代码似星辰，照亮行征程。开源精神长，传承永不忘。携手共前行，未来更辉煌💥

文章目录

映射
- 映射的定义
- 映射初始化
- - `make` 函数
  - 使用字面量
- 源码分析
- - 数据结构
  - - `hmap`
    - `bmap`
  - 数据存储
  - 键值访问
  - - 竞态检测
    - `Sanitizer` 检测
    - 空检查
    - 并发写检查
    - 哈希值计算
    - 桶定位
    - 扩容情况处理
    - 桶内查找
  - 键值插入、扩容机制
  - - 参数检查
    - 竞态检测
    - `Sanitizer` 检测
    - 并发检测
    - 哈希值计算
    - 初始化桶数组
    - 定位桶和处理扩容
    - 遍历桶并检查键
    - 检查是否需要扩容
    - - `overLoadFactor`
      - `tooManyOverflowBuckets`
    - 插入或更新键值对
    - - 插入位置为空
      - 存储新的键值对
    - 返回值的地址
  - 键值删除
  - - 竞态检测
    - `Sanitizer` 检测
    - 参数检查
    - 并发检测
    - 设置写入标记
    - 计算哈希值和目标桶
    - 处理扩容
    - 定位目标桶
    - 查找目标键
    - 删除目标键值对
    - 清理空闲槽
    - 重置哈希种子
    - 清除写入标记
- 常用操作
- - 访问映射值
  - 检查键是否存在
  - 修改映射的值
  - 添加键值对
  - 删除键值对
  - 遍历映射
  - 获取映射长度
  - 映射嵌套
  - 映射的拷贝
  - 清空映射
- 常见问题
- - 检查键是否存在
  - 初始化映射时指定容量
  - 避免并发修改
  - 不使用映射存储大量的数据

映射

在 golagn 中，映射是一种键值对的集合，其中每个键（key）都唯一地映射到一个值（value）。它类似于 c/c++ 中的 std::unordered_map，但是具体实现上还是有很大的区别，下面会进行详细的剖析。

映射的定义

映射的类型定义为：

map[KeyType]ValueType

KeyType：键的类型，必须是可比较的（可以进行 == 或 != 比较）。其中常见的可比较类型包括：
- 基本类型：int, float64, string, bool 等。
- 复合类型：指针、通道、接口、包含可比较字段的结构体。
ValueType：值的类型可以是任何类型，包括基本类型、自定义类型、切片、结构体、映射等。

映射初始化

在 golang 中提供了两种初始化映射的方式：make 函数、使用字面量。

`make` 函数

make 是 golang 中用于初始化切片、映射和通道的标准函数。对于映射来说，make 的语法如下：

m := make(map[KeyType]ValueType)

例如，创建一个空的映射，其中键为 string 类型，值为 int 类型，如下所示：

m := make(map[string]int)

另外在 golang 中可以为映射指定初始容量（如果不指定，默认容量为零）。

例如，创建一个初始容量为 10 的映射，如下所示：

m := make(map[string]int, 10)

这种方式创建映射中的键值对数量为 0，后续可以通过赋值操作添加键值对。

使用字面量

映射的字面量语法允许在声明时直接初始化键值对。这种方式适用于在编译时已知数据的情况。

语法如下所示：

m := map[KeyType]ValueType{Key1: Value1,Key2: Value2,// ...
}

例如：

m := map[string]int{"a": 1,"b": 2,"c": 3,
}

源码分析

下面主要针对 map 中的数据结构、数据存储、键值访问、键值插入和扩容机制、键值删除五大部分源码进行剖析。

数据结构

在 map 的内部由两个核心数据结构组成：hmap 和 bmap 。

`hmap`

hmap 是 golang 中 map 的底层核心数据结构之一，它封装了哈希表的所有关键信息。源码如下所示：

源码位置：src/runtime/map.go

// A header for a Go map.
type hmap struct {// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.// Make sure this stays in sync with the compiler's definition.count     int // # live cells == size of map.  Must be first (used by len() builtin)flags     uint8B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for detailshash0     uint32 // hash seedbuckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growingnevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)extra *mapextra // optional fields
}

count：表示当前 map 中存储的键值对数量，len() 直接访问它。
flags：用于存储一些标志位，例如是否正在扩容、是否需要清理等。
B：表示桶的数量为 2^B。例如，如果 B = 3，则桶的数量为 2^3 = 8。B 的值决定了 buckets 数组的大小。
noverflow：表示溢出桶的数量。当一个桶满了（最多存储 8 对键值对），会创建一个溢出桶，并通过链表连接。
hash0：哈希种子，用于计算键的哈希值。通过哈希种子可以避免哈希冲突，增加哈希值的随机性。
buckets：指向底层桶数组的指针，桶数组的大小为 2^B，每个桶是一个 bmap 结构体。
oldbuckets：在扩容时，旧的桶数组会存储在这里。新的桶数组大小是旧数组的两倍。扩容完成后，oldbuckets 会被释放。
nevacuate：用于记录扩容进度的计数器。表示已经迁移完成的桶数量。
extra：存储一些可选字段。

`bmap`

bmap 是存储键值对的“桶”，每个桶可以存储最多 8 对键值对。源码如下所示：

源码位置：src/runtime/map.go

// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [abi.MapBucketCount]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}

tophash：abi.MapBucketCount 是一个常量，值为 8。存储每个键的哈希值的最高字节（top byte）。特殊情况：如果 tophash[0] < minTopHash，则表示该桶处于迁移状态，而不是存储键的哈希值。

// Maximum number of key/elem pairs a bucket can hold.
MapBucketCountBits = 3 // log2 of number of elements in a bucket.
MapBucketCount     = 1 << MapBucketCountBits

minTopHash     = 5 // minimum tophash for a normal filled cell.

在 tophash 之后，bmap 会依次存储键和值。键和值的存储方式是连续的：先存储所有键（bucketCnt 个键），然后存储所有值（bucketCnt 个值）。这种设计可以减少内存对齐时的填充，从而节省内存。例如，在 map[int64]int8 的情况下，如果键和值交替存储，可能会因为对齐问题浪费内存。
最后在每个 bmap 的末尾包含一个指向下一个溢出桶的指针（overflow）。当一个桶满了（最多存储 8 对键值对）时，会通过链表的方式创建一个溢出桶，用于存储额外的键值对。

数据存储

例如：在64位平台上有一个 map[int]string，键的大小为 8 字节（int64），值的大小为 16 字节（指向字符串数据的指针（8 字节）和字符串的长度（8 字节））。一个 bmap 的内存布局如下所示：

在这里插入图片描述

键值访问

由 map 数据存储模型可知，因为键和值的存储是动态的，访问键和值时就需要通过指针偏移来实现。源码内容如下所示：

源码位置：src/runtime/map.go

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {// ...
}

t *maptype：map 的类型信息，例如键的类型、值的类型、哈希函数等。
h *hmap： map 的底层结构，包含哈希表的元数据（如桶数组、键值对数量等）。
key unsafe.Pointer：查找目标键的指针。
返回值：返回指向值的指针。如果键不存在，则返回值类型的零值的指针。

对于 mapaccess1 源码的流程进行了如下步骤的拆解：

竞态检测
Sanitizer 检测
空检查
并发写检查
哈希值计算
桶定位
扩容情况处理
桶查找

下面针对每一步骤进行详细说明。

竞态检测

如果启用了竞态检测，记录当前操作的上下文，以便在发生竞态时能够追踪。

if raceenabled && h != nil {callerpc := getcallerpc()pc := abi.FuncPCABIInternal(mapaccess1)racereadpc(unsafe.Pointer(h), callerpc, pc)raceReadObjectPC(t.Key, key, callerpc, pc)
}

`Sanitizer` 检测

msanread 会标记对 key 的读取操作，确保该内存区域已经被正确初始化。防止程序读取未初始化的内存，从而避免潜在的未定义行为。

asanread 会检查对 key 的读取操作是否安全，例如是否越界或是否访问了已释放的内存。

if msanenabled && h != nil {msanread(key, t.Key.Size_)
}
if asanenabled && h != nil {asanread(key, t.Key.Size_)
}

空检查

如果 map 为空，直接返回值类型的零值。

if h == nil || h.count == 0 {if err := mapKeyError(t, key); err != nil {panic(err) // see issue 23734}return unsafe.Pointer(&zeroVal[0])
}

并发写检查

如果 map 正在被写入（hashWriting 标志被设置），抛出并发读写错误，这一点也表明 map 并不支持并发安全。

if h.flags&hashWriting != 0 {fatal("concurrent map read and map write")
}

哈希值计算

使用键的哈希函数计算哈希值，其中 h.hash0 是哈希种子，用于避免哈希冲突。

hash := t.Hasher(key, uintptr(h.hash0))

桶定位

代码中使用哈希值的低 B 位（bucketMask(h.B)）定位到初始桶。其中 BucketSize 是每个桶的大小（包括键和值的存储）。

m := bucketMask(h.B)
b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))

扩容情况处理

如果当前真正在扩容，那么检查旧的桶数组 oldbuckets。如果旧桶尚未迁移完成（!evacuated(oldb)），使用旧桶进行查找。

if c := h.oldbuckets; c != nil {if !h.sameSizeGrow() {// There used to be half as many buckets; mask down one more power of two.// 如果扩容前的桶数量是当前的一半，进一步掩码哈希值m >>= 1}oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))if !evacuated(oldb) {b = oldb}
}

桶内查找

// 获取键的哈希值的最高字节
top := tophash(hash)
bucketloop:// 遍历当前桶及其溢出桶for ; b != nil; b = b.overflow(t) {// 遍历桶内的所有键值对for i := uintptr(0); i < abi.MapBucketCount; i++ {// 检查当前个键的哈希值最高字节是否匹配，如果不相等，说明当前槽位的键与目标键不匹配if b.tophash[i] != top {// 如果遇到空槽（emptyRest）跳出桶循环，因为后续槽位都是空的if b.tophash[i] == emptyRest {break bucketloop}// 不匹配则跳过当前槽位continue}// 计算第 i 个键的地址k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))// 如果键是指针类型，解引用键的指针if t.IndirectKey() {k = *((*unsafe.Pointer)(k))}// 比较传入的键和当前键是否相等if t.Key.Equal(key, k) {// 计算第 i 个值的地址e := add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))// 如果值是指针类型，解引用值的指针if t.IndirectElem() {e = *((*unsafe.Pointer)(e))}// 返回值的指针return e}}}// 如果未找到，则返回值类型的零值指针return unsafe.Pointer(&zeroVal[0])

根据上一小结中的 bmap 数据存储模型可知，桶内查找中最为重要的就是键的偏移量计算和值的偏移量计算，如下所示：

k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))

其中，dataOffset 是 tophash 数组之后的偏移量，i * uintptr(t.KeySize) 是计算得第 i 个键的偏移量。

e := add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))

其中，dataOffset + abi.MapBucketCount*uintptr(t.KeySize) 是值的起始偏移量（所有键之后）；i * uintptr(t.ValueSize) 是第 i 个值的偏移量。

键值插入、扩容机制

在 map 实现中，当插入元素到达一定的时机后会触发扩容机制，下面从元素的插入开始深入研究映射的扩容机制。

在 golang 中，mapassign 是一个底层的运行时函数，用于实现 map 的赋值操作。当使用 m[key] = value 语法向 map 中插入或更新键值对时，最终会调用 mapassign 函数。

源码位置：src/runtime/map.go

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
//
// mapassign should be an internal detail,
// but widely used packages access it using linkname.
// Notable members of the hall of shame include:
//   - github.com/bytedance/sonic
//   - github.com/cloudwego/frugal
//   - github.com/RomiChan/protobuf
//   - github.com/segmentio/encoding
//   - github.com/ugorji/go/codec
//
// Do not remove or change the type signature.
// See go.dev/issue/67401.
//
//go:linkname mapassign
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {// ...
}

根据上面的函数原型参数解释：

t *maptype 是 map 的类型信息。
h *hmap 是 map 的底层数据结构。
key unsafe.Pointer 是要插入或更新的键的指针。

将 mapassign 函数内部将代码执行流程划分了如下几个阶段：

参数检查
竞态检测
Sanitizer 检测
并发检测
哈希值计算
初始化桶数组
定位桶和处理扩容
遍历桶并检查键
检查是否需要扩容
插入或更新键值对
返回值的地址

下面针对每一步骤进行详细说明。

参数检查

如果 h 是 nil，则直接抛出错误，因为不能对 nil 的 map 进行赋值。

if h == nil {panic(plainError("assignment to entry in nil map"))
}

竞态检测

如果启用了竞态检测，记录当前操作的上下文，以便在发生竞态时能够追踪。

if raceenabled && h != nil {callerpc := getcallerpc()pc := abi.FuncPCABIInternal(mapaccess1)racereadpc(unsafe.Pointer(h), callerpc, pc)raceReadObjectPC(t.Key, key, callerpc, pc)
}

`Sanitizer` 检测

asanread 会检查对 key 的读操作是否安全，例如是否越界或是否访问了已释放的内存。

if msanenabled && h != nil {msanread(key, t.Key.Size_)
}
if asanenabled && h != nil {asanread(key, t.Key.Size_)
}

并发检测

如果 h.flags 中的 hashWriting 标志被设置，说明当前有其他协程正在写入 map，直接触发 fatal 错误。这也说明 map 不支持并发写入。

if h.flags&hashWriting != 0 {fatal("concurrent map writes")
}

哈希值计算

使用键的哈希函数计算哈希值，并结合哈希种子以避免哈希冲突。

hash := t.Hasher(key, uintptr(h.hash0))

初始化桶数组

如果 map 的桶数组尚未初始化，则分配一个初始大小的桶数组（默认）。

if h.buckets == nil {h.buckets = newobject(t.Bucket) // newarray(t.Bucket, 1)
}

定位桶和处理扩容

again:bucket := hash & bucketMask(h.B)	// 计算当前键的哈希值对应的桶索引if h.growing() {	// 如果 hashWriting 被设置，表示 map 正在扩容，则调用 growWork 函数处理扩容逻辑growWork(t, h, bucket)}

growWork 函数是扩容的核心逻辑，负责迁移旧桶中的数据到新桶中，源码如下所示：

func growWork(t *maptype, h *hmap, bucket uintptr) {// make sure we evacuate the oldbucket corresponding// to the bucket we're about to use// bucket&h.oldbucketmask() 表示当前桶索引对应的旧桶索引evacuate(t, h, bucket&h.oldbucketmask())	// 将旧桶中的数据迁移到新桶中// evacuate one more oldbucket to make progress on growingif h.growing() {	// 检查是否设置了 hashWriting 标志，表示 map 是否正在扩容// h.nevacuate 记录当前需要迁移的旧桶索引evacuate(t, h, h.nevacuate)}
}

需要注意的是每次迁移完成一个旧桶后，h.nevacuate 才会更新为下一个需要迁移的旧桶索引，这样设计的好处是避免一次性迁移所有数据导致的性能抖动。

下面介绍 evacuate 函数，它的核心逻辑包括：

遍历旧桶及其溢出桶。
将旧桶中的键值对重新哈希并插入到新桶中。
更新 h.nevacuate 需要迁移的旧桶索引，并记录已迁移的旧桶数量。

函数定义如下所示：

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {// ...
}

t：map 的类型信息。
h：map 的底层数据结构。
oldbucket：当前需要迁移的旧桶索引。

函数内容如下所示：

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {// 通过旧桶数组的指针和旧桶索引，计算出当前旧桶的地址// h.oldbuckets: 旧桶数组的指针// oldbucket*uintptr(t.BucketSize)): 计算当前旧桶的偏移量// 然后通过 add 函数计算处旧桶的地址b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))// 算新桶的掩码值newbit := h.noldbuckets()if !evacuated(b) {	// 检查当前旧桶是否已经被迁移，如果未迁移，则执行以下逻辑var xy [2]evacDstx := &xy[0]// 新桶的地址x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.BucketSize)))// 新桶中键的起始地址x.k = add(unsafe.Pointer(x.b), dataOffset)// 新桶中值的起始地址x.e = add(x.k, abi.MapBucketCount*uintptr(t.KeySize))// 如果扩容，则初始化第二个迁移目标if !h.sameSizeGrow() {y := &xy[1]// 新桶的地址y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.BucketSize)))// 新桶中键的起始地址y.k = add(unsafe.Pointer(y.b), dataOffset)// 新桶中值的起始地址y.e = add(y.k, abi.MapBucketCount*uintptr(t.KeySize))}// 遍历当前旧桶及其所有溢出桶for ; b != nil; b = b.overflow(t) {k := add(unsafe.Pointer(b), dataOffset)	// 当前桶中键的首地址e := add(k, abi.MapBucketCount*uintptr(t.KeySize))	// 当前桶中值的首地址// 遍历当前桶中的所有键值对for i := 0; i < abi.MapBucketCount; i, k, e = i+1, add(k, uintptr(t.KeySize)), add(e, uintptr(t.ValueSize)) {top := b.tophash[i]	// 当前键值对的哈希高位值// 如果当前键值对为空，标记该槽已迁移且为空if isEmpty(top) {b.tophash[i] = evacuatedEmptycontinue}// 如果哈希高位值小于 minTopHash，表示哈希表状态异常if top < minTopHash {throw("bad map state")}// 判断键是否为指针类型，如果是指针类型，则解引用指针获取实际地址k2 := kif t.IndirectKey() {k2 = *((*unsafe.Pointer)(k2))}// 判断是迁移到 x 还是迁移到 yvar useY uint8// 是否是等量扩容（桶数量不变）if !h.sameSizeGrow() {// Compute hash to make our evacuation decision (whether we need// to send this key/elem to bucket x or bucket y).// 重新计算键的哈希值hash := t.Hasher(k2, uintptr(h.hash0))// 判断是否是 NaN 键if h.flags&iterator != 0 && !t.ReflexiveKey() && !t.Key.Equal(k2, k2) {// If key != key (NaNs), then the hash could be (and probably// will be) entirely different from the old hash. Moreover,// it isn't reproducible. Reproducibility is required in the// presence of iterators, as our evacuation decision must// match whatever decision the iterator made.// Fortunately, we have the freedom to send these keys either// way. Also, tophash is meaningless for these kinds of keys.// We let the low bit of tophash drive the evacuation decision.// We recompute a new random tophash for the next level so// these keys will get evenly distributed across all buckets// after multiple grows.// 使用旧桶的 tophash 的最低位决定是否迁移到桶 yuseY = top & 1top = tophash(hash)	// 重新计算 tophash} else {// 对于普通键，根据新哈希值的 newbit 是否是1来决定是否迁移到桶y中if hash&newbit != 0 {useY = 1}}}if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {throw("bad evacuatedN")}// 标记当前键值对已迁移b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY// 确定迁移目的桶dst := &xy[useY]                 // evacuation destination// 判断目标桶是否已经满了，如果已经满了，则创建一个新的溢出桶，然后继续迁移if dst.i == abi.MapBucketCount {dst.b = h.newoverflow(t, dst.b)dst.i = 0dst.k = add(unsafe.Pointer(dst.b), dataOffset)dst.e = add(dst.k, abi.MapBucketCount*uintptr(t.KeySize))}// 下面为真正迁移的代码，每行都是简单的指针操作，不再赘述dst.b.tophash[dst.i&(abi.MapBucketCount-1)] = top // mask dst.i as an optimization, to avoid a bounds checkif t.IndirectKey() {*(*unsafe.Pointer)(dst.k) = k2 // copy pointer} else {typedmemmove(t.Key, dst.k, k) // copy elem}if t.IndirectElem() {*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)} else {typedmemmove(t.Elem, dst.e, e)}dst.i++// These updates might push these pointers past the end of the// key or elem arrays.  That's ok, as we have the overflow pointer// at the end of the bucket to protect against pointing past the// end of the bucket.dst.k = add(dst.k, uintptr(t.KeySize))dst.e = add(dst.e, uintptr(t.ValueSize))}}// Unlink the overflow buckets & clear key/elem to help GC.// 如果旧桶不再被迭代器使用，清理旧桶中的指针以帮助 GCif h.flags&oldIterator == 0 && t.Bucket.Pointers() {b := add(h.oldbuckets, oldbucket*uintptr(t.BucketSize))// Preserve b.tophash because the evacuation// state is maintained there.ptr := add(b, dataOffset)n := uintptr(t.BucketSize) - dataOffsetmemclrHasPointers(ptr, n)}}// 如果当前桶是正在迁移的桶，更新迁移进度if oldbucket == h.nevacuate {advanceEvacuationMark(h, t, newbit)}
}

遍历桶并检查键

下面这段代码遍历目标桶及其溢出桶，寻找合适的插入位置或更新现有键值对。如下所示：

// 通过目标桶的索引 bucket 定位到对应的桶
b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
// 根据键的哈希值计算其哈希高位值
top := tophash(hash)var inserti *uint8
var insertk unsafe.Pointer
var elem unsafe.Pointer
bucketloop:for {// 遍历当前桶的所有槽位for i := uintptr(0); i < abi.MapBucketCount; i++ {// 当前槽位的哈希高位值是否与目标值匹配，如果不匹配，说明当前槽位不包含目标键if b.tophash[i] != top {// 如果当前槽位为空且未找到空闲槽位，则记录该槽位，用于后续插入if isEmpty(b.tophash[i]) && inserti == nil {// inserti:空闲槽地址inserti = &b.tophash[i]// insertk:空闲槽键的地址insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))// elem:空闲槽值的地址elem = add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))}// 如果当前槽位的哈希值为 emptyRest，说明后续槽位均为空，结束遍历if b.tophash[i] == emptyRest {break bucketloop}continue}// 获取当前槽的键地址k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))if t.IndirectKey() {	// 是否是指针类型，如果是指针类型，需要解引用k = *((*unsafe.Pointer)(k))}if !t.Key.Equal(key, k) { // 键不相等，直接跳过continue}// 找到匹配的键，更新其值// already have a mapping for key. Update it.if t.NeedKeyUpdate() {typedmemmove(t.Key, k, key)	// 复制目标键到当前键位置}// 计算值的地址elem = add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))goto done	// 结束操作}// 如果当前桶未找到匹配的键或空闲槽位，则检查溢出桶ovf := b.overflow(t)	// 获取当前桶的溢出桶if ovf == nil {break	// 如果没有溢出桶，则结束遍历}b = ovf	// 如果有溢出桶，更新桶的地址}

检查是否需要扩容

if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {hashGrow(t, h)goto again // Growing the table invalidates everything, so try again
}

根据上述代码中的逻辑，核心在于 overLoadFactor 和 tooManyOverflowBuckets 的判断，下面对这两个函数进行刨析。

`overLoadFactor`

判断当前 map 的负载因子是否超过了设定的阈值。其中 count 表示当前 map 中的键值对数量；B 表示当前 map 的桶数量的对数（2^B 是当前桶的数量）。

func overLoadFactor(count int, B uint8) bool {return count > abi.MapBucketCount && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

具体计算逻辑如下所示：

count > abi.MapBucketCount：确保键值对数量大于单个桶的容量（8 个键值对）。
uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)：计算当前负载因子是否超过阈值。

$l o a d F a c t orDe n = 2$

$\\= loadFactorDen * abi.MapBucketCount * 13 / 16 \\= 2 * 8 * 13 / 16 \\= 13$

func bucketShift(b uint8) uintptr {// Masking the shift amount allows overflow checks to be elided.return uintptr(1) << (b & (goarch.PtrSize*8 - 1))
}

goarch.PtrSize：表示指针的大小（以字节为单位）。在 64 位系统中，goarch.PtrSize = 8；在 32 位系统中，goarch.PtrSize = 4。

假设 B = 3 则有：

$\\= uintptr(1) << (3 \& (8*8 - 1)) \\= uintptr(1) << 3 \& 63 \\= uintptr(1) << 3 \\= 8$

综上所述，结算结果为：

$\\= uintptr(count) > 13 * (8 / 2) \\= 13 * 4 \\= 52$

因此，当键值对数量 count 超过 52 时，overLoadFactor 返回 true，触发扩容。

通过负载因子（loadFactorThreshold）可以快速的计算出扩容的时机，公式如下所示：

$\\= loadFactorNum / loadFactorDen \\= 13 / 2 \\= 6.5$

这意味着当键值对数量超过 当前桶的容量 的 6.5 倍 时，map 会触发扩容。

`tooManyOverflowBuckets`

判断当前 map 的溢出桶数量是否过多。如果溢出桶数量过多，说明 map 的哈希表分布不够均匀，需要扩容以优化性能。

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {// If the threshold is too low, we do extraneous work.// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.// "too many" means (approximately) as many overflow buckets as regular buckets.// 当溢出桶的数量接近或超过当前桶的数量时，认为溢出桶过多。// See incrnoverflow for more details.if B > 15 {// 如果 B 大于 15，将其限制为 15。B = 15}// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.return noverflow >= uint16(1)<<(B&15)
}

noverflow：当前 map 的溢出桶数量。
B：当前 map 的桶数量的对数（2^B 是当前桶的数量）。

为什么限制 B 的最大值为 15？

首先对于 map 来说 2^15 = 32768，这已经是一个非常大的哈希表。另外对于内存的使用效率来说，如果 B 超过 15 可能会导致内存使用过多，或者哈希表过于稀疏，从而降低性能。

插入或更新键值对

插入位置为空

if inserti == nil {// The current bucket and all the overflow buckets connected to it are full, allocate a new one.// 检查当前的插入位置是否为空。如果为空，说明当前桶及其所有溢出桶都已满，需要分配一个新的溢出桶newb := h.newoverflow(t, b)	// 创建一个新的溢出桶inserti = &newb.tophash[0]	// 指向新溢出桶的第一个 tophash 位置。tophash 是一个数组，用于存储键的哈希值的高位信息insertk = add(unsafe.Pointer(newb), dataOffset)	// 计算新溢出桶中键的存储位置。add 是一个低级操作，用于通过偏移量计算指针位置。dataOffset 是键值对数据在桶中的偏移量elem = add(insertk, abi.MapBucketCount*uintptr(t.KeySize))	// 计算新溢出桶中值的存储位置。abi.MapBucketCount 是每个桶中键值对的数量，t.KeySize 是键的大小。通过偏移量计算出值的存储位置
}

存储新的键值对

// store new key/elem at insert position
if t.IndirectKey() {	// 判断键是否是指针类型。如果是指针类型，则需要间接存储kmem := newobject(t.Key)	// 分配一个新的对象，用于存储键*(*unsafe.Pointer)(insertk) = kmem	// 将分配的对象地址存储到 insertk 指向的位置insertk = kmem	// 更新 insertk 使其指向实际的键存储位置
}if t.IndirectElem() {	// 判断值是否是指针类型。如果是指针类型，则需要间接存储vmem := newobject(t.Elem)	// 分配一个新的对象，用于存储值*(*unsafe.Pointer)(elem) = vmem	// 将分配的对象地址存储到 elem 指向的位置
}typedmemmove(t.Key, insertk, key) // 将传入的键 key 复制到 insertk 指向的位置
*inserti = top	// 将键的哈希值的高位信息存储到 inserti 指向的位置。top 是哈希值的高位部分
h.count++	// 增加哈希表的计数器 count，表示哈希表中键值对的数量增加了一个

返回值的地址

done:if h.flags&hashWriting == 0 {	// 检查 h.flags 是否设置了 hashWriting 标志// hashWriting 在 map 写入操作开始时设置，写入操作结束时清除。它用于检测并发写入冲突// 如果检测到并发写入，调用 fatal 函数抛出致命错误fatal("concurrent map writes")}h.flags &^= hashWriting	// 清除 hashWriting 标志if t.IndirectElem() {	// 判断 map 的值是否是指针类型elem = *((*unsafe.Pointer)(elem)) // 如果 map 的值是指针类型，则需要通过指针解引用获取实际的值地址}return elem	// 返回值的地址

键值删除

在 golang 中，mapdelete 是一个底层的运行时函数，用于实现 map 的键值删除逻辑。

源码位置：src/runtime/map.go

下面给出函数原型，如下所示：

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {// ...
}

t：map 的类型信息。
h：map 的底层数据结构。
key：要删除的键的指针。

将 mapdelete 函数中的代码逻辑分为如下几个阶段：

竞态检测
Sanitizer 检测
参数检查
并发检测
设置写入标记
计算哈希值和目标桶
处理扩容
定位目标桶
查找键值对
删除目标键值对
清理空闲槽
重置哈希种子
清除写入标记

下面针对每一步骤进行详细说明。

竞态检测

如果启用了竞态检测，记录对 map 的写操作和键的读操作，以便在发生竞态时能够追踪。

if raceenabled && h != nil {callerpc := getcallerpc()pc := abi.FuncPCABIInternal(mapdelete)// 记录写操作的程序计数器信息racewritepc(unsafe.Pointer(h), callerpc, pc)// 记录读操作的程序计数器信息raceReadObjectPC(t.Key, key, callerpc, pc)
}

`Sanitizer` 检测

记录键的读操作，检测未初始化或越界的内存访问。

if msanenabled && h != nil {msanread(key, t.Key.Size_)
}
if asanenabled && h != nil {asanread(key, t.Key.Size_)
}

参数检查

如果 map 为 nil 或者键值对数量为 0，直接返回。如果键无效 mapKeyError 则报出异常。

if h == nil || h.count == 0 {if err := mapKeyError(t, key); err != nil {panic(err) // see issue 23734}return
}

并发检测

检查是否已经有其他协程正在写入 map，不支持并发操作。

if h.flags&hashWriting != 0 {fatal("concurrent map writes")
}

设置写入标记

设置 hashWriting 标志，表示当前正在写入 map。

h.flags ^= hashWriting

计算哈希值和目标桶

计算键的哈希值和目标桶的索引，用于定位目标桶。

hash := t.Hasher(key, uintptr(h.hash0))
bucket := hash & bucketMask(h.B)

处理扩容

如果 map 正在扩容，调用 growWork 处理扩容逻辑。

有关扩容的源码分析，请查看 “键值插入、扩容机制” 小结。

if h.growing() {growWork(t, h, bucket)
}

定位目标桶

通过目标桶的索引 bucket ，进行指针偏移定位到对应的桶

b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
bOrig := b

查找目标键

// 根据键的哈希值计算其哈希高位值
top := tophash(hash)
search:// 遍历目标桶及其所有溢出桶for ; b != nil; b = b.overflow(t) {// 遍历当前桶中的所有键值对for i := uintptr(0); i < abi.MapBucketCount; i++ {// 如果当前槽位的哈希值不匹配，跳过if b.tophash[i] != top {// 如果遇到 emptyRest，表示后续槽位均为空，结束搜索if b.tophash[i] == emptyRest {break search}continue}// 获取当前槽的键地址k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))k2 := kif t.IndirectKey() {	// 判断键是否为指针类型k2 = *((*unsafe.Pointer)(k2))}if !t.Key.Equal(key, k2) {	// 比较键是否相等，如果不相等则跳过本次循环continue}// 删除目标键值对}}

删除目标键值对

删除键值对的核心逻辑，负责清空键和值的内容，并更新桶的状态。

// Only clear key if there are pointers in it.
if t.IndirectKey() {	// 判断键是否为指针类型*(*unsafe.Pointer)(k) = nil	// 如果是指针类型，直接将指针置为 nil
} else if t.Key.Pointers() {memclrHasPointers(k, t.Key.Size_)	// 如果键包含指针，则清空键的内容，帮助 GC
}// 计算当前槽位的值的地址
e := add(unsafe.Pointer(b), dataOffset+abi.MapBucketCount*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
// 根据值的类型，清空值的内容
if t.IndirectElem() {	// 判断值是否为指针类型*(*unsafe.Pointer)(e) = nil	// 如果是指针类型，直接将指针置为 nil
} else if t.Elem.Pointers() {	// 判断值是否包含指针memclrHasPointers(e, t.Elem.Size_)	// 如果值包含指针，则清空值的内容，帮助 GC
} else {memclrNoHeapPointers(e, t.Elem.Size_)	// 如果值不包含指针，则清空值的内容，不需要指针操作
}// 更新槽位状态
b.tophash[i] = emptyOne

清理空闲槽

// If the bucket now ends in a bunch of emptyOne states,
// change those to emptyRest states.
// It would be nice to make this a separate function, but
// for loops are not currently inlineable.
// 检查当前槽是否是最后一个槽，以及后续槽是否需要继续清理
if i == abi.MapBucketCount-1 {// 如果当前桶有溢出桶并且溢出桶的第一个槽不是 emptyRest，表示后续槽仍包含数据if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {goto notLast}
} else {// 下一个槽不是 emptyRest，表示后续槽仍包含数据if b.tophash[i+1] != emptyRest {goto notLast}
}// 清理空闲槽
for {// 将当前槽的状态更新为 emptyRestb.tophash[i] = emptyRestif i == 0 {	// 当前槽位是第一个槽if b == bOrig {// 当前桶是初始桶，清理完成break // beginning of initial bucket, we're done.}// Find previous bucket, continue at its last entry.c := b	// 保存当前桶的指针for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {// 找到当前桶的前一个桶}i = abi.MapBucketCount - 1	// 继续清理前一个桶的最后一个槽} else {// 继续清理当前桶的前一个槽i--}// 如果当前槽不是 emptyOne，清理完成 if b.tophash[i] != emptyOne {break}
}
notLast:// 减少键值对的数量h.count--

重置哈希种子

如果 map 中没有键值对，重置哈希种子以防止哈希碰撞攻击

if h.count == 0 {h.hash0 = uint32(rand())
}

清除写入标记

清除 hashWriting 标志，表示写入操作完成

// 如果标志未设置，说明可能存在并发写入问题
if h.flags&hashWriting == 0 {fatal("concurrent map writes")
}// 清除
h.flags &^= hashWriting

常用操作

访问映射值

通过键访问对应的值，格式如下所示：

value := m["key"]

其中，如果键不存在，则返回值类型的零值。例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}fmt.Println(m["apple"])  // 1fmt.Println(m["orange"]) // 0
}

检查键是否存在

通过多值赋值检查键是否存在，格式如下所示：

value, ok := m["key"]
if ok {fmt.Println("Key exists, value:", value)
} else {fmt.Println("Key does not exist")
}

ok 是一个布尔值，表示键是否存在。
如果键存在，ok 为 true，value 是对应的值。
如果键不存在，ok 为 false，value 是值类型的零值。

例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}value, ok := m["apple"]if ok {fmt.Println("Key exists, value:", value) // Key exists, value: 1} else {fmt.Println("Key does not exist")}
}

修改映射的值

通过键直接赋值，格式如下所示：

m["key"] = newValue

例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}m["apple"] = 10         // 修改键为 "apple" 的值为 10fmt.Println(m["apple"]) // 10
}

添加键值对

直接赋值即可，格式如下所示：

m["newKey"] = newValue

例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}m["orange"] = 4          // 添加键为 "orange"，值为 4 的键值对fmt.Println(m["orange"]) // 4
}

删除键值对

使用 delete 函数进行删除，格式如下所示：

delete(m, "key")

例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}fmt.Println(m["banana"]) // 2delete(m, "banana")      // 删除键为 "banana" 的键值对fmt.Println(m["banana"]) // 0
}

遍历映射

使用 for range 循环遍历映射中的键值对，例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}// Key: apple, Value: 1// Key: banana, Value: 2// Key: cherry, Value: 3for key, value := range m {fmt.Printf("Key: %v, Value: %v\n", key, value)}
}

需要注意的是，映射的遍历顺序是随机的，因为映射内部是无序的。

获取映射长度

使用 len 函数获取映射中键值对的数量，例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,"cherry": 3,}length := len(m)fmt.Println(length) // 3
}

映射嵌套

映射的值可以是任何类型，包括另一个映射，从而实现嵌套映射，例如：

package mainimport "fmt"func main() {nestedMap := map[string]map[string]int{"fruits": {"apple":  1,"banana": 2,},"vegetables": {"carrot":   3,"broccoli": 4,},}fmt.Println(nestedMap["fruits"]["apple"]) // 1
}

映射的拷贝

映射是引用类型，直接赋值会共享同一个底层数据结构。如果需要拷贝映射，需要手动创建一个新的映射并复制键值对。例如：

package mainimport "fmt"func main() {m1 := map[string]int{"apple":  1,"banana": 2,}m2 := make(map[string]int)for key, value := range m1 {m2[key] = value	// 深拷贝}// Key: apple, Value: 1// Key: banana, Value: 2for key, value := range m2 {fmt.Printf("Key: %s, Value: %d\n", key, value)}m2["apple"] = 3fmt.Println(m1["apple"]) // 此时访问 m1 中的 apple，并没有修改
}

清空映射

虽然 golang 没有直接清空映射的函数，但可以通过遍历映射并删除所有键值对来实现。例如：

package mainimport "fmt"func main() {m := map[string]int{"apple":  1,"banana": 2,}fmt.Println(len(m)) // 2for key := range m {delete(m, key)}fmt.Println(len(m)) // 0
}

常见问题

检查键是否存在

在访问映射中的值之前，应始终检查键是否存在，以避免误用零值。例如：

value, exists := myMap["key"]
if exists {fmt.Println("Value:", value)
} else {fmt.Println("Key does not exist")
}

初始化映射时指定容量

如果已知映射的大小，建议在初始化时指定容量，以减少后续的内存重新分配。例如：

myMap := make(map[string]int, 100) // 预计映射大小为 100

避免并发修改

golang 的映射不是线程安全的，因此在并发场景下直接修改映射会导致竞态条件。

使用 sync.Mutex 加锁保护映射

var mu sync.Mutex
mu.Lock()
myMap["key"] = value
mu.Unlock()

使用并发安全的映射 sync.Map，这个是 golang 标准库中提供的数据类型。

var myMap sync.Map
myMap.Store("key", 1) // 存储键值对
val, ok := myMap.Load("key") // 根据键读取值

关于更多并发相关的知识点，请关注后续文章《Golang 并发编程》。

不使用映射存储大量的数据

映射是基于哈希表实现的，其内部结构需要额外的内存来存储哈希值、指针和元数据。也就是说映射的内存占用通常比数组或切片更高。对于大量数据，这种额外的内存开销可能会导致显著的性能问题。

下面给出一些可以替代的方案：

使用外部数据库存储大规模数据。
使用切片存储，切片的内存占用相对较低，另外对于有序的数据切片的性能通常会优于映射的性能。
采用分块存储，将多个数据分散到不同的映射中，每个映射负责一部分数据，从而减少扩容的开销。
定期清理不需要的键值对，优化内存使用。
另外如果对于数据量较小且不需要频繁查询的数据，可以考虑其他的数据结构。

🌺🌺🌺撒花！

如果本文对你有帮助，就点关注或者留个👍
如果您有任何技术问题或者需要更多其他的内容，请随时向我提问。
在这里插入图片描述