postgres 源码解析46 可见性映射表VM

news/2024/10/25 1:27:49/

简介

  Postgres 为实现多版本并发控制技术,当事务删除或者更新元组时,并非从物理上进行删除,而是将其进行逻辑删除[具体实现通过设置元组头信息xmax/infomask等标志位信息],随着业务的累增,表会越来越膨胀,对于执行计划的生成/最优路径的选择会产生干扰。为解决这一问题,可以通过调用VACUUM来清理这些无效元组。但是一个表可能有很多页组成,如何快速定位到含有无效元组的数据页在高并发场景显得尤为重要,幸运的是pg为表新增对应的附属文件—可见性映射表(VM),来加速判断heap块是否存在无效元祖。

VM 文件结构

在这里插入图片描述

  VM中为每个HEAP page设置两个比特位 (all-visible and all-frozen),分别对应于该页是否存在无效元祖、该页元组是否全部冻结。
all-visible 比特位的设置表明页内所有元组对于后续所有的事务都是可见的,因此该页无需进行 vacuum操作;
all-frozen 比特位的设置表明页内所有的元组已被冻结,在进行全表扫描vacuum请求时也无需进行vacuum操作。
NOTES: all-frozen 比特位的设置必须建立在该页已设置过 all-visible比特位。

简单介绍下标识位的写/更新逻辑:

在这里插入图片描述
其中比特位的含义如下:
all-visible 比特位: 0 ==> 含有无效元祖    1 ==> 元组均可见,不含无效元祖
all-frozen 比特位: 0 ==> 含有非冻结元祖   1 ==> 元组均冻结可见
方便讲述,取自页内的第一个字节示例:
字节对应的二进制信息: 00 00 00 10
根据上述内容可知,heap表的第一页至第三页含有无效元祖,第四页没有无效元祖
场景:对heap表进行vacuum操作,块1无效元祖被清除,需要设置 all-visible比特位,而块4所有元组冻结
在这里插入图片描述

读取数据是以字节为单位,因此通过 char *map数组读取出页内容首地址,通过偏移量确定all-visible 与 all-frozen比特位
1 Block-1对应的比特位为 00, 设置all-visible后更新为 10;
2 Block-4对应的比特位为 10, 设置all-frozen后更新为 11;

宏定义与数据结构

/* Number of bits for one heap page */
#define BITS_PER_HEAPBLOCK 2             // 每个heap块对应 2bits/* Flags for bit map */
#define VISIBILITYMAP_ALL_VISIBLE	0x01	// all_visible
#define VISIBILITYMAP_ALL_FROZEN	0x02    // all_frozen 
#define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap* flags bits */
** Size of the bitmap on each visibility map page, in bytes. There's no* extra headers, so the whole page minus the standard page header is* used for the bitmap.*/
#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))    // map页大小/* Number of heap blocks we can represent in one byte */
#define HEAPBLOCKS_PER_BYTE (BITS_PER_BYTE / BITS_PER_HEAPBLOCK)  // 1 字节对应 4个heap块/* Number of heap blocks we can represent in one visibility map page. */
#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)  // 一个map 对应的heap块数量/* Mapping from heap block number to the right bit in the visibility map */
#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)
#define HEAPBLK_TO_OFFSET(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)/* Masks for counting subsets of bits in the visibility map. */
#define VISIBLE_MASK64	UINT64CONST(0x5555555555555555) /* The lower bit of each* bit pair */
#define FROZEN_MASK64	UINT64CONST(0xaaaaaaaaaaaaaaaa) /* The upper bit of each* bit pair */
// 读取没有 line pointers文件页的访问方法,尤其适合于VM文件页
/*1. PageGetContents2. 	To be used in cases where the page does not contain line pointers.3.  4. Note: prior to 8.3 this was not guaranteed to yield a MAXALIGN'd result.5. Now it is.  Beware of old code that might think the offset to the contents6. is just SizeOfPageHeaderData rather than MAXALIGN(SizeOfPageHeaderData).*/
#define PageGetContents(page) \((char *) (page) + MAXALIGN(SizeOfPageHeaderData))

接口函数

1 visibilitymap_set
该函数的主要功能是设置可见性标识位,其执行流程如下:
1)首先进行安全性校验,判断传入的heap buf 和 vmbuf是否有效以及buf中缓存页是否一一对应;
2)获取VM页内容首地址(跳过PageHeaderData),获取vmbuf的 BUFFER_LOCK_EXCLUSIVE;
3)如果之前没有设置过相应的标识位,进行如下操作:
   (1) 进入临界区,在指定bit位设置信息,将vmbuf标记为脏;
   (2) 写WAL日志,如果开启wal_log_hints,需要将此日志号的LSN更新至heap 页后中;最后更新vmbuf缓存页的LSN,并退出临界。
4)释放vmbuf 持有的排他锁。

/**	visibilitymap_set - set bit(s) on a previously pinned page** recptr is the LSN of the XLOG record we're replaying, if we're in recovery,* or InvalidXLogRecPtr in normal running.  The page LSN is advanced to the* one provided; in normal running, we generate a new XLOG record and set the* page LSN to that value.  cutoff_xid is the largest xmin on the page being* marked all-visible; it is needed for Hot Standby, and can be* InvalidTransactionId if the page contains no tuples.  It can also be set* to InvalidTransactionId when a page that is already all-visible is being* marked all-frozen.** 在recovery时 recptr为XLOG 记录的LSN,正常运行时为 InvalidXLogRecPtr。* cutoff_xid为进行标记操作的最大事务号;在备机上如果页内没有元组则为 InvalidTransactionId* 在页标记为 all-frozen时其 cutoff_xid 为 InvalidTransactionId* * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling* this function. Except in recovery, caller should also pass the heap* buffer. When checksums are enabled and we're not in recovery, we must add* the heap buffer to the WAL chain to protect it from being torn.** You must pass a buffer containing the correct map page to this function.* Call visibilitymap_pin first to pin the right one. This function doesn't do* any I/O.*/
void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,uint8 flags)
{BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);Page		page;uint8	   *map;#ifdef TRACE_VISIBILITYMAPelog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
#endifAssert(InRecovery || XLogRecPtrIsInvalid(recptr));Assert(InRecovery || BufferIsValid(heapBuf));Assert(flags & VISIBILITYMAP_VALID_BITS);/* Check that we have the right heap page pinned, if present */if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)elog(ERROR, "wrong heap buffer passed to visibilitymap_set");/* Check that we have the right VM page pinned */if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)elog(ERROR, "wrong VM buffer passed to visibilitymap_set");page = BufferGetPage(vmBuf);map = (uint8 *) PageGetContents(page);LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);if (flags != (map[mapByte] >> mapOffset & VISIBILITYMAP_VALID_BITS)){START_CRIT_SECTION();map[mapByte] |= (flags << mapOffset);MarkBufferDirty(vmBuf);if (RelationNeedsWAL(rel)){if (XLogRecPtrIsInvalid(recptr)){Assert(!InRecovery);recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,cutoff_xid, flags);/** If data checksums are enabled (or wal_log_hints=on), we* need to protect the heap page from being torn.*/if (XLogHintBitIsNeeded()){Page		heapPage = BufferGetPage(heapBuf);/* caller is expected to set PD_ALL_VISIBLE first */Assert(PageIsAllVisible(heapPage));PageSetLSN(heapPage, recptr);}}PageSetLSN(page, recptr);}END_CRIT_SECTION();}LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
}

2 visibilitymap_get_status

  1. 首先判断vmbuf是否有效,如果有效,则进一步其缓存的页是否为heap块对应页,若对应关系不匹配,则释放vmbuf pin;
  2. 若无效,则调用 vm_readbuf 将vm页加载至缓冲块中并返回vmbuf,若返回vmbuf无效,则返回false后退出;
    3)紧接着读取vm页首地址,根据偏移量读取相应的标识位信息;
    这里只需要pin 机制,无需加 BUFFER_LOCK_SHARE
/**	visibilitymap_get_status - get status of bits** Are all tuples on heapBlk visible to all or are marked frozen, according* to the visibility map?** On entry, *buf should be InvalidBuffer or a valid buffer returned by an* earlier call to visibilitymap_pin or visibilitymap_get_status on the same* relation. On return, *buf is a valid buffer with the map page containing* the bit for heapBlk, or InvalidBuffer. The caller is responsible for* releasing *buf after it's done testing and setting bits.** NOTE: This function is typically called without a lock on the heap page,* so somebody else could change the bit just after we look at it.  In fact,* since we don't lock the visibility map page either, it's even possible that* someone else could have changed the bit just before we look at it, but yet* we might see the old value.  It is the caller's responsibility to deal with* all concurrency issues!*/
uint8
visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *buf)
{BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);char	   *map;uint8		result;#ifdef TRACE_VISIBILITYMAPelog(DEBUG1, "vm_get_status %s %d", RelationGetRelationName(rel), heapBlk);
#endif/* Reuse the old pinned buffer if possible */if (BufferIsValid(*buf)){if (BufferGetBlockNumber(*buf) != mapBlock){ReleaseBuffer(*buf);*buf = InvalidBuffer;}}if (!BufferIsValid(*buf)){*buf = vm_readbuf(rel, mapBlock, false);if (!BufferIsValid(*buf))return false;}map = PageGetContents(BufferGetPage(*buf));/** A single byte read is atomic.  There could be memory-ordering effects* here, but for performance reasons we make it the caller's job to worry* about that.*///单一字节的读取是原子的 result = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS);return result;
}

3 vm_readbuf

vm_readbuf 函数的功能是负责将指定VM页加载至缓冲区中,若有需要会进行extend生成新页并进行初始化。其执行流程图如下:
在这里插入图片描述

/** Read a visibility map page.** If the page doesn't exist, InvalidBuffer is returned, or if 'extend' is* true, the visibility map file is extended.*/
static Buffer
vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
{Buffer		buf;SMgrRelation reln;/** Caution: re-using this smgr pointer could fail if the relcache entry* gets closed.  It's safe as long as we only do smgr-level operations* between here and the last use of the pointer.*/reln = RelationGetSmgr(rel);/** If we haven't cached the size of the visibility map fork yet, check it* first.*/// 首先检查 是否cached 对应fork (vm)页if (reln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] == InvalidBlockNumber){if (smgrexists(reln, VISIBILITYMAP_FORKNUM))    // 判断是否存在,存在即cachedsmgrnblocks(reln, VISIBILITYMAP_FORKNUM);elsereln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] = 0;}/* Handle requests beyond EOF */// 申请的页号超出对应 fork现有最大页号,且指定扩展,则调用 vm_extend进行新建,反之返回InvalidBuffer if (blkno >= reln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM]){if (extend)vm_extend(rel, blkno + 1);elsereturn InvalidBuffer;}/** Use ZERO_ON_ERROR mode, and initialize the page if necessary. It's* always safe to clear bits, so it's better to clear corrupt pages than* error out.** The initialize-the-page part is trickier than it looks, because of the* possibility of multiple backends doing this concurrently, and our* desire to not uselessly take the buffer lock in the normal path where* the page is OK.  We must take the lock to initialize the page, so* recheck page newness after we have the lock, in case someone else* already did it.  Also, because we initially check PageIsNew with no* lock, it's possible to fall through and return the buffer while someone* else is still initializing the page (i.e., we might see pd_upper as set* but other page header fields are still zeroes).  This is harmless for* callers that will take a buffer lock themselves, but some callers* inspect the page without any lock at all.  The latter is OK only so* long as it doesn't depend on the page header having correct contents.* Current usage is safe because PageGetContents() does not require that.*/// 常规流程 ==》 从共享缓冲池选择一个缓冲块缓存指定的VM页面,如果是新NEW页,获取// BUFFER_LOCK_EXCLUSIVE,后再次检查页面是否为NEW[进行两次判断其是否为新页,// 是因为有其他进程在本进程申请锁时已经完成了初始化]buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,RBM_ZERO_ON_ERROR, NULL);if (PageIsNew(BufferGetPage(buf))){LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);if (PageIsNew(BufferGetPage(buf)))PageInit(BufferGetPage(buf), BLCKSZ, 0);LockBuffer(buf, BUFFER_LOCK_UNLOCK);}return buf;
}

4 vm_extend

当访问的vm页在文件中不存在时,此时需调用vm_extend函数扩展新页并完成相应的初始化工作,其执行流程图如下:
在这里插入图片描述

  1. 首先页面初始化,填充PageHeader结构体pd_lower、pd_upper/和flag初始信息;
    2)获取relation的extension锁,防止其他进程进行同样的扩展工作;
    3)如果文件不存在,则调用 smgrcreate进行创建,反之进入第4)步;
    4)获取当前vm块号,如果当前块号小于指定快号,则需在此调用vm_extend进行扩展(递归调用);
    5)向其他进程发送无效消息强制其关闭对rel的引用,其目的是避免其他进程对此文件的create或者extension,因为这写操作容易发生。
    6)最后释放锁资源;
/** Ensure that the visibility map fork is at least vm_nblocks long, extending* it if necessary with zeroed pages.*/
static void
vm_extend(Relation rel, BlockNumber vm_nblocks)
{BlockNumber vm_nblocks_now;PGAlignedBlock pg;SMgrRelation reln;PageInit((Page) pg.data, BLCKSZ, 0);/** We use the relation extension lock to lock out other backends trying to* extend the visibility map at the same time. It also locks out extension* of the main fork, unnecessarily, but extending the visibility map* happens seldom enough that it doesn't seem worthwhile to have a* separate lock tag type for it.** Note that another backend might have extended or created the relation* by the time we get the lock.*/LockRelationForExtension(rel, ExclusiveLock);/** Caution: re-using this smgr pointer could fail if the relcache entry* gets closed.  It's safe as long as we only do smgr-level operations* between here and the last use of the pointer.*/reln = RelationGetSmgr(rel);/** Create the file first if it doesn't exist.  If smgr_vm_nblocks is* positive then it must exist, no need for an smgrexists call.*/if ((reln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] == 0 ||reln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] == InvalidBlockNumber) &&!smgrexists(reln, VISIBILITYMAP_FORKNUM))smgrcreate(reln, VISIBILITYMAP_FORKNUM, false);/* Invalidate cache so that smgrnblocks() asks the kernel. */reln->smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] = InvalidBlockNumber;vm_nblocks_now = smgrnblocks(reln, VISIBILITYMAP_FORKNUM);/* Now extend the file */while (vm_nblocks_now < vm_nblocks){PageSetChecksumInplace((Page) pg.data, vm_nblocks_now);smgrextend(reln, VISIBILITYMAP_FORKNUM, vm_nblocks_now, pg.data, false);vm_nblocks_now++;}/** Send a shared-inval message to force other backends to close any smgr* references they may have for this rel, which we are about to change.* This is a useful optimization because it means that backends don't have* to keep checking for creation or extension of the file, which happens* infrequently.*/CacheInvalidateSmgr(reln->smgr_rnode);UnlockRelationForExtension(rel, ExclusiveLock);
}

http://www.ppmy.cn/news/5825.html

相关文章

15、Mysql高级之并发参数调整

15、Mysql高级之并发参数调整 文章目录15、Mysql高级之并发参数调整1、max_connections2、back_log3、table_open_cache4、thread_cache_size5、innodb_lock_wait_timeout从实现上来说&#xff0c;MySQL Server 是多线程结构&#xff0c;包括后台线程和客户服务线程。多线程可以…

自重启伪遗传改良算法解决TSP问题(Matlab代码实现)

&#x1f468;‍&#x1f393;个人主页&#xff1a;研学社的博客 &#x1f4a5;&#x1f4a5;&#x1f49e;&#x1f49e;欢迎来到本博客❤️❤️&#x1f4a5;&#x1f4a5; &#x1f3c6;博主优势&#xff1a;&#x1f31e;&#x1f31e;&#x1f31e;博客内容尽量做到思维缜…

redis安装 3台机器 6节点

一&#xff1a; redis官网地址&#xff1a; 6.2.6版本 1 | Index of /releases/http://download.redis.io/releases/ 二&#xff1a; 编辑配置文件 1: 注释本地IP地址&#xff1a; 1&#xff1a; bind: 本地IP 2&#xff1a; protected-mode no: #关闭保护模式 3&#xff1…

函数(6)

目录 1、函数是什么&#xff1f; 2、C语言中函数的分类&#xff1a; 1、库函数 2、自定义函数 3、函数的参数 4、函数的调用 5、练习 1、打印100~200之间的素数 2、打印100~200之间的闰年 3、写一个函数&#xff0c;实现一个整形有序数组的二分查找 6、函数的嵌套调…

网络ping不通,试试这8招

摘要&#xff1a;网络ping不通&#xff0c;该怎么办&#xff1f;本文教你8个大招&#xff0c;轻松找到问题根源。本文分享自华为云社区《网络ping不通&#xff0c;该怎么办&#xff1f;》&#xff0c;作者&#xff1a;wljslmz。 如下图&#xff0c;PC&#xff08;192.168.10.1…

mybatis 中@SelectProvider注解的使用

我看了下与Select有啥区别&#xff0c;这个SelectProvider是能够加多条件判断的&#xff0c;看下面的代码示例&#xff1a; SelectProvider&#xff1a;用于构建动态查询SQL。 InsertProvider&#xff1a;用于构建动态新增SQL。 UpdateProvider&#xff1a;用于构建动态更新SQ…

Qt——基本介绍、详解对象树

目录 一.基本介绍 二.对象树 一.基本介绍 创建qt项目是&#xff0c;如果选择空窗口QWidget&#xff0c;那么mian函数中会有如下代码&#xff1a; #include "myWindow.h"#include <QApplication>int main(int argc, char *argv[]) {QApplication a(argc, ar…

Opencv(C++)笔记--直方图均衡化、直方图计算

目录 1--直方图均衡化 2--直方图计算 1--直方图均衡化 ① 简述&#xff1a; 对图片的对比度进行调整&#xff0c;输入为灰度图像&#xff0c;对亮度进行归一化处理&#xff0c;提高灰度图的对比度&#xff1b; ② Opencv API&#xff1a; cv::equalizeHist(gray, dst); ③…