PG12.9-Repmgr5.1.0-vip-3节点部署
Centos 7.6
PG 12.9
repmgr 5.1.0
1.系统环境准备
主机名 ip地址 角色 端口 enmo-01 172.20.10.1 主库 6000enmo-02 172.20.10.2 备库 6000enmo-03 172.20.10.3 备库 6000准备将172.20.10.6设置为vip
修改网卡(刷新一下 mac 地址)
需要所有的机器在同一个网段,防止 vip 无法绑定生效
vi /etc/sysconfig/network-scripts/ifcfg-ens33ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPV6INIT=no
IPADDR=172.20.10.1
NETMASK=255.255.255.0
GATEWAY=172.20.10.10
DNS=8.8.8.8
配置 hosts
vi /etc/ssh/sshd_configUseDNS 设置为 no 修改主机名 hostnamectl set-hostname enmo-01 hostnamectl set-hostname enmo-02hostnamectl set-hostname enmo-03 修改 hosts 文件 echo "172.20.10.1 enmo-01 172.20.10.2 enmo-02172.20.10.3 enmo-03" >>/etc/hosts
同步时间
所有服务器的时区及时间需要保持同步 timedatectl set-timezone Asia/Shanghai
关闭防火墙和 selinux
systemctl stop firewalld
systemctl disable firewalld vi /etc/selinux/config
SELINUX=disabled
用户配置
创建安装用户
groupadd dba -g 1000
useradd postgres -g 1000 -u 1000 添加 sudo 权限
启动服务及管理 VIP 使用,权限可细化,忽略此步骤 vip 漂移会失败
visudo -f /etc/sudoers
postgres ALL=(ALL) NOPASSWD:ALL
reboot now
重启使sudo生效
2.安装主数据库(源码安装)
此处仅主库装数据库,备库需要有数据库软件。
安装依赖
yum install -y openssl openssl-devel readline readline-devel zlib-devel gcc flex
切换到postgres用户
su - postgres
获取并解压包
wget https://ftp.postgresql.org/pub/source/v12.9/postgresql-12.9.tar.gz
tar -zxf postgresql-12.9.tar.gz
–进入目录
cd postgresql-12.9
–编译环境检查
./configure --prefix=/home/postgres/soft --with-openssl --with-pgport=6000(最后出现makefile)
–编译安装
make world -j24
make install-world -j24
–初始化数据库数据目录到/home/postgres/data并启用checksum检验
/home/postgres/soft/bin/initdb -D /home/postgres/data -k
–启动数据库
/home/postgres/soft/bin/pg_ctl start -D /home/postgres/data -l /tmp/logfile
–配置环境变量.bashrc
vi .bashrc
export PGPORT=6000
export PGUSER=postgres
export PGHOME=/home/postgres/soft
export PGDATA=/home/postgres/data
export PATH=${PGHOME}/bin:${PATH}
LD_LIBRARY_PATH=$PGHOME/lib:/usr/local/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH
–读取环境变量
source .bashrc
–登录到数据库
psql -d postgres
–创建超级用户 postgres
CREATE USER postgres WITH SUPERUSER PASSWORD '1QAZ@WSX';
–修改端口(如果./configure没指定–with-pgport)
alter system set port=6000;select pg_reload_conf();
3.数据库配置
su - postgres
mkdir -p /home/postgres/archive/wal
cluster_name='enmo_6001'
listen_addresses='0.0.0.0'
port=6000
wal_log_hints=on
logging_collector=on
logging_collector=on
log_filename='pg_log_%u.log'
log_file_mode=0600
log_truncate_on_rotation=on
log_rotation_age=1d
wal_keep_segments=100
archive_mode = on
archive_command = 'cp %p /home/postgres/archive/wal/%f'
wal_level = 'logical'
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
修改参数,其余参数根据环境进行调整。
重启数据库生效
pg_ctl restart -D $PGDATA -l /tmp/logfile
4.节点间配置互信
–配置集群内postgres用户之间互信
–enmo-01
su - postgres
ssh-keygen -t rsa
cd ~/.ssh
ssh-copy-id -i postgres@enmo-02
–enmo-02
su - postgres
ssh-keygen -t rsa
cd ~/.ssh
cat *.pub >> authorized_keys
scp authorized_keys postgres@enmo-01:/home/postgres/.ssh
上述过程为配置两节点间的postgres的用户的互信。两两节点之间需要做一次。用date 检验,确保集群内节点能通过postgres用户互相访问。
ssh enmo-01 date;
ssh enmo-02 date;
ssh enmo-03 date;
5.安装repmgr(源码安装)并配置白名单
所有节点都需要安装repmgr工具
–解压压缩包
su - postgres
tar -zxf repmgr-5.1.0.tar.gz
cd repmgr-5.1.0
#环境变量
pg_config验证
–编译安装repmgr
su - postgres
cd /home/postgres/repmgr-5.1.0
./configure
make -j24
make install -j24
–查看版本
/home/postgres/repmgr-5.1.0/repmgr --version
–安装repmgr扩展
$ psql -d postgres
#主库创建 repmgr 库存储元数据
create user repmgr superuser password '123456';
create database repmgr owner repmgr;#在postgresql.conf的shared_preload_libraries加上repmgr
vim $PGDATA/postgresql.conf
shared_preload_libraries = 'repmgr'\c repmgr
create extension repmgr;
–pg_hba配置
node11 节点配置 pg_hba.conf 文件
vi $PGDATA/pg_hba.conflocal repmgr repmgr trust
host repmgr repmgr 172.20.10.1/32 trust
host repmgr repmgr 172.20.10.2/32 trust
host repmgr repmgr 172.20.10.3/32 trust
host repmgr repmgr 172.20.10.6/32 trust
host all all 172.20.10.1/24 scram-sha-256host replication repmgr 172.20.10.1/32 trust
host replication repmgr 172.20.10.2/32 trust
host replication repmgr 172.20.10.3/32 trust
host replication repmgr 172.20.10.6/32 trust
–这边将repmgr之间用trust,其他ip其他用户登录需要用密码。如果想让repmgr在此处也为密码校验,则需要将trust改为md5或者scram-sha-256并在本地创建.pgpass文件,操作如下:
su - postgres
touch ~/.pgpass
chmod 0600 ~/.pgpass echo "172.20.10.1:6000:repmgr:repmgr:123456
172.20.10.2:6000:repmgr:repmgr:123456
172.20.10.3:6000:repmgr:repmgr:123456
172.20.10.4:6000:repmgr:repmgr:123456 ">> ~/.pgpass
使用密码校验需要注意密码的加密格式,留意pg_authid系统表和password_encryption参数。
–实例检查
只保留主节点运行 pg 实例,如果其他节点有实例运行,停止后删除 pgdata 目录
其他所有节点执行如下操作
su - postgres
pg_ctl stop -D $PGDATA
rm -rf $PGDARA/*
6.主库repmgr配置及注册
(1)主节点创建repmgr.conf
touch /home/postgres/repmgr.conf
vi /home/postgres/repmgr.conf node_id=1
node_name='enmo_6001'
conninfo='host=172.20.10.1 port=6000 user=repmgr dbname=repmgr
connect_timeout=2'
data_directory='/data/pgdata'
replication_user='repmgr'
replication_type='physical'
pg_bindir='/opt/postgresql-12.9/bin'
monitoring_history=yes
monitor_interval_secs=5
log_level='debug'
log_file='/home/postgres/repmgr.log'
failover='automatic'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
promote_command='/home/postgres/repmgr_promote.sh'
follow_command='/home/postgres/repmgr_follow.sh %n'
上面主要修改的部分为:
node-id 三个节点为1,2,3,用于切换的id
node_name 主要为标识,设置与数据库cluster_name相同
conninfo 设置本地用于连接的repmgr数据库等信息
(2)配置提升为主库需要的 repmgr_promote.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.2 "/bin/sudo /usr/sbin/ip addr del 172.10.20.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.3 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start" >> /home/postgres/repmgr.log
/bin/sudo /usr/sbin/ifconfig ens33:0 172.20.10.6 netmask 255.255.255.0
上边几部分分别为
下掉2,3节点的vip
提升本地数据库角色为主库
在本地挂上vip
(3)配置赋予新主跟随的 repmgr_follow.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.1 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow %n" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby follow -f /home/postgres/repmgr.conf --upstream-node-id=%n --log-to-file
上边两部分为
下掉本地的vip
让本地备机跟随新主
给两个脚本权限
chmod 755 repmgr_promote.sh
chmod 755 repmgr_follow.sh
(4)主库注册
/home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf primary register
INFO: connecting to primary database…
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr
host=172.20.10.1 port=6000 fallback_application_name=repmgr"
INFO: “repmgr” extension is already installed
NOTICE: primary node record (ID: 11) registered
查看集群状态,此时集群里只有一个主库
/home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf cluster show
7.备库repmgr配置及从库注册
上面已经说到所有的节点都需要安装repmgr
如下为2,3节点的配置文件及脚本。
节点2:
(1)主节点创建repmgr.conf
touch /home/postgres/repmgr.conf
vi /home/postgres/repmgr.conf node_id=2
node_name='enmo_6002'
conninfo='host=172.20.10.2 port=6000 user=repmgr dbname=repmgr
connect_timeout=2'
data_directory='/data/pgdata'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/repmgr-5.1.0'
pg_bindir='/opt/postgresql-12.9/bin'
monitoring_history=yes
monitor_interval_secs=5
log_level='debug'
log_file='/home/postgres/repmgr.log'
failover='automatic'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
promote_command='/home/postgres/repmgr_promote.sh'
follow_command='/home/postgres/repmgr_follow.sh %n'
(2)配置提升为主库需要的 repmgr_promote.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.1 "/bin/sudo /usr/sbin/ip addr del 172.10.20.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.3 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start" >> /home/postgres/repmgr.log
/bin/sudo /usr/sbin/ifconfig ens33:0 172.20.10.6 netmask 255.255.255.0
(3)配置赋予新主跟随的 repmgr_follow.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.2 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow %n" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby follow -f /home/postgres/repmgr.conf --upstream-node-id=%n --log-to-file
节点3:
(1)主节点创建repmgr.conf
touch /home/postgres/repmgr.conf
vi /home/postgres/repmgr.conf node_id=3
node_name='enmo_6003'
conninfo='host=172.20.10.3 port=6000 user=repmgr dbname=repmgr
connect_timeout=2'
data_directory='/data/pgdata'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/repmgr-5.1.0'
pg_bindir='/opt/postgresql-12.9/bin'
monitoring_history=yes
monitor_interval_secs=5
log_level='debug'
log_file='/home/postgres/repmgr.log'
failover='automatic'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
promote_command='/home/postgres/repmgr_promote.sh'
follow_command='/home/postgres/repmgr_follow.sh %n'
(2)配置提升为主库需要的 repmgr_promote.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.1 "/bin/sudo /usr/sbin/ip addr del 172.10.20.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 1 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.2 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 2 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start" >> /home/postgres/repmgr.log
/bin/sudo /usr/sbin/ifconfig ens33:0 172.20.10.6 netmask 255.255.255.0
(3)配置赋予新主跟随的 repmgr_follow.sh 脚本
#! /bin/bashecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@172.20.10.3 "/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 3 finish" >> /home/postgres/repmgr.logecho "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow %n" >> /home/postgres/repmgr.log
/home/postgres/repmgr-5.1.0/repmgr standby follow -f /home/postgres/repmgr.conf --upstream-node-id=%n --log-to-file
给两个脚本权限
chmod 755 repmgr_promote.sh
chmod 755 repmgr_follow.sh
(4)两个备库分别从主库clone数据,相当于搭建流复制(要加入集群的备机节点执行,确保本地有数据目录,且数据目录下为空)
/home/postgres/repmgr-5.1.0/repmgr -h 172.10.10.1 -p6000 -U repmgr -d repmgr -f /home/postgres/repmgr.conf standby clone
(5)修改两个备库的 配置参数
修改 postgresql.conf
vi $PGDATA/postgresql.conf
cluster_name='enmo_6002' vi $PGDATA/postgresql.conf
cluster_name='enmo_6003'
(6)启动及注册
两个备库分别拉起
pg_ctl start -D $PGDATA
两个备库分别执行,进行注册
/home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf standby register
8.所有节点启动守护进程repmgrd
/home/postgres/repmgr-5.1.0/repmgrd -f /home/postgres/repmgr.conf -d
repmgrd守护进程主要做对集群节点的监控
9.流复制状态和集群状态检查
主备库分别查看 wal 进程 主库有 walsender 备库有 walreceiver
ps -ef | grep wal
查看集群状态,所有节点都可以通过repmgr查看集群状态
/home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf cluster showDEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.2 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.3 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+-------------+----------+----------+----------+------------------------------------------------------------------------1 | enmo_6001 | primary | * running | | default | 100 | 1 | host=172.20.10.1 port=6000 user=repmgr dbname=repmgr connect_timeout=22 | enmo_6002 | standby | running | enmo_6001 | default | 100 | 1 | host=172.20.10.2 port=6000 user=repmgr dbname=repmgr connect_timeout=23 | enmo_6003 | standby | running | enmo_6001 | default | 100 | 1 | host=172.20.10.3 port=6000 user=repmgr dbname=repmgr connect_timeout=2/home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf service statusDEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.2 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.3 port=6000 fallback_application_name=repmgr"
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------------+---------+-----------+-------------+---------+-------+---------+--------------------1 | enmo_6001 | primary | * running | | running | 15001 | no | n/a2 | enmo_6002 | standby | running | enmo_6001 | running | 15311 | no | 2 second(s) ago3 | enmo_6003 | standby | running | enmo_6001 | running | 16068 | no | 4 second(s) ago
主库查看流复制状态
postgres=# select * from pg_stat_replication;pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn |flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------+-------------------------------21781 | 16385 | repmgr | enmo_6002 | 172.20.10.2 | | 37700 | 2021-11-30 11:26:55.584587+08 | | streaming | 0/25003338 | 0/25003338 |0/25003338 | 0/25003338 | | | | 1 | sync | 2021-11-30 14:40:23.214424+0821880 | 16385 | repmgr | enmo_6003 | 172.20.10.3 | | 63746 | 2021-11-30 11:27:53.53565+08 | | streaming | 0/25003338 | 0/25003338 |0/25003338 | 0/25003338 | | | | 0 | async | 2021-11-30 14:40:20.465075+08(2 rows)
10.手动给主库挂载vip
/bin/sudo /usr/sbin/ifconfig ens33:0 172.20.10.6 netmask 255.255.255.0
搭建完毕!
附录:相关命令
上vip
/bin/sudo /usr/sbin/ifconfig ens33:0 172.20.10.6 netmask 255.255.255.0
下vip
/bin/sudo /usr/sbin/ip addr del 172.20.10.6/24 dev ens33:0
主库注册
repmgr -f /home/postgres/repmgr.conf primary register
备库注册
repmgr -f /home/postgres/repmgr.conf standby register
备库从主库clone(相当于全量build)
repmgr -h 172.20.10.1 -p6000 -U repmgr -d repmgr -f /home/postgres/repmgr.conf standby clone
相当于增量build,如果先加上–dry-run,表示先调试,但不执行
repmgr node rejoin -d 'host=172.20.10.1 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose
查看集群状态
repmgr -f /home/postgres/repmgr.conf cluster show
repmgr -f /home/postgres/repmgr.conf service status
启动守护进程
repmgrd -f /home/postgres/repmgr.conf -d
switchover(备库执行)
repmgr -f /home/postgres/repmgr.conf standby switchover -U repmgr --verbose