作者:杭州美创科技有限公司
得益于PostgreSQL的开源特性,越来越多的第三方集群管理软件填补了PostgreSQL在集群方面的易用性和可靠性,patroni+etcd提供了一系列的集群管理方案。etcd负责集群状态信息的存放,用来联系各个节点,patroni负责为集群提供高可用服务,两者的集合为PostgreSQL集群提供了故障转移的高可用服务,它不仅配置简单,而且功能丰富:
- 支持手动和自动故障转移
- 支持一主多从、级联复制
- 支持同步、异步模式
- 支持使用watchdog防止脑裂
前期准备
节点规划。实验过程我们使用一主两从构建一套高可用环境。
关闭主机防火墙
# systemctl stop firewalld.service
# systemctl disable firewalld.service
安装postgresql并搭建流复制环境(此步骤略)
在各个节点上部署etcd
安装必要的依赖包及etcd软件
# yum install -y gcc python-devel epel-release
# yum install -y etcd
编辑配置文件(以下列出了需要修改的参数,并以主节点为例)
# vim /etc/etcd/etcd.conf
#[Member]
ETCD_DATA_DIR="/var/lib/etcd/node1.etcd"
ETCD_LISTEN_PEER_URLS="http://:"
ETCD_LISTEN_CLIENT_URLS="http://:,http://:"
#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://:"
ETCD_ADVERTISE_CLIENT_URLS="http://:"
ETCD_INITIAL_CLUSTER="node1=http://:,node2=http://:, node3=http://:"
启动etcd集群,并设置开机自启动
# systemctl start etcd
# systemctl enable etcd
在各个节点上部署python3
需要使用高版本的python来使用patroni服务,一般的linux环境内置了版本的python环境,因此我们需要升级python,这里采用源码编译安装方式安装
# wget -c https://www.python.org/ftp/python//Python-.tar.xz
# ./configure
# make
# make install
删除原版本的软连接,添加新的软链接以使用python3
# rm -f /usr/bin/python
# ln -s /usr/local/bin/python3 /usr/bin/python
在各个节点上部署patroni
安装必要的依赖包和patroni软件
# pip3 install psycopg2-binary -i https://mirrors.aliyun.com/pypi/simple/
# pip3 install patroni -i https://mirrors.aliyun.com/pypi/simple/
修改patroni配置文件(以主节点为例)
# vim /etc/patroni.yml
scope: pgsql
namespace: /pgsql/
name: pgsql_node2
restapi:
listen: :
connect_address: :
etcd:
host: :
bootstrap:
# this section will be written into Etcd:///config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl:
loop_wait:
retry_timeout:
maximum_lag_on_failover:
master_start_timeout:
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: ""
port:
wal_level: logical
hot_standby: "on"
wal_keep_segments:
max_wal_senders:
max_replication_slots:
wal_log_hints: "on"
# archive_mode: "on"
# archive_timeout: 1800s
# archive_command: gzip < %p > /data/backup/pgwalarchive/%f.gz
# recovery_conf:
# restore_command: gunzip < /data/backup/pgwalarchive/%f.gz > %p
postgresql:
listen: :
connect_address: :
data_dir: /pgdata/patr2
bin_dir: /usr/pgsql-/bin
# config_dir: /etc/postgresql//main
authentication:
replication:
username: repl
password: repl
superuser:
username: postgres
password: postgres
#watchdog:
# mode: automatic # Allowed values: off, automatic, required
# device: /dev/watchdog
# safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
配置patroni服务单元
# vim /etc/systemd/system/patroni.service
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
#StandardOutput=syslog
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=
Restart=no
[Install]
WantedBy=multi-user.target
启动patroni服务
# systemctl start patroni
当然地,我们也可以直接使用patroni命令来启动patroni服务,配置服务单元是为了更方便使用。
#/usr/local/bin/patroni /etc/patroni.yml > patroni.log 2>&1 &
集群环境使用
查看节点信息
# patronictl -c /etc/patroni.yml list
+ Cluster: pgsql ()+------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node1 | : | Leader | running | 3 | |
| pgsql_node2 | : | | running | 3 | 0 |
| pgsql_node3 | : | | running | 3 | 0 |
+-------------+---------------------+--------+---------+----+-----------+
手动切换主从,选择某一可用的从节点,使其成为主节点角色
# patronictl -c /etc/patroni.yml switchover
Master [pgsql_node1]: pgsql_node1
Candidate ['pgsql_node2', 'pgsql_node3'] []: pgsql_node2
When should the switchover take place (e.g. -20T11: ) [now]: now
查看集群状态
# patronictl -c /etc/patroni.yml list
+ Cluster: pgsql ()+------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node1 | : | | running | 3 | 0 |
| pgsql_node2 | : | Leader | running | 3 | |
| pgsql_node3 | : | | running | 3 | 0 |
+-------------+---------------------+--------+---------+----+-----------+
自动切换主从。重启node1节点所在主机。查看集群状态,node2自动提升为主,如果只是关闭节点实例,则patroni会再将数据库服务自动拉起。
# patronictl -c /etc/patroni.yml list
+ Cluster: pgsql ()+------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node2 | : | Leader | running | 3 | |
| pgsql_node3 | : | | running | 3 | 0 |
+-------------+---------------------+--------+---------+----+-----------+
初始化某一节点。当某一节点与主库不同步,或者节点异常运行时,可以使用此方法初始化节点信息以重新加入集群。
# patronictl -c /etc/patroni.yml reinit pgsql
+ Cluster: pgsql ()+------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node1 | : | | running | 3 | 0 |
| pgsql_node2 | : | Leader | running | 3 | |
| pgsql_node3 | : | | running | 3 | 0 |
+-------------+---------------------+--------+---------+----+-----------+
选择以下需要添加的节点名称:pgsql_node3
你确定要重新初始化成员 pgsql_node3?[y/N]:y
成功:为成员pgsql_node3执行初始化
patroni是基于python开发的模板,etcd集群是按照Raft算法和协议开发的,是一个强一致性的、分布式的key-value数据库。两者的结合相辅相成,使得PostgreSQL集群在使用和维护过程中变得更便捷、更透明。