Initial design document for Patroni/Debian integration

This commit is contained in:
Michael Banck 2018-10-22 16:19:21 +02:00
parent e786df09ce
commit 5c84823b7d

219
debian/design.md vendored Normal file
View file

@ -0,0 +1,219 @@
Integrating patroni with Debian
===============================
Introduction
------------
Patroni manages PostgreSQL instances and mostly expects a blank sheet, i.e
prefers to initialize and insists to start and stop the database itself. Debian
on the other hand includes the `postgresql-common` (called pg-common in the
following) framework that manages concurrent major versions of PostgreSQL and
possibly database instances for each of them. This document details a design on
how to integrate Patroni (in form of the Debian `patroni` package, preferable
provided by PGDG's apt repository) with Debian's pg-common framework and
policies.
pg-common nomenclature and directory layout
-------------------------------------------
pg-common enhances the usual `libpq` environment variables like `PGHOST` and
`PGDATABASE` with `PGCLUSTER`, that is the Debian standard way of addressing a
specific version/instance combination. The `PGCLUSTER` environment variable
consists of two parts, the PostgreSQL major version and the instance name
(`main` being the default instance name), serarated by a '/', e.g. `'10/main'`.
The two components are usually abbreviated or referred to as '%v' and '%c',
respectively.
The versioned binaries and libraries are located in
`/usr/lib/postgresql/%v/{bin,lib}`. The configuration files usually reside in
`/etc/postgresql/%v/%c/` and the default log file (set via the `pg_ctlcluster`
wrapper) is `/var/log/postgresql/postgresql-%v-%c.log`. The default data
directory is `/var/lib/postgresql/%v/%c` and the listening sockets are located
in `/var/run/postgresql/`.
Patroni configuration
---------------------
The patroni configuration is in YAML format. Different patroni instances or
installations are identified by the `scope` configuration option, which mostly
maps to an instance name.
As patroni is usually started via Docker or another container runtime, there is
no opinioated default file system layout or even configuratin file location.
It would be desirable to create/maintain a Patroni configuration for each
pg-common instance as `/etc/patroni/%v-%c.yml`, i.e. e.g.
`/etc/patroni/10-main.yml`. Another possibility would be to prefix the
configuration file name with `patroni-`.
Automatic generation of patroni configuration
---------------------------------------------
Currently, the pg-common instance specific configuration file
`/etc/patroni/%v-%c.yml` needs to be deployed/adopted manually as the pg-common
framework and `pg_createcluster` have no possibilty to run external programs as
hooks after instance creation.
Another possibilty is to create the configuration file via an external config
management program like Ansible or Puppet.
Regardless of this, a simple program that creates the instance-specific
configuration from a template (e.g. `/etc/patroni/patroni.yml.in`) would be
desirable and could be shipped in the `patroni` Debian package.
Debian-adopted Patroni configuration
------------------------------------
it is possble to mimick the default Debian layout via the following
configuration parameters:
```
data_dir = "/var/lib/postgresql/%v/%c"
bin_dir = "/usr/lib/postgresql/%v/bin"
config_dir: "/etc/postgresql/%v/%c"
```
The logfile location and filename as well as the socket directory have to be
explicitly set via the configuration file:
```
unix_socket_directories = '/var/run/postgresql/'
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%v-%c.log'
```
The `postgresql.conf` configuration file
----------------------------------------
Patroni expects to deploy a configuration file and to be able to change it and
keep it in sync across nodes. If a configuration file `postgresql.conf` is
already present in the `config_dir`, it renames it to `postgresql.base.conf` and
includes it at the top of the `postgresql.conf` it writes instead.
Initialization of the first and standby instances
-------------------------------------------------
Patroni by default runs `initdb` on the data directory during bootrap while
pg-common provides the `pg_createcluster` command for this purpose.
It is possible to tell patroni to run an external bootstrap command which is
passed the `--scope` and `--datadir` command-line options. This makes it
possible to have a small wrapper script like the following that runs
`pg_createcluster` instead:
```
#!/bin/sh
for i in "$@"
do
case $i in
--scope=*)
SCOPE="${i#*=}"
shift # past argument=value
;;
--datadir=*)
DATADIR="${i#*=}"
shift # past argument=value
;;
*)
# unknown option
;;
esac
done
VERSION=$(echo $SCOPE | sed -e 's/\/.*//')
CLUSTER=$(echo $SCOPE | sed -e 's/.*\///')
pg_createcluster $VERSION $CLUSTER
exit $?
```
This requires the following in the patroni YAML configuratio:
```
bootstrap:
# Custom bootstrap method
method: pg_createcluster
pg_createcluster:
command: <custom script above>
```
Subsequent nodes are boostrapped from the primary via base backups. Just running
`pg_createcluster` on them is not possible as then the cluster IDs (the
'Database system identifier' in the `pg_controldata` output) would differ.
Again, it is possible to provide an external clone program, which can run
`pg_basebackup` itself. The pg-common framework currently does not provide for
this, but it is possible to run `pg_basebackup` after `pg_createcluster` (which
creates the configuration directory) and purging the data directory:
```
#!/bin/sh
for i in "$@"
do
case $i in
--scope=*)
SCOPE="${i#*=}"
shift # past argument=value
;;
--role=*)
ROLE="${i#*=}"
shift # past argument=value
;;
--datadir=*)
DATADIR="${i#*=}"
shift # past argument=value
;;
--connstring=*)
CONNSTR="${i#*=}"
shift # past argument=value
;;
*)
# unknown option
;;
esac
done
VERSION=$(echo $SCOPE | sed -e 's/\/.*//')
CLUSTER=$(echo $SCOPE | sed -e 's/.*\///')
if [ -f /etc/postgresql/$VERSION/$CLUSTER/postgresql.conf ]
then
pg_dropcluster $VERSION $CLUSTER
fi
pg_createcluster $VERSION $CLUSTER && rm -rf $DATADIR && pg_basebackup --pgdata $DATADIR -X stream --dbname=$CONNSTR
exit $?
```
Those two scripts could be shipped in the `patroni` Debian package as e.g.
`/usr/bin/pg_createcluster_patroni` and `/usr/bin/pg_clonecluster_patroni`.
Systemd services
----------------
The patroni daemon/agent needs to be started for each configuration file in
`/etc/patroni`, i.e. a systemd service unit is required for each. This can be
facilitated via the `patroni@.service` that acts as a wild card and could have
the following pg-common specific content:
```
[Unit]
ConditionPathExists=/etc/patroni/$i.yml
[Service]
ExecStart=/usr/bin/patroni /etc/patroni/$i.yml
```
This makes it possible to e.g. start patroni for the '10/main' instance with
`systemctl start patroni@10/main`.
Making `patronictl` pg-common aware
-----------------------------------
The `patronictl` CLI currently looks for a configuration file
`~/.config/patronictl.yml`. This could potentially be a symlink to the
configuration under `/etc/patroni/`. In the pg-common scope, it could
addtionally either display all pg-common instances and/or accept the '%v/%c'
instance notation in order to select a specific instance to display or act on.