Initial design document for Patroni/Debian integration

2018-10-22 16:19:21 +02:00 · 2018-10-22 16:19:21 +02:00 · 5c84823b7d
parent e786df09ce
commit 5c84823b7d
1 changed files with 219 additions and 0 deletions
--- a/debian/design.md
+++ b/debian/design.md
@ -0,0 +1,219 @@
+Integrating patroni with Debian
+===============================
+
+Introduction
+------------
+
+Patroni manages PostgreSQL instances and mostly expects a blank sheet, i.e
+prefers to initialize and insists to start and stop the database itself. Debian
+on the other hand includes the `postgresql-common` (called pg-common in the
+following) framework that manages concurrent major versions of PostgreSQL and
+possibly database instances for each of them. This document details a design on
+how to integrate Patroni (in form of the Debian `patroni` package, preferable
+provided by PGDG's apt repository) with Debian's pg-common framework and
+policies.
+
+pg-common nomenclature and directory layout
+-------------------------------------------
+
+pg-common enhances the usual `libpq` environment variables like `PGHOST` and
+`PGDATABASE` with `PGCLUSTER`, that is the Debian standard way of addressing a
+specific version/instance combination. The `PGCLUSTER` environment variable
+consists of two parts, the PostgreSQL major version and the instance name
+(`main` being the default instance name), serarated by a '/', e.g. `'10/main'`.
+The two components are usually abbreviated or referred to as '%v' and '%c',
+respectively.
+
+The versioned binaries and libraries are located in
+`/usr/lib/postgresql/%v/{bin,lib}`. The configuration files usually reside in
+`/etc/postgresql/%v/%c/` and the default log file (set via the `pg_ctlcluster`
+wrapper) is `/var/log/postgresql/postgresql-%v-%c.log`. The default data
+directory is `/var/lib/postgresql/%v/%c` and the listening sockets are located
+in `/var/run/postgresql/`.
+
+Patroni configuration
+---------------------
+
+The patroni configuration is in YAML format. Different patroni instances or
+installations are identified by the `scope` configuration option, which mostly
+maps to an instance name.
+
+As patroni is usually started via Docker or another container runtime, there is
+no opinioated default file system layout or even configuratin file location.
+
+It would be desirable to create/maintain a Patroni configuration for each
+pg-common instance as `/etc/patroni/%v-%c.yml`, i.e. e.g.
+`/etc/patroni/10-main.yml`. Another possibility would be to prefix the
+configuration file name with `patroni-`.
+
+Automatic generation of patroni configuration
+---------------------------------------------
+
+Currently, the pg-common instance specific configuration file
+`/etc/patroni/%v-%c.yml` needs to be deployed/adopted manually as the pg-common
+framework and `pg_createcluster` have no possibilty to run external programs as
+hooks after instance creation.
+
+Another possibilty is to create the configuration file via an external config
+management program like Ansible or Puppet.
+
+Regardless of this, a simple program that creates the instance-specific
+configuration from a template (e.g. `/etc/patroni/patroni.yml.in`) would be
+desirable and could be shipped in the `patroni` Debian package.
+
+Debian-adopted Patroni configuration
+------------------------------------
+
+it is possble to mimick the default Debian layout via the following
+configuration parameters:
+
+```
+data_dir = "/var/lib/postgresql/%v/%c"
+bin_dir = "/usr/lib/postgresql/%v/bin"
+config_dir: "/etc/postgresql/%v/%c"
+```
+
+The logfile location and filename as well as the socket directory have to be
+explicitly set via the configuration file:
+
+```
+unix_socket_directories = '/var/run/postgresql/'
+log_directory = '/var/log/postgresql'
+log_filename = 'postgresql-%v-%c.log'
+```
+
+The `postgresql.conf` configuration file
+----------------------------------------
+
+Patroni expects to deploy a configuration file and to be able to change it and
+keep it in sync across nodes. If a configuration file `postgresql.conf` is
+already present in the `config_dir`, it renames it to `postgresql.base.conf` and
+includes it at the top of the `postgresql.conf` it writes instead.
+
+Initialization of the first and standby instances
+-------------------------------------------------
+
+Patroni by default runs `initdb` on the data directory during bootrap while
+pg-common provides the `pg_createcluster` command for this purpose. 
+
+It is possible to tell patroni to run an external bootstrap command which is
+passed the `--scope` and `--datadir` command-line options. This makes it
+possible to have a small wrapper script like the following that runs
+`pg_createcluster` instead:
+
+```
+#!/bin/sh
+
+for i in "$@"
+do
+case $i in
+    --scope=*)
+    SCOPE="${i#*=}"
+    shift # past argument=value
+    ;;
+    --datadir=*)
+    DATADIR="${i#*=}"
+    shift # past argument=value
+    ;;
+    *)
+       # unknown option
+    ;;
+esac
+done
+
+VERSION=$(echo $SCOPE | sed -e 's/\/.*//')
+CLUSTER=$(echo $SCOPE | sed -e 's/.*\///')
+
+pg_createcluster $VERSION $CLUSTER
+exit $?
+```
+
+This requires the following in the patroni YAML configuratio:
+
+```
+bootstrap:
+  # Custom bootstrap method
+  method: pg_createcluster
+  pg_createcluster:
+    command: <custom script above>
+```
+
+Subsequent nodes are boostrapped from the primary via base backups. Just running
+`pg_createcluster` on them is not possible as then the cluster IDs (the
+'Database system identifier' in the `pg_controldata` output) would differ.
+
+Again, it is possible to provide an external clone program, which can run
+`pg_basebackup` itself. The pg-common framework currently does not provide for
+this, but it is possible to run `pg_basebackup` after `pg_createcluster` (which
+creates the configuration directory) and purging the data directory:
+
+```
+#!/bin/sh
+
+for i in "$@"
+do
+case $i in
+    --scope=*)
+    SCOPE="${i#*=}"
+    shift # past argument=value
+    ;;
+    --role=*)
+    ROLE="${i#*=}"
+    shift # past argument=value
+    ;;
+    --datadir=*)
+    DATADIR="${i#*=}"
+    shift # past argument=value
+    ;;
+    --connstring=*)
+    CONNSTR="${i#*=}"
+    shift # past argument=value
+    ;;
+    *)
+       # unknown option
+    ;;
+esac
+done
+
+VERSION=$(echo $SCOPE | sed -e 's/\/.*//')
+CLUSTER=$(echo $SCOPE | sed -e 's/.*\///')
+
+if [ -f /etc/postgresql/$VERSION/$CLUSTER/postgresql.conf ]
+then
+    pg_dropcluster $VERSION $CLUSTER
+fi
+
+pg_createcluster $VERSION $CLUSTER && rm -rf $DATADIR && pg_basebackup --pgdata $DATADIR -X stream --dbname=$CONNSTR
+exit $?
+```
+
+Those two scripts could be shipped in the `patroni` Debian package as e.g.
+`/usr/bin/pg_createcluster_patroni` and `/usr/bin/pg_clonecluster_patroni`.
+
+Systemd services
+----------------
+
+The patroni daemon/agent needs to be started for each configuration file in
+`/etc/patroni`, i.e. a systemd service unit is required for each. This can be
+facilitated via the `patroni@.service` that acts as a wild card and could have
+the following pg-common specific content:
+
+```
+[Unit]
+ConditionPathExists=/etc/patroni/$i.yml
+
+[Service]
+ExecStart=/usr/bin/patroni /etc/patroni/$i.yml
+```
+
+This makes it possible to e.g. start patroni for the '10/main' instance with
+`systemctl start patroni@10/main`.
+
+Making `patronictl` pg-common aware
+-----------------------------------
+
+The `patronictl` CLI currently looks for a configuration file 
+`~/.config/patronictl.yml`. This could potentially be a symlink to the
+configuration under `/etc/patroni/`. In the pg-common scope, it could
+addtionally either display all pg-common instances and/or accept the '%v/%c'
+instance notation in order to select a specific instance to display or act on.