Integrating Patroni with Debian =============================== Introduction ------------ Patroni manages PostgreSQL instances and mostly expects a blank sheet, i.e prefers to initialize and insists to start and stop the database itself. Debian on the other hand includes the `postgresql-common` (called pg-common in the following) framework that manages concurrent major versions of PostgreSQL and possibly database instances for each of them. This document details a design on how to integrate Patroni (in form of the Debian `patroni` package, preferable provided by PGDG's apt repository) with Debian's pg-common framework and policies. pg-common nomenclature and directory layout ------------------------------------------- pg-common enhances the usual `libpq` environment variables like `PGHOST` and `PGDATABASE` with `PGCLUSTER`, that is the Debian standard way of addressing a specific version/instance combination. The `PGCLUSTER` environment variable consists of two parts, the PostgreSQL major version and the instance name (`main` being the default instance name), serarated by a '/', e.g. `'10/main'`. The two components are usually abbreviated or referred to as '%v' and '%c', respectively. The versioned binaries and libraries are located in `/usr/lib/postgresql/%v/{bin,lib}`. The configuration files usually reside in `/etc/postgresql/%v/%c/` and the default log file (set via the `pg_ctlcluster` wrapper) is `/var/log/postgresql/postgresql-%v-%c.log`. The default data directory is `/var/lib/postgresql/%v/%c` and the listening sockets are located in `/var/run/postgresql/`. Patroni configuration --------------------- The Patroni configuration is in YAML format. Different Patroni instances or installations are identified by the `scope` configuration option, which mostly maps to an instance name. As Patroni is usually started via Docker or another container runtime, there is no opinioated default file system layout or even configuratin file location. It would be desirable to create/maintain a Patroni configuration for each pg-common instance as `/etc/patroni/%v-%c.yml`, i.e. e.g. `/etc/patroni/10-main.yml`. Another possibility would be to prefix the configuration file name with `patroni-`. Automatic generation of Patroni configuration --------------------------------------------- Currently, the pg-common instance specific configuration file `/etc/patroni/%v-%c.yml` needs to be deployed/adopted manually as the pg-common framework and `pg_createcluster` have no possibilty to run external programs as hooks after instance creation. Another possibilty is to create the configuration file via an external config management program like Ansible or Puppet. Regardless of this, a simple program that creates the instance-specific configuration from a template (e.g. `/etc/patroni/patroni.yml.in`) would be desirable and could be shipped in the `patroni` Debian package. Debian-adopted Patroni configuration ------------------------------------ it is possble to mimick the default Debian layout via the following configuration parameters: ``` data_dir = "/var/lib/postgresql/%v/%c" bin_dir = "/usr/lib/postgresql/%v/bin" config_dir: "/etc/postgresql/%v/%c" ``` The logfile location and filename as well as the socket directory have to be explicitly set via the configuration file: ``` unix_socket_directories = '/var/run/postgresql/' log_directory = '/var/log/postgresql' log_filename = 'postgresql-%v-%c.log' ``` The `postgresql.conf` configuration file ---------------------------------------- Patroni expects to deploy a configuration file and to be able to change it and keep it in sync across nodes. If a configuration file `postgresql.conf` is already present in the `config_dir`, it renames it to `postgresql.base.conf` and includes it at the top of the `postgresql.conf` it writes instead. Initialization of the first and standby instances ------------------------------------------------- Patroni by default runs `initdb` on the data directory during bootrap while pg-common provides the `pg_createcluster` command for this purpose. It is possible to tell Patroni to run an external bootstrap command which is passed the `--scope` and `--datadir` command-line options. This makes it possible to have a small wrapper script like the following that runs `pg_createcluster` instead: ``` #!/bin/sh for i in "$@" do case $i in --scope=*) SCOPE="${i#*=}" shift # past argument=value ;; --datadir=*) DATADIR="${i#*=}" shift # past argument=value ;; *) # unknown option ;; esac done VERSION=$(echo $SCOPE | sed -e 's/\/.*//') CLUSTER=$(echo $SCOPE | sed -e 's/.*\///') pg_createcluster $VERSION $CLUSTER exit $? ``` This requires the following in the Patroni YAML configuratio: ``` bootstrap: # Custom bootstrap method method: pg_createcluster pg_createcluster: command: ``` Subsequent nodes are boostrapped from the primary via base backups. Just running `pg_createcluster` on them is not possible as then the cluster IDs (the 'Database system identifier' in the `pg_controldata` output) would differ. Again, it is possible to provide an external clone program, which can run `pg_basebackup` itself. The pg-common framework currently does not provide for this, but it is possible to run `pg_basebackup` after `pg_createcluster` (which creates the configuration directory) and purging the data directory: ``` #!/bin/sh for i in "$@" do case $i in --scope=*) SCOPE="${i#*=}" shift # past argument=value ;; --role=*) ROLE="${i#*=}" shift # past argument=value ;; --datadir=*) DATADIR="${i#*=}" shift # past argument=value ;; --connstring=*) CONNSTR="${i#*=}" shift # past argument=value ;; *) # unknown option ;; esac done VERSION=$(echo $SCOPE | sed -e 's/\/.*//') CLUSTER=$(echo $SCOPE | sed -e 's/.*\///') if [ -f /etc/postgresql/$VERSION/$CLUSTER/postgresql.conf ] then pg_dropcluster $VERSION $CLUSTER fi pg_createcluster $VERSION $CLUSTER && rm -rf $DATADIR && pg_basebackup --pgdata $DATADIR -X stream --dbname=$CONNSTR exit $? ``` Those two scripts could be shipped in the `patroni` Debian package e.g. as `/usr/share/patroni/pg_createcluster_patroni` and `/usr/share/patroni/pg_clonecluster_patroni`. Systemd services ---------------- The Patroni daemon/agent needs to be started for each configuration file in `/etc/patroni`, i.e. a systemd service unit is required for each. This can be facilitated via the `patroni@.service` that acts as a wild card and could have the following pg-common specific content: ``` [Unit] ConditionPathExists=/etc/patroni/%i.yml [Service] ExecStart=/usr/bin/patroni /etc/patroni/%i.yml ``` This makes it possible to e.g. start Patroni for the '10/main' instance with `systemctl start patroni@10/main`. DCS Debian availability ----------------------- The only DCS server available in Debian 9 (stretch) is zookeeper. Both etcd and consul were blacklisted before release by the Debian security team due to being written in Go. Current testing (buster) also includes etcd and consul. The latest Patroni Debian package is available for both Debian stable and testing/unstable via the PGDG APT repository. DCS setup --------- One outstanding issue with providing a general Patroni configuration template and possibly a program that creates a Patroni configuration file per pg-common instance is the distributed consensus store, which also needs to be configured in the Patroni YAML file. One possibility would be to introduce a file `/etc/default/patroni` where the DCS and it's ip-address(es) could be configured which could be sourced by the configuration file generator. Security -------- The proposed/current setup should be considered a Proof of Concept. In particular, care must be taken that the postgres db/system user password is not leaked and that the DCS access is secured/hardened. Making `patronictl` pg-common aware ----------------------------------- The `patronictl` CLI currently looks for a configuration file `~/.config/patronictl.yml`. This could potentially be a symlink to the configuration under `/etc/patroni/`. In the pg-common scope, it could addtionally either display all pg-common instances and/or accept the '%v/%c' instance notation in order to select a specific instance to display or act on. Current status -------------- As of the postgresql-common_194 and patroni_1.5.0-9 packages, the following steps are still needed for each pg-common instance %v/%c: 1. Setup DCS and possibly install its client python packages 2. Install postgresql-common package 3. Disable automatic instance creation in /etc/postgresql-common/createcluster.conf 4. Install postgresql-%v package 5. Setup authentication via pgpass or otherwise 6. Create a /etc/patroni/%v-%c.yml configuration file 7. Start the patroni systemd service for the instance In particular, no changes are required to the Patroni or postgresql-common upstream codebases.