New upstream version 2.0.0

Release V2.0.0
Black run
2024-04-14 09:26:34 +02:00 · 2024-04-09 16:45:11 +02:00 · 2024-02-27 11:29:52 +01:00 · 2024-02-26 16:02:53 +01:00 · 2024-01-09 06:50:00 +01:00 · 2024-01-09 06:50:00 +01:00
45 changed files with 1514 additions and 657 deletions
--- a/.coveragerc
+++ b/.coveragerc
@ -0,0 +1,3 @@
+[run]
+include =
+  check_patroni/*
--- a/.gitignore
+++ b/.gitignore
@ -1,10 +1,10 @@
 __pycache__/
 check_patroni.egg-info
-tests/*.state_file
 tests/config.ini
 vagrant/.vagrant
 vagrant/*.state_file
 .*.swp
+.coverage
 .venv/
 .tox/
 dist/
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,13 +1,37 @@
 # Change log

-## Unreleased
+## check_patroni 2.0.0 - 2024-04-09
+
+### Changed
+
+* In `cluster_node_count`, a healthy standby, sync replica or standby leaders cannot be "in
+  archive recovery" because this service doesn't check for lag and timelines.

 ### Added

+* Add the timeline in the  `cluster_has_replica` perfstats. (#50)
+* Add a mention about shell completion support and shell versions in the doc. (#53)
+* Add the leader type and whether it's archiving to the `cluster_has_leader` perfstats. (#58)
+
 ### Fixed

+* Add compatibility with [requests](https://requests.readthedocs.io)
+  version 2.25 and higher.
+* Fix what `cluster_has_replica` deems a healthy replica. (#50, reported by @mbanck)
+* Fix `cluster_has_replica` to display perfstats for replicas whenever it's possible (healthy or not). (#50)
+* Fix `cluster_has_leader` to correctly check for standby leaders. (#58, reported by @mbanck)
+* Fix `cluster_node_count` to correctly manage replication states. (#50, reported by @mbanck)
+
 ### Misc

+* Improve the documentation for `node_is_replica`.
+* Improve test coverage by running an HTTP server to fake the Patroni API (#55
+  by @dlax).
+* Work around old pytest versions in type annotations in the test suite.
+* Declare compatibility with click version 7.1 (or higher).
+* In tests, work around nagiosplugin 1.3.2 not properly handling stdout
+  redirection.
+
 ## check_patroni 1.0.0 - 2023-08-28

 Check patroni is now tagged as Production/Stable.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -43,15 +43,14 @@ A vagrant file can be found in [this
 repository](https://github.com/ioguix/vagrant-patroni) to generate a patroni/etcd
 setup.

-The `README.md` can be geneated with `./docs/make_readme.sh`.
+The `README.md` can be generated with `./docs/make_readme.sh`.

 ## Executing Tests

 Crafting repeatable tests using a live Patroni cluster can be intricate. To
-simplify the development process, interactions with Patroni's API are
-substituted with a mock function that yields an HTTP return code and a JSON
-object outlining the cluster's status. The JSON files containing this
-information are housed in the `./tests/json` directory.
+simplify the development process, a fake HTTP server is set up as a test
+fixture and serves static files (either from `tests/json` directory or from
+in-memory data).

 An important consideration is that there is a potential drawback: if the JSON
 data is incorrect or if modifications have been made to Patroni without
@ -61,21 +60,15 @@ erroneously.
 The tests are executed automatically for each PR using the ci (see
 `.github/workflow/lint.yml` and `.github/workflow/tests.yml`).

-Running the tests manually:
+Running the tests,

-* Using patroni's nominal replica state of `streaming` (since v3.0.4):
+* manually:

  ```bash
-  pytest ./tests
+  pytest --cov tests
  ```

-* Using patroni's nominal replica state of `running` (before v3.0.4):
-
-  ```bash
-  pytest --use-old-replica-state ./tests
-  ```
-
-* Using tox:
+* or using tox:

  ```bash
  tox -e lint    # mypy + flake8 + black + isort ° codespell
@ -83,9 +76,9 @@ Running the tests manually:
  tox -e py      # pytests and "lint" tests for the default version of python
  ```

-Please note that when dealing with any service that checks the state of a node
-in patroni's `cluster` endpoint, the corresponding JSON test file must be added
-in `./tests/tools.py`.
+Please note that when dealing with any service that checks the state of a node,
+the related tests must use the `old_replica_state` fixture to test with both
+old (pre 3.0.4) and new replica states.

 A bash script, `check_patroni.sh`, is provided to facilitate testing all
 services on a Patroni endpoint (`./vagrant/check_patroni.sh`). It requires one
@ -99,17 +92,3 @@ Here's an example usage:
 ```bash
 ./vagrant/check_patroni.sh http://10.20.30.51:8008
 ```
-
-## Release
-
-Update the Changelog.
-
-The package is generated and uploaded to pypi when a `v*` tag is created (see
-`.github/workflow/publish.yml`).
-
-Alternatively, the release can be done manually with:
-
-```
-tox -e build
-tox -e upload
-```
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -2,6 +2,7 @@ include *.md
 include mypy.ini
 include pytest.ini
 include tox.ini
+include .coveragerc
 include .flake8
 include pyproject.toml
 recursive-include docs *.sh
--- a/README.md
+++ b/README.md
@ -45,7 +45,7 @@ Commands:
  node_is_leader                Check if the node is a leader node.
  node_is_pending_restart       Check if the node is in pending restart...
  node_is_primary               Check if the node is the primary with the...
-  node_is_replica               Check if the node is a running replica...
+  node_is_replica               Check if the node is a replica with no...
  node_patroni_version          Check if the version is equal to the input
  node_tl_has_changed           Check if the timeline has changed.
 ```
@ -60,7 +60,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git

 check_patroni works on python 3.6, we keep it that way because patroni also
 supports it and there are still lots of RH 7 variants around. That being said
-python 3.6 has been EOL for age and there is no support for it in the github
+python 3.6 has been EOL for ages and there is no support for it in the github
 CI.

 ## Support
@ -98,8 +98,8 @@ A match is found when: `start <= VALUE <= end`.

 For example, the following command will raise:

-* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
-* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
+* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
+* a critical if there are no nodes, which can be translated to outside of range [1;+INF[

 ```
 check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
@ -115,6 +115,30 @@ Several options are available:
  * `--cert_file`: your certificate or the concatenation of your certificate and private key
  * `--key_file`: your private key (optional)

+## Shell completion
+
+We use the [click] library which supports shell completion natively.
+
+Shell completion can be added by typing the following command or adding it to
+a file spécific to your shell of choice.
+
+* for Bash (add to `~/.bashrc`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
+  ```
+* for Zsh  (add to `~/.zshrc`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
+  ```
+* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
+  ```
+
+Please note that shell completion is not supported far all shell versions, for
+example only Bash versions older than 4.4 are supported.
+
+[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/

 ## Cluster services

@ -152,11 +176,27 @@ Usage: check_patroni cluster_has_leader [OPTIONS]

  This check applies to any kind of leaders including standby leaders.

+  A leader is a node with the "leader" role and a "running" state.
+
+  A standby leader is a node with a "standby_leader" role and a "streaming" or
+  "in archive recovery" state. Please note that log shipping could be stuck
+  because the WAL are not available or applicable. Patroni doesn't provide
+  information about the origin cluster (timeline or lag), so we cannot check
+  if there is a problem in that particular case. That's why we issue a warning
+  when the node is "in archive recovery". We suggest using other supervision
+  tools to do this (eg. check_pgactivity).
+
  Check:
  * `OK`: if there is a leader node.
-  * `CRITICAL`: otherwise
+  * 'WARNING': if there is a stanby leader in archive mode.
+  * `CRITICAL`: otherwise.

-  Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
+  Perfdata:
+  * `has_leader` is 1 if there is any kind of leader node, 0 otherwise
+  * `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
+     archive recovery", 0 otherwise
+  * `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
+  * `is_leader` is 1 if there is a "classical" leader node, 0 otherwise

 Options:
  --help  Show this message and exit.
@ -169,10 +209,27 @@ Usage: check_patroni cluster_has_replica [OPTIONS]

  Check if the cluster has healthy replicas and/or if some are sync standbies

+  For patroni (and this check):
+  * a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
+  * a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
+
  A healthy replica:
-  * is in running or streaming state (V3.0.4)
-  * has a replica or sync_standby role
-  * has a lag lower or equal to max_lag
+  * has a `replica` or `sync_standby` role
+  * has the same timeline as the leader and
+    * is in `running` state (patroni < V3.0.4)
+    * is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
+  * has a lag lower or equal to `max_lag`
+
+  Please note that replica `in archive recovery` could be stuck because the
+  WAL are not available or applicable (the server's timeline has diverged for
+  the leader's). We already detect the latter but we will miss the former.
+  Therefore, it's preferable to check for the lag in addition to the healthy
+  state if you rely on log shipping to help lagging standbies to catch up.
+
+  Since we require a healthy replica to have the same timeline as the leader,
+  it's possible that we raise alerts when the cluster is performing a
+  switchover or failover and the standbies are in the process of catching up
+  with the new leader. The alert shouldn't last long.

  Check:
  * `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
@ -182,8 +239,9 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
  Perfdata:
  * healthy_replica & unhealthy_replica count
  * the number of sync_replica, they are included in the previous count
-  * the lag of each replica labelled with  "member name"_lag
-  * a boolean to tell if the node is a sync stanbdy labelled with  "member name"_sync
+  * the lag of each replica labelled with "member name"_lag
+  * the timeline of each replica labelled with "member name"_timeline
+  * a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync

 Options:
  -w, --warning TEXT    Warning threshold for the number of healthy replica
@ -241,26 +299,37 @@ Usage: check_patroni cluster_node_count [OPTIONS]

  Count the number of nodes in the cluster.

+  The role refers to the role of the server in the cluster. Possible values
+  are:
+  * master or leader
+  * replica
+  * standby_leader
+  * sync_standby
+  * demoted
+  * promoted
+  * uninitialized
+
  The state refers to the state of PostgreSQL. Possible values are:
  * initializing new cluster, initdb failed
  * running custom bootstrap script, custom bootstrap failed
  * starting, start failed
  * restarting, restart failed
-  * running, streaming (for a replica V3.0.4)
+  * running, streaming, in archive recovery
  * stopping, stopped, stop failed
  * creating replica
  * crashed

-  The role refers to the role of the server in the cluster. Possible values
-  are:
-  * master or leader (V3.0.0+)
-  * replica
-  * demoted
-  * promoted
-  * uninitialized
+  The "healthy" checks only ensures that:
+  * a leader has the running state
+  * a standby_leader has the running or streaming (V3.0.4) state
+  * a replica or sync-standby has the running or streaming (V3.0.4) state
+
+  Since we dont check the lag or timeline, "in archive recovery" is not
+  considered a valid state for this service. See cluster_has_leader and
+  cluster_has_replica for specialized checks.

  Check:
-  * Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
+  * Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
  * `OK`:  If they are not provided.

  Perfdata:
@ -307,7 +376,7 @@ Usage: check_patroni node_is_pending_restart [OPTIONS]

  Check if the node is in pending restart state.

-  This situation can arise if the configuration has been modified but requiers
+  This situation can arise if the configuration has been modified but requires
  a restart of PostgreSQL to take effect.

  Check:
@ -368,12 +437,21 @@ Options:
 ```
 Usage: check_patroni node_is_replica [OPTIONS]

-  Check if the node is a running replica with no noloadbalance tag.
+  Check if the node is a replica with no noloadbalance tag.

  It is possible to check if the node is synchronous or asynchronous. If
-  nothing is specified any kind of replica is accepted. When checking for a
+  nothing is specified any kind of replica is accepted.  When checking for a
  synchronous replica, it's not possible to specify a lag.

+  This service is using the following Patroni endpoints: replica, asynchronous
+  and synchronous. The first two implement the `lag` tag. For these endpoints
+  the state of a replica node doesn't reflect the replication state
+  (`streaming` or `in archive recovery`), we only know if it's `running`. The
+  timeline is also not checked.
+
+  Therefore, if a cluster is using asynchronous replication, it is recommended
+  to check for the lag to detect a divegence as soon as possible.
+
  Check:
  * `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
  * `CRITICAL`:  otherwise
--- a/RELEASE.md
+++ b/RELEASE.md
@ -0,0 +1,38 @@
+# Release HOW TO
+
+## Preparatory changes
+
+* Review the **Unreleased** section, if any, in `CHANGELOG.md` possibly adding
+  any missing item from closed issues, merged pull requests, or directly the
+  git history[^git-changes],
+* Rename the **Unreleased** section according to the version to be released,
+  with a date,
+* Bump the version in `check_patroni/__init__.py`,
+* Rebuild the `README.md` (`cd docs; ./make_readme.sh`),
+* Commit these changes (either on a dedicated branch, before submitting a pull
+  request or directly on the `master` branch) with the commit message `release
+  X.Y.Z`.
+* Then, when changes landed in the `master` branch, create an annotated (and
+  possibly signed) tag, as `git tag -a [-s] -m 'release X.Y.Z' vX.Y.Z`,
+  and,
+* Push with `--follow-tags`.
+
+[^git-changes]: Use `git log $(git describe --tags --abbrev=0).. --format=%s
+  --reverse` to get commits from the previous tag.
+
+## PyPI package
+
+The package is generated and uploaded to pypi when a `v*` tag is created (see
+`.github/workflow/publish.yml`).
+
+Alternatively, the release can be done manually with:
+
+```
+tox -e build
+tox -e upload
+```
+
+## GitHub release
+
+Draft a new release from the release page, choosing the tag just pushed and
+copy the relevant change log section as a description.
--- a/check_patroni/init.py
+++ b/check_patroni/init.py
@ -1,5 +1,5 @@
 import logging

-__version__ = "1.0.0"
+__version__ = "2.0.0"

 _log: logging.Logger = logging.getLogger(__name__)
--- a/check_patroni/cli.py
+++ b/check_patroni/cli.py
@ -226,29 +226,40 @@ def cluster_node_count(
 ) -> None:
    """Count the number of nodes in the cluster.

-    \b
-    The state refers to the state of PostgreSQL. Possible values are:
-    * initializing new cluster, initdb failed
-    * running custom bootstrap script, custom bootstrap failed
-    * starting, start failed
-    * restarting, restart failed
-    * running, streaming (for a replica V3.0.4)
-    * stopping, stopped, stop failed
-    * creating replica
-    * crashed
-
    \b
    The role refers to the role of the server in the cluster. Possible values
    are:
-    * master or leader (V3.0.0+)
+    * master or leader
    * replica
+    * standby_leader
+    * sync_standby
    * demoted
    * promoted
    * uninitialized

+    \b
+    The state refers to the state of PostgreSQL. Possible values are:
+    * initializing new cluster, initdb failed
+    * running custom bootstrap script, custom bootstrap failed
+    * starting, start failed
+    * restarting, restart failed
+    * running, streaming, in archive recovery
+    * stopping, stopped, stop failed
+    * creating replica
+    * crashed
+
+    \b
+    The "healthy" checks only ensures that:
+    * a leader has the running state
+    * a standby_leader has the running or streaming (V3.0.4) state
+    * a replica or sync-standby has the running or streaming (V3.0.4) state
+
+    Since we dont check the lag or timeline, "in archive recovery" is not considered a valid state
+    for this service. See cluster_has_leader and cluster_has_replica for specialized checks.
+
    \b
    Check:
-    * Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
+    * Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
    * `OK`:  If they are not provided.

    \b
@ -285,17 +296,38 @@ def cluster_has_leader(ctx: click.Context) -> None:

    This check applies to any kind of leaders including standby leaders.

+    A leader is a node with the "leader" role and a "running" state.
+
+    A standby leader is a node with a "standby_leader" role and a "streaming"
+    or "in archive recovery" state. Please note that log shipping could be
+    stuck because the WAL are not available or applicable. Patroni doesn't
+    provide information about the origin cluster (timeline or lag), so we
+    cannot check if there is a problem in that particular case. That's why we
+    issue a warning when the node is "in archive recovery". We suggest using
+    other supervision tools to do this (eg. check_pgactivity).
+
    \b
    Check:
    * `OK`: if there is a leader node.
-    * `CRITICAL`: otherwise
+    * 'WARNING': if there is a stanby leader in archive mode.
+    * `CRITICAL`: otherwise.
+
+    \b
+    Perfdata:
+    * `has_leader` is 1 if there is any kind of leader node, 0 otherwise
+    * `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
+       archive recovery", 0 otherwise
+    * `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
+    * `is_leader` is 1 if there is a "classical" leader node, 0 otherwise

-    Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
    """
    check = nagiosplugin.Check()
    check.add(
        ClusterHasLeader(ctx.obj.connection_info),
        nagiosplugin.ScalarContext("has_leader", None, "@0:0"),
+        nagiosplugin.ScalarContext("is_standby_leader_in_arc_rec", "@1:1", None),
+        nagiosplugin.ScalarContext("is_leader", None, None),
+        nagiosplugin.ScalarContext("is_standby_leader", None, None),
        ClusterHasLeaderSummary(),
    )
    check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
@ -341,11 +373,29 @@ def cluster_has_replica(
 ) -> None:
    """Check if the cluster has healthy replicas and/or if some are sync standbies

+    \b
+    For patroni (and this check):
+    * a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
+    * a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
+
    \b
    A healthy replica:
-    * is in running or streaming state (V3.0.4)
-    * has a replica or sync_standby role
-    * has a lag lower or equal to max_lag
+    * has a `replica` or `sync_standby` role
+    * has the same timeline as the leader and
+      * is in `running` state (patroni < V3.0.4)
+      * is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
+    * has a lag lower or equal to `max_lag`
+
+    Please note that replica `in archive recovery` could be stuck because the WAL
+    are not available or applicable (the server's timeline has diverged for the
+    leader's). We already detect the latter but we will miss the former.
+    Therefore, it's preferable to check for the lag in addition to the healthy
+    state if you rely on log shipping to help lagging standbies to catch up.
+
+    Since we require a healthy replica to have the same timeline as the
+    leader, it's possible that we raise alerts when the cluster is performing a
+    switchover or failover and the standbies are in the process of catching up with
+    the new leader. The alert shouldn't last long.

    \b
    Check:
@ -357,8 +407,9 @@ def cluster_has_replica(
    Perfdata:
    * healthy_replica & unhealthy_replica count
    * the number of sync_replica, they are included in the previous count
-    * the lag of each replica labelled with  "member name"_lag
-    * a boolean to tell if the node is a sync stanbdy labelled with  "member name"_sync
+    * the lag of each replica labelled with "member name"_lag
+    * the timeline of each replica labelled with "member name"_timeline
+    * a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
    """

    tmax_lag = size_to_byte(max_lag) if max_lag is not None else None
@ -377,6 +428,7 @@ def cluster_has_replica(
        ),
        nagiosplugin.ScalarContext("unhealthy_replica"),
        nagiosplugin.ScalarContext("replica_lag"),
+        nagiosplugin.ScalarContext("replica_timeline"),
        nagiosplugin.ScalarContext("replica_sync"),
    )
    check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
@ -569,10 +621,20 @@ def node_is_leader(ctx: click.Context, check_standby_leader: bool) -> None:
 def node_is_replica(
    ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool
 ) -> None:
-    """Check if the node is a running replica with no noloadbalance tag.
+    """Check if the node is a replica with no noloadbalance tag.

-    It is possible to check if the node is synchronous or asynchronous. If nothing is specified any kind of replica is accepted.
-    When checking for a synchronous replica, it's not possible to specify a lag.
+    It is possible to check if the node is synchronous or asynchronous. If
+    nothing is specified any kind of replica is accepted.  When checking for a
+    synchronous replica, it's not possible to specify a lag.
+
+    This service is using the following Patroni endpoints: replica, asynchronous
+    and synchronous. The first two implement the `lag` tag. For these endpoints
+    the state of a replica node doesn't reflect the replication state
+    (`streaming` or `in archive recovery`), we only know if it's `running`. The
+    timeline is also not checked.
+
+    Therefore, if a cluster is using asynchronous replication, it is
+    recommended to check for the lag to detect a divegence as soon as possible.

    \b
    Check:
@ -610,7 +672,7 @@ def node_is_pending_restart(ctx: click.Context) -> None:
    """Check if the node is in pending restart state.

    This situation can arise if the configuration has been modified but
-    requiers a restart of PostgreSQL to take effect.
+    requires a restart of PostgreSQL to take effect.

    \b
    Check:
--- a/check_patroni/cluster.py
+++ b/check_patroni/cluster.py
@ -1,7 +1,7 @@
 import hashlib
 import json
 from collections import Counter
-from typing import Iterable, Union
+from typing import Any, Iterable, Union

 import nagiosplugin

@ -14,25 +14,52 @@ def replace_chars(text: str) -> str:


 class ClusterNodeCount(PatroniResource):
-    def probe(self: "ClusterNodeCount") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
+        def debug_member(member: Any, health: str) -> None:
+            _log.debug(
+                "Node %(node_name)s is %(health)s: role %(role)s state %(state)s.",
+                {
+                    "node_name": member["name"],
+                    "health": health,
+                    "role": member["role"],
+                    "state": member["state"],
+                },
+            )
+
+        # get the cluster info
        item_dict = self.rest_api("cluster")
+
        role_counters: Counter[str] = Counter()
        roles = []
        status_counters: Counter[str] = Counter()
        statuses = []
+        healthy_member = 0

        for member in item_dict["members"]:
-            roles.append(replace_chars(member["role"]))
-            statuses.append(replace_chars(member["state"]))
+            state, role = member["state"], member["role"]
+            roles.append(replace_chars(role))
+            statuses.append(replace_chars(state))
+
+            if role == "leader" and state == "running":
+                healthy_member += 1
+                debug_member(member, "healthy")
+                continue
+
+            if role in ["standby_leader", "replica", "sync_standby"] and (
+                (self.has_detailed_states() and state == "streaming")
+                or (not self.has_detailed_states() and state == "running")
+            ):
+                healthy_member += 1
+                debug_member(member, "healthy")
+                continue
+
+            debug_member(member, "unhealthy")
        role_counters.update(roles)
        status_counters.update(statuses)

        # The actual check: members, healthy_members
        yield nagiosplugin.Metric("members", len(item_dict["members"]))
-        yield nagiosplugin.Metric(
-            "healthy_members",
-            status_counters["running"] + status_counters.get("streaming", 0),
-        )
+        yield nagiosplugin.Metric("healthy_members", healthy_member)

        # The performance data : role
        for role in role_counters:
@ -48,74 +75,149 @@ class ClusterNodeCount(PatroniResource):


 class ClusterHasLeader(PatroniResource):
-    def probe(self: "ClusterHasLeader") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("cluster")

        is_leader_found = False
+        is_standby_leader_found = False
+        is_standby_leader_in_arc_rec = False
        for member in item_dict["members"]:
-            if (
-                member["role"] in ("leader", "standby_leader")
-                and member["state"] == "running"
-            ):
+            if member["role"] == "leader" and member["state"] == "running":
                is_leader_found = True
                break

+            if member["role"] == "standby_leader":
+                if member["state"] not in ["streaming", "in archive recovery"]:
+                    # for patroni >= 3.0.4 any state would be wrong
+                    # for patroni <  3.0.4 a state different from running would be wrong
+                    if self.has_detailed_states() or member["state"] != "running":
+                        continue
+
+                if member["state"] in ["in archive recovery"]:
+                    is_standby_leader_in_arc_rec = True
+
+                is_standby_leader_found = True
+                break
        return [
            nagiosplugin.Metric(
                "has_leader",
+                1 if is_leader_found or is_standby_leader_found else 0,
+            ),
+            nagiosplugin.Metric(
+                "is_standby_leader_in_arc_rec",
+                1 if is_standby_leader_in_arc_rec else 0,
+            ),
+            nagiosplugin.Metric(
+                "is_standby_leader",
+                1 if is_standby_leader_found else 0,
+            ),
+            nagiosplugin.Metric(
+                "is_leader",
                1 if is_leader_found else 0,
-            )
+            ),
        ]


 class ClusterHasLeaderSummary(nagiosplugin.Summary):
-    def ok(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return "The cluster has a running leader."

    @handle_unknown
-    def problem(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
-        return "The cluster has no running leader."
+    def problem(self, results: nagiosplugin.Result) -> str:
+        return "The cluster has no running leader or the standby leader is in archive recovery."


 class ClusterHasReplica(PatroniResource):
-    def __init__(
-        self: "ClusterHasReplica",
-        connection_info: ConnectionInfo,
-        max_lag: Union[int, None],
-    ):
+    def __init__(self, connection_info: ConnectionInfo, max_lag: Union[int, None]):
        super().__init__(connection_info)
        self.max_lag = max_lag

-    def probe(self: "ClusterHasReplica") -> Iterable[nagiosplugin.Metric]:
-        item_dict = self.rest_api("cluster")
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
+        def debug_member(member: Any, health: str) -> None:
+            _log.debug(
+                "Node %(node_name)s is %(health)s: lag %(lag)s, state %(state)s, tl %(tl)s.",
+                {
+                    "node_name": member["name"],
+                    "health": health,
+                    "lag": member["lag"],
+                    "state": member["state"],
+                    "tl": member["timeline"],
+                },
+            )
+
+        # get the cluster info
+        cluster_item_dict = self.rest_api("cluster")

        replicas = []
        healthy_replica = 0
        unhealthy_replica = 0
        sync_replica = 0
-        for member in item_dict["members"]:
-            # FIXME are there other acceptable states
+        leader_tl = None
+
+        # Look for replicas
+        for member in cluster_item_dict["members"]:
            if member["role"] in ["replica", "sync_standby"]:
-                # patroni 3.0.4 changed the standby state from running to streaming
-                if (
-                    member["state"] in ["running", "streaming"]
-                    and member["lag"] != "unknown"
-                ):
+                if member["lag"] == "unknown":
+                    # This could happen if the node is stopped
+                    # nagiosplugin doesn't handle strings in perfstats
+                    # so we have to ditch all the stats in that case
+                    debug_member(member, "unhealthy")
+                    unhealthy_replica += 1
+                    continue
+                else:
                    replicas.append(
                        {
                            "name": member["name"],
                            "lag": member["lag"],
+                            "timeline": member["timeline"],
                            "sync": 1 if member["role"] == "sync_standby" else 0,
                        }
                    )

-                    if member["role"] == "sync_standby":
-                        sync_replica += 1
+                # Get the leader tl if we haven't already
+                if leader_tl is None:
+                    # If there are no leaders, we will loop here for all
+                    # members because leader_tl will remain None. it's not
+                    # a big deal since having no leader is rare.
+                    for tmember in cluster_item_dict["members"]:
+                        if tmember["role"] == "leader":
+                            leader_tl = int(tmember["timeline"])
+                            break

-                    if self.max_lag is None or self.max_lag >= int(member["lag"]):
-                        healthy_replica += 1
-                        continue
-                unhealthy_replica += 1
+                    _log.debug(
+                        "Patroni's leader_timeline is %(leader_tl)s",
+                        {
+                            "leader_tl": leader_tl,
+                        },
+                    )
+
+                # Test for an unhealthy replica
+                if (
+                    self.has_detailed_states()
+                    and not (
+                        member["state"] in ["streaming", "in archive recovery"]
+                        and int(member["timeline"]) == leader_tl
+                    )
+                ) or (
+                    not self.has_detailed_states()
+                    and not (
+                        member["state"] == "running"
+                        and int(member["timeline"]) == leader_tl
+                    )
+                ):
+                    debug_member(member, "unhealthy")
+                    unhealthy_replica += 1
+                    continue
+
+                if member["role"] == "sync_standby":
+                    sync_replica += 1
+
+                if self.max_lag is None or self.max_lag >= int(member["lag"]):
+                    debug_member(member, "healthy")
+                    healthy_replica += 1
+                else:
+                    debug_member(member, "unhealthy")
+                    unhealthy_replica += 1

        # The actual check
        yield nagiosplugin.Metric("healthy_replica", healthy_replica)
@ -127,6 +229,11 @@ class ClusterHasReplica(PatroniResource):
            yield nagiosplugin.Metric(
                f"{replica['name']}_lag", replica["lag"], context="replica_lag"
            )
+            yield nagiosplugin.Metric(
+                f"{replica['name']}_timeline",
+                replica["timeline"],
+                context="replica_timeline",
+            )
            yield nagiosplugin.Metric(
                f"{replica['name']}_sync", replica["sync"], context="replica_sync"
            )
@ -140,7 +247,7 @@ class ClusterHasReplica(PatroniResource):

 class ClusterConfigHasChanged(PatroniResource):
    def __init__(
-        self: "ClusterConfigHasChanged",
+        self,
        connection_info: ConnectionInfo,
        config_hash: str,  # Always contains the old hash
        state_file: str,  # Only used to update the hash in the state_file (when needed)
@ -151,7 +258,7 @@ class ClusterConfigHasChanged(PatroniResource):
        self.config_hash = config_hash
        self.save = save

-    def probe(self: "ClusterConfigHasChanged") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("config")

        new_hash = hashlib.md5(json.dumps(item_dict).encode()).hexdigest()
@ -183,23 +290,21 @@ class ClusterConfigHasChanged(PatroniResource):


 class ClusterConfigHasChangedSummary(nagiosplugin.Summary):
-    def __init__(self: "ClusterConfigHasChangedSummary", config_hash: str) -> None:
+    def __init__(self, config_hash: str) -> None:
        self.old_config_hash = config_hash

    # Note: It would be helpful to display the old / new hash here. Unfortunately, it's not a metric.
    # So we only have the old / expected one.
-    def ok(self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return f"The hash of patroni's dynamic configuration has not changed ({self.old_config_hash})."

    @handle_unknown
-    def problem(
-        self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result
-    ) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return f"The hash of patroni's dynamic configuration has changed. The old hash was {self.old_config_hash}."


 class ClusterIsInMaintenance(PatroniResource):
-    def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("cluster")

        # The actual check
@ -212,7 +317,7 @@ class ClusterIsInMaintenance(PatroniResource):


 class ClusterHasScheduledAction(PatroniResource):
-    def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("cluster")

        scheduled_switchover = 0
--- a/check_patroni/node.py
+++ b/check_patroni/node.py
@ -7,7 +7,7 @@ from .types import APIError, ConnectionInfo, PatroniResource, handle_unknown


 class NodeIsPrimary(PatroniResource):
-    def probe(self: "NodeIsPrimary") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        try:
            self.rest_api("primary")
        except APIError:
@ -16,24 +16,22 @@ class NodeIsPrimary(PatroniResource):


 class NodeIsPrimarySummary(nagiosplugin.Summary):
-    def ok(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return "This node is the primary with the leader lock."

    @handle_unknown
-    def problem(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return "This node is not the primary with the leader lock."


 class NodeIsLeader(PatroniResource):
    def __init__(
-        self: "NodeIsLeader",
-        connection_info: ConnectionInfo,
-        check_is_standby_leader: bool,
+        self, connection_info: ConnectionInfo, check_is_standby_leader: bool
    ) -> None:
        super().__init__(connection_info)
        self.check_is_standby_leader = check_is_standby_leader

-    def probe(self: "NodeIsLeader") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        apiname = "leader"
        if self.check_is_standby_leader:
            apiname = "standby-leader"
@ -46,26 +44,23 @@ class NodeIsLeader(PatroniResource):


 class NodeIsLeaderSummary(nagiosplugin.Summary):
-    def __init__(
-        self: "NodeIsLeaderSummary",
-        check_is_standby_leader: bool,
-    ) -> None:
+    def __init__(self, check_is_standby_leader: bool) -> None:
        if check_is_standby_leader:
            self.leader_kind = "standby leader"
        else:
            self.leader_kind = "leader"

-    def ok(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return f"This node is a {self.leader_kind} node."

    @handle_unknown
-    def problem(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return f"This node is not a {self.leader_kind} node."


 class NodeIsReplica(PatroniResource):
    def __init__(
-        self: "NodeIsReplica",
+        self,
        connection_info: ConnectionInfo,
        max_lag: str,
        check_is_sync: bool,
@ -76,7 +71,7 @@ class NodeIsReplica(PatroniResource):
        self.check_is_sync = check_is_sync
        self.check_is_async = check_is_async

-    def probe(self: "NodeIsReplica") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        try:
            if self.check_is_sync:
                api_name = "synchronous"
@ -95,12 +90,7 @@ class NodeIsReplica(PatroniResource):


 class NodeIsReplicaSummary(nagiosplugin.Summary):
-    def __init__(
-        self: "NodeIsReplicaSummary",
-        lag: str,
-        check_is_sync: bool,
-        check_is_async: bool,
-    ) -> None:
+    def __init__(self, lag: str, check_is_sync: bool, check_is_async: bool) -> None:
        self.lag = lag
        if check_is_sync:
            self.replica_kind = "synchronous replica"
@ -109,7 +99,7 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
        else:
            self.replica_kind = "replica"

-    def ok(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        if self.lag is None:
            return (
                f"This node is a running {self.replica_kind} with no noloadbalance tag."
@ -117,14 +107,14 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
        return f"This node is a running {self.replica_kind} with no noloadbalance tag and the lag is under {self.lag}."

    @handle_unknown
-    def problem(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        if self.lag is None:
            return f"This node is not a running {self.replica_kind} with no noloadbalance tag."
        return f"This node is not a running {self.replica_kind} with no noloadbalance tag and a lag under {self.lag}."


 class NodeIsPendingRestart(PatroniResource):
-    def probe(self: "NodeIsPendingRestart") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("patroni")

        is_pending_restart = item_dict.get("pending_restart", False)
@ -137,19 +127,17 @@ class NodeIsPendingRestart(PatroniResource):


 class NodeIsPendingRestartSummary(nagiosplugin.Summary):
-    def ok(self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return "This node doesn't have the pending restart flag."

    @handle_unknown
-    def problem(
-        self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result
-    ) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return "This node has the pending restart flag."


 class NodeTLHasChanged(PatroniResource):
    def __init__(
-        self: "NodeTLHasChanged",
+        self,
        connection_info: ConnectionInfo,
        timeline: str,  # Always contains the old timeline
        state_file: str,  # Only used to update the timeline in the state_file (when needed)
@ -160,7 +148,7 @@ class NodeTLHasChanged(PatroniResource):
        self.timeline = timeline
        self.save = save

-    def probe(self: "NodeTLHasChanged") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("patroni")
        new_tl = item_dict["timeline"]

@ -193,27 +181,23 @@ class NodeTLHasChanged(PatroniResource):


 class NodeTLHasChangedSummary(nagiosplugin.Summary):
-    def __init__(self: "NodeTLHasChangedSummary", timeline: str) -> None:
+    def __init__(self, timeline: str) -> None:
        self.timeline = timeline

-    def ok(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return f"The timeline is still {self.timeline}."

    @handle_unknown
-    def problem(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return f"The expected timeline was {self.timeline} got {results['timeline'].metric}."


 class NodePatroniVersion(PatroniResource):
-    def __init__(
-        self: "NodePatroniVersion",
-        connection_info: ConnectionInfo,
-        patroni_version: str,
-    ) -> None:
+    def __init__(self, connection_info: ConnectionInfo, patroni_version: str) -> None:
        super().__init__(connection_info)
        self.patroni_version = patroni_version

-    def probe(self: "NodePatroniVersion") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        item_dict = self.rest_api("patroni")

        version = item_dict["patroni"]["version"]
@ -232,21 +216,21 @@ class NodePatroniVersion(PatroniResource):


 class NodePatroniVersionSummary(nagiosplugin.Summary):
-    def __init__(self: "NodePatroniVersionSummary", patroni_version: str) -> None:
+    def __init__(self, patroni_version: str) -> None:
        self.patroni_version = patroni_version

-    def ok(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return f"Patroni's version is {self.patroni_version}."

    @handle_unknown
-    def problem(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        # FIXME find a way to make the following work, check is perf data can be strings
        # return f"The expected patroni version was {self.patroni_version} got {results['patroni_version'].metric}."
        return f"Patroni's version is not {self.patroni_version}."


 class NodeIsAlive(PatroniResource):
-    def probe(self: "NodeIsAlive") -> Iterable[nagiosplugin.Metric]:
+    def probe(self) -> Iterable[nagiosplugin.Metric]:
        try:
            self.rest_api("liveness")
        except APIError:
@ -255,9 +239,9 @@ class NodeIsAlive(PatroniResource):


 class NodeIsAliveSummary(nagiosplugin.Summary):
-    def ok(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
+    def ok(self, results: nagiosplugin.Result) -> str:
        return "This node is alive (patroni is running)."

    @handle_unknown
-    def problem(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
+    def problem(self, results: nagiosplugin.Result) -> str:
        return "This node is not alive (patroni is not running)."
--- a/check_patroni/types.py
+++ b/check_patroni/types.py
@ -1,3 +1,5 @@
+import json
+from functools import lru_cache
 from typing import Any, Callable, List, Optional, Tuple, Union
 from urllib.parse import urlparse

@ -28,11 +30,11 @@ class Parameters:
    verbose: int


-@attr.s(auto_attribs=True, slots=True)
+@attr.s(auto_attribs=True, eq=False, slots=True)
 class PatroniResource(nagiosplugin.Resource):
    conn_info: ConnectionInfo

-    def rest_api(self: "PatroniResource", service: str) -> Any:
+    def rest_api(self, service: str) -> Any:
        """Try to connect to all the provided endpoints for the requested service"""
        for endpoint in self.conn_info.endpoints:
            cert: Optional[Union[Tuple[str, str], str]] = None
@ -71,10 +73,31 @@ class PatroniResource(nagiosplugin.Resource):

            try:
                return r.json()
-            except requests.exceptions.JSONDecodeError:
+            except (json.JSONDecodeError, ValueError):
                return None
        raise nagiosplugin.CheckError("Connection failed for all provided endpoints")

+    @lru_cache(maxsize=None)
+    def has_detailed_states(self) -> bool:
+        # get patroni's version to find out if the "streaming" and "in archive recovery" states are available
+        patroni_item_dict = self.rest_api("patroni")
+
+        if tuple(
+            int(v) for v in patroni_item_dict["patroni"]["version"].split(".", 2)
+        ) >= (3, 0, 4):
+            _log.debug(
+                "Patroni's version is %(version)s, more detailed states can be used to check for the health of replicas.",
+                {"version": patroni_item_dict["patroni"]["version"]},
+            )
+
+            return True
+
+        _log.debug(
+            "Patroni's version is %(version)s, the running state and the timelines must be used to check for the health of replicas.",
+            {"version": patroni_item_dict["patroni"]["version"]},
+        )
+        return False
+

 HandleUnknown = Callable[[nagiosplugin.Summary, nagiosplugin.Results], Any]

--- a/docs/make_readme.sh
+++ b/docs/make_readme.sh
@ -42,7 +42,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git

 check_patroni works on python 3.6, we keep it that way because patroni also
 supports it and there are still lots of RH 7 variants around. That being said
-python 3.6 has been EOL for age and there is no support for it in the github
+python 3.6 has been EOL for ages and there is no support for it in the github
 CI.

 ## Support
@ -80,8 +80,8 @@ A match is found when: `start <= VALUE <= end`.

 For example, the following command will raise:

-* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
-* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
+* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
+* a critical if there are no nodes, which can be translated to outside of range [1;+INF[

 ```
 check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
@ -97,6 +97,30 @@ Several options are available:
  * `--cert_file`: your certificate or the concatenation of your certificate and private key
  * `--key_file`: your private key (optional)

+## Shell completion
+
+We use the [click] library which supports shell completion natively.
+
+Shell completion can be added by typing the following command or adding it to
+a file spécific to your shell of choice.
+
+* for Bash (add to `~/.bashrc`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
+  ```
+* for Zsh  (add to `~/.zshrc`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
+  ```
+* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
+  ```
+  eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
+  ```
+
+Please note that shell completion is not supported far all shell versions, for
+example only Bash versions older than 4.4 are supported.
+
+[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
 _EOF_
 readme
 readme "## Cluster services"
--- a/mypy.ini
+++ b/mypy.ini
@ -1,4 +1,5 @@
 [mypy]
+files = .
 show_error_codes = true
 strict = true
 exclude = build/
--- a/requirements-dev.txt
+++ b/requirements-dev.txt
@ -4,7 +4,7 @@ isort
 flake8
 mypy==0.961
 pytest
-pytest-mock
+pytest-cov
 types-requests
 setuptools
 tox
--- a/setup.py
+++ b/setup.py
@ -41,12 +41,12 @@ setup(
        "attrs >= 17, !=21.1",
        "requests",
        "nagiosplugin >= 1.3.2",
-        "click >= 8.0.1",
+        "click >= 7.1",
    ],
    extras_require={
        "test": [
-            "pytest",
-            "pytest-mock",
+            "importlib_metadata; python_version < '3.8'",
+            "pytest >= 6.0.2",
        ],
    },
    entry_points={
@ -56,4 +56,3 @@ setup(
    },
    zip_safe=False,
 )
-
--- a/tests/init.py
+++ b/tests/init.py
@ -0,0 +1,65 @@
+import json
+import logging
+import shutil
+from contextlib import contextmanager
+from functools import partial
+from http.server import HTTPServer, SimpleHTTPRequestHandler
+from pathlib import Path
+from typing import Any, Iterator, Mapping, Union
+
+logger = logging.getLogger(__name__)
+
+
+class PatroniAPI(HTTPServer):
+    def __init__(self, directory: Path, *, datadir: Path) -> None:
+        self.directory = directory
+        self.datadir = datadir
+        handler_cls = partial(SimpleHTTPRequestHandler, directory=str(directory))
+        super().__init__(("", 0), handler_cls)
+
+    def serve_forever(self, *args: Any) -> None:
+        logger.info(
+            "starting fake Patroni API at %s (directory=%s)",
+            self.endpoint,
+            self.directory,
+        )
+        return super().serve_forever(*args)
+
+    @property
+    def endpoint(self) -> str:
+        return f"http://{self.server_name}:{self.server_port}"
+
+    @contextmanager
+    def routes(self, mapping: Mapping[str, Union[Path, str]]) -> Iterator[None]:
+        """Temporarily install specified files in served directory, thus
+        building "routes" from given mapping.
+
+        The 'mapping' defines target route paths as keys and files to be
+        installed in served directory as values. Mapping values of type 'str'
+        are assumed be relative file path to the 'datadir'.
+        """
+        for route_path, fpath in mapping.items():
+            if isinstance(fpath, str):
+                fpath = self.datadir / fpath
+            shutil.copy(fpath, self.directory / route_path)
+        try:
+            yield None
+        finally:
+            for fname in mapping:
+                (self.directory / fname).unlink()
+
+
+def cluster_api_set_replica_running(in_json: Path, target_dir: Path) -> Path:
+    # starting from 3.0.4 the state of replicas is streaming or in archive recovery
+    # instead of running
+    with in_json.open() as f:
+        js = json.load(f)
+    for node in js["members"]:
+        if node["role"] in ["replica", "sync_standby", "standby_leader"]:
+            if node["state"] in ["streaming", "in archive recovery"]:
+                node["state"] = "running"
+    assert target_dir.is_dir()
+    out_json = target_dir / in_json.name
+    with out_json.open("w") as f:
+        json.dump(js, f)
+    return out_json
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -1,12 +1,76 @@
-def pytest_addoption(parser):
-    """
-    Add CLI options to `pytest` to pass those options to the test cases.
-    These options are used in `pytest_generate_tests`.
-    """
-    parser.addoption("--use-old-replica-state", action="store_true", default=False)
+import logging
+import sys
+from pathlib import Path
+from threading import Thread
+from typing import Any, Iterator, Tuple
+from unittest.mock import patch
+
+if sys.version_info >= (3, 8):
+    from importlib.metadata import version as metadata_version
+else:
+    from importlib_metadata import version as metadata_version
+
+import pytest
+from click.testing import CliRunner
+
+from . import PatroniAPI
+
+logger = logging.getLogger(__name__)


-def pytest_generate_tests(metafunc):
-    metafunc.parametrize(
-        "use_old_replica_state", [metafunc.config.getoption("use_old_replica_state")]
-    )
+def numversion(pkgname: str) -> Tuple[int, ...]:
+    version = metadata_version(pkgname)
+    return tuple(int(v) for v in version.split(".", 3))
+
+
+if numversion("pytest") >= (6, 2):
+    TempPathFactory = pytest.TempPathFactory
+else:
+    from _pytest.tmpdir import TempPathFactory
+
+
+@pytest.fixture(scope="session", autouse=True)
+def nagioplugin_runtime_stdout() -> Iterator[None]:
+    # work around https://github.com/mpounsett/nagiosplugin/issues/24 when
+    # nagiosplugin is older than 1.3.3
+    if numversion("nagiosplugin") < (1, 3, 3):
+        target = "nagiosplugin.runtime.Runtime.stdout"
+        with patch(target, None):
+            logger.warning("patching %r", target)
+            yield None
+    else:
+        yield None
+
+
+@pytest.fixture(
+    params=[False, True],
+    ids=lambda v: "new-replica-state" if v else "old-replica-state",
+)
+def old_replica_state(request: Any) -> Any:
+    return request.param
+
+
+@pytest.fixture(scope="session")
+def datadir() -> Path:
+    return Path(__file__).parent / "json"
+
+
+@pytest.fixture(scope="session")
+def patroni_api(
+    tmp_path_factory: TempPathFactory, datadir: Path
+) -> Iterator[PatroniAPI]:
+    """A fake HTTP server for the Patroni API serving files from a temporary
+    directory.
+    """
+    httpd = PatroniAPI(tmp_path_factory.mktemp("api"), datadir=datadir)
+    t = Thread(target=httpd.serve_forever)
+    t.start()
+    yield httpd
+    httpd.shutdown()
+    t.join()
+
+
+@pytest.fixture
+def runner() -> CliRunner:
+    """A CliRunner with stdout and stderr not mixed."""
+    return CliRunner(mix_stderr=False)
--- a/tests/json/cluster_has_leader_ko_standby_leader.json
+++ b/tests/json/cluster_has_leader_ko_standby_leader.json
@ -0,0 +1,33 @@
+{
+  "members": [
+    {
+      "name": "srv1",
+      "role": "standby_leader",
+      "state": "stopped",
+      "api_url": "https://10.20.199.3:8008/patroni",
+      "host": "10.20.199.3",
+      "port": 5432,
+      "timeline": 51
+    },
+    {
+      "name": "srv2",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.4:8008/patroni",
+      "host": "10.20.199.4",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    },
+    {
+      "name": "srv3",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.5:8008/patroni",
+      "host": "10.20.199.5",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    }
+  ]
+}
--- a/tests/json/cluster_has_leader_ko_standby_leader_archiving.json
+++ b/tests/json/cluster_has_leader_ko_standby_leader_archiving.json
@ -0,0 +1,33 @@
+{
+  "members": [
+    {
+      "name": "srv1",
+      "role": "standby_leader",
+      "state": "in archive recovery",
+      "api_url": "https://10.20.199.3:8008/patroni",
+      "host": "10.20.199.3",
+      "port": 5432,
+      "timeline": 51
+    },
+    {
+      "name": "srv2",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.4:8008/patroni",
+      "host": "10.20.199.4",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    },
+    {
+      "name": "srv3",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.5:8008/patroni",
+      "host": "10.20.199.5",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    }
+  ]
+}
--- a/tests/json/cluster_has_leader_ok_standby_leader.json
+++ b/tests/json/cluster_has_leader_ok_standby_leader.json
@ -3,7 +3,7 @@
    {
      "name": "srv1",
      "role": "standby_leader",
-      "state": "running",
+      "state": "streaming",
      "api_url": "https://10.20.199.3:8008/patroni",
      "host": "10.20.199.3",
      "port": 5432,
--- a/tests/json/cluster_has_replica_ko_all_replica.json
+++ b/tests/json/cluster_has_replica_ko_all_replica.json
@ -0,0 +1,35 @@
+{
+  "members": [
+    {
+      "name": "srv1",
+      "role": "replica",
+      "state": "running",
+      "api_url": "https://10.20.199.3:8008/patroni",
+      "host": "10.20.199.3",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    },
+    {
+      "name": "srv2",
+      "role": "replica",
+      "state": "running",
+      "api_url": "https://10.20.199.4:8008/patroni",
+      "host": "10.20.199.4",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    },
+    {
+      "name": "srv3",
+      "role": "replica",
+      "state": "running",
+      "api_url": "https://10.20.199.5:8008/patroni",
+      "host": "10.20.199.5",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+
+    }
+  ]
+}
--- a/tests/json/cluster_has_replica_ko_wrong_tl.json
+++ b/tests/json/cluster_has_replica_ko_wrong_tl.json
@ -0,0 +1,33 @@
+{
+  "members": [
+    {
+      "name": "srv1",
+      "role": "leader",
+      "state": "running",
+      "api_url": "https://10.20.199.3:8008/patroni",
+      "host": "10.20.199.3",
+      "port": 5432,
+      "timeline": 51
+    },
+    {
+      "name": "srv2",
+      "role": "replica",
+      "state": "running",
+      "api_url": "https://10.20.199.4:8008/patroni",
+      "host": "10.20.199.4",
+      "port": 5432,
+      "timeline": 50,
+      "lag": 1000000
+    },
+    {
+      "name": "srv3",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.5:8008/patroni",
+      "host": "10.20.199.5",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    }
+  ]
+}
--- a/tests/json/cluster_has_replica_ok.json
+++ b/tests/json/cluster_has_replica_ok.json
@ -12,7 +12,7 @@
    {
      "name": "srv2",
      "role": "replica",
-      "state": "streaming",
+      "state": "in archive recovery",
      "api_url": "https://10.20.199.4:8008/patroni",
      "host": "10.20.199.4",
      "port": 5432,
--- a/tests/json/cluster_has_replica_patroni_verion_3.0.0.json
+++ b/tests/json/cluster_has_replica_patroni_verion_3.0.0.json
@ -0,0 +1,26 @@
+{
+  "state": "running",
+  "postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
+  "role": "master",
+  "server_version": 110012,
+  "cluster_unlocked": false,
+  "xlog": {
+    "location": 1174407088
+  },
+  "timeline": 51,
+  "replication": [
+    {
+      "usename": "replicator",
+      "application_name": "srv1",
+      "client_addr": "10.20.199.3",
+      "state": "streaming",
+      "sync_state": "async",
+      "sync_priority": 0
+    }
+  ],
+  "database_system_identifier": "6965971025273547206",
+  "patroni": {
+    "version": "3.0.0",
+    "scope": "patroni-demo"
+  }
+}
--- a/tests/json/cluster_has_replica_patroni_verion_3.1.0.json
+++ b/tests/json/cluster_has_replica_patroni_verion_3.1.0.json
@ -0,0 +1,26 @@
+{
+  "state": "running",
+  "postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
+  "role": "master",
+  "server_version": 110012,
+  "cluster_unlocked": false,
+  "xlog": {
+    "location": 1174407088
+  },
+  "timeline": 51,
+  "replication": [
+    {
+      "usename": "replicator",
+      "application_name": "srv1",
+      "client_addr": "10.20.199.3",
+      "state": "streaming",
+      "sync_state": "async",
+      "sync_priority": 0
+    }
+  ],
+  "database_system_identifier": "6965971025273547206",
+  "patroni": {
+    "version": "3.1.0",
+    "scope": "patroni-demo"
+  }
+}
--- a/tests/json/cluster_node_count_ko_in_archive_recovery.json
+++ b/tests/json/cluster_node_count_ko_in_archive_recovery.json
@ -0,0 +1,33 @@
+{
+  "members": [
+    {
+      "name": "srv1",
+      "role": "standby_leader",
+      "state": "in archive recovery",
+      "api_url": "https://10.20.199.3:8008/patroni",
+      "host": "10.20.199.3",
+      "port": 5432,
+      "timeline": 51
+    },
+    {
+      "name": "srv2",
+      "role": "replica",
+      "state": "in archive recovery",
+      "api_url": "https://10.20.199.4:8008/patroni",
+      "host": "10.20.199.4",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    },
+    {
+      "name": "srv3",
+      "role": "replica",
+      "state": "streaming",
+      "api_url": "https://10.20.199.5:8008/patroni",
+      "host": "10.20.199.5",
+      "port": 5432,
+      "timeline": 51,
+      "lag": 0
+    }
+  ]
+}
--- a/tests/test_api.py
+++ b/tests/test_api.py
@ -1,30 +1,20 @@
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_api_status_code_200(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_pending_restart_ok", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
-    )
+def test_api_status_code_200(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
+        )
    assert result.exit_code == 0


-def test_api_status_code_404(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "Fake test", 404)
+def test_api_status_code_404(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
+        main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
    )
    assert result.exit_code == 3
--- a/tests/test_cluster_config_has_changed.py
+++ b/tests/test_cluster_config_has_changed.py
@ -1,23 +1,29 @@
+from pathlib import Path
+from typing import Iterator
+
 import nagiosplugin
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import here, my_mock
+from . import PatroniAPI
+
+
+@pytest.fixture(scope="module", autouse=True)
+def cluster_config_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
+    with patroni_api.routes({"config": "cluster_config_has_changed.json"}):
+        yield None


 def test_cluster_config_has_changed_ok_with_hash(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_config_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--hash",
            "96b12d82571473d13e890b893734e731",
@ -31,22 +37,20 @@ def test_cluster_config_has_changed_ok_with_hash(


 def test_cluster_config_has_changed_ok_with_state_file(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
-    runner = CliRunner()
-
-    with open(here / "cluster_config_has_changed.state_file", "w") as f:
+    state_file = tmp_path / "cluster_config_has_changed.state_file"
+    with state_file.open("w") as f:
        f.write('{"hash": "96b12d82571473d13e890b893734e731"}')

-    my_mock(mocker, "cluster_config_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--state-file",
-            str(here / "cluster_config_has_changed.state_file"),
+            str(state_file),
        ],
    )
    assert result.exit_code == 0
@ -57,16 +61,13 @@ def test_cluster_config_has_changed_ok_with_state_file(


 def test_cluster_config_has_changed_ko_with_hash(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_config_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--hash",
            "96b12d82571473d13e890b8937ffffff",
@ -80,24 +81,21 @@ def test_cluster_config_has_changed_ko_with_hash(


 def test_cluster_config_has_changed_ko_with_state_file_and_save(
-    mocker: MockerFixture,
-    use_old_replica_state: bool,
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
-    runner = CliRunner()
-
-    with open(here / "cluster_config_has_changed.state_file", "w") as f:
+    state_file = tmp_path / "cluster_config_has_changed.state_file"
+    with state_file.open("w") as f:
        f.write('{"hash": "96b12d82571473d13e890b8937ffffff"}')

-    my_mock(mocker, "cluster_config_has_changed", 200)
    # test without saving the new hash
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--state-file",
-            str(here / "cluster_config_has_changed.state_file"),
+            str(state_file),
        ],
    )
    assert result.exit_code == 2
@ -106,7 +104,8 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
        == "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
    )

-    cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
+    state_file = tmp_path / "cluster_config_has_changed.state_file"
+    cookie = nagiosplugin.Cookie(state_file)
    cookie.open()
    new_config_hash = cookie.get("hash")
    cookie.close()
@ -118,10 +117,10 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--state-file",
-            str(here / "cluster_config_has_changed.state_file"),
+            str(state_file),
            "--save",
        ],
    )
@ -131,7 +130,7 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
        == "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
    )

-    cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
+    cookie = nagiosplugin.Cookie(state_file)
    cookie.open()
    new_config_hash = cookie.get("hash")
    cookie.close()
@ -140,22 +139,20 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(


 def test_cluster_config_has_changed_params(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
    # This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_config_has_changed", 200)
+    fake_state_file = tmp_path / "fake_file_name.state_file"
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_config_has_changed",
            "--hash",
            "640df9f0211c791723f18fc3ed9dbb95",
            "--state-file",
-            str(here / "fake_file_name.state_file"),
+            str(fake_state_file),
        ],
    )
    assert result.exit_code == 3
--- a/tests/test_cluster_has_leader.py
+++ b/tests/test_cluster_has_leader.py
@ -1,54 +1,139 @@
+from pathlib import Path
+from typing import Iterator, Union
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI, cluster_api_set_replica_running


-def test_cluster_has_leader_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
+@pytest.fixture
+def cluster_has_leader_ok(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_leader_ok.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None

-    my_mock(mocker, "cluster_has_leader_ok", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
-    )
-    assert result.exit_code == 0
+
+@pytest.mark.usefixtures("cluster_has_leader_ok")
+def test_cluster_has_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
    assert (
        result.stdout
-        == "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
+        == "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=1 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
    )
+    assert result.exit_code == 0


+@pytest.fixture
+def cluster_has_leader_ok_standby_leader(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_leader_ok_standby_leader.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_leader_ok_standby_leader")
 def test_cluster_has_leader_ok_standby_leader(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_leader_ok_standby_leader", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
-    )
-    assert result.exit_code == 0
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
    assert (
        result.stdout
-        == "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
+        == "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
    )
+    assert result.exit_code == 0


-def test_cluster_has_leader_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
+@pytest.fixture
+def cluster_has_leader_ko(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_leader_ko.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None

-    my_mock(mocker, "cluster_has_leader_ko", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
+
+@pytest.mark.usefixtures("cluster_has_leader_ko")
+def test_cluster_has_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
+    assert (
+        result.stdout
+        == "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
    )
    assert result.exit_code == 2
+
+
+@pytest.fixture
+def cluster_has_leader_ko_standby_leader(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_leader_ko_standby_leader.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader")
+def test_cluster_has_leader_ko_standby_leader(
+    runner: CliRunner, patroni_api: PatroniAPI
+) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
    assert (
        result.stdout
-        == "CLUSTERHASLEADER CRITICAL - The cluster has no running leader. | has_leader=0;;@0\n"
+        == "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
    )
+    assert result.exit_code == 2
+
+
+@pytest.fixture
+def cluster_has_leader_ko_standby_leader_archiving(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = (
+        "cluster_has_leader_ko_standby_leader_archiving.json"
+    )
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader_archiving")
+def test_cluster_has_leader_ko_standby_leader_archiving(
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
+) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
+    if old_replica_state:
+        assert (
+            result.stdout
+            == "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
+        )
+        assert result.exit_code == 0
+    else:
+        assert (
+            result.stdout
+            == "CLUSTERHASLEADER WARNING - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=1;@1:1\n"
+        )
+        assert result.exit_code == 1
--- a/tests/test_cluster_has_replica.py
+++ b/tests/test_cluster_has_replica.py
@ -1,39 +1,46 @@
+from pathlib import Path
+from typing import Iterator, Union
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI, cluster_api_set_replica_running


-# TODO Lag threshold tests
-def test_cluster_has_relica_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
+@pytest.fixture
+def cluster_has_replica_ok(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ok.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None

-    my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_replica"]
-    )
-    assert result.exit_code == 0
+
+@pytest.mark.usefixtures("cluster_has_replica_ok")
+def test_cluster_has_relica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_replica"])
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
+        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
    )
+    assert result.exit_code == 0


+@pytest.mark.usefixtures("cluster_has_replica_ok")
 def test_cluster_has_replica_ok_with_count_thresholds(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--warning",
            "@1",
@ -41,48 +48,56 @@ def test_cluster_has_replica_ok_with_count_thresholds(
            "@0",
        ],
    )
-    assert result.exit_code == 0
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
+        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
    )
+    assert result.exit_code == 0


+@pytest.mark.usefixtures("cluster_has_replica_ok")
 def test_cluster_has_replica_ok_with_sync_count_thresholds(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--sync-warning",
            "1:",
        ],
    )
-    assert result.exit_code == 0
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1;1: unhealthy_replica=0\n"
+        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1;1: unhealthy_replica=0\n"
    )
+    assert result.exit_code == 0


+@pytest.fixture
+def cluster_has_replica_ok_lag(
+    patroni_api: PatroniAPI, datadir: Path, tmp_path: Path, old_replica_state: bool
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ok_lag.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_replica_ok_lag")
 def test_cluster_has_replica_ok_with_count_thresholds_lag(
-    mocker: MockerFixture,
-    use_old_replica_state: bool,
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ok_lag", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--warning",
            "@1",
@ -92,24 +107,35 @@ def test_cluster_has_replica_ok_with_count_thresholds_lag(
            "1MB",
        ],
    )
-    assert result.exit_code == 0
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=0\n"
+        == "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=0\n"
    )
+    assert result.exit_code == 0


+@pytest.fixture
+def cluster_has_replica_ko(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ko.json"
+    patroni_path: Union[str, Path] = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_replica_ko")
 def test_cluster_has_replica_ko_with_count_thresholds(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--warning",
            "@1",
@ -117,24 +143,22 @@ def test_cluster_has_replica_ko_with_count_thresholds(
            "@0",
        ],
    )
-    assert result.exit_code == 1
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=1\n"
+        == "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
    )
+    assert result.exit_code == 1


+@pytest.mark.usefixtures("cluster_has_replica_ko")
 def test_cluster_has_replica_ko_with_sync_count_thresholds(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--sync-warning",
            "2:",
@ -142,25 +166,36 @@ def test_cluster_has_replica_ko_with_sync_count_thresholds(
            "1:",
        ],
    )
-    assert result.exit_code == 2
+    # The lag on srv2 is "unknown". We don't handle string in perfstats so we have to scratch all the second node stats
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 sync_replica=0;2:;1: unhealthy_replica=1\n"
+        == "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0;2:;1: unhealthy_replica=1\n"
    )
+    assert result.exit_code == 2


+@pytest.fixture
+def cluster_has_replica_ko_lag(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ko_lag.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_replica_ko_lag")
 def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
-    mocker: MockerFixture,
-    use_old_replica_state: bool,
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_replica_ko_lag", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_has_replica",
            "--warning",
            "@1",
@ -170,8 +205,84 @@ def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
            "1MB",
        ],
    )
-    assert result.exit_code == 2
    assert (
        result.stdout
-        == "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv3_lag=20000000 srv3_sync=0 sync_replica=0 unhealthy_replica=2\n"
+        == "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv2_timeline=51 srv3_lag=20000000 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=2\n"
    )
+    assert result.exit_code == 2
+
+
+@pytest.fixture
+def cluster_has_replica_ko_wrong_tl(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ko_wrong_tl.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_replica_ko_wrong_tl")
+def test_cluster_has_replica_ko_wrong_tl(
+    runner: CliRunner, patroni_api: PatroniAPI
+) -> None:
+    result = runner.invoke(
+        main,
+        [
+            "-e",
+            patroni_api.endpoint,
+            "cluster_has_replica",
+            "--warning",
+            "@1",
+            "--critical",
+            "@0",
+            "--max-lag",
+            "1MB",
+        ],
+    )
+    assert (
+        result.stdout
+        == "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv2_lag=1000000 srv2_sync=0 srv2_timeline=50 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
+    )
+    assert result.exit_code == 1
+
+
+@pytest.fixture
+def cluster_has_replica_ko_all_replica(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_has_replica_ko_all_replica.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_has_replica_ko_all_replica")
+def test_cluster_has_replica_ko_all_replica(
+    runner: CliRunner, patroni_api: PatroniAPI
+) -> None:
+    result = runner.invoke(
+        main,
+        [
+            "-e",
+            patroni_api.endpoint,
+            "cluster_has_replica",
+            "--warning",
+            "@1",
+            "--critical",
+            "@0",
+            "--max-lag",
+            "1MB",
+        ],
+    )
+    assert (
+        result.stdout
+        == "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv1_lag=0 srv1_sync=0 srv1_timeline=51 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=3\n"
+    )
+    assert result.exit_code == 2
--- a/tests/test_cluster_has_scheduled_action.py
+++ b/tests/test_cluster_has_scheduled_action.py
@ -1,20 +1,17 @@
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


 def test_cluster_has_scheduled_action_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_scheduled_action_ok", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
-    )
+    with patroni_api.routes({"cluster": "cluster_has_scheduled_action_ok.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
+        )
    assert result.exit_code == 0
    assert (
        result.stdout
@ -23,14 +20,14 @@ def test_cluster_has_scheduled_action_ok(


 def test_cluster_has_scheduled_action_ko_switchover(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_scheduled_action_ko_switchover", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
-    )
+    with patroni_api.routes(
+        {"cluster": "cluster_has_scheduled_action_ko_switchover.json"}
+    ):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
+        )
    assert result.exit_code == 2
    assert (
        result.stdout
@ -39,14 +36,14 @@ def test_cluster_has_scheduled_action_ko_switchover(


 def test_cluster_has_scheduled_action_ko_restart(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_has_scheduled_action_ko_restart", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
-    )
+    with patroni_api.routes(
+        {"cluster": "cluster_has_scheduled_action_ko_restart.json"}
+    ):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
+        )
    assert result.exit_code == 2
    assert (
        result.stdout
--- a/tests/test_cluster_is_in_maintenance.py
+++ b/tests/test_cluster_is_in_maintenance.py
@ -1,20 +1,17 @@
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


 def test_cluster_is_in_maintenance_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_is_in_maintenance_ok", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
-    )
+    with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ok.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
+        )
    assert result.exit_code == 0
    assert (
        result.stdout
@ -23,14 +20,12 @@ def test_cluster_is_in_maintenance_ok(


 def test_cluster_is_in_maintenance_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_is_in_maintenance_ko", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
-    )
+    with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ko.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
+        )
    assert result.exit_code == 2
    assert (
        result.stdout
@ -39,14 +34,14 @@ def test_cluster_is_in_maintenance_ko(


 def test_cluster_is_in_maintenance_ok_pause_false(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_is_in_maintenance_ok_pause_false", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
-    )
+    with patroni_api.routes(
+        {"cluster": "cluster_is_in_maintenance_ok_pause_false.json"}
+    ):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
+        )
    assert result.exit_code == 0
    assert (
        result.stdout
--- a/tests/test_cluster_node_count.py
+++ b/tests/test_cluster_node_count.py
@ -1,22 +1,33 @@
+from pathlib import Path
+from typing import Iterator, Union
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI, cluster_api_set_replica_running


+@pytest.fixture
+def cluster_node_count_ok(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_ok.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_ok")
 def test_cluster_node_count_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "cluster_node_count"]
-    )
-    assert result.exit_code == 0
-    if use_old_replica_state:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_node_count"])
+    if old_replica_state:
        assert (
            result.output
            == "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=3\n"
@ -26,19 +37,18 @@ def test_cluster_node_count_ok(
            result.output
            == "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
        )
+    assert result.exit_code == 0


+@pytest.mark.usefixtures("cluster_node_count_ok")
 def test_cluster_node_count_ok_with_thresholds(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_node_count",
            "--warning",
            "@0:1",
@ -50,8 +60,7 @@ def test_cluster_node_count_ok_with_thresholds(
            "@0:1",
        ],
    )
-    assert result.exit_code == 0
-    if use_old_replica_state:
+    if old_replica_state:
        assert (
            result.output
            == "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=3\n"
@ -61,19 +70,31 @@ def test_cluster_node_count_ok_with_thresholds(
            result.output
            == "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
        )
+    assert result.exit_code == 0


+@pytest.fixture
+def cluster_node_count_healthy_warning(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_healthy_warning.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_healthy_warning")
 def test_cluster_node_count_healthy_warning(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_healthy_warning", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_node_count",
            "--healthy-warning",
            "@2",
@ -81,8 +102,7 @@ def test_cluster_node_count_healthy_warning(
            "@0:1",
        ],
    )
-    assert result.exit_code == 1
-    if use_old_replica_state:
+    if old_replica_state:
        assert (
            result.output
            == "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=2\n"
@ -92,19 +112,31 @@ def test_cluster_node_count_healthy_warning(
            result.output
            == "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
        )
+    assert result.exit_code == 1


+@pytest.fixture
+def cluster_node_count_healthy_critical(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_healthy_critical.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_healthy_critical")
 def test_cluster_node_count_healthy_critical(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_healthy_critical", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_node_count",
            "--healthy-warning",
            "@2",
@ -112,24 +144,35 @@ def test_cluster_node_count_healthy_critical(
            "@0:1",
        ],
    )
-    assert result.exit_code == 2
    assert (
        result.output
        == "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_leader=1 role_replica=2 state_running=1 state_start_failed=2\n"
    )
+    assert result.exit_code == 2


+@pytest.fixture
+def cluster_node_count_warning(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_warning.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_warning")
 def test_cluster_node_count_warning(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_warning", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_node_count",
            "--warning",
            "@2",
@ -137,8 +180,7 @@ def test_cluster_node_count_warning(
            "@0:1",
        ],
    )
-    assert result.exit_code == 1
-    if use_old_replica_state:
+    if old_replica_state:
        assert (
            result.stdout
            == "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=2\n"
@ -148,19 +190,31 @@ def test_cluster_node_count_warning(
            result.stdout
            == "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
        )
+    assert result.exit_code == 1


+@pytest.fixture
+def cluster_node_count_critical(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_critical.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_critical")
 def test_cluster_node_count_critical(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "cluster_node_count_critical", 200, use_old_replica_state)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "cluster_node_count",
            "--warning",
            "@2",
@ -168,8 +222,51 @@ def test_cluster_node_count_critical(
            "@0:1",
        ],
    )
-    assert result.exit_code == 2
    assert (
        result.stdout
        == "CLUSTERNODECOUNT CRITICAL - members is 1 (outside range @0:1) | healthy_members=1 members=1;@2;@1 role_leader=1 state_running=1\n"
    )
+    assert result.exit_code == 2
+
+
+@pytest.fixture
+def cluster_node_count_ko_in_archive_recovery(
+    patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
+) -> Iterator[None]:
+    cluster_path: Union[str, Path] = "cluster_node_count_ko_in_archive_recovery.json"
+    patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
+    if old_replica_state:
+        cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
+        patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
+    with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
+        yield None
+
+
+@pytest.mark.usefixtures("cluster_node_count_ko_in_archive_recovery")
+def test_cluster_node_count_ko_in_archive_recovery(
+    runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
+) -> None:
+    result = runner.invoke(
+        main,
+        [
+            "-e",
+            patroni_api.endpoint,
+            "cluster_node_count",
+            "--healthy-warning",
+            "@2",
+            "--healthy-critical",
+            "@0:1",
+        ],
+    )
+    if old_replica_state:
+        assert (
+            result.stdout
+            == "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_running=3\n"
+        )
+        assert result.exit_code == 0
+    else:
+        assert (
+            result.stdout
+            == "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_in_archive_recovery=2 state_streaming=1\n"
+        )
+        assert result.exit_code == 2
--- a/tests/test_node_is_alive.py
+++ b/tests/test_node_is_alive.py
@ -1,16 +1,19 @@
+from pathlib import Path
+
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, None, 200)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
+def test_node_is_alive_ok(
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
+) -> None:
+    liveness = tmp_path / "liveness"
+    liveness.touch()
+    with patroni_api.routes({"liveness": liveness}):
+        result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
    assert result.exit_code == 0
    assert (
        result.stdout
@ -18,11 +21,8 @@ def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) ->
    )


-def test_node_is_alive_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, None, 404)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
+def test_node_is_alive_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
    assert result.exit_code == 2
    assert (
        result.stdout
--- a/tests/test_node_is_leader.py
+++ b/tests/test_node_is_leader.py
@ -1,28 +1,37 @@
+from typing import Iterator
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
+@pytest.fixture
+def node_is_leader_ok(patroni_api: PatroniAPI) -> Iterator[None]:
+    with patroni_api.routes(
+        {
+            "leader": "node_is_leader_ok.json",
+            "standby-leader": "node_is_leader_ok_standby_leader.json",
+        }
+    ):
+        yield None

-    my_mock(mocker, "node_is_leader_ok", 200)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
+
+@pytest.mark.usefixtures("node_is_leader_ok")
+def test_node_is_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
    assert result.exit_code == 0
    assert (
        result.stdout
        == "NODEISLEADER OK - This node is a leader node. | is_leader=1;;@0\n"
    )

-    my_mock(mocker, "node_is_leader_ok_standby_leader", 200)
    result = runner.invoke(
        main,
-        ["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
+        ["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
    )
-    print(result.stdout)
    assert result.exit_code == 0
    assert (
        result.stdout
@ -30,21 +39,17 @@ def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -
    )


-def test_node_is_leader_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_leader_ko", 503)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
+def test_node_is_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
    assert result.exit_code == 2
    assert (
        result.stdout
        == "NODEISLEADER CRITICAL - This node is not a leader node. | is_leader=0;;@0\n"
    )

-    my_mock(mocker, "node_is_leader_ko_standby_leader", 503)
    result = runner.invoke(
        main,
-        ["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
+        ["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
    )
    assert result.exit_code == 2
    assert (
--- a/tests/test_node_is_pending_restart.py
+++ b/tests/test_node_is_pending_restart.py
@ -1,20 +1,15 @@
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_is_pending_restart_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_pending_restart_ok", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
-    )
+def test_node_is_pending_restart_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
+        )
    assert result.exit_code == 0
    assert (
        result.stdout
@ -22,15 +17,11 @@ def test_node_is_pending_restart_ok(
    )


-def test_node_is_pending_restart_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_pending_restart_ko", 200)
-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
-    )
+def test_node_is_pending_restart_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    with patroni_api.routes({"patroni": "node_is_pending_restart_ko.json"}):
+        result = runner.invoke(
+            main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
+        )
    assert result.exit_code == 2
    assert (
        result.stdout
--- a/tests/test_node_is_primary.py
+++ b/tests/test_node_is_primary.py
@ -1,16 +1,13 @@
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_primary_ok", 200)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
+def test_node_is_primary_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    with patroni_api.routes({"primary": "node_is_primary_ok.json"}):
+        result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
    assert result.exit_code == 0
    assert (
        result.stdout
@ -18,11 +15,8 @@ def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool)
    )


-def test_node_is_primary_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_primary_ko", 503)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
+def test_node_is_primary_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
    assert result.exit_code == 2
    assert (
        result.stdout
--- a/tests/test_node_is_replica.py
+++ b/tests/test_node_is_replica.py
@ -1,16 +1,27 @@
+from typing import Iterator
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
+@pytest.fixture
+def node_is_replica_ok(patroni_api: PatroniAPI) -> Iterator[None]:
+    with patroni_api.routes(
+        {
+            k: "node_is_replica_ok.json"
+            for k in ("replica", "synchronous", "asynchronous")
+        }
+    ):
+        yield None

-    my_mock(mocker, "node_is_replica_ok", 200)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
+
+@pytest.mark.usefixtures("node_is_replica_ok")
+def test_node_is_replica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
    assert result.exit_code == 0
    assert (
        result.stdout
@ -18,11 +29,8 @@ def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool)
    )


-def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_is_replica_ko", 503)
-    result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
+def test_node_is_replica_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
    assert result.exit_code == 2
    assert (
        result.stdout
@ -30,15 +38,10 @@ def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool)
    )


-def test_node_is_replica_ko_lag(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+def test_node_is_replica_ko_lag(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 503)
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--max-lag", "100"]
+        main, ["-e", patroni_api.endpoint, "node_is_replica", "--max-lag", "100"]
    )
    assert result.exit_code == 2
    assert (
@ -46,12 +49,11 @@ def test_node_is_replica_ko_lag(
        == "NODEISREPLICA CRITICAL - This node is not a running replica with no noloadbalance tag and a lag under 100. | is_replica=0;;@0\n"
    )

-    my_mock(mocker, "node_is_replica_ok", 503)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_is_replica",
            "--is-async",
            "--max-lag",
@ -65,15 +67,11 @@ def test_node_is_replica_ko_lag(
    )


-def test_node_is_replica_sync_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+@pytest.mark.usefixtures("node_is_replica_ok")
+def test_node_is_replica_sync_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 200)
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
+        main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
    )
    assert result.exit_code == 0
    assert (
@ -82,15 +80,10 @@ def test_node_is_replica_sync_ok(
    )


-def test_node_is_replica_sync_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+def test_node_is_replica_sync_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 503)
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
+        main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
    )
    assert result.exit_code == 2
    assert (
@ -99,15 +92,11 @@ def test_node_is_replica_sync_ko(
    )


-def test_node_is_replica_async_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+@pytest.mark.usefixtures("node_is_replica_ok")
+def test_node_is_replica_async_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 200)
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
+        main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
    )
    assert result.exit_code == 0
    assert (
@ -116,15 +105,10 @@ def test_node_is_replica_async_ok(
    )


-def test_node_is_replica_async_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+def test_node_is_replica_async_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 503)
    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
+        main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
    )
    assert result.exit_code == 2
    assert (
@ -133,18 +117,14 @@ def test_node_is_replica_async_ko(
    )


-def test_node_is_replica_params(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
+@pytest.mark.usefixtures("node_is_replica_ok")
+def test_node_is_replica_params(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_is_replica",
            "--is-async",
            "--is-sync",
@ -157,12 +137,11 @@ def test_node_is_replica_params(
    )

    # We don't do the check ourselves, patroni does it and changes the return code
-    my_mock(mocker, "node_is_replica_ok", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_is_replica",
            "--is-sync",
            "--max-lag",
--- a/tests/test_node_patroni_version.py
+++ b/tests/test_node_patroni_version.py
@ -1,22 +1,25 @@
+from typing import Iterator
+
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import my_mock
+from . import PatroniAPI


-def test_node_patroni_version_ok(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
+@pytest.fixture(scope="module", autouse=True)
+def node_patroni_version(patroni_api: PatroniAPI) -> Iterator[None]:
+    with patroni_api.routes({"patroni": "node_patroni_version.json"}):
+        yield None

-    my_mock(mocker, "node_patroni_version", 200)
+
+def test_node_patroni_version_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_patroni_version",
            "--patroni-version",
            "2.0.2",
@ -29,17 +32,12 @@ def test_node_patroni_version_ok(
    )


-def test_node_patroni_version_ko(
-    mocker: MockerFixture, use_old_replica_state: bool
-) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_patroni_version", 200)
+def test_node_patroni_version_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_patroni_version",
            "--patroni-version",
            "1.0.0",
--- a/tests/test_node_tl_has_changed.py
+++ b/tests/test_node_tl_has_changed.py
@ -1,23 +1,30 @@
+from pathlib import Path
+from typing import Iterator
+
 import nagiosplugin
+import pytest
 from click.testing import CliRunner
-from pytest_mock import MockerFixture

 from check_patroni.cli import main

-from .tools import here, my_mock
+from . import PatroniAPI


+@pytest.fixture
+def node_tl_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
+    with patroni_api.routes({"patroni": "node_tl_has_changed.json"}):
+        yield None
+
+
+@pytest.mark.usefixtures("node_tl_has_changed")
 def test_node_tl_has_changed_ok_with_timeline(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_tl_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--timeline",
            "58",
@ -30,23 +37,22 @@ def test_node_tl_has_changed_ok_with_timeline(
    )


+@pytest.mark.usefixtures("node_tl_has_changed")
 def test_node_tl_has_changed_ok_with_state_file(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
-    runner = CliRunner()
-
-    with open(here / "node_tl_has_changed.state_file", "w") as f:
+    state_file = tmp_path / "node_tl_has_changed.state_file"
+    with state_file.open("w") as f:
        f.write('{"timeline": 58}')

-    my_mock(mocker, "node_tl_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--state-file",
-            str(here / "node_tl_has_changed.state_file"),
+            str(state_file),
        ],
    )
    assert result.exit_code == 0
@ -56,17 +62,15 @@ def test_node_tl_has_changed_ok_with_state_file(
    )


+@pytest.mark.usefixtures("node_tl_has_changed")
 def test_node_tl_has_changed_ko_with_timeline(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI
 ) -> None:
-    runner = CliRunner()
-
-    my_mock(mocker, "node_tl_has_changed", 200)
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--timeline",
            "700",
@ -79,24 +83,23 @@ def test_node_tl_has_changed_ko_with_timeline(
    )


+@pytest.mark.usefixtures("node_tl_has_changed")
 def test_node_tl_has_changed_ko_with_state_file_and_save(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
-    runner = CliRunner()
-
-    with open(here / "node_tl_has_changed.state_file", "w") as f:
+    state_file = tmp_path / "node_tl_has_changed.state_file"
+    with state_file.open("w") as f:
        f.write('{"timeline": 700}')

-    my_mock(mocker, "node_tl_has_changed", 200)
    # test without saving the new tl
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--state-file",
-            str(here / "node_tl_has_changed.state_file"),
+            str(state_file),
        ],
    )
    assert result.exit_code == 2
@ -105,7 +108,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
        == "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
    )

-    cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
+    cookie = nagiosplugin.Cookie(state_file)
    cookie.open()
    new_tl = cookie.get("timeline")
    cookie.close()
@ -117,10 +120,10 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--state-file",
-            str(here / "node_tl_has_changed.state_file"),
+            str(state_file),
            "--save",
        ],
    )
@ -130,7 +133,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
        == "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
    )

-    cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
+    cookie = nagiosplugin.Cookie(state_file)
    cookie.open()
    new_tl = cookie.get("timeline")
    cookie.close()
@ -138,23 +141,22 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
    assert new_tl == 58


+@pytest.mark.usefixtures("node_tl_has_changed")
 def test_node_tl_has_changed_params(
-    mocker: MockerFixture, use_old_replica_state: bool
+    runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
 ) -> None:
    # This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
-    runner = CliRunner()
-
-    my_mock(mocker, "node_tl_has_changed", 200)
+    fake_state_file = tmp_path / "fake_file_name.state_file"
    result = runner.invoke(
        main,
        [
            "-e",
-            "https://10.20.199.3:8008",
+            patroni_api.endpoint,
            "node_tl_has_changed",
            "--timeline",
            "58",
            "--state-file",
-            str(here / "fake_file_name.state_file"),
+            str(fake_state_file),
        ],
    )
    assert result.exit_code == 3
@ -163,9 +165,7 @@ def test_node_tl_has_changed_params(
        == "NODETLHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --timeline or --state-file should be provided for this service\n"
    )

-    result = runner.invoke(
-        main, ["-e", "https://10.20.199.3:8008", "node_tl_has_changed"]
-    )
+    result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_tl_has_changed"])
    assert result.exit_code == 3
    assert (
        result.stdout
--- a/tests/tools.py
+++ b/tests/tools.py
@ -1,49 +0,0 @@
-import json
-import pathlib
-from typing import Any
-
-from pytest_mock import MockerFixture
-
-from check_patroni.types import APIError, PatroniResource
-
-here = pathlib.Path(__file__).parent
-
-
-def getjson(name: str) -> Any:
-    path = here / "json" / f"{name}.json"
-    if not path.exists():
-        raise Exception(f"path does not exist : {path}")
-
-    with path.open() as f:
-        return json.load(f)
-
-
-def my_mock(
-    mocker: MockerFixture,
-    json_file: str,
-    status: int,
-    use_old_replica_state: bool = False,
-) -> None:
-    def mock_rest_api(self: PatroniResource, service: str) -> Any:
-        if status != 200:
-            raise APIError("Test en erreur pour status code 200")
-        if json_file:
-            if use_old_replica_state and (
-                json_file.startswith("cluster_has_replica")
-                or json_file.startswith("cluster_node_count")
-            ):
-                return cluster_api_set_replica_running(getjson(json_file))
-            return getjson(json_file)
-        return None
-
-    mocker.resetall()
-    mocker.patch("check_patroni.types.PatroniResource.rest_api", mock_rest_api)
-
-
-def cluster_api_set_replica_running(js: Any) -> Any:
-    # starting from 3.0.4 the state of replicas is streaming instead of running
-    for node in js["members"]:
-        if node["role"] in ["replica", "sync_standby"]:
-            if node["state"] == "streaming":
-                node["state"] = "running"
-    return js
--- a/tox.ini
+++ b/tox.ini
@ -4,11 +4,9 @@ envlist = lint, mypy, py{37,38,39,310,311}
 skip_missing_interpreters = True

 [testenv]
-deps =
-    pytest
-    pytest-mock
+extras = test
 commands =
-    pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv}
+    pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv --log-level=debug}

 [testenv:lint]
 skip_install = True
@ -18,7 +16,7 @@ deps =
    flake8
    isort
 commands =
-    codespell {toxinidir}/check_patroni {toxinidir}/tests
+    codespell {toxinidir}/check_patroni {toxinidir}/tests {toxinidir}/docs/ {toxinidir}/RELEASE.md {toxinidir}/CONTRIBUTING.md
    black --check --diff {toxinidir}/check_patroni {toxinidir}/tests
    flake8 {toxinidir}/check_patroni {toxinidir}/tests
    isort --check --diff {toxinidir}/check_patroni {toxinidir}/tests
@ -28,7 +26,7 @@ deps =
    mypy == 0.961
 commands =
    # we need to install types-requests
-    mypy --install-types --non-interactive {toxinidir}/check_patroni
+    mypy --install-types --non-interactive

 [testenv:build]
 deps =
--- a/vagrant/README.md
+++ b/vagrant/README.md
@ -100,7 +100,7 @@ http://$IP/icingaweb2/setup

  Finish

-* Screen 15: Hopefuly success
+* Screen 15: Hopefully success

  Login

--- a/vagrant/provision/icinga2.bash
+++ b/vagrant/provision/icinga2.bash
@ -66,7 +66,7 @@ icinga_setup(){
 	info "# Icinga setup"
 	info "#============================================================================="

-## this part is already done by the standart icinga install with the user icinga2 
+## this part is already done by the standard icinga install with the user icinga2
 ## and a random password, here we dont really care

 	cat << __EOF__ | sudo -u postgres psql 
@ -83,7 +83,7 @@ __EOF__
 	icingacli setup config directory --group icingaweb2
 	icingacli setup token create

-## this part is already done by the standart icinga install with the user icinga2
+## this part is already done by the standard icinga install with the user icinga2
 	cat << __EOF__ > /etc/icinga2/features-available/ido-pgsql.conf 
 /**
 * The db_ido_pgsql library implements IDO functionality
@ -198,7 +198,7 @@ grafana(){
 	cat << __EOF__ > /etc/grafana/grafana.ini 
 [database]
 # You can configure the database connection by specifying type, host, name, user and password
-# as seperate properties or as on string using the url propertie.
+# as separate properties or as on string using the url property.

 # Either "mysql", "postgres" or "sqlite3", it's your choice
 type = postgres
Author	SHA1	Message	Date
David Prévot	c52e34116d	New upstream version 2.0.0	2024-04-14 09:26:34 +02:00
benoit	807f9b2071	Release V2.0.0	2024-04-09 16:45:11 +02:00
benoit	e0589b97a8	Black run	2024-02-27 11:29:52 +01:00
benoit	a4ed20210c	Improve doc for node_is_replica node_is_replica is using the following Patroni endpoints: replica, asynchronous and synchronous. The first two implement the lag tag. For these endpoints the state of a replica node doesn't reflect the replication state (streaming or in archive recovery), we only know if it's running. The timeline is also not checked. Therefore, if a cluster is using asynchronous replication, it is recommended to check for the lag to detect a divegence as soon as possible.	2024-02-26 16:02:53 +01:00
benoit	364a385a2f	Fix cluster_has_leader in archive recovery tests Since replication states are also over-ridden for standby_leaders since the commit fixing cluster_node_count, the tests had to be adapted.	2024-01-09 06:50:00 +01:00
benoit	78ef0f6ada	Fix cluster_node_count's management of replication states The service now supports the `streaming` state. Since we dont check for lag or timeline in this service, a healthy node is : * leader : in a running state * standby_leader : running (pre Patroni 3.0.4), streaming otherwise * standby & sync_standby : running (pre Patroni 3.0.4), streaming otherwise Updated the tests for this service.	2024-01-09 06:50:00 +01:00
benoit	46db3e2d15	Fix the cluster_has_leader service for standby clusters Before this patch we checked the expected standby leader state was `running` for all versions of Patroni. With this patch, for: * Patroni < 3.0.4, standby leaders are in `running` state. * Patroni >= 3.0.4, standby leaders can be in `streaming` or `in archive recovey` state. We will raise a warning for the latter. The tests where modified to account for this. Co-authored-by: Denis Laxalde <denis@laxalde.org>	2023-12-18 13:17:37 +01:00
benoit	ffc330f96e	Mention that shell completion support is dependant on the shell version	2023-11-16 13:59:06 +01:00
benoit	8d6b8502b6	cluster_has_replica: fix the way a healthy replica is detected For patroni >= version 3.0.4: * the role is `replica` or `sync_standby` * the state is `streaming` or `in archive recovery` * the timeline is the same as the leader * the lag is lower or equal to `max_lag` For prio versions of patroni: * the role is `replica` or `sync_standby` * the state is `running` * the timeline is the same as the leader * the lag is lower or equal to `max_lag` Additionnally, we now display the timeline in the perfstats. We also try to display the perf stats of unhealthy replica as much as possible. Update tests for cluster_has_replica: * Fix the tests to make them work with the new algotithm * Add a specific test for tl divergences	2023-11-11 10:50:35 +01:00
Denis Laxalde	6ee8db1df2	Avoid using requests's JSONDecodeError This exception is only present in "recent" version of requests, typically not in the version distributed by Debian bullseye. Since requests' JSONDecodeError is in general a subclass of json.JSONDecodeError, we use the latter, but also handle the plain ValueError (which json.JSONDecodeError is a subclass of) because requests might use simplejson (which uses its own JSONDecodeError, also a subclass of ValueError).	2023-10-13 11:45:39 +02:00
Denis Laxalde	a8c4a3125d	Work around nagiosplugin issue about stdout in tests We basically apply the change from https://github.com/mpounsett/nagiosplugin/issues/24 as a fixture, but only when nagiosplugin's version is old.	2023-10-13 11:45:39 +02:00
Denis Laxalde	4035f1a3da	Add compat for old pytest in type hints	2023-10-13 11:45:39 +02:00
Denis Laxalde	fabf3c142b	Declare compatibility with click 7.1 or higher We apparently, from the test suite, don't need version 8.x.	2023-10-13 11:45:39 +02:00
Denis Laxalde	593278206a	Let Mypy check all files From previous commit, the test suite also type-checks.	2023-10-06 10:40:29 +02:00
Denis Laxalde	903b83e211	Use fake HTTP server for the Patroni API in tests We introduce a patroni_api fixture, defined in tests/conftest.py, which sets up an HTTP server serving files in a temporary directory. The server is itself defined by the PatroniAPI class; it has a 'routes()' context manager method to be used in actual tests to setup expected responses based on specified JSON files. We set up some logging in order to improve debugging. The direct advantage of this is that PatroniResource.rest_api() method is now covered by the test suite. Coverage before this commit: Name Stmts Miss Cover ----------------------------------------------- check_patroni/__init__.py 3 0 100% check_patroni/cli.py 193 18 91% check_patroni/cluster.py 113 0 100% check_patroni/convert.py 23 5 78% check_patroni/node.py 146 1 99% check_patroni/types.py 50 23 54% ----------------------------------------------- TOTAL 528 47 91% and after this commit: Name Stmts Miss Cover ----------------------------------------------- check_patroni/__init__.py 3 0 100% check_patroni/cli.py 193 18 91% check_patroni/cluster.py 113 0 100% check_patroni/convert.py 23 5 78% check_patroni/node.py 146 1 99% check_patroni/types.py 50 9 82% ----------------------------------------------- TOTAL 528 33 94% In actual test functions, we either invoke patroni_api.routes() to configure which JSON file(s) should be served for each endpoint, or we define dedicated fixtures (e.g. cluster_config_has_changed()) to configure this for several test functions or the whole module. The 'old_replica_state' parametrized fixture is used when needed to adjust such fixtures, e.g. in cluster_has_replica_ok(), to modify the JSON content using cluster_api_set_replica_running() (previously in tests/tools.py, now in tests/__init__.py). The dependency on pytest-mock is no longer needed.	2023-10-06 10:40:29 +02:00
Denis Laxalde	32e06f7051	Use the 'test' extra in Tox's test environment Instead of repeating the dependencies.	2023-10-06 10:33:04 +02:00
Denis Laxalde	2d2c389bdb	Configure coverage To be run with 'pytest --cov --cov-report=html'.	2023-10-06 10:33:04 +02:00
Denis Laxalde	34f576ea0f	Turn --use-old-replica-state into a parametrized fixture Instead of requiring the user to run the test suite with and without the --use-old-replica-state flag, we introduce an 'old_replica_state()' parametrized fixture that is used only when needed (i.e. in test_cluster_{has_replica,node_count}.py).	2023-10-06 10:33:04 +02:00
Denis Laxalde	fea89041b8	Run pytest with --log-level=debug in tox and CI This way, our log messages (and those from our stack) will show up in case of errors or test failures, which makes debugging easier.	2023-10-03 09:54:13 +02:00
Denis Laxalde	ea92809cb3	Introduce a 'runner' test fixture Instead of defining the CliRunner value in each test, we use a fixture. The CliRunner is also configured with stdout and stderr separated because mixing them will pose problem if we use stderr for other purposes in tests, e.g. to emit log messages from a forth-coming HTTP server.	2023-10-03 09:54:13 +02:00
Denis Laxalde	d34e597e61	Use the tmp_path fixture instead of writing files to tests/	2023-10-03 09:54:13 +02:00
Denis Laxalde	bc2d2917c3	Introduce a fake_restapi test fixture This fixture itself uses the 'use_old_replica_state' fixture, so that it's no longer needed to use it explicitly in test functions.	2023-10-03 09:54:13 +02:00
Denis Laxalde	c3cdb8cdd4	Set a default value to status parameter of my_mock in tests Most of the times, it's 200, so the default value simplifies usage in actual tests.	2023-10-03 09:54:13 +02:00
Denis Laxalde	123c300911	Add type hints in tests/conftest.py	2023-10-03 09:54:13 +02:00
Denis Laxalde	a0189ebba7	Fix some typos spotted by codespell 2.2.6	2023-10-03 09:53:53 +02:00
Denis Laxalde	95f21a133d	Drop superfluous type annotation of 'self' See https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#classes > For instance methods, omit type for "self"	2023-10-03 09:39:40 +02:00
benoit	de8b3daa7a	Update tox.ini to run codespell on the documentation	2023-08-30 10:19:18 +02:00
benoit	82e0af8a9e	Update README CONTRIBUTING RELEASE * README: add information pertaining to shell completion; * CONTRIBUTING: remove release information; * RELEASE: create a dedicated file with all the relevant release information.	2023-08-30 10:19:18 +02:00