Improve doc for node_is_replica

node_is_replica is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the lag tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(streaming or in archive recovery), we only know if it's running. The
timeline is also not checked.

Therefore, if a cluster is using asynchronous replication, it is recommended
to check for the lag to detect a divegence as soon as possible.
This commit is contained in:
benoit 2024-02-26 14:16:47 +01:00 committed by Benoit
parent 364a385a2f
commit a4ed20210c
3 changed files with 26 additions and 6 deletions

View file

@ -24,6 +24,7 @@
### Misc ### Misc
* Improve the documentation for node_is_replica.
* Improve test coverage by running an HTTP server to fake the Patroni API (#55 * Improve test coverage by running an HTTP server to fake the Patroni API (#55
by @dlax). by @dlax).
* Work around old pytest versions in type annotations in the test suite. * Work around old pytest versions in type annotations in the test suite.

View file

@ -45,7 +45,7 @@ Commands:
node_is_leader Check if the node is a leader node. node_is_leader Check if the node is a leader node.
node_is_pending_restart Check if the node is in pending restart... node_is_pending_restart Check if the node is in pending restart...
node_is_primary Check if the node is the primary with the... node_is_primary Check if the node is the primary with the...
node_is_replica Check if the node is a running replica... node_is_replica Check if the node is a replica with no...
node_patroni_version Check if the version is equal to the input node_patroni_version Check if the version is equal to the input
node_tl_has_changed Check if the timeline has changed. node_tl_has_changed Check if the timeline has changed.
``` ```
@ -437,12 +437,21 @@ Options:
``` ```
Usage: check_patroni node_is_replica [OPTIONS] Usage: check_patroni node_is_replica [OPTIONS]
Check if the node is a running replica with no noloadbalance tag. Check if the node is a replica with no noloadbalance tag.
It is possible to check if the node is synchronous or asynchronous. If It is possible to check if the node is synchronous or asynchronous. If
nothing is specified any kind of replica is accepted. When checking for a nothing is specified any kind of replica is accepted. When checking for a
synchronous replica, it's not possible to specify a lag. synchronous replica, it's not possible to specify a lag.
This service is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the `lag` tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(`streaming` or `in archive recovery`), we only know if it's `running`. The
timeline is also not checked.
Therefore, if a cluster is using asynchronous replication, it is recommended
to check for the lag to detect a divegence as soon as possible.
Check: Check:
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold. * `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
* `CRITICAL`: otherwise * `CRITICAL`: otherwise

View file

@ -621,10 +621,20 @@ def node_is_leader(ctx: click.Context, check_standby_leader: bool) -> None:
def node_is_replica( def node_is_replica(
ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool
) -> None: ) -> None:
"""Check if the node is a running replica with no noloadbalance tag. """Check if the node is a replica with no noloadbalance tag.
It is possible to check if the node is synchronous or asynchronous. If nothing is specified any kind of replica is accepted. It is possible to check if the node is synchronous or asynchronous. If
When checking for a synchronous replica, it's not possible to specify a lag. nothing is specified any kind of replica is accepted. When checking for a
synchronous replica, it's not possible to specify a lag.
This service is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the `lag` tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(`streaming` or `in archive recovery`), we only know if it's `running`. The
timeline is also not checked.
Therefore, if a cluster is using asynchronous replication, it is
recommended to check for the lag to detect a divegence as soon as possible.
\b \b
Check: Check: