The service now supports the `streaming` state.
Since we dont check for lag or timeline in this service, a healthy node
is :
* leader : in a running state
* standby_leader : running (pre Patroni 3.0.4), streaming otherwise
* standby & sync_standby : running (pre Patroni 3.0.4), streaming otherwise
Updated the tests for this service.
Before this patch we checked the expected standby leader state
was `running` for all versions of Patroni.
With this patch, for:
* Patroni < 3.0.4, standby leaders are in `running` state.
* Patroni >= 3.0.4, standby leaders can be in `streaming` or `in
archive recovey` state. We will raise a warning for the latter.
The tests where modified to account for this.
Co-authored-by: Denis Laxalde <denis@laxalde.org>
For patroni >= version 3.0.4:
* the role is `replica` or `sync_standby`
* the state is `streaming` or `in archive recovery`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`
For prio versions of patroni:
* the role is `replica` or `sync_standby`
* the state is `running`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`
Additionnally, we now display the timeline in the perfstats. We also try
to display the perf stats of unhealthy replica as much as possible.
Update tests for cluster_has_replica:
* Fix the tests to make them work with the new algotithm
* Add a specific test for tl divergences
* Add `--sync-warning` and `--sync-critical`
* Add `sync_replica` to track the number of sync replica in the perf data
* Add `MEMBER-sync` to track if a member is a sync replica in the perf data
* Change all replica status from `running` to `streaming`
* Add an option to pytest to change the state back to `running`
* Also tests the output of the script
* Add a quick test script for live clusters
Previously, replica nodes were labeled with a `running` state. As a
result, our checks were based on nodes marked as `running` through
the `--running-[warning|critical]` options.
However, with the recent changes in Patroni 3.0.4, replica nodes now
carry a `streaming` state. This shift in terminology calls for an
adjustment in our approach. A new state, `healthy_member`, has been
introduced to encompass both `running` and `streaming` nodes.
Key Modifications:
* The existing `--running-[warning|critical]` option is now designated
as `--healthy-[warning|critical]`.
* Introduction of the `healthy_member` perfdata, which serves as the
reference point for the aforementioned options.
* Updates to documentation, help messages, and tests.
Since patroni 3.0.4, standby node nominal state is "streaming" instead
of "running". Some services need to be changed to account for that.
Reported in issue #28
The checks `cluster_config_has_changed` and `node_tl_has_changed` use a
state file to store the previous value of the config hash and the
timeline.
Previously the check would fail if something changed, but the new value
would be saved directly. This behavious has changed. The new value
is saved only if `--save` is passed to the check.
The mimics the way [check_pgactivity] manages this kind of checks.
[check_pgactivity]: https://github.com/OPMDG/check_pgactivity