The service now supports the `streaming` state.
Since we dont check for lag or timeline in this service, a healthy node
is :
* leader : in a running state
* standby_leader : running (pre Patroni 3.0.4), streaming otherwise
* standby & sync_standby : running (pre Patroni 3.0.4), streaming otherwise
Updated the tests for this service.
Before this patch we checked the expected standby leader state
was `running` for all versions of Patroni.
With this patch, for:
* Patroni < 3.0.4, standby leaders are in `running` state.
* Patroni >= 3.0.4, standby leaders can be in `streaming` or `in
archive recovey` state. We will raise a warning for the latter.
The tests where modified to account for this.
Co-authored-by: Denis Laxalde <denis@laxalde.org>
For patroni >= version 3.0.4:
* the role is `replica` or `sync_standby`
* the state is `streaming` or `in archive recovery`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`
For prio versions of patroni:
* the role is `replica` or `sync_standby`
* the state is `running`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`
Additionnally, we now display the timeline in the perfstats. We also try
to display the perf stats of unhealthy replica as much as possible.
Update tests for cluster_has_replica:
* Fix the tests to make them work with the new algotithm
* Add a specific test for tl divergences
This exception is only present in "recent" version of requests,
typically not in the version distributed by Debian bullseye. Since
requests' JSONDecodeError is in general a subclass of
json.JSONDecodeError, we use the latter, but also handle the plain
ValueError (which json.JSONDecodeError is a subclass of) because
requests might use simplejson (which uses its own JSONDecodeError, also
a subclass of ValueError).
We introduce a patroni_api fixture, defined in tests/conftest.py, which
sets up an HTTP server serving files in a temporary directory. The
server is itself defined by the PatroniAPI class; it has a 'routes()'
context manager method to be used in actual tests to setup expected
responses based on specified JSON files.
We set up some logging in order to improve debugging.
The direct advantage of this is that PatroniResource.rest_api() method
is now covered by the test suite.
Coverage before this commit:
Name Stmts Miss Cover
-----------------------------------------------
check_patroni/__init__.py 3 0 100%
check_patroni/cli.py 193 18 91%
check_patroni/cluster.py 113 0 100%
check_patroni/convert.py 23 5 78%
check_patroni/node.py 146 1 99%
check_patroni/types.py 50 23 54%
-----------------------------------------------
TOTAL 528 47 91%
and after this commit:
Name Stmts Miss Cover
-----------------------------------------------
check_patroni/__init__.py 3 0 100%
check_patroni/cli.py 193 18 91%
check_patroni/cluster.py 113 0 100%
check_patroni/convert.py 23 5 78%
check_patroni/node.py 146 1 99%
check_patroni/types.py 50 9 82%
-----------------------------------------------
TOTAL 528 33 94%
In actual test functions, we either invoke patroni_api.routes() to
configure which JSON file(s) should be served for each endpoint, or we
define dedicated fixtures (e.g. cluster_config_has_changed()) to
configure this for several test functions or the whole module.
The 'old_replica_state' parametrized fixture is used when needed to
adjust such fixtures, e.g. in cluster_has_replica_ok(), to modify the
JSON content using cluster_api_set_replica_running() (previously in
tests/tools.py, now in tests/__init__.py).
The dependency on pytest-mock is no longer needed.
Instead of requiring the user to run the test suite with and without the
--use-old-replica-state flag, we introduce an 'old_replica_state()'
parametrized fixture that is used only when needed (i.e. in
test_cluster_{has_replica,node_count}.py).
Instead of defining the CliRunner value in each test, we use a fixture.
The CliRunner is also configured with stdout and stderr separated
because mixing them will pose problem if we use stderr for other
purposes in tests, e.g. to emit log messages from a forth-coming HTTP
server.
* README: add information pertaining to shell completion;
* CONTRIBUTING: remove release information;
* RELEASE: create a dedicated file with all the relevant release
information.
* Add `--sync-warning` and `--sync-critical`
* Add `sync_replica` to track the number of sync replica in the perf data
* Add `MEMBER-sync` to track if a member is a sync replica in the perf data
This changes requiered python 3.7 which I overlooked at first. I would
like to be able to install the check any patroni server and Patroni
supports python 3.6.
For `node_is_alive`, it seemed to be a good idea to exit with a
`CRITICAL` when the target doesn't exist. But for all the rest, UNKNOWN
(which corresponds to a configuration error) seems better.
Previously if a node wasn't reachable whe would get an UNKNOWN error.
instead of a CRITICAL error.
```
NODEISALIVE UNKNOWN - Connection failed for all provided endpoints
```
We now get the correct error.
```
NODEISALIVE CRITICAL - This node is not alive (patroni is not running). | is_alive=0;;@0
```
* Change all replica status from `running` to `streaming`
* Add an option to pytest to change the state back to `running`
* Also tests the output of the script
* Add a quick test script for live clusters
Previously, replica nodes were labeled with a `running` state. As a
result, our checks were based on nodes marked as `running` through
the `--running-[warning|critical]` options.
However, with the recent changes in Patroni 3.0.4, replica nodes now
carry a `streaming` state. This shift in terminology calls for an
adjustment in our approach. A new state, `healthy_member`, has been
introduced to encompass both `running` and `streaming` nodes.
Key Modifications:
* The existing `--running-[warning|critical]` option is now designated
as `--healthy-[warning|critical]`.
* Introduction of the `healthy_member` perfdata, which serves as the
reference point for the aforementioned options.
* Updates to documentation, help messages, and tests.
Since patroni 3.0.4, standby node nominal state is "streaming" instead
of "running". Some services need to be changed to account for that.
Reported in issue #28