Compare commits

...

28 commits

Author SHA1 Message Date
David Prévot c52e34116d New upstream version 2.0.0 2024-04-14 09:26:34 +02:00
benoit 807f9b2071 Release V2.0.0 2024-04-09 16:45:11 +02:00
benoit e0589b97a8 Black run 2024-02-27 11:29:52 +01:00
benoit a4ed20210c Improve doc for node_is_replica
node_is_replica is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the lag tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(streaming or in archive recovery), we only know if it's running. The
timeline is also not checked.

Therefore, if a cluster is using asynchronous replication, it is recommended
to check for the lag to detect a divegence as soon as possible.
2024-02-26 16:02:53 +01:00
benoit 364a385a2f Fix cluster_has_leader in archive recovery tests
Since replication states are also over-ridden for standby_leaders since
the commit fixing cluster_node_count, the tests had to be adapted.
2024-01-09 06:50:00 +01:00
benoit 78ef0f6ada Fix cluster_node_count's management of replication states
The service now supports the `streaming` state.

Since we dont check for lag or timeline in this service, a healthy node
is :

* leader : in a running state
* standby_leader : running (pre Patroni 3.0.4), streaming otherwise
* standby & sync_standby : running (pre Patroni 3.0.4), streaming otherwise

Updated the tests for this service.
2024-01-09 06:50:00 +01:00
benoit 46db3e2d15 Fix the cluster_has_leader service for standby clusters
Before this patch we checked the expected standby leader state
was `running` for all versions of Patroni.

With this patch, for:
* Patroni < 3.0.4, standby leaders are in `running` state.
* Patroni >= 3.0.4, standby leaders can be in `streaming` or `in
archive recovey` state. We will raise a warning for the latter.

The tests where modified to account for this.

Co-authored-by: Denis Laxalde <denis@laxalde.org>
2023-12-18 13:17:37 +01:00
benoit ffc330f96e Mention that shell completion support is dependant on the shell version 2023-11-16 13:59:06 +01:00
benoit 8d6b8502b6 cluster_has_replica: fix the way a healthy replica is detected
For patroni >= version 3.0.4:
* the role is `replica` or `sync_standby`
* the state is `streaming` or `in archive recovery`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`

For prio versions of patroni:
* the role is `replica` or `sync_standby`
* the state is `running`
* the timeline is the same as the leader
* the lag is lower or equal to `max_lag`

Additionnally, we now display the timeline in the perfstats. We also try
to display the perf stats of unhealthy replica as much as possible.

Update tests for cluster_has_replica:
* Fix the tests to make them work with the new algotithm
* Add a specific test for tl divergences
2023-11-11 10:50:35 +01:00
Denis Laxalde 6ee8db1df2 Avoid using requests's JSONDecodeError
This exception is only present in "recent" version of requests,
typically not in the version distributed by Debian bullseye. Since
requests' JSONDecodeError is in general a subclass of
json.JSONDecodeError, we use the latter, but also handle the plain
ValueError (which json.JSONDecodeError is a subclass of) because
requests might use simplejson (which uses its own JSONDecodeError, also
a subclass of ValueError).
2023-10-13 11:45:39 +02:00
Denis Laxalde a8c4a3125d Work around nagiosplugin issue about stdout in tests
We basically apply the change from
https://github.com/mpounsett/nagiosplugin/issues/24 as a fixture, but
only when nagiosplugin's version is old.
2023-10-13 11:45:39 +02:00
Denis Laxalde 4035f1a3da Add compat for old pytest in type hints 2023-10-13 11:45:39 +02:00
Denis Laxalde fabf3c142b Declare compatibility with click 7.1 or higher
We apparently, from the test suite, don't need version 8.x.
2023-10-13 11:45:39 +02:00
Denis Laxalde 593278206a Let Mypy check all files
From previous commit, the test suite also type-checks.
2023-10-06 10:40:29 +02:00
Denis Laxalde 903b83e211 Use fake HTTP server for the Patroni API in tests
We introduce a patroni_api fixture, defined in tests/conftest.py, which
sets up an HTTP server serving files in a temporary directory. The
server is itself defined by the PatroniAPI class; it has a 'routes()'
context manager method to be used in actual tests to setup expected
responses based on specified JSON files.

We set up some logging in order to improve debugging.

The direct advantage of this is that PatroniResource.rest_api() method
is now covered by the test suite.

Coverage before this commit:

  Name                        Stmts   Miss  Cover
  -----------------------------------------------
  check_patroni/__init__.py       3      0   100%
  check_patroni/cli.py          193     18    91%
  check_patroni/cluster.py      113      0   100%
  check_patroni/convert.py       23      5    78%
  check_patroni/node.py         146      1    99%
  check_patroni/types.py         50     23    54%
  -----------------------------------------------
  TOTAL                         528     47    91%

and after this commit:

  Name                        Stmts   Miss  Cover
  -----------------------------------------------
  check_patroni/__init__.py       3      0   100%
  check_patroni/cli.py          193     18    91%
  check_patroni/cluster.py      113      0   100%
  check_patroni/convert.py       23      5    78%
  check_patroni/node.py         146      1    99%
  check_patroni/types.py         50      9    82%
  -----------------------------------------------
  TOTAL                         528     33    94%

In actual test functions, we either invoke patroni_api.routes() to
configure which JSON file(s) should be served for each endpoint, or we
define dedicated fixtures (e.g. cluster_config_has_changed()) to
configure this for several test functions or the whole module.

The 'old_replica_state' parametrized fixture is used when needed to
adjust such fixtures, e.g. in cluster_has_replica_ok(), to modify the
JSON content using cluster_api_set_replica_running() (previously in
tests/tools.py, now in tests/__init__.py).

The dependency on pytest-mock is no longer needed.
2023-10-06 10:40:29 +02:00
Denis Laxalde 32e06f7051 Use the 'test' extra in Tox's test environment
Instead of repeating the dependencies.
2023-10-06 10:33:04 +02:00
Denis Laxalde 2d2c389bdb Configure coverage
To be run with 'pytest --cov --cov-report=html'.
2023-10-06 10:33:04 +02:00
Denis Laxalde 34f576ea0f Turn --use-old-replica-state into a parametrized fixture
Instead of requiring the user to run the test suite with and without the
--use-old-replica-state flag, we introduce an 'old_replica_state()'
parametrized fixture that is used only when needed (i.e. in
test_cluster_{has_replica,node_count}.py).
2023-10-06 10:33:04 +02:00
Denis Laxalde fea89041b8 Run pytest with --log-level=debug in tox and CI
This way, our log messages (and those from our stack) will show up in
case of errors or test failures, which makes debugging easier.
2023-10-03 09:54:13 +02:00
Denis Laxalde ea92809cb3 Introduce a 'runner' test fixture
Instead of defining the CliRunner value in each test, we use a fixture.
The CliRunner is also configured with stdout and stderr separated
because mixing them will pose problem if we use stderr for other
purposes in tests, e.g. to emit log messages from a forth-coming HTTP
server.
2023-10-03 09:54:13 +02:00
Denis Laxalde d34e597e61 Use the tmp_path fixture instead of writing files to tests/ 2023-10-03 09:54:13 +02:00
Denis Laxalde bc2d2917c3 Introduce a fake_restapi test fixture
This fixture itself uses the 'use_old_replica_state' fixture, so that
it's no longer needed to use it explicitly in test functions.
2023-10-03 09:54:13 +02:00
Denis Laxalde c3cdb8cdd4 Set a default value to status parameter of my_mock in tests
Most of the times, it's 200, so the default value simplifies usage in
actual tests.
2023-10-03 09:54:13 +02:00
Denis Laxalde 123c300911 Add type hints in tests/conftest.py 2023-10-03 09:54:13 +02:00
Denis Laxalde a0189ebba7 Fix some typos spotted by codespell 2.2.6 2023-10-03 09:53:53 +02:00
Denis Laxalde 95f21a133d Drop superfluous type annotation of 'self'
See https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#classes
> For instance methods, omit type for "self"
2023-10-03 09:39:40 +02:00
benoit de8b3daa7a Update tox.ini to run codespell on the documentation 2023-08-30 10:19:18 +02:00
benoit 82e0af8a9e Update README CONTRIBUTING RELEASE
* README: add information pertaining to shell completion;
* CONTRIBUTING: remove release information;
* RELEASE: create a dedicated file with all the relevant release
  information.
2023-08-30 10:19:18 +02:00
45 changed files with 1514 additions and 657 deletions

3
.coveragerc Normal file
View file

@ -0,0 +1,3 @@
[run]
include =
check_patroni/*

2
.gitignore vendored
View file

@ -1,10 +1,10 @@
__pycache__/
check_patroni.egg-info
tests/*.state_file
tests/config.ini
vagrant/.vagrant
vagrant/*.state_file
.*.swp
.coverage
.venv/
.tox/
dist/

View file

@ -1,13 +1,37 @@
# Change log
## Unreleased
## check_patroni 2.0.0 - 2024-04-09
### Changed
* In `cluster_node_count`, a healthy standby, sync replica or standby leaders cannot be "in
archive recovery" because this service doesn't check for lag and timelines.
### Added
* Add the timeline in the `cluster_has_replica` perfstats. (#50)
* Add a mention about shell completion support and shell versions in the doc. (#53)
* Add the leader type and whether it's archiving to the `cluster_has_leader` perfstats. (#58)
### Fixed
* Add compatibility with [requests](https://requests.readthedocs.io)
version 2.25 and higher.
* Fix what `cluster_has_replica` deems a healthy replica. (#50, reported by @mbanck)
* Fix `cluster_has_replica` to display perfstats for replicas whenever it's possible (healthy or not). (#50)
* Fix `cluster_has_leader` to correctly check for standby leaders. (#58, reported by @mbanck)
* Fix `cluster_node_count` to correctly manage replication states. (#50, reported by @mbanck)
### Misc
* Improve the documentation for `node_is_replica`.
* Improve test coverage by running an HTTP server to fake the Patroni API (#55
by @dlax).
* Work around old pytest versions in type annotations in the test suite.
* Declare compatibility with click version 7.1 (or higher).
* In tests, work around nagiosplugin 1.3.2 not properly handling stdout
redirection.
## check_patroni 1.0.0 - 2023-08-28
Check patroni is now tagged as Production/Stable.

View file

@ -43,15 +43,14 @@ A vagrant file can be found in [this
repository](https://github.com/ioguix/vagrant-patroni) to generate a patroni/etcd
setup.
The `README.md` can be geneated with `./docs/make_readme.sh`.
The `README.md` can be generated with `./docs/make_readme.sh`.
## Executing Tests
Crafting repeatable tests using a live Patroni cluster can be intricate. To
simplify the development process, interactions with Patroni's API are
substituted with a mock function that yields an HTTP return code and a JSON
object outlining the cluster's status. The JSON files containing this
information are housed in the `./tests/json` directory.
simplify the development process, a fake HTTP server is set up as a test
fixture and serves static files (either from `tests/json` directory or from
in-memory data).
An important consideration is that there is a potential drawback: if the JSON
data is incorrect or if modifications have been made to Patroni without
@ -61,21 +60,15 @@ erroneously.
The tests are executed automatically for each PR using the ci (see
`.github/workflow/lint.yml` and `.github/workflow/tests.yml`).
Running the tests manually:
Running the tests,
* Using patroni's nominal replica state of `streaming` (since v3.0.4):
* manually:
```bash
pytest ./tests
pytest --cov tests
```
* Using patroni's nominal replica state of `running` (before v3.0.4):
```bash
pytest --use-old-replica-state ./tests
```
* Using tox:
* or using tox:
```bash
tox -e lint # mypy + flake8 + black + isort ° codespell
@ -83,9 +76,9 @@ Running the tests manually:
tox -e py # pytests and "lint" tests for the default version of python
```
Please note that when dealing with any service that checks the state of a node
in patroni's `cluster` endpoint, the corresponding JSON test file must be added
in `./tests/tools.py`.
Please note that when dealing with any service that checks the state of a node,
the related tests must use the `old_replica_state` fixture to test with both
old (pre 3.0.4) and new replica states.
A bash script, `check_patroni.sh`, is provided to facilitate testing all
services on a Patroni endpoint (`./vagrant/check_patroni.sh`). It requires one
@ -99,17 +92,3 @@ Here's an example usage:
```bash
./vagrant/check_patroni.sh http://10.20.30.51:8008
```
## Release
Update the Changelog.
The package is generated and uploaded to pypi when a `v*` tag is created (see
`.github/workflow/publish.yml`).
Alternatively, the release can be done manually with:
```
tox -e build
tox -e upload
```

View file

@ -2,6 +2,7 @@ include *.md
include mypy.ini
include pytest.ini
include tox.ini
include .coveragerc
include .flake8
include pyproject.toml
recursive-include docs *.sh

124
README.md
View file

@ -45,7 +45,7 @@ Commands:
node_is_leader Check if the node is a leader node.
node_is_pending_restart Check if the node is in pending restart...
node_is_primary Check if the node is the primary with the...
node_is_replica Check if the node is a running replica...
node_is_replica Check if the node is a replica with no...
node_patroni_version Check if the version is equal to the input
node_tl_has_changed Check if the timeline has changed.
```
@ -60,7 +60,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git
check_patroni works on python 3.6, we keep it that way because patroni also
supports it and there are still lots of RH 7 variants around. That being said
python 3.6 has been EOL for age and there is no support for it in the github
python 3.6 has been EOL for ages and there is no support for it in the github
CI.
## Support
@ -98,8 +98,8 @@ A match is found when: `start <= VALUE <= end`.
For example, the following command will raise:
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
```
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
@ -115,6 +115,30 @@ Several options are available:
* `--cert_file`: your certificate or the concatenation of your certificate and private key
* `--key_file`: your private key (optional)
## Shell completion
We use the [click] library which supports shell completion natively.
Shell completion can be added by typing the following command or adding it to
a file spécific to your shell of choice.
* for Bash (add to `~/.bashrc`):
```
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
```
* for Zsh (add to `~/.zshrc`):
```
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
```
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
```
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
```
Please note that shell completion is not supported far all shell versions, for
example only Bash versions older than 4.4 are supported.
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
## Cluster services
@ -152,11 +176,27 @@ Usage: check_patroni cluster_has_leader [OPTIONS]
This check applies to any kind of leaders including standby leaders.
A leader is a node with the "leader" role and a "running" state.
A standby leader is a node with a "standby_leader" role and a "streaming" or
"in archive recovery" state. Please note that log shipping could be stuck
because the WAL are not available or applicable. Patroni doesn't provide
information about the origin cluster (timeline or lag), so we cannot check
if there is a problem in that particular case. That's why we issue a warning
when the node is "in archive recovery". We suggest using other supervision
tools to do this (eg. check_pgactivity).
Check:
* `OK`: if there is a leader node.
* `CRITICAL`: otherwise
* 'WARNING': if there is a stanby leader in archive mode.
* `CRITICAL`: otherwise.
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
Perfdata:
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
archive recovery", 0 otherwise
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
Options:
--help Show this message and exit.
@ -169,10 +209,27 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
Check if the cluster has healthy replicas and/or if some are sync standbies
For patroni (and this check):
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
A healthy replica:
* is in running or streaming state (V3.0.4)
* has a replica or sync_standby role
* has a lag lower or equal to max_lag
* has a `replica` or `sync_standby` role
* has the same timeline as the leader and
* is in `running` state (patroni < V3.0.4)
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
* has a lag lower or equal to `max_lag`
Please note that replica `in archive recovery` could be stuck because the
WAL are not available or applicable (the server's timeline has diverged for
the leader's). We already detect the latter but we will miss the former.
Therefore, it's preferable to check for the lag in addition to the healthy
state if you rely on log shipping to help lagging standbies to catch up.
Since we require a healthy replica to have the same timeline as the leader,
it's possible that we raise alerts when the cluster is performing a
switchover or failover and the standbies are in the process of catching up
with the new leader. The alert shouldn't last long.
Check:
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
@ -182,8 +239,9 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
Perfdata:
* healthy_replica & unhealthy_replica count
* the number of sync_replica, they are included in the previous count
* the lag of each replica labelled with "member name"_lag
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
* the lag of each replica labelled with "member name"_lag
* the timeline of each replica labelled with "member name"_timeline
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
Options:
-w, --warning TEXT Warning threshold for the number of healthy replica
@ -241,26 +299,37 @@ Usage: check_patroni cluster_node_count [OPTIONS]
Count the number of nodes in the cluster.
The role refers to the role of the server in the cluster. Possible values
are:
* master or leader
* replica
* standby_leader
* sync_standby
* demoted
* promoted
* uninitialized
The state refers to the state of PostgreSQL. Possible values are:
* initializing new cluster, initdb failed
* running custom bootstrap script, custom bootstrap failed
* starting, start failed
* restarting, restart failed
* running, streaming (for a replica V3.0.4)
* running, streaming, in archive recovery
* stopping, stopped, stop failed
* creating replica
* crashed
The role refers to the role of the server in the cluster. Possible values
are:
* master or leader (V3.0.0+)
* replica
* demoted
* promoted
* uninitialized
The "healthy" checks only ensures that:
* a leader has the running state
* a standby_leader has the running or streaming (V3.0.4) state
* a replica or sync-standby has the running or streaming (V3.0.4) state
Since we dont check the lag or timeline, "in archive recovery" is not
considered a valid state for this service. See cluster_has_leader and
cluster_has_replica for specialized checks.
Check:
* Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
* `OK`: If they are not provided.
Perfdata:
@ -307,7 +376,7 @@ Usage: check_patroni node_is_pending_restart [OPTIONS]
Check if the node is in pending restart state.
This situation can arise if the configuration has been modified but requiers
This situation can arise if the configuration has been modified but requires
a restart of PostgreSQL to take effect.
Check:
@ -368,12 +437,21 @@ Options:
```
Usage: check_patroni node_is_replica [OPTIONS]
Check if the node is a running replica with no noloadbalance tag.
Check if the node is a replica with no noloadbalance tag.
It is possible to check if the node is synchronous or asynchronous. If
nothing is specified any kind of replica is accepted. When checking for a
nothing is specified any kind of replica is accepted. When checking for a
synchronous replica, it's not possible to specify a lag.
This service is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the `lag` tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(`streaming` or `in archive recovery`), we only know if it's `running`. The
timeline is also not checked.
Therefore, if a cluster is using asynchronous replication, it is recommended
to check for the lag to detect a divegence as soon as possible.
Check:
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
* `CRITICAL`: otherwise

38
RELEASE.md Normal file
View file

@ -0,0 +1,38 @@
# Release HOW TO
## Preparatory changes
* Review the **Unreleased** section, if any, in `CHANGELOG.md` possibly adding
any missing item from closed issues, merged pull requests, or directly the
git history[^git-changes],
* Rename the **Unreleased** section according to the version to be released,
with a date,
* Bump the version in `check_patroni/__init__.py`,
* Rebuild the `README.md` (`cd docs; ./make_readme.sh`),
* Commit these changes (either on a dedicated branch, before submitting a pull
request or directly on the `master` branch) with the commit message `release
X.Y.Z`.
* Then, when changes landed in the `master` branch, create an annotated (and
possibly signed) tag, as `git tag -a [-s] -m 'release X.Y.Z' vX.Y.Z`,
and,
* Push with `--follow-tags`.
[^git-changes]: Use `git log $(git describe --tags --abbrev=0).. --format=%s
--reverse` to get commits from the previous tag.
## PyPI package
The package is generated and uploaded to pypi when a `v*` tag is created (see
`.github/workflow/publish.yml`).
Alternatively, the release can be done manually with:
```
tox -e build
tox -e upload
```
## GitHub release
Draft a new release from the release page, choosing the tag just pushed and
copy the relevant change log section as a description.

View file

@ -1,5 +1,5 @@
import logging
__version__ = "1.0.0"
__version__ = "2.0.0"
_log: logging.Logger = logging.getLogger(__name__)

View file

@ -226,29 +226,40 @@ def cluster_node_count(
) -> None:
"""Count the number of nodes in the cluster.
\b
The state refers to the state of PostgreSQL. Possible values are:
* initializing new cluster, initdb failed
* running custom bootstrap script, custom bootstrap failed
* starting, start failed
* restarting, restart failed
* running, streaming (for a replica V3.0.4)
* stopping, stopped, stop failed
* creating replica
* crashed
\b
The role refers to the role of the server in the cluster. Possible values
are:
* master or leader (V3.0.0+)
* master or leader
* replica
* standby_leader
* sync_standby
* demoted
* promoted
* uninitialized
\b
The state refers to the state of PostgreSQL. Possible values are:
* initializing new cluster, initdb failed
* running custom bootstrap script, custom bootstrap failed
* starting, start failed
* restarting, restart failed
* running, streaming, in archive recovery
* stopping, stopped, stop failed
* creating replica
* crashed
\b
The "healthy" checks only ensures that:
* a leader has the running state
* a standby_leader has the running or streaming (V3.0.4) state
* a replica or sync-standby has the running or streaming (V3.0.4) state
Since we dont check the lag or timeline, "in archive recovery" is not considered a valid state
for this service. See cluster_has_leader and cluster_has_replica for specialized checks.
\b
Check:
* Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
* `OK`: If they are not provided.
\b
@ -285,17 +296,38 @@ def cluster_has_leader(ctx: click.Context) -> None:
This check applies to any kind of leaders including standby leaders.
A leader is a node with the "leader" role and a "running" state.
A standby leader is a node with a "standby_leader" role and a "streaming"
or "in archive recovery" state. Please note that log shipping could be
stuck because the WAL are not available or applicable. Patroni doesn't
provide information about the origin cluster (timeline or lag), so we
cannot check if there is a problem in that particular case. That's why we
issue a warning when the node is "in archive recovery". We suggest using
other supervision tools to do this (eg. check_pgactivity).
\b
Check:
* `OK`: if there is a leader node.
* `CRITICAL`: otherwise
* 'WARNING': if there is a stanby leader in archive mode.
* `CRITICAL`: otherwise.
\b
Perfdata:
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
archive recovery", 0 otherwise
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
"""
check = nagiosplugin.Check()
check.add(
ClusterHasLeader(ctx.obj.connection_info),
nagiosplugin.ScalarContext("has_leader", None, "@0:0"),
nagiosplugin.ScalarContext("is_standby_leader_in_arc_rec", "@1:1", None),
nagiosplugin.ScalarContext("is_leader", None, None),
nagiosplugin.ScalarContext("is_standby_leader", None, None),
ClusterHasLeaderSummary(),
)
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
@ -341,11 +373,29 @@ def cluster_has_replica(
) -> None:
"""Check if the cluster has healthy replicas and/or if some are sync standbies
\b
For patroni (and this check):
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
\b
A healthy replica:
* is in running or streaming state (V3.0.4)
* has a replica or sync_standby role
* has a lag lower or equal to max_lag
* has a `replica` or `sync_standby` role
* has the same timeline as the leader and
* is in `running` state (patroni < V3.0.4)
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
* has a lag lower or equal to `max_lag`
Please note that replica `in archive recovery` could be stuck because the WAL
are not available or applicable (the server's timeline has diverged for the
leader's). We already detect the latter but we will miss the former.
Therefore, it's preferable to check for the lag in addition to the healthy
state if you rely on log shipping to help lagging standbies to catch up.
Since we require a healthy replica to have the same timeline as the
leader, it's possible that we raise alerts when the cluster is performing a
switchover or failover and the standbies are in the process of catching up with
the new leader. The alert shouldn't last long.
\b
Check:
@ -357,8 +407,9 @@ def cluster_has_replica(
Perfdata:
* healthy_replica & unhealthy_replica count
* the number of sync_replica, they are included in the previous count
* the lag of each replica labelled with "member name"_lag
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
* the lag of each replica labelled with "member name"_lag
* the timeline of each replica labelled with "member name"_timeline
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
"""
tmax_lag = size_to_byte(max_lag) if max_lag is not None else None
@ -377,6 +428,7 @@ def cluster_has_replica(
),
nagiosplugin.ScalarContext("unhealthy_replica"),
nagiosplugin.ScalarContext("replica_lag"),
nagiosplugin.ScalarContext("replica_timeline"),
nagiosplugin.ScalarContext("replica_sync"),
)
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
@ -569,10 +621,20 @@ def node_is_leader(ctx: click.Context, check_standby_leader: bool) -> None:
def node_is_replica(
ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool
) -> None:
"""Check if the node is a running replica with no noloadbalance tag.
"""Check if the node is a replica with no noloadbalance tag.
It is possible to check if the node is synchronous or asynchronous. If nothing is specified any kind of replica is accepted.
When checking for a synchronous replica, it's not possible to specify a lag.
It is possible to check if the node is synchronous or asynchronous. If
nothing is specified any kind of replica is accepted. When checking for a
synchronous replica, it's not possible to specify a lag.
This service is using the following Patroni endpoints: replica, asynchronous
and synchronous. The first two implement the `lag` tag. For these endpoints
the state of a replica node doesn't reflect the replication state
(`streaming` or `in archive recovery`), we only know if it's `running`. The
timeline is also not checked.
Therefore, if a cluster is using asynchronous replication, it is
recommended to check for the lag to detect a divegence as soon as possible.
\b
Check:
@ -610,7 +672,7 @@ def node_is_pending_restart(ctx: click.Context) -> None:
"""Check if the node is in pending restart state.
This situation can arise if the configuration has been modified but
requiers a restart of PostgreSQL to take effect.
requires a restart of PostgreSQL to take effect.
\b
Check:

View file

@ -1,7 +1,7 @@
import hashlib
import json
from collections import Counter
from typing import Iterable, Union
from typing import Any, Iterable, Union
import nagiosplugin
@ -14,25 +14,52 @@ def replace_chars(text: str) -> str:
class ClusterNodeCount(PatroniResource):
def probe(self: "ClusterNodeCount") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
def debug_member(member: Any, health: str) -> None:
_log.debug(
"Node %(node_name)s is %(health)s: role %(role)s state %(state)s.",
{
"node_name": member["name"],
"health": health,
"role": member["role"],
"state": member["state"],
},
)
# get the cluster info
item_dict = self.rest_api("cluster")
role_counters: Counter[str] = Counter()
roles = []
status_counters: Counter[str] = Counter()
statuses = []
healthy_member = 0
for member in item_dict["members"]:
roles.append(replace_chars(member["role"]))
statuses.append(replace_chars(member["state"]))
state, role = member["state"], member["role"]
roles.append(replace_chars(role))
statuses.append(replace_chars(state))
if role == "leader" and state == "running":
healthy_member += 1
debug_member(member, "healthy")
continue
if role in ["standby_leader", "replica", "sync_standby"] and (
(self.has_detailed_states() and state == "streaming")
or (not self.has_detailed_states() and state == "running")
):
healthy_member += 1
debug_member(member, "healthy")
continue
debug_member(member, "unhealthy")
role_counters.update(roles)
status_counters.update(statuses)
# The actual check: members, healthy_members
yield nagiosplugin.Metric("members", len(item_dict["members"]))
yield nagiosplugin.Metric(
"healthy_members",
status_counters["running"] + status_counters.get("streaming", 0),
)
yield nagiosplugin.Metric("healthy_members", healthy_member)
# The performance data : role
for role in role_counters:
@ -48,74 +75,149 @@ class ClusterNodeCount(PatroniResource):
class ClusterHasLeader(PatroniResource):
def probe(self: "ClusterHasLeader") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("cluster")
is_leader_found = False
is_standby_leader_found = False
is_standby_leader_in_arc_rec = False
for member in item_dict["members"]:
if (
member["role"] in ("leader", "standby_leader")
and member["state"] == "running"
):
if member["role"] == "leader" and member["state"] == "running":
is_leader_found = True
break
if member["role"] == "standby_leader":
if member["state"] not in ["streaming", "in archive recovery"]:
# for patroni >= 3.0.4 any state would be wrong
# for patroni < 3.0.4 a state different from running would be wrong
if self.has_detailed_states() or member["state"] != "running":
continue
if member["state"] in ["in archive recovery"]:
is_standby_leader_in_arc_rec = True
is_standby_leader_found = True
break
return [
nagiosplugin.Metric(
"has_leader",
1 if is_leader_found or is_standby_leader_found else 0,
),
nagiosplugin.Metric(
"is_standby_leader_in_arc_rec",
1 if is_standby_leader_in_arc_rec else 0,
),
nagiosplugin.Metric(
"is_standby_leader",
1 if is_standby_leader_found else 0,
),
nagiosplugin.Metric(
"is_leader",
1 if is_leader_found else 0,
)
),
]
class ClusterHasLeaderSummary(nagiosplugin.Summary):
def ok(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return "The cluster has a running leader."
@handle_unknown
def problem(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
return "The cluster has no running leader."
def problem(self, results: nagiosplugin.Result) -> str:
return "The cluster has no running leader or the standby leader is in archive recovery."
class ClusterHasReplica(PatroniResource):
def __init__(
self: "ClusterHasReplica",
connection_info: ConnectionInfo,
max_lag: Union[int, None],
):
def __init__(self, connection_info: ConnectionInfo, max_lag: Union[int, None]):
super().__init__(connection_info)
self.max_lag = max_lag
def probe(self: "ClusterHasReplica") -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("cluster")
def probe(self) -> Iterable[nagiosplugin.Metric]:
def debug_member(member: Any, health: str) -> None:
_log.debug(
"Node %(node_name)s is %(health)s: lag %(lag)s, state %(state)s, tl %(tl)s.",
{
"node_name": member["name"],
"health": health,
"lag": member["lag"],
"state": member["state"],
"tl": member["timeline"],
},
)
# get the cluster info
cluster_item_dict = self.rest_api("cluster")
replicas = []
healthy_replica = 0
unhealthy_replica = 0
sync_replica = 0
for member in item_dict["members"]:
# FIXME are there other acceptable states
leader_tl = None
# Look for replicas
for member in cluster_item_dict["members"]:
if member["role"] in ["replica", "sync_standby"]:
# patroni 3.0.4 changed the standby state from running to streaming
if (
member["state"] in ["running", "streaming"]
and member["lag"] != "unknown"
):
if member["lag"] == "unknown":
# This could happen if the node is stopped
# nagiosplugin doesn't handle strings in perfstats
# so we have to ditch all the stats in that case
debug_member(member, "unhealthy")
unhealthy_replica += 1
continue
else:
replicas.append(
{
"name": member["name"],
"lag": member["lag"],
"timeline": member["timeline"],
"sync": 1 if member["role"] == "sync_standby" else 0,
}
)
if member["role"] == "sync_standby":
sync_replica += 1
# Get the leader tl if we haven't already
if leader_tl is None:
# If there are no leaders, we will loop here for all
# members because leader_tl will remain None. it's not
# a big deal since having no leader is rare.
for tmember in cluster_item_dict["members"]:
if tmember["role"] == "leader":
leader_tl = int(tmember["timeline"])
break
if self.max_lag is None or self.max_lag >= int(member["lag"]):
healthy_replica += 1
continue
unhealthy_replica += 1
_log.debug(
"Patroni's leader_timeline is %(leader_tl)s",
{
"leader_tl": leader_tl,
},
)
# Test for an unhealthy replica
if (
self.has_detailed_states()
and not (
member["state"] in ["streaming", "in archive recovery"]
and int(member["timeline"]) == leader_tl
)
) or (
not self.has_detailed_states()
and not (
member["state"] == "running"
and int(member["timeline"]) == leader_tl
)
):
debug_member(member, "unhealthy")
unhealthy_replica += 1
continue
if member["role"] == "sync_standby":
sync_replica += 1
if self.max_lag is None or self.max_lag >= int(member["lag"]):
debug_member(member, "healthy")
healthy_replica += 1
else:
debug_member(member, "unhealthy")
unhealthy_replica += 1
# The actual check
yield nagiosplugin.Metric("healthy_replica", healthy_replica)
@ -127,6 +229,11 @@ class ClusterHasReplica(PatroniResource):
yield nagiosplugin.Metric(
f"{replica['name']}_lag", replica["lag"], context="replica_lag"
)
yield nagiosplugin.Metric(
f"{replica['name']}_timeline",
replica["timeline"],
context="replica_timeline",
)
yield nagiosplugin.Metric(
f"{replica['name']}_sync", replica["sync"], context="replica_sync"
)
@ -140,7 +247,7 @@ class ClusterHasReplica(PatroniResource):
class ClusterConfigHasChanged(PatroniResource):
def __init__(
self: "ClusterConfigHasChanged",
self,
connection_info: ConnectionInfo,
config_hash: str, # Always contains the old hash
state_file: str, # Only used to update the hash in the state_file (when needed)
@ -151,7 +258,7 @@ class ClusterConfigHasChanged(PatroniResource):
self.config_hash = config_hash
self.save = save
def probe(self: "ClusterConfigHasChanged") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("config")
new_hash = hashlib.md5(json.dumps(item_dict).encode()).hexdigest()
@ -183,23 +290,21 @@ class ClusterConfigHasChanged(PatroniResource):
class ClusterConfigHasChangedSummary(nagiosplugin.Summary):
def __init__(self: "ClusterConfigHasChangedSummary", config_hash: str) -> None:
def __init__(self, config_hash: str) -> None:
self.old_config_hash = config_hash
# Note: It would be helpful to display the old / new hash here. Unfortunately, it's not a metric.
# So we only have the old / expected one.
def ok(self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return f"The hash of patroni's dynamic configuration has not changed ({self.old_config_hash})."
@handle_unknown
def problem(
self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result
) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return f"The hash of patroni's dynamic configuration has changed. The old hash was {self.old_config_hash}."
class ClusterIsInMaintenance(PatroniResource):
def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("cluster")
# The actual check
@ -212,7 +317,7 @@ class ClusterIsInMaintenance(PatroniResource):
class ClusterHasScheduledAction(PatroniResource):
def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("cluster")
scheduled_switchover = 0

View file

@ -7,7 +7,7 @@ from .types import APIError, ConnectionInfo, PatroniResource, handle_unknown
class NodeIsPrimary(PatroniResource):
def probe(self: "NodeIsPrimary") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
try:
self.rest_api("primary")
except APIError:
@ -16,24 +16,22 @@ class NodeIsPrimary(PatroniResource):
class NodeIsPrimarySummary(nagiosplugin.Summary):
def ok(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return "This node is the primary with the leader lock."
@handle_unknown
def problem(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return "This node is not the primary with the leader lock."
class NodeIsLeader(PatroniResource):
def __init__(
self: "NodeIsLeader",
connection_info: ConnectionInfo,
check_is_standby_leader: bool,
self, connection_info: ConnectionInfo, check_is_standby_leader: bool
) -> None:
super().__init__(connection_info)
self.check_is_standby_leader = check_is_standby_leader
def probe(self: "NodeIsLeader") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
apiname = "leader"
if self.check_is_standby_leader:
apiname = "standby-leader"
@ -46,26 +44,23 @@ class NodeIsLeader(PatroniResource):
class NodeIsLeaderSummary(nagiosplugin.Summary):
def __init__(
self: "NodeIsLeaderSummary",
check_is_standby_leader: bool,
) -> None:
def __init__(self, check_is_standby_leader: bool) -> None:
if check_is_standby_leader:
self.leader_kind = "standby leader"
else:
self.leader_kind = "leader"
def ok(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return f"This node is a {self.leader_kind} node."
@handle_unknown
def problem(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return f"This node is not a {self.leader_kind} node."
class NodeIsReplica(PatroniResource):
def __init__(
self: "NodeIsReplica",
self,
connection_info: ConnectionInfo,
max_lag: str,
check_is_sync: bool,
@ -76,7 +71,7 @@ class NodeIsReplica(PatroniResource):
self.check_is_sync = check_is_sync
self.check_is_async = check_is_async
def probe(self: "NodeIsReplica") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
try:
if self.check_is_sync:
api_name = "synchronous"
@ -95,12 +90,7 @@ class NodeIsReplica(PatroniResource):
class NodeIsReplicaSummary(nagiosplugin.Summary):
def __init__(
self: "NodeIsReplicaSummary",
lag: str,
check_is_sync: bool,
check_is_async: bool,
) -> None:
def __init__(self, lag: str, check_is_sync: bool, check_is_async: bool) -> None:
self.lag = lag
if check_is_sync:
self.replica_kind = "synchronous replica"
@ -109,7 +99,7 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
else:
self.replica_kind = "replica"
def ok(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
if self.lag is None:
return (
f"This node is a running {self.replica_kind} with no noloadbalance tag."
@ -117,14 +107,14 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
return f"This node is a running {self.replica_kind} with no noloadbalance tag and the lag is under {self.lag}."
@handle_unknown
def problem(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
if self.lag is None:
return f"This node is not a running {self.replica_kind} with no noloadbalance tag."
return f"This node is not a running {self.replica_kind} with no noloadbalance tag and a lag under {self.lag}."
class NodeIsPendingRestart(PatroniResource):
def probe(self: "NodeIsPendingRestart") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("patroni")
is_pending_restart = item_dict.get("pending_restart", False)
@ -137,19 +127,17 @@ class NodeIsPendingRestart(PatroniResource):
class NodeIsPendingRestartSummary(nagiosplugin.Summary):
def ok(self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return "This node doesn't have the pending restart flag."
@handle_unknown
def problem(
self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result
) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return "This node has the pending restart flag."
class NodeTLHasChanged(PatroniResource):
def __init__(
self: "NodeTLHasChanged",
self,
connection_info: ConnectionInfo,
timeline: str, # Always contains the old timeline
state_file: str, # Only used to update the timeline in the state_file (when needed)
@ -160,7 +148,7 @@ class NodeTLHasChanged(PatroniResource):
self.timeline = timeline
self.save = save
def probe(self: "NodeTLHasChanged") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("patroni")
new_tl = item_dict["timeline"]
@ -193,27 +181,23 @@ class NodeTLHasChanged(PatroniResource):
class NodeTLHasChangedSummary(nagiosplugin.Summary):
def __init__(self: "NodeTLHasChangedSummary", timeline: str) -> None:
def __init__(self, timeline: str) -> None:
self.timeline = timeline
def ok(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return f"The timeline is still {self.timeline}."
@handle_unknown
def problem(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return f"The expected timeline was {self.timeline} got {results['timeline'].metric}."
class NodePatroniVersion(PatroniResource):
def __init__(
self: "NodePatroniVersion",
connection_info: ConnectionInfo,
patroni_version: str,
) -> None:
def __init__(self, connection_info: ConnectionInfo, patroni_version: str) -> None:
super().__init__(connection_info)
self.patroni_version = patroni_version
def probe(self: "NodePatroniVersion") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
item_dict = self.rest_api("patroni")
version = item_dict["patroni"]["version"]
@ -232,21 +216,21 @@ class NodePatroniVersion(PatroniResource):
class NodePatroniVersionSummary(nagiosplugin.Summary):
def __init__(self: "NodePatroniVersionSummary", patroni_version: str) -> None:
def __init__(self, patroni_version: str) -> None:
self.patroni_version = patroni_version
def ok(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return f"Patroni's version is {self.patroni_version}."
@handle_unknown
def problem(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
# FIXME find a way to make the following work, check is perf data can be strings
# return f"The expected patroni version was {self.patroni_version} got {results['patroni_version'].metric}."
return f"Patroni's version is not {self.patroni_version}."
class NodeIsAlive(PatroniResource):
def probe(self: "NodeIsAlive") -> Iterable[nagiosplugin.Metric]:
def probe(self) -> Iterable[nagiosplugin.Metric]:
try:
self.rest_api("liveness")
except APIError:
@ -255,9 +239,9 @@ class NodeIsAlive(PatroniResource):
class NodeIsAliveSummary(nagiosplugin.Summary):
def ok(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
def ok(self, results: nagiosplugin.Result) -> str:
return "This node is alive (patroni is running)."
@handle_unknown
def problem(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
def problem(self, results: nagiosplugin.Result) -> str:
return "This node is not alive (patroni is not running)."

View file

@ -1,3 +1,5 @@
import json
from functools import lru_cache
from typing import Any, Callable, List, Optional, Tuple, Union
from urllib.parse import urlparse
@ -28,11 +30,11 @@ class Parameters:
verbose: int
@attr.s(auto_attribs=True, slots=True)
@attr.s(auto_attribs=True, eq=False, slots=True)
class PatroniResource(nagiosplugin.Resource):
conn_info: ConnectionInfo
def rest_api(self: "PatroniResource", service: str) -> Any:
def rest_api(self, service: str) -> Any:
"""Try to connect to all the provided endpoints for the requested service"""
for endpoint in self.conn_info.endpoints:
cert: Optional[Union[Tuple[str, str], str]] = None
@ -71,10 +73,31 @@ class PatroniResource(nagiosplugin.Resource):
try:
return r.json()
except requests.exceptions.JSONDecodeError:
except (json.JSONDecodeError, ValueError):
return None
raise nagiosplugin.CheckError("Connection failed for all provided endpoints")
@lru_cache(maxsize=None)
def has_detailed_states(self) -> bool:
# get patroni's version to find out if the "streaming" and "in archive recovery" states are available
patroni_item_dict = self.rest_api("patroni")
if tuple(
int(v) for v in patroni_item_dict["patroni"]["version"].split(".", 2)
) >= (3, 0, 4):
_log.debug(
"Patroni's version is %(version)s, more detailed states can be used to check for the health of replicas.",
{"version": patroni_item_dict["patroni"]["version"]},
)
return True
_log.debug(
"Patroni's version is %(version)s, the running state and the timelines must be used to check for the health of replicas.",
{"version": patroni_item_dict["patroni"]["version"]},
)
return False
HandleUnknown = Callable[[nagiosplugin.Summary, nagiosplugin.Results], Any]

View file

@ -42,7 +42,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git
check_patroni works on python 3.6, we keep it that way because patroni also
supports it and there are still lots of RH 7 variants around. That being said
python 3.6 has been EOL for age and there is no support for it in the github
python 3.6 has been EOL for ages and there is no support for it in the github
CI.
## Support
@ -80,8 +80,8 @@ A match is found when: `start <= VALUE <= end`.
For example, the following command will raise:
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
```
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
@ -97,6 +97,30 @@ Several options are available:
* `--cert_file`: your certificate or the concatenation of your certificate and private key
* `--key_file`: your private key (optional)
## Shell completion
We use the [click] library which supports shell completion natively.
Shell completion can be added by typing the following command or adding it to
a file spécific to your shell of choice.
* for Bash (add to `~/.bashrc`):
```
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
```
* for Zsh (add to `~/.zshrc`):
```
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
```
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
```
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
```
Please note that shell completion is not supported far all shell versions, for
example only Bash versions older than 4.4 are supported.
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
_EOF_
readme
readme "## Cluster services"

View file

@ -1,4 +1,5 @@
[mypy]
files = .
show_error_codes = true
strict = true
exclude = build/

View file

@ -4,7 +4,7 @@ isort
flake8
mypy==0.961
pytest
pytest-mock
pytest-cov
types-requests
setuptools
tox

View file

@ -41,12 +41,12 @@ setup(
"attrs >= 17, !=21.1",
"requests",
"nagiosplugin >= 1.3.2",
"click >= 8.0.1",
"click >= 7.1",
],
extras_require={
"test": [
"pytest",
"pytest-mock",
"importlib_metadata; python_version < '3.8'",
"pytest >= 6.0.2",
],
},
entry_points={
@ -56,4 +56,3 @@ setup(
},
zip_safe=False,
)

View file

@ -0,0 +1,65 @@
import json
import logging
import shutil
from contextlib import contextmanager
from functools import partial
from http.server import HTTPServer, SimpleHTTPRequestHandler
from pathlib import Path
from typing import Any, Iterator, Mapping, Union
logger = logging.getLogger(__name__)
class PatroniAPI(HTTPServer):
def __init__(self, directory: Path, *, datadir: Path) -> None:
self.directory = directory
self.datadir = datadir
handler_cls = partial(SimpleHTTPRequestHandler, directory=str(directory))
super().__init__(("", 0), handler_cls)
def serve_forever(self, *args: Any) -> None:
logger.info(
"starting fake Patroni API at %s (directory=%s)",
self.endpoint,
self.directory,
)
return super().serve_forever(*args)
@property
def endpoint(self) -> str:
return f"http://{self.server_name}:{self.server_port}"
@contextmanager
def routes(self, mapping: Mapping[str, Union[Path, str]]) -> Iterator[None]:
"""Temporarily install specified files in served directory, thus
building "routes" from given mapping.
The 'mapping' defines target route paths as keys and files to be
installed in served directory as values. Mapping values of type 'str'
are assumed be relative file path to the 'datadir'.
"""
for route_path, fpath in mapping.items():
if isinstance(fpath, str):
fpath = self.datadir / fpath
shutil.copy(fpath, self.directory / route_path)
try:
yield None
finally:
for fname in mapping:
(self.directory / fname).unlink()
def cluster_api_set_replica_running(in_json: Path, target_dir: Path) -> Path:
# starting from 3.0.4 the state of replicas is streaming or in archive recovery
# instead of running
with in_json.open() as f:
js = json.load(f)
for node in js["members"]:
if node["role"] in ["replica", "sync_standby", "standby_leader"]:
if node["state"] in ["streaming", "in archive recovery"]:
node["state"] = "running"
assert target_dir.is_dir()
out_json = target_dir / in_json.name
with out_json.open("w") as f:
json.dump(js, f)
return out_json

View file

@ -1,12 +1,76 @@
def pytest_addoption(parser):
"""
Add CLI options to `pytest` to pass those options to the test cases.
These options are used in `pytest_generate_tests`.
"""
parser.addoption("--use-old-replica-state", action="store_true", default=False)
import logging
import sys
from pathlib import Path
from threading import Thread
from typing import Any, Iterator, Tuple
from unittest.mock import patch
if sys.version_info >= (3, 8):
from importlib.metadata import version as metadata_version
else:
from importlib_metadata import version as metadata_version
import pytest
from click.testing import CliRunner
from . import PatroniAPI
logger = logging.getLogger(__name__)
def pytest_generate_tests(metafunc):
metafunc.parametrize(
"use_old_replica_state", [metafunc.config.getoption("use_old_replica_state")]
)
def numversion(pkgname: str) -> Tuple[int, ...]:
version = metadata_version(pkgname)
return tuple(int(v) for v in version.split(".", 3))
if numversion("pytest") >= (6, 2):
TempPathFactory = pytest.TempPathFactory
else:
from _pytest.tmpdir import TempPathFactory
@pytest.fixture(scope="session", autouse=True)
def nagioplugin_runtime_stdout() -> Iterator[None]:
# work around https://github.com/mpounsett/nagiosplugin/issues/24 when
# nagiosplugin is older than 1.3.3
if numversion("nagiosplugin") < (1, 3, 3):
target = "nagiosplugin.runtime.Runtime.stdout"
with patch(target, None):
logger.warning("patching %r", target)
yield None
else:
yield None
@pytest.fixture(
params=[False, True],
ids=lambda v: "new-replica-state" if v else "old-replica-state",
)
def old_replica_state(request: Any) -> Any:
return request.param
@pytest.fixture(scope="session")
def datadir() -> Path:
return Path(__file__).parent / "json"
@pytest.fixture(scope="session")
def patroni_api(
tmp_path_factory: TempPathFactory, datadir: Path
) -> Iterator[PatroniAPI]:
"""A fake HTTP server for the Patroni API serving files from a temporary
directory.
"""
httpd = PatroniAPI(tmp_path_factory.mktemp("api"), datadir=datadir)
t = Thread(target=httpd.serve_forever)
t.start()
yield httpd
httpd.shutdown()
t.join()
@pytest.fixture
def runner() -> CliRunner:
"""A CliRunner with stdout and stderr not mixed."""
return CliRunner(mix_stderr=False)

View file

@ -0,0 +1,33 @@
{
"members": [
{
"name": "srv1",
"role": "standby_leader",
"state": "stopped",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,
"timeline": 51
},
{
"name": "srv2",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,
"timeline": 51,
"lag": 0
},
{
"name": "srv3",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.5:8008/patroni",
"host": "10.20.199.5",
"port": 5432,
"timeline": 51,
"lag": 0
}
]
}

View file

@ -0,0 +1,33 @@
{
"members": [
{
"name": "srv1",
"role": "standby_leader",
"state": "in archive recovery",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,
"timeline": 51
},
{
"name": "srv2",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,
"timeline": 51,
"lag": 0
},
{
"name": "srv3",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.5:8008/patroni",
"host": "10.20.199.5",
"port": 5432,
"timeline": 51,
"lag": 0
}
]
}

View file

@ -3,7 +3,7 @@
{
"name": "srv1",
"role": "standby_leader",
"state": "running",
"state": "streaming",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,

View file

@ -0,0 +1,35 @@
{
"members": [
{
"name": "srv1",
"role": "replica",
"state": "running",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,
"timeline": 51,
"lag": 0
},
{
"name": "srv2",
"role": "replica",
"state": "running",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,
"timeline": 51,
"lag": 0
},
{
"name": "srv3",
"role": "replica",
"state": "running",
"api_url": "https://10.20.199.5:8008/patroni",
"host": "10.20.199.5",
"port": 5432,
"timeline": 51,
"lag": 0
}
]
}

View file

@ -0,0 +1,33 @@
{
"members": [
{
"name": "srv1",
"role": "leader",
"state": "running",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,
"timeline": 51
},
{
"name": "srv2",
"role": "replica",
"state": "running",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,
"timeline": 50,
"lag": 1000000
},
{
"name": "srv3",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.5:8008/patroni",
"host": "10.20.199.5",
"port": 5432,
"timeline": 51,
"lag": 0
}
]
}

View file

@ -12,7 +12,7 @@
{
"name": "srv2",
"role": "replica",
"state": "streaming",
"state": "in archive recovery",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,

View file

@ -0,0 +1,26 @@
{
"state": "running",
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
"role": "master",
"server_version": 110012,
"cluster_unlocked": false,
"xlog": {
"location": 1174407088
},
"timeline": 51,
"replication": [
{
"usename": "replicator",
"application_name": "srv1",
"client_addr": "10.20.199.3",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"database_system_identifier": "6965971025273547206",
"patroni": {
"version": "3.0.0",
"scope": "patroni-demo"
}
}

View file

@ -0,0 +1,26 @@
{
"state": "running",
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
"role": "master",
"server_version": 110012,
"cluster_unlocked": false,
"xlog": {
"location": 1174407088
},
"timeline": 51,
"replication": [
{
"usename": "replicator",
"application_name": "srv1",
"client_addr": "10.20.199.3",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"database_system_identifier": "6965971025273547206",
"patroni": {
"version": "3.1.0",
"scope": "patroni-demo"
}
}

View file

@ -0,0 +1,33 @@
{
"members": [
{
"name": "srv1",
"role": "standby_leader",
"state": "in archive recovery",
"api_url": "https://10.20.199.3:8008/patroni",
"host": "10.20.199.3",
"port": 5432,
"timeline": 51
},
{
"name": "srv2",
"role": "replica",
"state": "in archive recovery",
"api_url": "https://10.20.199.4:8008/patroni",
"host": "10.20.199.4",
"port": 5432,
"timeline": 51,
"lag": 0
},
{
"name": "srv3",
"role": "replica",
"state": "streaming",
"api_url": "https://10.20.199.5:8008/patroni",
"host": "10.20.199.5",
"port": 5432,
"timeline": 51,
"lag": 0
}
]
}

View file

@ -1,30 +1,20 @@
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_api_status_code_200(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_pending_restart_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
)
def test_api_status_code_200(runner: CliRunner, patroni_api: PatroniAPI) -> None:
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
)
assert result.exit_code == 0
def test_api_status_code_404(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "Fake test", 404)
def test_api_status_code_404(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
)
assert result.exit_code == 3

View file

@ -1,23 +1,29 @@
from pathlib import Path
from typing import Iterator
import nagiosplugin
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import here, my_mock
from . import PatroniAPI
@pytest.fixture(scope="module", autouse=True)
def cluster_config_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
with patroni_api.routes({"config": "cluster_config_has_changed.json"}):
yield None
def test_cluster_config_has_changed_ok_with_hash(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_config_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--hash",
"96b12d82571473d13e890b893734e731",
@ -31,22 +37,20 @@ def test_cluster_config_has_changed_ok_with_hash(
def test_cluster_config_has_changed_ok_with_state_file(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
runner = CliRunner()
with open(here / "cluster_config_has_changed.state_file", "w") as f:
state_file = tmp_path / "cluster_config_has_changed.state_file"
with state_file.open("w") as f:
f.write('{"hash": "96b12d82571473d13e890b893734e731"}')
my_mock(mocker, "cluster_config_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--state-file",
str(here / "cluster_config_has_changed.state_file"),
str(state_file),
],
)
assert result.exit_code == 0
@ -57,16 +61,13 @@ def test_cluster_config_has_changed_ok_with_state_file(
def test_cluster_config_has_changed_ko_with_hash(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_config_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--hash",
"96b12d82571473d13e890b8937ffffff",
@ -80,24 +81,21 @@ def test_cluster_config_has_changed_ko_with_hash(
def test_cluster_config_has_changed_ko_with_state_file_and_save(
mocker: MockerFixture,
use_old_replica_state: bool,
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
runner = CliRunner()
with open(here / "cluster_config_has_changed.state_file", "w") as f:
state_file = tmp_path / "cluster_config_has_changed.state_file"
with state_file.open("w") as f:
f.write('{"hash": "96b12d82571473d13e890b8937ffffff"}')
my_mock(mocker, "cluster_config_has_changed", 200)
# test without saving the new hash
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--state-file",
str(here / "cluster_config_has_changed.state_file"),
str(state_file),
],
)
assert result.exit_code == 2
@ -106,7 +104,8 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
)
cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
state_file = tmp_path / "cluster_config_has_changed.state_file"
cookie = nagiosplugin.Cookie(state_file)
cookie.open()
new_config_hash = cookie.get("hash")
cookie.close()
@ -118,10 +117,10 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--state-file",
str(here / "cluster_config_has_changed.state_file"),
str(state_file),
"--save",
],
)
@ -131,7 +130,7 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
)
cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
cookie = nagiosplugin.Cookie(state_file)
cookie.open()
new_config_hash = cookie.get("hash")
cookie.close()
@ -140,22 +139,20 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
def test_cluster_config_has_changed_params(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
runner = CliRunner()
my_mock(mocker, "cluster_config_has_changed", 200)
fake_state_file = tmp_path / "fake_file_name.state_file"
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_config_has_changed",
"--hash",
"640df9f0211c791723f18fc3ed9dbb95",
"--state-file",
str(here / "fake_file_name.state_file"),
str(fake_state_file),
],
)
assert result.exit_code == 3

View file

@ -1,54 +1,139 @@
from pathlib import Path
from typing import Iterator, Union
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI, cluster_api_set_replica_running
def test_cluster_has_leader_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.fixture
def cluster_has_leader_ok(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_leader_ok.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
my_mock(mocker, "cluster_has_leader_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
)
assert result.exit_code == 0
@pytest.mark.usefixtures("cluster_has_leader_ok")
def test_cluster_has_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
assert (
result.stdout
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=1 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
)
assert result.exit_code == 0
@pytest.fixture
def cluster_has_leader_ok_standby_leader(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_leader_ok_standby_leader.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_leader_ok_standby_leader")
def test_cluster_has_leader_ok_standby_leader(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_leader_ok_standby_leader", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
)
assert result.exit_code == 0
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
assert (
result.stdout
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
)
assert result.exit_code == 0
def test_cluster_has_leader_ko(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.fixture
def cluster_has_leader_ko(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_leader_ko.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
my_mock(mocker, "cluster_has_leader_ko", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
@pytest.mark.usefixtures("cluster_has_leader_ko")
def test_cluster_has_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
assert (
result.stdout
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_has_leader_ko_standby_leader(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_leader_ko_standby_leader.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader")
def test_cluster_has_leader_ko_standby_leader(
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
assert (
result.stdout
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader. | has_leader=0;;@0\n"
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_has_leader_ko_standby_leader_archiving(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = (
"cluster_has_leader_ko_standby_leader_archiving.json"
)
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader_archiving")
def test_cluster_has_leader_ko_standby_leader_archiving(
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
if old_replica_state:
assert (
result.stdout
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
)
assert result.exit_code == 0
else:
assert (
result.stdout
== "CLUSTERHASLEADER WARNING - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=1;@1:1\n"
)
assert result.exit_code == 1

View file

@ -1,39 +1,46 @@
from pathlib import Path
from typing import Iterator, Union
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI, cluster_api_set_replica_running
# TODO Lag threshold tests
def test_cluster_has_relica_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.fixture
def cluster_has_replica_ok(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ok.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_replica"]
)
assert result.exit_code == 0
@pytest.mark.usefixtures("cluster_has_replica_ok")
def test_cluster_has_relica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_replica"])
assert (
result.stdout
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
)
assert result.exit_code == 0
@pytest.mark.usefixtures("cluster_has_replica_ok")
def test_cluster_has_replica_ok_with_count_thresholds(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
@ -41,48 +48,56 @@ def test_cluster_has_replica_ok_with_count_thresholds(
"@0",
],
)
assert result.exit_code == 0
assert (
result.stdout
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
)
assert result.exit_code == 0
@pytest.mark.usefixtures("cluster_has_replica_ok")
def test_cluster_has_replica_ok_with_sync_count_thresholds(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--sync-warning",
"1:",
],
)
assert result.exit_code == 0
assert (
result.stdout
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1;1: unhealthy_replica=0\n"
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1;1: unhealthy_replica=0\n"
)
assert result.exit_code == 0
@pytest.fixture
def cluster_has_replica_ok_lag(
patroni_api: PatroniAPI, datadir: Path, tmp_path: Path, old_replica_state: bool
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ok_lag.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_replica_ok_lag")
def test_cluster_has_replica_ok_with_count_thresholds_lag(
mocker: MockerFixture,
use_old_replica_state: bool,
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ok_lag", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
@ -92,24 +107,35 @@ def test_cluster_has_replica_ok_with_count_thresholds_lag(
"1MB",
],
)
assert result.exit_code == 0
assert (
result.stdout
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=0\n"
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=0\n"
)
assert result.exit_code == 0
@pytest.fixture
def cluster_has_replica_ko(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ko.json"
patroni_path: Union[str, Path] = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_replica_ko")
def test_cluster_has_replica_ko_with_count_thresholds(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
@ -117,24 +143,22 @@ def test_cluster_has_replica_ko_with_count_thresholds(
"@0",
],
)
assert result.exit_code == 1
assert (
result.stdout
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=1\n"
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
)
assert result.exit_code == 1
@pytest.mark.usefixtures("cluster_has_replica_ko")
def test_cluster_has_replica_ko_with_sync_count_thresholds(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--sync-warning",
"2:",
@ -142,25 +166,36 @@ def test_cluster_has_replica_ko_with_sync_count_thresholds(
"1:",
],
)
assert result.exit_code == 2
# The lag on srv2 is "unknown". We don't handle string in perfstats so we have to scratch all the second node stats
assert (
result.stdout
== "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 sync_replica=0;2:;1: unhealthy_replica=1\n"
== "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0;2:;1: unhealthy_replica=1\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_has_replica_ko_lag(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ko_lag.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_replica_ko_lag")
def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
mocker: MockerFixture,
use_old_replica_state: bool,
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_replica_ko_lag", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
@ -170,8 +205,84 @@ def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
"1MB",
],
)
assert result.exit_code == 2
assert (
result.stdout
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv3_lag=20000000 srv3_sync=0 sync_replica=0 unhealthy_replica=2\n"
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv2_timeline=51 srv3_lag=20000000 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=2\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_has_replica_ko_wrong_tl(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ko_wrong_tl.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_replica_ko_wrong_tl")
def test_cluster_has_replica_ko_wrong_tl(
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
result = runner.invoke(
main,
[
"-e",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
"--critical",
"@0",
"--max-lag",
"1MB",
],
)
assert (
result.stdout
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv2_lag=1000000 srv2_sync=0 srv2_timeline=50 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
)
assert result.exit_code == 1
@pytest.fixture
def cluster_has_replica_ko_all_replica(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_has_replica_ko_all_replica.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_has_replica_ko_all_replica")
def test_cluster_has_replica_ko_all_replica(
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
result = runner.invoke(
main,
[
"-e",
patroni_api.endpoint,
"cluster_has_replica",
"--warning",
"@1",
"--critical",
"@0",
"--max-lag",
"1MB",
],
)
assert (
result.stdout
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv1_lag=0 srv1_sync=0 srv1_timeline=51 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=3\n"
)
assert result.exit_code == 2

View file

@ -1,20 +1,17 @@
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_cluster_has_scheduled_action_ok(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_scheduled_action_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
)
with patroni_api.routes({"cluster": "cluster_has_scheduled_action_ok.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
)
assert result.exit_code == 0
assert (
result.stdout
@ -23,14 +20,14 @@ def test_cluster_has_scheduled_action_ok(
def test_cluster_has_scheduled_action_ko_switchover(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_scheduled_action_ko_switchover", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
)
with patroni_api.routes(
{"cluster": "cluster_has_scheduled_action_ko_switchover.json"}
):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
)
assert result.exit_code == 2
assert (
result.stdout
@ -39,14 +36,14 @@ def test_cluster_has_scheduled_action_ko_switchover(
def test_cluster_has_scheduled_action_ko_restart(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_has_scheduled_action_ko_restart", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
)
with patroni_api.routes(
{"cluster": "cluster_has_scheduled_action_ko_restart.json"}
):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
)
assert result.exit_code == 2
assert (
result.stdout

View file

@ -1,20 +1,17 @@
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_cluster_is_in_maintenance_ok(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_is_in_maintenance_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
)
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ok.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
)
assert result.exit_code == 0
assert (
result.stdout
@ -23,14 +20,12 @@ def test_cluster_is_in_maintenance_ok(
def test_cluster_is_in_maintenance_ko(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_is_in_maintenance_ko", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
)
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ko.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
)
assert result.exit_code == 2
assert (
result.stdout
@ -39,14 +34,14 @@ def test_cluster_is_in_maintenance_ko(
def test_cluster_is_in_maintenance_ok_pause_false(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_is_in_maintenance_ok_pause_false", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
)
with patroni_api.routes(
{"cluster": "cluster_is_in_maintenance_ok_pause_false.json"}
):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
)
assert result.exit_code == 0
assert (
result.stdout

View file

@ -1,22 +1,33 @@
from pathlib import Path
from typing import Iterator, Union
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI, cluster_api_set_replica_running
@pytest.fixture
def cluster_node_count_ok(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_ok.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_ok")
def test_cluster_node_count_ok(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "cluster_node_count"]
)
assert result.exit_code == 0
if use_old_replica_state:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_node_count"])
if old_replica_state:
assert (
result.output
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=3\n"
@ -26,19 +37,18 @@ def test_cluster_node_count_ok(
result.output
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
)
assert result.exit_code == 0
@pytest.mark.usefixtures("cluster_node_count_ok")
def test_cluster_node_count_ok_with_thresholds(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_node_count",
"--warning",
"@0:1",
@ -50,8 +60,7 @@ def test_cluster_node_count_ok_with_thresholds(
"@0:1",
],
)
assert result.exit_code == 0
if use_old_replica_state:
if old_replica_state:
assert (
result.output
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=3\n"
@ -61,19 +70,31 @@ def test_cluster_node_count_ok_with_thresholds(
result.output
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
)
assert result.exit_code == 0
@pytest.fixture
def cluster_node_count_healthy_warning(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_healthy_warning.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_healthy_warning")
def test_cluster_node_count_healthy_warning(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_healthy_warning", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_node_count",
"--healthy-warning",
"@2",
@ -81,8 +102,7 @@ def test_cluster_node_count_healthy_warning(
"@0:1",
],
)
assert result.exit_code == 1
if use_old_replica_state:
if old_replica_state:
assert (
result.output
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=2\n"
@ -92,19 +112,31 @@ def test_cluster_node_count_healthy_warning(
result.output
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
)
assert result.exit_code == 1
@pytest.fixture
def cluster_node_count_healthy_critical(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_healthy_critical.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_healthy_critical")
def test_cluster_node_count_healthy_critical(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_healthy_critical", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_node_count",
"--healthy-warning",
"@2",
@ -112,24 +144,35 @@ def test_cluster_node_count_healthy_critical(
"@0:1",
],
)
assert result.exit_code == 2
assert (
result.output
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_leader=1 role_replica=2 state_running=1 state_start_failed=2\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_node_count_warning(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_warning.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_warning")
def test_cluster_node_count_warning(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_warning", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_node_count",
"--warning",
"@2",
@ -137,8 +180,7 @@ def test_cluster_node_count_warning(
"@0:1",
],
)
assert result.exit_code == 1
if use_old_replica_state:
if old_replica_state:
assert (
result.stdout
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=2\n"
@ -148,19 +190,31 @@ def test_cluster_node_count_warning(
result.stdout
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
)
assert result.exit_code == 1
@pytest.fixture
def cluster_node_count_critical(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_critical.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_critical")
def test_cluster_node_count_critical(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "cluster_node_count_critical", 200, use_old_replica_state)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"cluster_node_count",
"--warning",
"@2",
@ -168,8 +222,51 @@ def test_cluster_node_count_critical(
"@0:1",
],
)
assert result.exit_code == 2
assert (
result.stdout
== "CLUSTERNODECOUNT CRITICAL - members is 1 (outside range @0:1) | healthy_members=1 members=1;@2;@1 role_leader=1 state_running=1\n"
)
assert result.exit_code == 2
@pytest.fixture
def cluster_node_count_ko_in_archive_recovery(
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
) -> Iterator[None]:
cluster_path: Union[str, Path] = "cluster_node_count_ko_in_archive_recovery.json"
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
if old_replica_state:
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
yield None
@pytest.mark.usefixtures("cluster_node_count_ko_in_archive_recovery")
def test_cluster_node_count_ko_in_archive_recovery(
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
) -> None:
result = runner.invoke(
main,
[
"-e",
patroni_api.endpoint,
"cluster_node_count",
"--healthy-warning",
"@2",
"--healthy-critical",
"@0:1",
],
)
if old_replica_state:
assert (
result.stdout
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_running=3\n"
)
assert result.exit_code == 0
else:
assert (
result.stdout
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_in_archive_recovery=2 state_streaming=1\n"
)
assert result.exit_code == 2

View file

@ -1,16 +1,19 @@
from pathlib import Path
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, None, 200)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
def test_node_is_alive_ok(
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
liveness = tmp_path / "liveness"
liveness.touch()
with patroni_api.routes({"liveness": liveness}):
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
assert result.exit_code == 0
assert (
result.stdout
@ -18,11 +21,8 @@ def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) ->
)
def test_node_is_alive_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, None, 404)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
def test_node_is_alive_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
assert result.exit_code == 2
assert (
result.stdout

View file

@ -1,28 +1,37 @@
from typing import Iterator
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
@pytest.fixture
def node_is_leader_ok(patroni_api: PatroniAPI) -> Iterator[None]:
with patroni_api.routes(
{
"leader": "node_is_leader_ok.json",
"standby-leader": "node_is_leader_ok_standby_leader.json",
}
):
yield None
my_mock(mocker, "node_is_leader_ok", 200)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
@pytest.mark.usefixtures("node_is_leader_ok")
def test_node_is_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
assert result.exit_code == 0
assert (
result.stdout
== "NODEISLEADER OK - This node is a leader node. | is_leader=1;;@0\n"
)
my_mock(mocker, "node_is_leader_ok_standby_leader", 200)
result = runner.invoke(
main,
["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
)
print(result.stdout)
assert result.exit_code == 0
assert (
result.stdout
@ -30,21 +39,17 @@ def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -
)
def test_node_is_leader_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_leader_ko", 503)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
def test_node_is_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
assert result.exit_code == 2
assert (
result.stdout
== "NODEISLEADER CRITICAL - This node is not a leader node. | is_leader=0;;@0\n"
)
my_mock(mocker, "node_is_leader_ko_standby_leader", 503)
result = runner.invoke(
main,
["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
)
assert result.exit_code == 2
assert (

View file

@ -1,20 +1,15 @@
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_is_pending_restart_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_pending_restart_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
)
def test_node_is_pending_restart_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
)
assert result.exit_code == 0
assert (
result.stdout
@ -22,15 +17,11 @@ def test_node_is_pending_restart_ok(
)
def test_node_is_pending_restart_ko(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_pending_restart_ko", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
)
def test_node_is_pending_restart_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
with patroni_api.routes({"patroni": "node_is_pending_restart_ko.json"}):
result = runner.invoke(
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
)
assert result.exit_code == 2
assert (
result.stdout

View file

@ -1,16 +1,13 @@
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_primary_ok", 200)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
def test_node_is_primary_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
with patroni_api.routes({"primary": "node_is_primary_ok.json"}):
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
assert result.exit_code == 0
assert (
result.stdout
@ -18,11 +15,8 @@ def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool)
)
def test_node_is_primary_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_primary_ko", 503)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
def test_node_is_primary_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
assert result.exit_code == 2
assert (
result.stdout

View file

@ -1,16 +1,27 @@
from typing import Iterator
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
@pytest.fixture
def node_is_replica_ok(patroni_api: PatroniAPI) -> Iterator[None]:
with patroni_api.routes(
{
k: "node_is_replica_ok.json"
for k in ("replica", "synchronous", "asynchronous")
}
):
yield None
my_mock(mocker, "node_is_replica_ok", 200)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
@pytest.mark.usefixtures("node_is_replica_ok")
def test_node_is_replica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
assert result.exit_code == 0
assert (
result.stdout
@ -18,11 +29,8 @@ def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool)
)
def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
runner = CliRunner()
my_mock(mocker, "node_is_replica_ko", 503)
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
def test_node_is_replica_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
assert result.exit_code == 2
assert (
result.stdout
@ -30,15 +38,10 @@ def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool)
)
def test_node_is_replica_ko_lag(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
def test_node_is_replica_ko_lag(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 503)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--max-lag", "100"]
main, ["-e", patroni_api.endpoint, "node_is_replica", "--max-lag", "100"]
)
assert result.exit_code == 2
assert (
@ -46,12 +49,11 @@ def test_node_is_replica_ko_lag(
== "NODEISREPLICA CRITICAL - This node is not a running replica with no noloadbalance tag and a lag under 100. | is_replica=0;;@0\n"
)
my_mock(mocker, "node_is_replica_ok", 503)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_is_replica",
"--is-async",
"--max-lag",
@ -65,15 +67,11 @@ def test_node_is_replica_ko_lag(
)
def test_node_is_replica_sync_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.mark.usefixtures("node_is_replica_ok")
def test_node_is_replica_sync_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
)
assert result.exit_code == 0
assert (
@ -82,15 +80,10 @@ def test_node_is_replica_sync_ok(
)
def test_node_is_replica_sync_ko(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
def test_node_is_replica_sync_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 503)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
)
assert result.exit_code == 2
assert (
@ -99,15 +92,11 @@ def test_node_is_replica_sync_ko(
)
def test_node_is_replica_async_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.mark.usefixtures("node_is_replica_ok")
def test_node_is_replica_async_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 200)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
)
assert result.exit_code == 0
assert (
@ -116,15 +105,10 @@ def test_node_is_replica_async_ok(
)
def test_node_is_replica_async_ko(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
def test_node_is_replica_async_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 503)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
)
assert result.exit_code == 2
assert (
@ -133,18 +117,14 @@ def test_node_is_replica_async_ko(
)
def test_node_is_replica_params(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.mark.usefixtures("node_is_replica_ok")
def test_node_is_replica_params(runner: CliRunner, patroni_api: PatroniAPI) -> None:
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_is_replica",
"--is-async",
"--is-sync",
@ -157,12 +137,11 @@ def test_node_is_replica_params(
)
# We don't do the check ourselves, patroni does it and changes the return code
my_mock(mocker, "node_is_replica_ok", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_is_replica",
"--is-sync",
"--max-lag",

View file

@ -1,22 +1,25 @@
from typing import Iterator
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import my_mock
from . import PatroniAPI
def test_node_patroni_version_ok(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
@pytest.fixture(scope="module", autouse=True)
def node_patroni_version(patroni_api: PatroniAPI) -> Iterator[None]:
with patroni_api.routes({"patroni": "node_patroni_version.json"}):
yield None
my_mock(mocker, "node_patroni_version", 200)
def test_node_patroni_version_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_patroni_version",
"--patroni-version",
"2.0.2",
@ -29,17 +32,12 @@ def test_node_patroni_version_ok(
)
def test_node_patroni_version_ko(
mocker: MockerFixture, use_old_replica_state: bool
) -> None:
runner = CliRunner()
my_mock(mocker, "node_patroni_version", 200)
def test_node_patroni_version_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_patroni_version",
"--patroni-version",
"1.0.0",

View file

@ -1,23 +1,30 @@
from pathlib import Path
from typing import Iterator
import nagiosplugin
import pytest
from click.testing import CliRunner
from pytest_mock import MockerFixture
from check_patroni.cli import main
from .tools import here, my_mock
from . import PatroniAPI
@pytest.fixture
def node_tl_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
with patroni_api.routes({"patroni": "node_tl_has_changed.json"}):
yield None
@pytest.mark.usefixtures("node_tl_has_changed")
def test_node_tl_has_changed_ok_with_timeline(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "node_tl_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--timeline",
"58",
@ -30,23 +37,22 @@ def test_node_tl_has_changed_ok_with_timeline(
)
@pytest.mark.usefixtures("node_tl_has_changed")
def test_node_tl_has_changed_ok_with_state_file(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
runner = CliRunner()
with open(here / "node_tl_has_changed.state_file", "w") as f:
state_file = tmp_path / "node_tl_has_changed.state_file"
with state_file.open("w") as f:
f.write('{"timeline": 58}')
my_mock(mocker, "node_tl_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--state-file",
str(here / "node_tl_has_changed.state_file"),
str(state_file),
],
)
assert result.exit_code == 0
@ -56,17 +62,15 @@ def test_node_tl_has_changed_ok_with_state_file(
)
@pytest.mark.usefixtures("node_tl_has_changed")
def test_node_tl_has_changed_ko_with_timeline(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI
) -> None:
runner = CliRunner()
my_mock(mocker, "node_tl_has_changed", 200)
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--timeline",
"700",
@ -79,24 +83,23 @@ def test_node_tl_has_changed_ko_with_timeline(
)
@pytest.mark.usefixtures("node_tl_has_changed")
def test_node_tl_has_changed_ko_with_state_file_and_save(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
runner = CliRunner()
with open(here / "node_tl_has_changed.state_file", "w") as f:
state_file = tmp_path / "node_tl_has_changed.state_file"
with state_file.open("w") as f:
f.write('{"timeline": 700}')
my_mock(mocker, "node_tl_has_changed", 200)
# test without saving the new tl
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--state-file",
str(here / "node_tl_has_changed.state_file"),
str(state_file),
],
)
assert result.exit_code == 2
@ -105,7 +108,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
)
cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
cookie = nagiosplugin.Cookie(state_file)
cookie.open()
new_tl = cookie.get("timeline")
cookie.close()
@ -117,10 +120,10 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--state-file",
str(here / "node_tl_has_changed.state_file"),
str(state_file),
"--save",
],
)
@ -130,7 +133,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
)
cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
cookie = nagiosplugin.Cookie(state_file)
cookie.open()
new_tl = cookie.get("timeline")
cookie.close()
@ -138,23 +141,22 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
assert new_tl == 58
@pytest.mark.usefixtures("node_tl_has_changed")
def test_node_tl_has_changed_params(
mocker: MockerFixture, use_old_replica_state: bool
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
) -> None:
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
runner = CliRunner()
my_mock(mocker, "node_tl_has_changed", 200)
fake_state_file = tmp_path / "fake_file_name.state_file"
result = runner.invoke(
main,
[
"-e",
"https://10.20.199.3:8008",
patroni_api.endpoint,
"node_tl_has_changed",
"--timeline",
"58",
"--state-file",
str(here / "fake_file_name.state_file"),
str(fake_state_file),
],
)
assert result.exit_code == 3
@ -163,9 +165,7 @@ def test_node_tl_has_changed_params(
== "NODETLHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --timeline or --state-file should be provided for this service\n"
)
result = runner.invoke(
main, ["-e", "https://10.20.199.3:8008", "node_tl_has_changed"]
)
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_tl_has_changed"])
assert result.exit_code == 3
assert (
result.stdout

View file

@ -1,49 +0,0 @@
import json
import pathlib
from typing import Any
from pytest_mock import MockerFixture
from check_patroni.types import APIError, PatroniResource
here = pathlib.Path(__file__).parent
def getjson(name: str) -> Any:
path = here / "json" / f"{name}.json"
if not path.exists():
raise Exception(f"path does not exist : {path}")
with path.open() as f:
return json.load(f)
def my_mock(
mocker: MockerFixture,
json_file: str,
status: int,
use_old_replica_state: bool = False,
) -> None:
def mock_rest_api(self: PatroniResource, service: str) -> Any:
if status != 200:
raise APIError("Test en erreur pour status code 200")
if json_file:
if use_old_replica_state and (
json_file.startswith("cluster_has_replica")
or json_file.startswith("cluster_node_count")
):
return cluster_api_set_replica_running(getjson(json_file))
return getjson(json_file)
return None
mocker.resetall()
mocker.patch("check_patroni.types.PatroniResource.rest_api", mock_rest_api)
def cluster_api_set_replica_running(js: Any) -> Any:
# starting from 3.0.4 the state of replicas is streaming instead of running
for node in js["members"]:
if node["role"] in ["replica", "sync_standby"]:
if node["state"] == "streaming":
node["state"] = "running"
return js

10
tox.ini
View file

@ -4,11 +4,9 @@ envlist = lint, mypy, py{37,38,39,310,311}
skip_missing_interpreters = True
[testenv]
deps =
pytest
pytest-mock
extras = test
commands =
pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv}
pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv --log-level=debug}
[testenv:lint]
skip_install = True
@ -18,7 +16,7 @@ deps =
flake8
isort
commands =
codespell {toxinidir}/check_patroni {toxinidir}/tests
codespell {toxinidir}/check_patroni {toxinidir}/tests {toxinidir}/docs/ {toxinidir}/RELEASE.md {toxinidir}/CONTRIBUTING.md
black --check --diff {toxinidir}/check_patroni {toxinidir}/tests
flake8 {toxinidir}/check_patroni {toxinidir}/tests
isort --check --diff {toxinidir}/check_patroni {toxinidir}/tests
@ -28,7 +26,7 @@ deps =
mypy == 0.961
commands =
# we need to install types-requests
mypy --install-types --non-interactive {toxinidir}/check_patroni
mypy --install-types --non-interactive
[testenv:build]
deps =

View file

@ -100,7 +100,7 @@ http://$IP/icingaweb2/setup
Finish
* Screen 15: Hopefuly success
* Screen 15: Hopefully success
Login

View file

@ -66,7 +66,7 @@ icinga_setup(){
info "# Icinga setup"
info "#============================================================================="
## this part is already done by the standart icinga install with the user icinga2
## this part is already done by the standard icinga install with the user icinga2
## and a random password, here we dont really care
cat << __EOF__ | sudo -u postgres psql
@ -83,7 +83,7 @@ __EOF__
icingacli setup config directory --group icingaweb2
icingacli setup token create
## this part is already done by the standart icinga install with the user icinga2
## this part is already done by the standard icinga install with the user icinga2
cat << __EOF__ > /etc/icinga2/features-available/ido-pgsql.conf
/**
* The db_ido_pgsql library implements IDO functionality
@ -198,7 +198,7 @@ grafana(){
cat << __EOF__ > /etc/grafana/grafana.ini
[database]
# You can configure the database connection by specifying type, host, name, user and password
# as seperate properties or as on string using the url propertie.
# as separate properties or as on string using the url property.
# Either "mysql", "postgres" or "sqlite3", it's your choice
type = postgres