New upstream version 2.0.0
This commit is contained in:
commit
c52e34116d
3
.coveragerc
Normal file
3
.coveragerc
Normal file
|
@ -0,0 +1,3 @@
|
|||
[run]
|
||||
include =
|
||||
check_patroni/*
|
2
.gitignore
vendored
2
.gitignore
vendored
|
@ -1,10 +1,10 @@
|
|||
__pycache__/
|
||||
check_patroni.egg-info
|
||||
tests/*.state_file
|
||||
tests/config.ini
|
||||
vagrant/.vagrant
|
||||
vagrant/*.state_file
|
||||
.*.swp
|
||||
.coverage
|
||||
.venv/
|
||||
.tox/
|
||||
dist/
|
||||
|
|
26
CHANGELOG.md
26
CHANGELOG.md
|
@ -1,13 +1,37 @@
|
|||
# Change log
|
||||
|
||||
## Unreleased
|
||||
## check_patroni 2.0.0 - 2024-04-09
|
||||
|
||||
### Changed
|
||||
|
||||
* In `cluster_node_count`, a healthy standby, sync replica or standby leaders cannot be "in
|
||||
archive recovery" because this service doesn't check for lag and timelines.
|
||||
|
||||
### Added
|
||||
|
||||
* Add the timeline in the `cluster_has_replica` perfstats. (#50)
|
||||
* Add a mention about shell completion support and shell versions in the doc. (#53)
|
||||
* Add the leader type and whether it's archiving to the `cluster_has_leader` perfstats. (#58)
|
||||
|
||||
### Fixed
|
||||
|
||||
* Add compatibility with [requests](https://requests.readthedocs.io)
|
||||
version 2.25 and higher.
|
||||
* Fix what `cluster_has_replica` deems a healthy replica. (#50, reported by @mbanck)
|
||||
* Fix `cluster_has_replica` to display perfstats for replicas whenever it's possible (healthy or not). (#50)
|
||||
* Fix `cluster_has_leader` to correctly check for standby leaders. (#58, reported by @mbanck)
|
||||
* Fix `cluster_node_count` to correctly manage replication states. (#50, reported by @mbanck)
|
||||
|
||||
### Misc
|
||||
|
||||
* Improve the documentation for `node_is_replica`.
|
||||
* Improve test coverage by running an HTTP server to fake the Patroni API (#55
|
||||
by @dlax).
|
||||
* Work around old pytest versions in type annotations in the test suite.
|
||||
* Declare compatibility with click version 7.1 (or higher).
|
||||
* In tests, work around nagiosplugin 1.3.2 not properly handling stdout
|
||||
redirection.
|
||||
|
||||
## check_patroni 1.0.0 - 2023-08-28
|
||||
|
||||
Check patroni is now tagged as Production/Stable.
|
||||
|
|
|
@ -43,15 +43,14 @@ A vagrant file can be found in [this
|
|||
repository](https://github.com/ioguix/vagrant-patroni) to generate a patroni/etcd
|
||||
setup.
|
||||
|
||||
The `README.md` can be geneated with `./docs/make_readme.sh`.
|
||||
The `README.md` can be generated with `./docs/make_readme.sh`.
|
||||
|
||||
## Executing Tests
|
||||
|
||||
Crafting repeatable tests using a live Patroni cluster can be intricate. To
|
||||
simplify the development process, interactions with Patroni's API are
|
||||
substituted with a mock function that yields an HTTP return code and a JSON
|
||||
object outlining the cluster's status. The JSON files containing this
|
||||
information are housed in the `./tests/json` directory.
|
||||
simplify the development process, a fake HTTP server is set up as a test
|
||||
fixture and serves static files (either from `tests/json` directory or from
|
||||
in-memory data).
|
||||
|
||||
An important consideration is that there is a potential drawback: if the JSON
|
||||
data is incorrect or if modifications have been made to Patroni without
|
||||
|
@ -61,21 +60,15 @@ erroneously.
|
|||
The tests are executed automatically for each PR using the ci (see
|
||||
`.github/workflow/lint.yml` and `.github/workflow/tests.yml`).
|
||||
|
||||
Running the tests manually:
|
||||
Running the tests,
|
||||
|
||||
* Using patroni's nominal replica state of `streaming` (since v3.0.4):
|
||||
* manually:
|
||||
|
||||
```bash
|
||||
pytest ./tests
|
||||
pytest --cov tests
|
||||
```
|
||||
|
||||
* Using patroni's nominal replica state of `running` (before v3.0.4):
|
||||
|
||||
```bash
|
||||
pytest --use-old-replica-state ./tests
|
||||
```
|
||||
|
||||
* Using tox:
|
||||
* or using tox:
|
||||
|
||||
```bash
|
||||
tox -e lint # mypy + flake8 + black + isort ° codespell
|
||||
|
@ -83,9 +76,9 @@ Running the tests manually:
|
|||
tox -e py # pytests and "lint" tests for the default version of python
|
||||
```
|
||||
|
||||
Please note that when dealing with any service that checks the state of a node
|
||||
in patroni's `cluster` endpoint, the corresponding JSON test file must be added
|
||||
in `./tests/tools.py`.
|
||||
Please note that when dealing with any service that checks the state of a node,
|
||||
the related tests must use the `old_replica_state` fixture to test with both
|
||||
old (pre 3.0.4) and new replica states.
|
||||
|
||||
A bash script, `check_patroni.sh`, is provided to facilitate testing all
|
||||
services on a Patroni endpoint (`./vagrant/check_patroni.sh`). It requires one
|
||||
|
@ -99,17 +92,3 @@ Here's an example usage:
|
|||
```bash
|
||||
./vagrant/check_patroni.sh http://10.20.30.51:8008
|
||||
```
|
||||
|
||||
## Release
|
||||
|
||||
Update the Changelog.
|
||||
|
||||
The package is generated and uploaded to pypi when a `v*` tag is created (see
|
||||
`.github/workflow/publish.yml`).
|
||||
|
||||
Alternatively, the release can be done manually with:
|
||||
|
||||
```
|
||||
tox -e build
|
||||
tox -e upload
|
||||
```
|
||||
|
|
|
@ -2,6 +2,7 @@ include *.md
|
|||
include mypy.ini
|
||||
include pytest.ini
|
||||
include tox.ini
|
||||
include .coveragerc
|
||||
include .flake8
|
||||
include pyproject.toml
|
||||
recursive-include docs *.sh
|
||||
|
|
118
README.md
118
README.md
|
@ -45,7 +45,7 @@ Commands:
|
|||
node_is_leader Check if the node is a leader node.
|
||||
node_is_pending_restart Check if the node is in pending restart...
|
||||
node_is_primary Check if the node is the primary with the...
|
||||
node_is_replica Check if the node is a running replica...
|
||||
node_is_replica Check if the node is a replica with no...
|
||||
node_patroni_version Check if the version is equal to the input
|
||||
node_tl_has_changed Check if the timeline has changed.
|
||||
```
|
||||
|
@ -60,7 +60,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git
|
|||
|
||||
check_patroni works on python 3.6, we keep it that way because patroni also
|
||||
supports it and there are still lots of RH 7 variants around. That being said
|
||||
python 3.6 has been EOL for age and there is no support for it in the github
|
||||
python 3.6 has been EOL for ages and there is no support for it in the github
|
||||
CI.
|
||||
|
||||
## Support
|
||||
|
@ -98,8 +98,8 @@ A match is found when: `start <= VALUE <= end`.
|
|||
|
||||
For example, the following command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
|
||||
|
||||
```
|
||||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||
|
@ -115,6 +115,30 @@ Several options are available:
|
|||
* `--cert_file`: your certificate or the concatenation of your certificate and private key
|
||||
* `--key_file`: your private key (optional)
|
||||
|
||||
## Shell completion
|
||||
|
||||
We use the [click] library which supports shell completion natively.
|
||||
|
||||
Shell completion can be added by typing the following command or adding it to
|
||||
a file spécific to your shell of choice.
|
||||
|
||||
* for Bash (add to `~/.bashrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
|
||||
```
|
||||
* for Zsh (add to `~/.zshrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
|
||||
```
|
||||
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
|
||||
```
|
||||
|
||||
Please note that shell completion is not supported far all shell versions, for
|
||||
example only Bash versions older than 4.4 are supported.
|
||||
|
||||
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
|
||||
|
||||
## Cluster services
|
||||
|
||||
|
@ -152,11 +176,27 @@ Usage: check_patroni cluster_has_leader [OPTIONS]
|
|||
|
||||
This check applies to any kind of leaders including standby leaders.
|
||||
|
||||
A leader is a node with the "leader" role and a "running" state.
|
||||
|
||||
A standby leader is a node with a "standby_leader" role and a "streaming" or
|
||||
"in archive recovery" state. Please note that log shipping could be stuck
|
||||
because the WAL are not available or applicable. Patroni doesn't provide
|
||||
information about the origin cluster (timeline or lag), so we cannot check
|
||||
if there is a problem in that particular case. That's why we issue a warning
|
||||
when the node is "in archive recovery". We suggest using other supervision
|
||||
tools to do this (eg. check_pgactivity).
|
||||
|
||||
Check:
|
||||
* `OK`: if there is a leader node.
|
||||
* `CRITICAL`: otherwise
|
||||
* 'WARNING': if there is a stanby leader in archive mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
Perfdata:
|
||||
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
|
||||
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
|
||||
archive recovery", 0 otherwise
|
||||
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
|
||||
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
|
@ -169,10 +209,27 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
|||
|
||||
Check if the cluster has healthy replicas and/or if some are sync standbies
|
||||
|
||||
For patroni (and this check):
|
||||
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
|
||||
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
|
||||
|
||||
A healthy replica:
|
||||
* is in running or streaming state (V3.0.4)
|
||||
* has a replica or sync_standby role
|
||||
* has a lag lower or equal to max_lag
|
||||
* has a `replica` or `sync_standby` role
|
||||
* has the same timeline as the leader and
|
||||
* is in `running` state (patroni < V3.0.4)
|
||||
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
|
||||
* has a lag lower or equal to `max_lag`
|
||||
|
||||
Please note that replica `in archive recovery` could be stuck because the
|
||||
WAL are not available or applicable (the server's timeline has diverged for
|
||||
the leader's). We already detect the latter but we will miss the former.
|
||||
Therefore, it's preferable to check for the lag in addition to the healthy
|
||||
state if you rely on log shipping to help lagging standbies to catch up.
|
||||
|
||||
Since we require a healthy replica to have the same timeline as the leader,
|
||||
it's possible that we raise alerts when the cluster is performing a
|
||||
switchover or failover and the standbies are in the process of catching up
|
||||
with the new leader. The alert shouldn't last long.
|
||||
|
||||
Check:
|
||||
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
||||
|
@ -183,6 +240,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
|||
* healthy_replica & unhealthy_replica count
|
||||
* the number of sync_replica, they are included in the previous count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
* the timeline of each replica labelled with "member name"_timeline
|
||||
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
|
||||
|
||||
Options:
|
||||
|
@ -241,26 +299,37 @@ Usage: check_patroni cluster_node_count [OPTIONS]
|
|||
|
||||
Count the number of nodes in the cluster.
|
||||
|
||||
The role refers to the role of the server in the cluster. Possible values
|
||||
are:
|
||||
* master or leader
|
||||
* replica
|
||||
* standby_leader
|
||||
* sync_standby
|
||||
* demoted
|
||||
* promoted
|
||||
* uninitialized
|
||||
|
||||
The state refers to the state of PostgreSQL. Possible values are:
|
||||
* initializing new cluster, initdb failed
|
||||
* running custom bootstrap script, custom bootstrap failed
|
||||
* starting, start failed
|
||||
* restarting, restart failed
|
||||
* running, streaming (for a replica V3.0.4)
|
||||
* running, streaming, in archive recovery
|
||||
* stopping, stopped, stop failed
|
||||
* creating replica
|
||||
* crashed
|
||||
|
||||
The role refers to the role of the server in the cluster. Possible values
|
||||
are:
|
||||
* master or leader (V3.0.0+)
|
||||
* replica
|
||||
* demoted
|
||||
* promoted
|
||||
* uninitialized
|
||||
The "healthy" checks only ensures that:
|
||||
* a leader has the running state
|
||||
* a standby_leader has the running or streaming (V3.0.4) state
|
||||
* a replica or sync-standby has the running or streaming (V3.0.4) state
|
||||
|
||||
Since we dont check the lag or timeline, "in archive recovery" is not
|
||||
considered a valid state for this service. See cluster_has_leader and
|
||||
cluster_has_replica for specialized checks.
|
||||
|
||||
Check:
|
||||
* Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
|
||||
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
|
||||
* `OK`: If they are not provided.
|
||||
|
||||
Perfdata:
|
||||
|
@ -307,7 +376,7 @@ Usage: check_patroni node_is_pending_restart [OPTIONS]
|
|||
|
||||
Check if the node is in pending restart state.
|
||||
|
||||
This situation can arise if the configuration has been modified but requiers
|
||||
This situation can arise if the configuration has been modified but requires
|
||||
a restart of PostgreSQL to take effect.
|
||||
|
||||
Check:
|
||||
|
@ -368,12 +437,21 @@ Options:
|
|||
```
|
||||
Usage: check_patroni node_is_replica [OPTIONS]
|
||||
|
||||
Check if the node is a running replica with no noloadbalance tag.
|
||||
Check if the node is a replica with no noloadbalance tag.
|
||||
|
||||
It is possible to check if the node is synchronous or asynchronous. If
|
||||
nothing is specified any kind of replica is accepted. When checking for a
|
||||
synchronous replica, it's not possible to specify a lag.
|
||||
|
||||
This service is using the following Patroni endpoints: replica, asynchronous
|
||||
and synchronous. The first two implement the `lag` tag. For these endpoints
|
||||
the state of a replica node doesn't reflect the replication state
|
||||
(`streaming` or `in archive recovery`), we only know if it's `running`. The
|
||||
timeline is also not checked.
|
||||
|
||||
Therefore, if a cluster is using asynchronous replication, it is recommended
|
||||
to check for the lag to detect a divegence as soon as possible.
|
||||
|
||||
Check:
|
||||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||
* `CRITICAL`: otherwise
|
||||
|
|
38
RELEASE.md
Normal file
38
RELEASE.md
Normal file
|
@ -0,0 +1,38 @@
|
|||
# Release HOW TO
|
||||
|
||||
## Preparatory changes
|
||||
|
||||
* Review the **Unreleased** section, if any, in `CHANGELOG.md` possibly adding
|
||||
any missing item from closed issues, merged pull requests, or directly the
|
||||
git history[^git-changes],
|
||||
* Rename the **Unreleased** section according to the version to be released,
|
||||
with a date,
|
||||
* Bump the version in `check_patroni/__init__.py`,
|
||||
* Rebuild the `README.md` (`cd docs; ./make_readme.sh`),
|
||||
* Commit these changes (either on a dedicated branch, before submitting a pull
|
||||
request or directly on the `master` branch) with the commit message `release
|
||||
X.Y.Z`.
|
||||
* Then, when changes landed in the `master` branch, create an annotated (and
|
||||
possibly signed) tag, as `git tag -a [-s] -m 'release X.Y.Z' vX.Y.Z`,
|
||||
and,
|
||||
* Push with `--follow-tags`.
|
||||
|
||||
[^git-changes]: Use `git log $(git describe --tags --abbrev=0).. --format=%s
|
||||
--reverse` to get commits from the previous tag.
|
||||
|
||||
## PyPI package
|
||||
|
||||
The package is generated and uploaded to pypi when a `v*` tag is created (see
|
||||
`.github/workflow/publish.yml`).
|
||||
|
||||
Alternatively, the release can be done manually with:
|
||||
|
||||
```
|
||||
tox -e build
|
||||
tox -e upload
|
||||
```
|
||||
|
||||
## GitHub release
|
||||
|
||||
Draft a new release from the release page, choosing the tag just pushed and
|
||||
copy the relevant change log section as a description.
|
|
@ -1,5 +1,5 @@
|
|||
import logging
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__version__ = "2.0.0"
|
||||
|
||||
_log: logging.Logger = logging.getLogger(__name__)
|
||||
|
|
|
@ -226,29 +226,40 @@ def cluster_node_count(
|
|||
) -> None:
|
||||
"""Count the number of nodes in the cluster.
|
||||
|
||||
\b
|
||||
The state refers to the state of PostgreSQL. Possible values are:
|
||||
* initializing new cluster, initdb failed
|
||||
* running custom bootstrap script, custom bootstrap failed
|
||||
* starting, start failed
|
||||
* restarting, restart failed
|
||||
* running, streaming (for a replica V3.0.4)
|
||||
* stopping, stopped, stop failed
|
||||
* creating replica
|
||||
* crashed
|
||||
|
||||
\b
|
||||
The role refers to the role of the server in the cluster. Possible values
|
||||
are:
|
||||
* master or leader (V3.0.0+)
|
||||
* master or leader
|
||||
* replica
|
||||
* standby_leader
|
||||
* sync_standby
|
||||
* demoted
|
||||
* promoted
|
||||
* uninitialized
|
||||
|
||||
\b
|
||||
The state refers to the state of PostgreSQL. Possible values are:
|
||||
* initializing new cluster, initdb failed
|
||||
* running custom bootstrap script, custom bootstrap failed
|
||||
* starting, start failed
|
||||
* restarting, restart failed
|
||||
* running, streaming, in archive recovery
|
||||
* stopping, stopped, stop failed
|
||||
* creating replica
|
||||
* crashed
|
||||
|
||||
\b
|
||||
The "healthy" checks only ensures that:
|
||||
* a leader has the running state
|
||||
* a standby_leader has the running or streaming (V3.0.4) state
|
||||
* a replica or sync-standby has the running or streaming (V3.0.4) state
|
||||
|
||||
Since we dont check the lag or timeline, "in archive recovery" is not considered a valid state
|
||||
for this service. See cluster_has_leader and cluster_has_replica for specialized checks.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* Compares the number of nodes against the normal and healthy (running + streaming) nodes warning and critical thresholds.
|
||||
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
|
||||
* `OK`: If they are not provided.
|
||||
|
||||
\b
|
||||
|
@ -285,17 +296,38 @@ def cluster_has_leader(ctx: click.Context) -> None:
|
|||
|
||||
This check applies to any kind of leaders including standby leaders.
|
||||
|
||||
A leader is a node with the "leader" role and a "running" state.
|
||||
|
||||
A standby leader is a node with a "standby_leader" role and a "streaming"
|
||||
or "in archive recovery" state. Please note that log shipping could be
|
||||
stuck because the WAL are not available or applicable. Patroni doesn't
|
||||
provide information about the origin cluster (timeline or lag), so we
|
||||
cannot check if there is a problem in that particular case. That's why we
|
||||
issue a warning when the node is "in archive recovery". We suggest using
|
||||
other supervision tools to do this (eg. check_pgactivity).
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if there is a leader node.
|
||||
* `CRITICAL`: otherwise
|
||||
* 'WARNING': if there is a stanby leader in archive mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
|
||||
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
|
||||
archive recovery", 0 otherwise
|
||||
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
|
||||
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
|
||||
|
||||
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterHasLeader(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("has_leader", None, "@0:0"),
|
||||
nagiosplugin.ScalarContext("is_standby_leader_in_arc_rec", "@1:1", None),
|
||||
nagiosplugin.ScalarContext("is_leader", None, None),
|
||||
nagiosplugin.ScalarContext("is_standby_leader", None, None),
|
||||
ClusterHasLeaderSummary(),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
@ -341,11 +373,29 @@ def cluster_has_replica(
|
|||
) -> None:
|
||||
"""Check if the cluster has healthy replicas and/or if some are sync standbies
|
||||
|
||||
\b
|
||||
For patroni (and this check):
|
||||
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
|
||||
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
|
||||
|
||||
\b
|
||||
A healthy replica:
|
||||
* is in running or streaming state (V3.0.4)
|
||||
* has a replica or sync_standby role
|
||||
* has a lag lower or equal to max_lag
|
||||
* has a `replica` or `sync_standby` role
|
||||
* has the same timeline as the leader and
|
||||
* is in `running` state (patroni < V3.0.4)
|
||||
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
|
||||
* has a lag lower or equal to `max_lag`
|
||||
|
||||
Please note that replica `in archive recovery` could be stuck because the WAL
|
||||
are not available or applicable (the server's timeline has diverged for the
|
||||
leader's). We already detect the latter but we will miss the former.
|
||||
Therefore, it's preferable to check for the lag in addition to the healthy
|
||||
state if you rely on log shipping to help lagging standbies to catch up.
|
||||
|
||||
Since we require a healthy replica to have the same timeline as the
|
||||
leader, it's possible that we raise alerts when the cluster is performing a
|
||||
switchover or failover and the standbies are in the process of catching up with
|
||||
the new leader. The alert shouldn't last long.
|
||||
|
||||
\b
|
||||
Check:
|
||||
|
@ -358,6 +408,7 @@ def cluster_has_replica(
|
|||
* healthy_replica & unhealthy_replica count
|
||||
* the number of sync_replica, they are included in the previous count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
* the timeline of each replica labelled with "member name"_timeline
|
||||
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
|
||||
"""
|
||||
|
||||
|
@ -377,6 +428,7 @@ def cluster_has_replica(
|
|||
),
|
||||
nagiosplugin.ScalarContext("unhealthy_replica"),
|
||||
nagiosplugin.ScalarContext("replica_lag"),
|
||||
nagiosplugin.ScalarContext("replica_timeline"),
|
||||
nagiosplugin.ScalarContext("replica_sync"),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
@ -569,10 +621,20 @@ def node_is_leader(ctx: click.Context, check_standby_leader: bool) -> None:
|
|||
def node_is_replica(
|
||||
ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool
|
||||
) -> None:
|
||||
"""Check if the node is a running replica with no noloadbalance tag.
|
||||
"""Check if the node is a replica with no noloadbalance tag.
|
||||
|
||||
It is possible to check if the node is synchronous or asynchronous. If nothing is specified any kind of replica is accepted.
|
||||
When checking for a synchronous replica, it's not possible to specify a lag.
|
||||
It is possible to check if the node is synchronous or asynchronous. If
|
||||
nothing is specified any kind of replica is accepted. When checking for a
|
||||
synchronous replica, it's not possible to specify a lag.
|
||||
|
||||
This service is using the following Patroni endpoints: replica, asynchronous
|
||||
and synchronous. The first two implement the `lag` tag. For these endpoints
|
||||
the state of a replica node doesn't reflect the replication state
|
||||
(`streaming` or `in archive recovery`), we only know if it's `running`. The
|
||||
timeline is also not checked.
|
||||
|
||||
Therefore, if a cluster is using asynchronous replication, it is
|
||||
recommended to check for the lag to detect a divegence as soon as possible.
|
||||
|
||||
\b
|
||||
Check:
|
||||
|
@ -610,7 +672,7 @@ def node_is_pending_restart(ctx: click.Context) -> None:
|
|||
"""Check if the node is in pending restart state.
|
||||
|
||||
This situation can arise if the configuration has been modified but
|
||||
requiers a restart of PostgreSQL to take effect.
|
||||
requires a restart of PostgreSQL to take effect.
|
||||
|
||||
\b
|
||||
Check:
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
import hashlib
|
||||
import json
|
||||
from collections import Counter
|
||||
from typing import Iterable, Union
|
||||
from typing import Any, Iterable, Union
|
||||
|
||||
import nagiosplugin
|
||||
|
||||
|
@ -14,25 +14,52 @@ def replace_chars(text: str) -> str:
|
|||
|
||||
|
||||
class ClusterNodeCount(PatroniResource):
|
||||
def probe(self: "ClusterNodeCount") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
def debug_member(member: Any, health: str) -> None:
|
||||
_log.debug(
|
||||
"Node %(node_name)s is %(health)s: role %(role)s state %(state)s.",
|
||||
{
|
||||
"node_name": member["name"],
|
||||
"health": health,
|
||||
"role": member["role"],
|
||||
"state": member["state"],
|
||||
},
|
||||
)
|
||||
|
||||
# get the cluster info
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
role_counters: Counter[str] = Counter()
|
||||
roles = []
|
||||
status_counters: Counter[str] = Counter()
|
||||
statuses = []
|
||||
healthy_member = 0
|
||||
|
||||
for member in item_dict["members"]:
|
||||
roles.append(replace_chars(member["role"]))
|
||||
statuses.append(replace_chars(member["state"]))
|
||||
state, role = member["state"], member["role"]
|
||||
roles.append(replace_chars(role))
|
||||
statuses.append(replace_chars(state))
|
||||
|
||||
if role == "leader" and state == "running":
|
||||
healthy_member += 1
|
||||
debug_member(member, "healthy")
|
||||
continue
|
||||
|
||||
if role in ["standby_leader", "replica", "sync_standby"] and (
|
||||
(self.has_detailed_states() and state == "streaming")
|
||||
or (not self.has_detailed_states() and state == "running")
|
||||
):
|
||||
healthy_member += 1
|
||||
debug_member(member, "healthy")
|
||||
continue
|
||||
|
||||
debug_member(member, "unhealthy")
|
||||
role_counters.update(roles)
|
||||
status_counters.update(statuses)
|
||||
|
||||
# The actual check: members, healthy_members
|
||||
yield nagiosplugin.Metric("members", len(item_dict["members"]))
|
||||
yield nagiosplugin.Metric(
|
||||
"healthy_members",
|
||||
status_counters["running"] + status_counters.get("streaming", 0),
|
||||
)
|
||||
yield nagiosplugin.Metric("healthy_members", healthy_member)
|
||||
|
||||
# The performance data : role
|
||||
for role in role_counters:
|
||||
|
@ -48,73 +75,148 @@ class ClusterNodeCount(PatroniResource):
|
|||
|
||||
|
||||
class ClusterHasLeader(PatroniResource):
|
||||
def probe(self: "ClusterHasLeader") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
is_leader_found = False
|
||||
is_standby_leader_found = False
|
||||
is_standby_leader_in_arc_rec = False
|
||||
for member in item_dict["members"]:
|
||||
if (
|
||||
member["role"] in ("leader", "standby_leader")
|
||||
and member["state"] == "running"
|
||||
):
|
||||
if member["role"] == "leader" and member["state"] == "running":
|
||||
is_leader_found = True
|
||||
break
|
||||
|
||||
if member["role"] == "standby_leader":
|
||||
if member["state"] not in ["streaming", "in archive recovery"]:
|
||||
# for patroni >= 3.0.4 any state would be wrong
|
||||
# for patroni < 3.0.4 a state different from running would be wrong
|
||||
if self.has_detailed_states() or member["state"] != "running":
|
||||
continue
|
||||
|
||||
if member["state"] in ["in archive recovery"]:
|
||||
is_standby_leader_in_arc_rec = True
|
||||
|
||||
is_standby_leader_found = True
|
||||
break
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"has_leader",
|
||||
1 if is_leader_found or is_standby_leader_found else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_standby_leader_in_arc_rec",
|
||||
1 if is_standby_leader_in_arc_rec else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_standby_leader",
|
||||
1 if is_standby_leader_found else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_leader",
|
||||
1 if is_leader_found else 0,
|
||||
)
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
class ClusterHasLeaderSummary(nagiosplugin.Summary):
|
||||
def ok(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "The cluster has a running leader."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "ClusterHasLeaderSummary", results: nagiosplugin.Result) -> str:
|
||||
return "The cluster has no running leader."
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "The cluster has no running leader or the standby leader is in archive recovery."
|
||||
|
||||
|
||||
class ClusterHasReplica(PatroniResource):
|
||||
def __init__(
|
||||
self: "ClusterHasReplica",
|
||||
connection_info: ConnectionInfo,
|
||||
max_lag: Union[int, None],
|
||||
):
|
||||
def __init__(self, connection_info: ConnectionInfo, max_lag: Union[int, None]):
|
||||
super().__init__(connection_info)
|
||||
self.max_lag = max_lag
|
||||
|
||||
def probe(self: "ClusterHasReplica") -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
def debug_member(member: Any, health: str) -> None:
|
||||
_log.debug(
|
||||
"Node %(node_name)s is %(health)s: lag %(lag)s, state %(state)s, tl %(tl)s.",
|
||||
{
|
||||
"node_name": member["name"],
|
||||
"health": health,
|
||||
"lag": member["lag"],
|
||||
"state": member["state"],
|
||||
"tl": member["timeline"],
|
||||
},
|
||||
)
|
||||
|
||||
# get the cluster info
|
||||
cluster_item_dict = self.rest_api("cluster")
|
||||
|
||||
replicas = []
|
||||
healthy_replica = 0
|
||||
unhealthy_replica = 0
|
||||
sync_replica = 0
|
||||
for member in item_dict["members"]:
|
||||
# FIXME are there other acceptable states
|
||||
leader_tl = None
|
||||
|
||||
# Look for replicas
|
||||
for member in cluster_item_dict["members"]:
|
||||
if member["role"] in ["replica", "sync_standby"]:
|
||||
# patroni 3.0.4 changed the standby state from running to streaming
|
||||
if (
|
||||
member["state"] in ["running", "streaming"]
|
||||
and member["lag"] != "unknown"
|
||||
):
|
||||
if member["lag"] == "unknown":
|
||||
# This could happen if the node is stopped
|
||||
# nagiosplugin doesn't handle strings in perfstats
|
||||
# so we have to ditch all the stats in that case
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
continue
|
||||
else:
|
||||
replicas.append(
|
||||
{
|
||||
"name": member["name"],
|
||||
"lag": member["lag"],
|
||||
"timeline": member["timeline"],
|
||||
"sync": 1 if member["role"] == "sync_standby" else 0,
|
||||
}
|
||||
)
|
||||
|
||||
# Get the leader tl if we haven't already
|
||||
if leader_tl is None:
|
||||
# If there are no leaders, we will loop here for all
|
||||
# members because leader_tl will remain None. it's not
|
||||
# a big deal since having no leader is rare.
|
||||
for tmember in cluster_item_dict["members"]:
|
||||
if tmember["role"] == "leader":
|
||||
leader_tl = int(tmember["timeline"])
|
||||
break
|
||||
|
||||
_log.debug(
|
||||
"Patroni's leader_timeline is %(leader_tl)s",
|
||||
{
|
||||
"leader_tl": leader_tl,
|
||||
},
|
||||
)
|
||||
|
||||
# Test for an unhealthy replica
|
||||
if (
|
||||
self.has_detailed_states()
|
||||
and not (
|
||||
member["state"] in ["streaming", "in archive recovery"]
|
||||
and int(member["timeline"]) == leader_tl
|
||||
)
|
||||
) or (
|
||||
not self.has_detailed_states()
|
||||
and not (
|
||||
member["state"] == "running"
|
||||
and int(member["timeline"]) == leader_tl
|
||||
)
|
||||
):
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
continue
|
||||
|
||||
if member["role"] == "sync_standby":
|
||||
sync_replica += 1
|
||||
|
||||
if self.max_lag is None or self.max_lag >= int(member["lag"]):
|
||||
debug_member(member, "healthy")
|
||||
healthy_replica += 1
|
||||
continue
|
||||
else:
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
|
||||
# The actual check
|
||||
|
@ -127,6 +229,11 @@ class ClusterHasReplica(PatroniResource):
|
|||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_lag", replica["lag"], context="replica_lag"
|
||||
)
|
||||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_timeline",
|
||||
replica["timeline"],
|
||||
context="replica_timeline",
|
||||
)
|
||||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_sync", replica["sync"], context="replica_sync"
|
||||
)
|
||||
|
@ -140,7 +247,7 @@ class ClusterHasReplica(PatroniResource):
|
|||
|
||||
class ClusterConfigHasChanged(PatroniResource):
|
||||
def __init__(
|
||||
self: "ClusterConfigHasChanged",
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
config_hash: str, # Always contains the old hash
|
||||
state_file: str, # Only used to update the hash in the state_file (when needed)
|
||||
|
@ -151,7 +258,7 @@ class ClusterConfigHasChanged(PatroniResource):
|
|||
self.config_hash = config_hash
|
||||
self.save = save
|
||||
|
||||
def probe(self: "ClusterConfigHasChanged") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("config")
|
||||
|
||||
new_hash = hashlib.md5(json.dumps(item_dict).encode()).hexdigest()
|
||||
|
@ -183,23 +290,21 @@ class ClusterConfigHasChanged(PatroniResource):
|
|||
|
||||
|
||||
class ClusterConfigHasChangedSummary(nagiosplugin.Summary):
|
||||
def __init__(self: "ClusterConfigHasChangedSummary", config_hash: str) -> None:
|
||||
def __init__(self, config_hash: str) -> None:
|
||||
self.old_config_hash = config_hash
|
||||
|
||||
# Note: It would be helpful to display the old / new hash here. Unfortunately, it's not a metric.
|
||||
# So we only have the old / expected one.
|
||||
def ok(self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The hash of patroni's dynamic configuration has not changed ({self.old_config_hash})."
|
||||
|
||||
@handle_unknown
|
||||
def problem(
|
||||
self: "ClusterConfigHasChangedSummary", results: nagiosplugin.Result
|
||||
) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The hash of patroni's dynamic configuration has changed. The old hash was {self.old_config_hash}."
|
||||
|
||||
|
||||
class ClusterIsInMaintenance(PatroniResource):
|
||||
def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
# The actual check
|
||||
|
@ -212,7 +317,7 @@ class ClusterIsInMaintenance(PatroniResource):
|
|||
|
||||
|
||||
class ClusterHasScheduledAction(PatroniResource):
|
||||
def probe(self: "ClusterIsInMaintenance") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
scheduled_switchover = 0
|
||||
|
|
|
@ -7,7 +7,7 @@ from .types import APIError, ConnectionInfo, PatroniResource, handle_unknown
|
|||
|
||||
|
||||
class NodeIsPrimary(PatroniResource):
|
||||
def probe(self: "NodeIsPrimary") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
self.rest_api("primary")
|
||||
except APIError:
|
||||
|
@ -16,24 +16,22 @@ class NodeIsPrimary(PatroniResource):
|
|||
|
||||
|
||||
class NodeIsPrimarySummary(nagiosplugin.Summary):
|
||||
def ok(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is the primary with the leader lock."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodeIsPrimarySummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is not the primary with the leader lock."
|
||||
|
||||
|
||||
class NodeIsLeader(PatroniResource):
|
||||
def __init__(
|
||||
self: "NodeIsLeader",
|
||||
connection_info: ConnectionInfo,
|
||||
check_is_standby_leader: bool,
|
||||
self, connection_info: ConnectionInfo, check_is_standby_leader: bool
|
||||
) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.check_is_standby_leader = check_is_standby_leader
|
||||
|
||||
def probe(self: "NodeIsLeader") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
apiname = "leader"
|
||||
if self.check_is_standby_leader:
|
||||
apiname = "standby-leader"
|
||||
|
@ -46,26 +44,23 @@ class NodeIsLeader(PatroniResource):
|
|||
|
||||
|
||||
class NodeIsLeaderSummary(nagiosplugin.Summary):
|
||||
def __init__(
|
||||
self: "NodeIsLeaderSummary",
|
||||
check_is_standby_leader: bool,
|
||||
) -> None:
|
||||
def __init__(self, check_is_standby_leader: bool) -> None:
|
||||
if check_is_standby_leader:
|
||||
self.leader_kind = "standby leader"
|
||||
else:
|
||||
self.leader_kind = "leader"
|
||||
|
||||
def ok(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"This node is a {self.leader_kind} node."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodeIsLeaderSummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"This node is not a {self.leader_kind} node."
|
||||
|
||||
|
||||
class NodeIsReplica(PatroniResource):
|
||||
def __init__(
|
||||
self: "NodeIsReplica",
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
max_lag: str,
|
||||
check_is_sync: bool,
|
||||
|
@ -76,7 +71,7 @@ class NodeIsReplica(PatroniResource):
|
|||
self.check_is_sync = check_is_sync
|
||||
self.check_is_async = check_is_async
|
||||
|
||||
def probe(self: "NodeIsReplica") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
if self.check_is_sync:
|
||||
api_name = "synchronous"
|
||||
|
@ -95,12 +90,7 @@ class NodeIsReplica(PatroniResource):
|
|||
|
||||
|
||||
class NodeIsReplicaSummary(nagiosplugin.Summary):
|
||||
def __init__(
|
||||
self: "NodeIsReplicaSummary",
|
||||
lag: str,
|
||||
check_is_sync: bool,
|
||||
check_is_async: bool,
|
||||
) -> None:
|
||||
def __init__(self, lag: str, check_is_sync: bool, check_is_async: bool) -> None:
|
||||
self.lag = lag
|
||||
if check_is_sync:
|
||||
self.replica_kind = "synchronous replica"
|
||||
|
@ -109,7 +99,7 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
|
|||
else:
|
||||
self.replica_kind = "replica"
|
||||
|
||||
def ok(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
if self.lag is None:
|
||||
return (
|
||||
f"This node is a running {self.replica_kind} with no noloadbalance tag."
|
||||
|
@ -117,14 +107,14 @@ class NodeIsReplicaSummary(nagiosplugin.Summary):
|
|||
return f"This node is a running {self.replica_kind} with no noloadbalance tag and the lag is under {self.lag}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodeIsReplicaSummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
if self.lag is None:
|
||||
return f"This node is not a running {self.replica_kind} with no noloadbalance tag."
|
||||
return f"This node is not a running {self.replica_kind} with no noloadbalance tag and a lag under {self.lag}."
|
||||
|
||||
|
||||
class NodeIsPendingRestart(PatroniResource):
|
||||
def probe(self: "NodeIsPendingRestart") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
|
||||
is_pending_restart = item_dict.get("pending_restart", False)
|
||||
|
@ -137,19 +127,17 @@ class NodeIsPendingRestart(PatroniResource):
|
|||
|
||||
|
||||
class NodeIsPendingRestartSummary(nagiosplugin.Summary):
|
||||
def ok(self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node doesn't have the pending restart flag."
|
||||
|
||||
@handle_unknown
|
||||
def problem(
|
||||
self: "NodeIsPendingRestartSummary", results: nagiosplugin.Result
|
||||
) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node has the pending restart flag."
|
||||
|
||||
|
||||
class NodeTLHasChanged(PatroniResource):
|
||||
def __init__(
|
||||
self: "NodeTLHasChanged",
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
timeline: str, # Always contains the old timeline
|
||||
state_file: str, # Only used to update the timeline in the state_file (when needed)
|
||||
|
@ -160,7 +148,7 @@ class NodeTLHasChanged(PatroniResource):
|
|||
self.timeline = timeline
|
||||
self.save = save
|
||||
|
||||
def probe(self: "NodeTLHasChanged") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
new_tl = item_dict["timeline"]
|
||||
|
||||
|
@ -193,27 +181,23 @@ class NodeTLHasChanged(PatroniResource):
|
|||
|
||||
|
||||
class NodeTLHasChangedSummary(nagiosplugin.Summary):
|
||||
def __init__(self: "NodeTLHasChangedSummary", timeline: str) -> None:
|
||||
def __init__(self, timeline: str) -> None:
|
||||
self.timeline = timeline
|
||||
|
||||
def ok(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The timeline is still {self.timeline}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodeTLHasChangedSummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The expected timeline was {self.timeline} got {results['timeline'].metric}."
|
||||
|
||||
|
||||
class NodePatroniVersion(PatroniResource):
|
||||
def __init__(
|
||||
self: "NodePatroniVersion",
|
||||
connection_info: ConnectionInfo,
|
||||
patroni_version: str,
|
||||
) -> None:
|
||||
def __init__(self, connection_info: ConnectionInfo, patroni_version: str) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.patroni_version = patroni_version
|
||||
|
||||
def probe(self: "NodePatroniVersion") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
|
||||
version = item_dict["patroni"]["version"]
|
||||
|
@ -232,21 +216,21 @@ class NodePatroniVersion(PatroniResource):
|
|||
|
||||
|
||||
class NodePatroniVersionSummary(nagiosplugin.Summary):
|
||||
def __init__(self: "NodePatroniVersionSummary", patroni_version: str) -> None:
|
||||
def __init__(self, patroni_version: str) -> None:
|
||||
self.patroni_version = patroni_version
|
||||
|
||||
def ok(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"Patroni's version is {self.patroni_version}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodePatroniVersionSummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
# FIXME find a way to make the following work, check is perf data can be strings
|
||||
# return f"The expected patroni version was {self.patroni_version} got {results['patroni_version'].metric}."
|
||||
return f"Patroni's version is not {self.patroni_version}."
|
||||
|
||||
|
||||
class NodeIsAlive(PatroniResource):
|
||||
def probe(self: "NodeIsAlive") -> Iterable[nagiosplugin.Metric]:
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
self.rest_api("liveness")
|
||||
except APIError:
|
||||
|
@ -255,9 +239,9 @@ class NodeIsAlive(PatroniResource):
|
|||
|
||||
|
||||
class NodeIsAliveSummary(nagiosplugin.Summary):
|
||||
def ok(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is alive (patroni is running)."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self: "NodeIsAliveSummary", results: nagiosplugin.Result) -> str:
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is not alive (patroni is not running)."
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
import json
|
||||
from functools import lru_cache
|
||||
from typing import Any, Callable, List, Optional, Tuple, Union
|
||||
from urllib.parse import urlparse
|
||||
|
||||
|
@ -28,11 +30,11 @@ class Parameters:
|
|||
verbose: int
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True, slots=True)
|
||||
@attr.s(auto_attribs=True, eq=False, slots=True)
|
||||
class PatroniResource(nagiosplugin.Resource):
|
||||
conn_info: ConnectionInfo
|
||||
|
||||
def rest_api(self: "PatroniResource", service: str) -> Any:
|
||||
def rest_api(self, service: str) -> Any:
|
||||
"""Try to connect to all the provided endpoints for the requested service"""
|
||||
for endpoint in self.conn_info.endpoints:
|
||||
cert: Optional[Union[Tuple[str, str], str]] = None
|
||||
|
@ -71,10 +73,31 @@ class PatroniResource(nagiosplugin.Resource):
|
|||
|
||||
try:
|
||||
return r.json()
|
||||
except requests.exceptions.JSONDecodeError:
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
return None
|
||||
raise nagiosplugin.CheckError("Connection failed for all provided endpoints")
|
||||
|
||||
@lru_cache(maxsize=None)
|
||||
def has_detailed_states(self) -> bool:
|
||||
# get patroni's version to find out if the "streaming" and "in archive recovery" states are available
|
||||
patroni_item_dict = self.rest_api("patroni")
|
||||
|
||||
if tuple(
|
||||
int(v) for v in patroni_item_dict["patroni"]["version"].split(".", 2)
|
||||
) >= (3, 0, 4):
|
||||
_log.debug(
|
||||
"Patroni's version is %(version)s, more detailed states can be used to check for the health of replicas.",
|
||||
{"version": patroni_item_dict["patroni"]["version"]},
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
_log.debug(
|
||||
"Patroni's version is %(version)s, the running state and the timelines must be used to check for the health of replicas.",
|
||||
{"version": patroni_item_dict["patroni"]["version"]},
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
HandleUnknown = Callable[[nagiosplugin.Summary, nagiosplugin.Results], Any]
|
||||
|
||||
|
|
|
@ -42,7 +42,7 @@ $ pip install git+https://github.com/dalibo/check_patroni.git
|
|||
|
||||
check_patroni works on python 3.6, we keep it that way because patroni also
|
||||
supports it and there are still lots of RH 7 variants around. That being said
|
||||
python 3.6 has been EOL for age and there is no support for it in the github
|
||||
python 3.6 has been EOL for ages and there is no support for it in the github
|
||||
CI.
|
||||
|
||||
## Support
|
||||
|
@ -80,8 +80,8 @@ A match is found when: `start <= VALUE <= end`.
|
|||
|
||||
For example, the following command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
|
||||
|
||||
```
|
||||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||
|
@ -97,6 +97,30 @@ Several options are available:
|
|||
* `--cert_file`: your certificate or the concatenation of your certificate and private key
|
||||
* `--key_file`: your private key (optional)
|
||||
|
||||
## Shell completion
|
||||
|
||||
We use the [click] library which supports shell completion natively.
|
||||
|
||||
Shell completion can be added by typing the following command or adding it to
|
||||
a file spécific to your shell of choice.
|
||||
|
||||
* for Bash (add to `~/.bashrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
|
||||
```
|
||||
* for Zsh (add to `~/.zshrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
|
||||
```
|
||||
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
|
||||
```
|
||||
|
||||
Please note that shell completion is not supported far all shell versions, for
|
||||
example only Bash versions older than 4.4 are supported.
|
||||
|
||||
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
|
||||
_EOF_
|
||||
readme
|
||||
readme "## Cluster services"
|
||||
|
|
1
mypy.ini
1
mypy.ini
|
@ -1,4 +1,5 @@
|
|||
[mypy]
|
||||
files = .
|
||||
show_error_codes = true
|
||||
strict = true
|
||||
exclude = build/
|
||||
|
|
|
@ -4,7 +4,7 @@ isort
|
|||
flake8
|
||||
mypy==0.961
|
||||
pytest
|
||||
pytest-mock
|
||||
pytest-cov
|
||||
types-requests
|
||||
setuptools
|
||||
tox
|
||||
|
|
7
setup.py
7
setup.py
|
@ -41,12 +41,12 @@ setup(
|
|||
"attrs >= 17, !=21.1",
|
||||
"requests",
|
||||
"nagiosplugin >= 1.3.2",
|
||||
"click >= 8.0.1",
|
||||
"click >= 7.1",
|
||||
],
|
||||
extras_require={
|
||||
"test": [
|
||||
"pytest",
|
||||
"pytest-mock",
|
||||
"importlib_metadata; python_version < '3.8'",
|
||||
"pytest >= 6.0.2",
|
||||
],
|
||||
},
|
||||
entry_points={
|
||||
|
@ -56,4 +56,3 @@ setup(
|
|||
},
|
||||
zip_safe=False,
|
||||
)
|
||||
|
||||
|
|
|
@ -0,0 +1,65 @@
|
|||
import json
|
||||
import logging
|
||||
import shutil
|
||||
from contextlib import contextmanager
|
||||
from functools import partial
|
||||
from http.server import HTTPServer, SimpleHTTPRequestHandler
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator, Mapping, Union
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PatroniAPI(HTTPServer):
|
||||
def __init__(self, directory: Path, *, datadir: Path) -> None:
|
||||
self.directory = directory
|
||||
self.datadir = datadir
|
||||
handler_cls = partial(SimpleHTTPRequestHandler, directory=str(directory))
|
||||
super().__init__(("", 0), handler_cls)
|
||||
|
||||
def serve_forever(self, *args: Any) -> None:
|
||||
logger.info(
|
||||
"starting fake Patroni API at %s (directory=%s)",
|
||||
self.endpoint,
|
||||
self.directory,
|
||||
)
|
||||
return super().serve_forever(*args)
|
||||
|
||||
@property
|
||||
def endpoint(self) -> str:
|
||||
return f"http://{self.server_name}:{self.server_port}"
|
||||
|
||||
@contextmanager
|
||||
def routes(self, mapping: Mapping[str, Union[Path, str]]) -> Iterator[None]:
|
||||
"""Temporarily install specified files in served directory, thus
|
||||
building "routes" from given mapping.
|
||||
|
||||
The 'mapping' defines target route paths as keys and files to be
|
||||
installed in served directory as values. Mapping values of type 'str'
|
||||
are assumed be relative file path to the 'datadir'.
|
||||
"""
|
||||
for route_path, fpath in mapping.items():
|
||||
if isinstance(fpath, str):
|
||||
fpath = self.datadir / fpath
|
||||
shutil.copy(fpath, self.directory / route_path)
|
||||
try:
|
||||
yield None
|
||||
finally:
|
||||
for fname in mapping:
|
||||
(self.directory / fname).unlink()
|
||||
|
||||
|
||||
def cluster_api_set_replica_running(in_json: Path, target_dir: Path) -> Path:
|
||||
# starting from 3.0.4 the state of replicas is streaming or in archive recovery
|
||||
# instead of running
|
||||
with in_json.open() as f:
|
||||
js = json.load(f)
|
||||
for node in js["members"]:
|
||||
if node["role"] in ["replica", "sync_standby", "standby_leader"]:
|
||||
if node["state"] in ["streaming", "in archive recovery"]:
|
||||
node["state"] = "running"
|
||||
assert target_dir.is_dir()
|
||||
out_json = target_dir / in_json.name
|
||||
with out_json.open("w") as f:
|
||||
json.dump(js, f)
|
||||
return out_json
|
|
@ -1,12 +1,76 @@
|
|||
def pytest_addoption(parser):
|
||||
"""
|
||||
Add CLI options to `pytest` to pass those options to the test cases.
|
||||
These options are used in `pytest_generate_tests`.
|
||||
"""
|
||||
parser.addoption("--use-old-replica-state", action="store_true", default=False)
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from threading import Thread
|
||||
from typing import Any, Iterator, Tuple
|
||||
from unittest.mock import patch
|
||||
|
||||
if sys.version_info >= (3, 8):
|
||||
from importlib.metadata import version as metadata_version
|
||||
else:
|
||||
from importlib_metadata import version as metadata_version
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def pytest_generate_tests(metafunc):
|
||||
metafunc.parametrize(
|
||||
"use_old_replica_state", [metafunc.config.getoption("use_old_replica_state")]
|
||||
)
|
||||
def numversion(pkgname: str) -> Tuple[int, ...]:
|
||||
version = metadata_version(pkgname)
|
||||
return tuple(int(v) for v in version.split(".", 3))
|
||||
|
||||
|
||||
if numversion("pytest") >= (6, 2):
|
||||
TempPathFactory = pytest.TempPathFactory
|
||||
else:
|
||||
from _pytest.tmpdir import TempPathFactory
|
||||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def nagioplugin_runtime_stdout() -> Iterator[None]:
|
||||
# work around https://github.com/mpounsett/nagiosplugin/issues/24 when
|
||||
# nagiosplugin is older than 1.3.3
|
||||
if numversion("nagiosplugin") < (1, 3, 3):
|
||||
target = "nagiosplugin.runtime.Runtime.stdout"
|
||||
with patch(target, None):
|
||||
logger.warning("patching %r", target)
|
||||
yield None
|
||||
else:
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.fixture(
|
||||
params=[False, True],
|
||||
ids=lambda v: "new-replica-state" if v else "old-replica-state",
|
||||
)
|
||||
def old_replica_state(request: Any) -> Any:
|
||||
return request.param
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def datadir() -> Path:
|
||||
return Path(__file__).parent / "json"
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def patroni_api(
|
||||
tmp_path_factory: TempPathFactory, datadir: Path
|
||||
) -> Iterator[PatroniAPI]:
|
||||
"""A fake HTTP server for the Patroni API serving files from a temporary
|
||||
directory.
|
||||
"""
|
||||
httpd = PatroniAPI(tmp_path_factory.mktemp("api"), datadir=datadir)
|
||||
t = Thread(target=httpd.serve_forever)
|
||||
t.start()
|
||||
yield httpd
|
||||
httpd.shutdown()
|
||||
t.join()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def runner() -> CliRunner:
|
||||
"""A CliRunner with stdout and stderr not mixed."""
|
||||
return CliRunner(mix_stderr=False)
|
||||
|
|
33
tests/json/cluster_has_leader_ko_standby_leader.json
Normal file
33
tests/json/cluster_has_leader_ko_standby_leader.json
Normal file
|
@ -0,0 +1,33 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "stopped",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,33 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -3,7 +3,7 @@
|
|||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "running",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
|
|
35
tests/json/cluster_has_replica_ko_all_replica.json
Normal file
35
tests/json/cluster_has_replica_ko_all_replica.json
Normal file
|
@ -0,0 +1,35 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
|
||||
}
|
||||
]
|
||||
}
|
33
tests/json/cluster_has_replica_ko_wrong_tl.json
Normal file
33
tests/json/cluster_has_replica_ko_wrong_tl.json
Normal file
|
@ -0,0 +1,33 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 50,
|
||||
"lag": 1000000
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -12,7 +12,7 @@
|
|||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
|
|
26
tests/json/cluster_has_replica_patroni_verion_3.0.0.json
Normal file
26
tests/json/cluster_has_replica_patroni_verion_3.0.0.json
Normal file
|
@ -0,0 +1,26 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 51,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "3.0.0",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
26
tests/json/cluster_has_replica_patroni_verion_3.1.0.json
Normal file
26
tests/json/cluster_has_replica_patroni_verion_3.1.0.json
Normal file
|
@ -0,0 +1,26 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 51,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "3.1.0",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
33
tests/json/cluster_node_count_ko_in_archive_recovery.json
Normal file
33
tests/json/cluster_node_count_ko_in_archive_recovery.json
Normal file
|
@ -0,0 +1,33 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,30 +1,20 @@
|
|||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_api_status_code_200(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_pending_restart_ok", 200)
|
||||
def test_api_status_code_200(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
def test_api_status_code_404(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "Fake test", 404)
|
||||
def test_api_status_code_404(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
|
|
|
@ -1,23 +1,29 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator
|
||||
|
||||
import nagiosplugin
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import here, my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def cluster_config_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"config": "cluster_config_has_changed.json"}):
|
||||
yield None
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_ok_with_hash(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_config_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"96b12d82571473d13e890b893734e731",
|
||||
|
@ -31,22 +37,20 @@ def test_cluster_config_has_changed_ok_with_hash(
|
|||
|
||||
|
||||
def test_cluster_config_has_changed_ok_with_state_file(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
with open(here / "cluster_config_has_changed.state_file", "w") as f:
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"hash": "96b12d82571473d13e890b893734e731"}')
|
||||
|
||||
my_mock(mocker, "cluster_config_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(here / "cluster_config_has_changed.state_file"),
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
@ -57,16 +61,13 @@ def test_cluster_config_has_changed_ok_with_state_file(
|
|||
|
||||
|
||||
def test_cluster_config_has_changed_ko_with_hash(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_config_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"96b12d82571473d13e890b8937ffffff",
|
||||
|
@ -80,24 +81,21 @@ def test_cluster_config_has_changed_ko_with_hash(
|
|||
|
||||
|
||||
def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
||||
mocker: MockerFixture,
|
||||
use_old_replica_state: bool,
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
with open(here / "cluster_config_has_changed.state_file", "w") as f:
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"hash": "96b12d82571473d13e890b8937ffffff"}')
|
||||
|
||||
my_mock(mocker, "cluster_config_has_changed", 200)
|
||||
# test without saving the new hash
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(here / "cluster_config_has_changed.state_file"),
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
@ -106,7 +104,8 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
|||
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_config_hash = cookie.get("hash")
|
||||
cookie.close()
|
||||
|
@ -118,10 +117,10 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
|||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(here / "cluster_config_has_changed.state_file"),
|
||||
str(state_file),
|
||||
"--save",
|
||||
],
|
||||
)
|
||||
|
@ -131,7 +130,7 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
|||
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(here / "cluster_config_has_changed.state_file")
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_config_hash = cookie.get("hash")
|
||||
cookie.close()
|
||||
|
@ -140,22 +139,20 @@ def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
|||
|
||||
|
||||
def test_cluster_config_has_changed_params(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_config_has_changed", 200)
|
||||
fake_state_file = tmp_path / "fake_file_name.state_file"
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"640df9f0211c791723f18fc3ed9dbb95",
|
||||
"--state-file",
|
||||
str(here / "fake_file_name.state_file"),
|
||||
str(fake_state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
|
|
|
@ -1,54 +1,139 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
def test_cluster_has_leader_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "cluster_has_leader_ok", 200)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ok")
|
||||
def test_cluster_has_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=1 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ok_standby_leader(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ok_standby_leader.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ok_standby_leader")
|
||||
def test_cluster_has_leader_ok_standby_leader(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_leader_ok_standby_leader", 200)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0\n"
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
def test_cluster_has_leader_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ko.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "cluster_has_leader_ko", 200)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_leader"]
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko")
|
||||
def test_cluster_has_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko_standby_leader(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ko_standby_leader.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader")
|
||||
def test_cluster_has_leader_ko_standby_leader(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader. | has_leader=0;;@0\n"
|
||||
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko_standby_leader_archiving(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = (
|
||||
"cluster_has_leader_ko_standby_leader_archiving.json"
|
||||
)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader_archiving")
|
||||
def test_cluster_has_leader_ko_standby_leader_archiving(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
else:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER WARNING - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=1;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
|
|
@ -1,39 +1,46 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
# TODO Lag threshold tests
|
||||
def test_cluster_has_relica_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_replica"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_relica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_replica"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_replica_ok_with_count_thresholds(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
|
@ -41,48 +48,56 @@ def test_cluster_has_replica_ok_with_count_thresholds(
|
|||
"@0",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1 unhealthy_replica=0\n"
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_replica_ok_with_sync_count_thresholds(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ok", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--sync-warning",
|
||||
"1:",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv3_lag=0 srv3_sync=1 sync_replica=1;1: unhealthy_replica=0\n"
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1;1: unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ok_lag(
|
||||
patroni_api: PatroniAPI, datadir: Path, tmp_path: Path, old_replica_state: bool
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ok_lag.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok_lag")
|
||||
def test_cluster_has_replica_ok_with_count_thresholds_lag(
|
||||
mocker: MockerFixture,
|
||||
use_old_replica_state: bool,
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ok_lag", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
|
@ -92,24 +107,35 @@ def test_cluster_has_replica_ok_with_count_thresholds_lag(
|
|||
"1MB",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=0\n"
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko.json"
|
||||
patroni_path: Union[str, Path] = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko")
|
||||
def test_cluster_has_replica_ko_with_count_thresholds(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
|
@ -117,24 +143,22 @@ def test_cluster_has_replica_ko_with_count_thresholds(
|
|||
"@0",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 sync_replica=0 unhealthy_replica=1\n"
|
||||
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko")
|
||||
def test_cluster_has_replica_ko_with_sync_count_thresholds(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ko", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--sync-warning",
|
||||
"2:",
|
||||
|
@ -142,25 +166,36 @@ def test_cluster_has_replica_ko_with_sync_count_thresholds(
|
|||
"1:",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
# The lag on srv2 is "unknown". We don't handle string in perfstats so we have to scratch all the second node stats
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 sync_replica=0;2:;1: unhealthy_replica=1\n"
|
||||
== "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0;2:;1: unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_lag(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_lag.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_lag")
|
||||
def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
|
||||
mocker: MockerFixture,
|
||||
use_old_replica_state: bool,
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_replica_ko_lag", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
|
@ -170,8 +205,84 @@ def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
|
|||
"1MB",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv3_lag=20000000 srv3_sync=0 sync_replica=0 unhealthy_replica=2\n"
|
||||
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv2_timeline=51 srv3_lag=20000000 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=2\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_wrong_tl(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_wrong_tl.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_wrong_tl")
|
||||
def test_cluster_has_replica_ko_wrong_tl(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv2_lag=1000000 srv2_sync=0 srv2_timeline=50 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_all_replica(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_all_replica.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_all_replica")
|
||||
def test_cluster_has_replica_ko_all_replica(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv1_lag=0 srv1_sync=0 srv1_timeline=51 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=3\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
|
|
@ -1,19 +1,16 @@
|
|||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_cluster_has_scheduled_action_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_scheduled_action_ok", 200)
|
||||
with patroni_api.routes({"cluster": "cluster_has_scheduled_action_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
@ -23,13 +20,13 @@ def test_cluster_has_scheduled_action_ok(
|
|||
|
||||
|
||||
def test_cluster_has_scheduled_action_ko_switchover(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_scheduled_action_ko_switchover", 200)
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_has_scheduled_action_ko_switchover.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
@ -39,13 +36,13 @@ def test_cluster_has_scheduled_action_ko_switchover(
|
|||
|
||||
|
||||
def test_cluster_has_scheduled_action_ko_restart(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_has_scheduled_action_ko_restart", 200)
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_has_scheduled_action_ko_restart.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_has_scheduled_action"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
|
|
@ -1,19 +1,16 @@
|
|||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_cluster_is_in_maintenance_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_is_in_maintenance_ok", 200)
|
||||
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
@ -23,13 +20,11 @@ def test_cluster_is_in_maintenance_ok(
|
|||
|
||||
|
||||
def test_cluster_is_in_maintenance_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_is_in_maintenance_ko", 200)
|
||||
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ko.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
@ -39,13 +34,13 @@ def test_cluster_is_in_maintenance_ko(
|
|||
|
||||
|
||||
def test_cluster_is_in_maintenance_ok_pause_false(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_is_in_maintenance_ok_pause_false", 200)
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_is_in_maintenance_ok_pause_false.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_is_in_maintenance"]
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
|
|
@ -1,22 +1,33 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ok")
|
||||
def test_cluster_node_count_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_node_count"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
if use_old_replica_state:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_node_count"])
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=3\n"
|
||||
|
@ -26,19 +37,18 @@ def test_cluster_node_count_ok(
|
|||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ok")
|
||||
def test_cluster_node_count_ok_with_thresholds(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_ok", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@0:1",
|
||||
|
@ -50,8 +60,7 @@ def test_cluster_node_count_ok_with_thresholds(
|
|||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
if use_old_replica_state:
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=3\n"
|
||||
|
@ -61,19 +70,31 @@ def test_cluster_node_count_ok_with_thresholds(
|
|||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_healthy_warning(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_healthy_warning.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_healthy_warning")
|
||||
def test_cluster_node_count_healthy_warning(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_healthy_warning", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
|
@ -81,8 +102,7 @@ def test_cluster_node_count_healthy_warning(
|
|||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
if use_old_replica_state:
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=2\n"
|
||||
|
@ -92,19 +112,31 @@ def test_cluster_node_count_healthy_warning(
|
|||
result.output
|
||||
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_healthy_critical(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_healthy_critical.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_healthy_critical")
|
||||
def test_cluster_node_count_healthy_critical(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_healthy_critical", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
|
@ -112,24 +144,35 @@ def test_cluster_node_count_healthy_critical(
|
|||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_leader=1 role_replica=2 state_running=1 state_start_failed=2\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_warning(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_warning.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_warning")
|
||||
def test_cluster_node_count_warning(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_warning", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@2",
|
||||
|
@ -137,8 +180,7 @@ def test_cluster_node_count_warning(
|
|||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
if use_old_replica_state:
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=2\n"
|
||||
|
@ -148,19 +190,31 @@ def test_cluster_node_count_warning(
|
|||
result.stdout
|
||||
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_critical(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_critical.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_critical")
|
||||
def test_cluster_node_count_critical(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "cluster_node_count_critical", 200, use_old_replica_state)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@2",
|
||||
|
@ -168,8 +222,51 @@ def test_cluster_node_count_critical(
|
|||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT CRITICAL - members is 1 (outside range @0:1) | healthy_members=1 members=1;@2;@1 role_leader=1 state_running=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_ko_in_archive_recovery(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_ko_in_archive_recovery.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ko_in_archive_recovery")
|
||||
def test_cluster_node_count_ko_in_archive_recovery(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
"--healthy-critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_running=3\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
else:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_in_archive_recovery=2 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
|
|
@ -1,16 +1,19 @@
|
|||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, None, 200)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
|
||||
def test_node_is_alive_ok(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
liveness = tmp_path / "liveness"
|
||||
liveness.touch()
|
||||
with patroni_api.routes({"liveness": liveness}):
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
|
@ -18,11 +21,8 @@ def test_node_is_alive_ok(mocker: MockerFixture, use_old_replica_state: bool) ->
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_alive_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, None, 404)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_alive"])
|
||||
def test_node_is_alive_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
|
|
|
@ -1,28 +1,37 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture
|
||||
def node_is_leader_ok(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes(
|
||||
{
|
||||
"leader": "node_is_leader_ok.json",
|
||||
"standby-leader": "node_is_leader_ok_standby_leader.json",
|
||||
}
|
||||
):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "node_is_leader_ok", 200)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
|
||||
|
||||
@pytest.mark.usefixtures("node_is_leader_ok")
|
||||
def test_node_is_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER OK - This node is a leader node. | is_leader=1;;@0\n"
|
||||
)
|
||||
|
||||
my_mock(mocker, "node_is_leader_ok_standby_leader", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
|
||||
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
|
||||
)
|
||||
print(result.stdout)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
|
@ -30,21 +39,17 @@ def test_node_is_leader_ok(mocker: MockerFixture, use_old_replica_state: bool) -
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_leader_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_leader_ko", 503)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_leader"])
|
||||
def test_node_is_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER CRITICAL - This node is not a leader node. | is_leader=0;;@0\n"
|
||||
)
|
||||
|
||||
my_mock(mocker, "node_is_leader_ko_standby_leader", 503)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["-e", "https://10.20.199.3:8008", "node_is_leader", "--is-standby-leader"],
|
||||
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
|
|
@ -1,19 +1,14 @@
|
|||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_pending_restart_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_pending_restart_ok", 200)
|
||||
def test_node_is_pending_restart_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
@ -22,14 +17,10 @@ def test_node_is_pending_restart_ok(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_pending_restart_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_pending_restart_ko", 200)
|
||||
def test_node_is_pending_restart_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ko.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_pending_restart"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
|
|
@ -1,16 +1,13 @@
|
|||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_primary_ok", 200)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
|
||||
def test_node_is_primary_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"primary": "node_is_primary_ok.json"}):
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
|
@ -18,11 +15,8 @@ def test_node_is_primary_ok(mocker: MockerFixture, use_old_replica_state: bool)
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_primary_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_primary_ko", 503)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_primary"])
|
||||
def test_node_is_primary_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
|
|
|
@ -1,16 +1,27 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture
|
||||
def node_is_replica_ok(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes(
|
||||
{
|
||||
k: "node_is_replica_ok.json"
|
||||
for k in ("replica", "synchronous", "asynchronous")
|
||||
}
|
||||
):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "node_is_replica_ok", 200)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
|
@ -18,11 +29,8 @@ def test_node_is_replica_ok(mocker: MockerFixture, use_old_replica_state: bool)
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_is_replica_ko", 503)
|
||||
result = runner.invoke(main, ["-e", "https://10.20.199.3:8008", "node_is_replica"])
|
||||
def test_node_is_replica_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
|
@ -30,15 +38,10 @@ def test_node_is_replica_ko(mocker: MockerFixture, use_old_replica_state: bool)
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_ko_lag(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
def test_node_is_replica_ko_lag(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 503)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--max-lag", "100"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--max-lag", "100"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
@ -46,12 +49,11 @@ def test_node_is_replica_ko_lag(
|
|||
== "NODEISREPLICA CRITICAL - This node is not a running replica with no noloadbalance tag and a lag under 100. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
my_mock(mocker, "node_is_replica_ok", 503)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-async",
|
||||
"--max-lag",
|
||||
|
@ -65,15 +67,11 @@ def test_node_is_replica_ko_lag(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_sync_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_sync_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 200)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
@ -82,15 +80,10 @@ def test_node_is_replica_sync_ok(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_sync_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
def test_node_is_replica_sync_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 503)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-sync"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
@ -99,15 +92,11 @@ def test_node_is_replica_sync_ko(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_async_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_async_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 200)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
|
@ -116,15 +105,10 @@ def test_node_is_replica_async_ok(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_async_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
def test_node_is_replica_async_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 503)
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_is_replica", "--is-async"]
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
|
@ -133,18 +117,14 @@ def test_node_is_replica_async_ko(
|
|||
)
|
||||
|
||||
|
||||
def test_node_is_replica_params(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_params(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-async",
|
||||
"--is-sync",
|
||||
|
@ -157,12 +137,11 @@ def test_node_is_replica_params(
|
|||
)
|
||||
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
my_mock(mocker, "node_is_replica_ok", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-sync",
|
||||
"--max-lag",
|
||||
|
|
|
@ -1,22 +1,25 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_patroni_version_ok(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def node_patroni_version(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"patroni": "node_patroni_version.json"}):
|
||||
yield None
|
||||
|
||||
my_mock(mocker, "node_patroni_version", 200)
|
||||
|
||||
def test_node_patroni_version_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_patroni_version",
|
||||
"--patroni-version",
|
||||
"2.0.2",
|
||||
|
@ -29,17 +32,12 @@ def test_node_patroni_version_ok(
|
|||
)
|
||||
|
||||
|
||||
def test_node_patroni_version_ko(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_patroni_version", 200)
|
||||
def test_node_patroni_version_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_patroni_version",
|
||||
"--patroni-version",
|
||||
"1.0.0",
|
||||
|
|
|
@ -1,23 +1,30 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator
|
||||
|
||||
import nagiosplugin
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from .tools import here, my_mock
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def node_tl_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"patroni": "node_tl_has_changed.json"}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ok_with_timeline(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_tl_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"58",
|
||||
|
@ -30,23 +37,22 @@ def test_node_tl_has_changed_ok_with_timeline(
|
|||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ok_with_state_file(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
with open(here / "node_tl_has_changed.state_file", "w") as f:
|
||||
state_file = tmp_path / "node_tl_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"timeline": 58}')
|
||||
|
||||
my_mock(mocker, "node_tl_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(here / "node_tl_has_changed.state_file"),
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
@ -56,17 +62,15 @@ def test_node_tl_has_changed_ok_with_state_file(
|
|||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ko_with_timeline(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_tl_has_changed", 200)
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"700",
|
||||
|
@ -79,24 +83,23 @@ def test_node_tl_has_changed_ko_with_timeline(
|
|||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ko_with_state_file_and_save(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
runner = CliRunner()
|
||||
|
||||
with open(here / "node_tl_has_changed.state_file", "w") as f:
|
||||
state_file = tmp_path / "node_tl_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"timeline": 700}')
|
||||
|
||||
my_mock(mocker, "node_tl_has_changed", 200)
|
||||
# test without saving the new tl
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(here / "node_tl_has_changed.state_file"),
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
@ -105,7 +108,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
|
|||
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_tl = cookie.get("timeline")
|
||||
cookie.close()
|
||||
|
@ -117,10 +120,10 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
|
|||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(here / "node_tl_has_changed.state_file"),
|
||||
str(state_file),
|
||||
"--save",
|
||||
],
|
||||
)
|
||||
|
@ -130,7 +133,7 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
|
|||
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(here / "node_tl_has_changed.state_file")
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_tl = cookie.get("timeline")
|
||||
cookie.close()
|
||||
|
@ -138,23 +141,22 @@ def test_node_tl_has_changed_ko_with_state_file_and_save(
|
|||
assert new_tl == 58
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_params(
|
||||
mocker: MockerFixture, use_old_replica_state: bool
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
|
||||
runner = CliRunner()
|
||||
|
||||
my_mock(mocker, "node_tl_has_changed", 200)
|
||||
fake_state_file = tmp_path / "fake_file_name.state_file"
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
"https://10.20.199.3:8008",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"58",
|
||||
"--state-file",
|
||||
str(here / "fake_file_name.state_file"),
|
||||
str(fake_state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
|
@ -163,9 +165,7 @@ def test_node_tl_has_changed_params(
|
|||
== "NODETLHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --timeline or --state-file should be provided for this service\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "node_tl_has_changed"]
|
||||
)
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_tl_has_changed"])
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
|
|
|
@ -1,49 +0,0 @@
|
|||
import json
|
||||
import pathlib
|
||||
from typing import Any
|
||||
|
||||
from pytest_mock import MockerFixture
|
||||
|
||||
from check_patroni.types import APIError, PatroniResource
|
||||
|
||||
here = pathlib.Path(__file__).parent
|
||||
|
||||
|
||||
def getjson(name: str) -> Any:
|
||||
path = here / "json" / f"{name}.json"
|
||||
if not path.exists():
|
||||
raise Exception(f"path does not exist : {path}")
|
||||
|
||||
with path.open() as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def my_mock(
|
||||
mocker: MockerFixture,
|
||||
json_file: str,
|
||||
status: int,
|
||||
use_old_replica_state: bool = False,
|
||||
) -> None:
|
||||
def mock_rest_api(self: PatroniResource, service: str) -> Any:
|
||||
if status != 200:
|
||||
raise APIError("Test en erreur pour status code 200")
|
||||
if json_file:
|
||||
if use_old_replica_state and (
|
||||
json_file.startswith("cluster_has_replica")
|
||||
or json_file.startswith("cluster_node_count")
|
||||
):
|
||||
return cluster_api_set_replica_running(getjson(json_file))
|
||||
return getjson(json_file)
|
||||
return None
|
||||
|
||||
mocker.resetall()
|
||||
mocker.patch("check_patroni.types.PatroniResource.rest_api", mock_rest_api)
|
||||
|
||||
|
||||
def cluster_api_set_replica_running(js: Any) -> Any:
|
||||
# starting from 3.0.4 the state of replicas is streaming instead of running
|
||||
for node in js["members"]:
|
||||
if node["role"] in ["replica", "sync_standby"]:
|
||||
if node["state"] == "streaming":
|
||||
node["state"] = "running"
|
||||
return js
|
10
tox.ini
10
tox.ini
|
@ -4,11 +4,9 @@ envlist = lint, mypy, py{37,38,39,310,311}
|
|||
skip_missing_interpreters = True
|
||||
|
||||
[testenv]
|
||||
deps =
|
||||
pytest
|
||||
pytest-mock
|
||||
extras = test
|
||||
commands =
|
||||
pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv}
|
||||
pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv --log-level=debug}
|
||||
|
||||
[testenv:lint]
|
||||
skip_install = True
|
||||
|
@ -18,7 +16,7 @@ deps =
|
|||
flake8
|
||||
isort
|
||||
commands =
|
||||
codespell {toxinidir}/check_patroni {toxinidir}/tests
|
||||
codespell {toxinidir}/check_patroni {toxinidir}/tests {toxinidir}/docs/ {toxinidir}/RELEASE.md {toxinidir}/CONTRIBUTING.md
|
||||
black --check --diff {toxinidir}/check_patroni {toxinidir}/tests
|
||||
flake8 {toxinidir}/check_patroni {toxinidir}/tests
|
||||
isort --check --diff {toxinidir}/check_patroni {toxinidir}/tests
|
||||
|
@ -28,7 +26,7 @@ deps =
|
|||
mypy == 0.961
|
||||
commands =
|
||||
# we need to install types-requests
|
||||
mypy --install-types --non-interactive {toxinidir}/check_patroni
|
||||
mypy --install-types --non-interactive
|
||||
|
||||
[testenv:build]
|
||||
deps =
|
||||
|
|
|
@ -100,7 +100,7 @@ http://$IP/icingaweb2/setup
|
|||
|
||||
Finish
|
||||
|
||||
* Screen 15: Hopefuly success
|
||||
* Screen 15: Hopefully success
|
||||
|
||||
Login
|
||||
|
||||
|
|
|
@ -66,7 +66,7 @@ icinga_setup(){
|
|||
info "# Icinga setup"
|
||||
info "#============================================================================="
|
||||
|
||||
## this part is already done by the standart icinga install with the user icinga2
|
||||
## this part is already done by the standard icinga install with the user icinga2
|
||||
## and a random password, here we dont really care
|
||||
|
||||
cat << __EOF__ | sudo -u postgres psql
|
||||
|
@ -83,7 +83,7 @@ __EOF__
|
|||
icingacli setup config directory --group icingaweb2
|
||||
icingacli setup token create
|
||||
|
||||
## this part is already done by the standart icinga install with the user icinga2
|
||||
## this part is already done by the standard icinga install with the user icinga2
|
||||
cat << __EOF__ > /etc/icinga2/features-available/ido-pgsql.conf
|
||||
/**
|
||||
* The db_ido_pgsql library implements IDO functionality
|
||||
|
@ -198,7 +198,7 @@ grafana(){
|
|||
cat << __EOF__ > /etc/grafana/grafana.ini
|
||||
[database]
|
||||
# You can configure the database connection by specifying type, host, name, user and password
|
||||
# as seperate properties or as on string using the url propertie.
|
||||
# as separate properties or as on string using the url property.
|
||||
|
||||
# Either "mysql", "postgres" or "sqlite3", it's your choice
|
||||
type = postgres
|
||||
|
|
Loading…
Reference in a new issue