Compare commits
No commits in common. "debian/latest" and "pristine-tar" have entirely different histories.
debian/lat
...
pristine-t
|
@ -1,3 +0,0 @@
|
|||
[run]
|
||||
include =
|
||||
check_patroni/*
|
13
.flake8
13
.flake8
|
@ -1,13 +0,0 @@
|
|||
[flake8]
|
||||
doctests = True
|
||||
ignore =
|
||||
# line too long
|
||||
E501,
|
||||
# line break before binary operator (added by black)
|
||||
W503,
|
||||
exclude =
|
||||
.git,
|
||||
.mypy_cache,
|
||||
.tox,
|
||||
.venv,
|
||||
mypy_config = mypy.ini
|
16
.github/workflows/lint.yml
vendored
16
.github/workflows/lint.yml
vendored
|
@ -1,16 +0,0 @@
|
|||
name: Lint
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v2
|
||||
- name: Install tox
|
||||
run: pip install tox
|
||||
- name: Lint (black & flake8)
|
||||
run: tox -e lint
|
||||
- name: Mypy
|
||||
run: tox -e mypy
|
28
.github/workflows/publish.yml
vendored
28
.github/workflows/publish.yml
vendored
|
@ -1,28 +0,0 @@
|
|||
name: Publish
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
|
||||
jobs:
|
||||
publish:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- name: Install
|
||||
run: python -m pip install setuptools wheel twine
|
||||
- name: Build
|
||||
run: |
|
||||
python setup.py check
|
||||
python setup.py sdist bdist_wheel
|
||||
python -m twine check dist/*
|
||||
- name: Publish
|
||||
run: python -m twine upload dist/*
|
||||
env:
|
||||
TWINE_USERNAME: __token__
|
||||
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
|
22
.github/workflows/tests.yml
vendored
22
.github/workflows/tests.yml
vendored
|
@ -1,22 +0,0 @@
|
|||
name: Tests
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
tests:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- python: "3.7"
|
||||
- python: "3.11"
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: ${{ matrix.python }}
|
||||
- name: Install tox
|
||||
run: pip install tox
|
||||
- name: Test
|
||||
run: tox -e py
|
11
.gitignore
vendored
11
.gitignore
vendored
|
@ -1,11 +0,0 @@
|
|||
__pycache__/
|
||||
check_patroni.egg-info
|
||||
tests/config.ini
|
||||
vagrant/.vagrant
|
||||
vagrant/*.state_file
|
||||
.*.swp
|
||||
.coverage
|
||||
.venv/
|
||||
.tox/
|
||||
dist/
|
||||
build/
|
97
CHANGELOG.md
97
CHANGELOG.md
|
@ -1,97 +0,0 @@
|
|||
# Change log
|
||||
|
||||
## check_patroni 2.0.0 - 2024-04-09
|
||||
|
||||
### Changed
|
||||
|
||||
* In `cluster_node_count`, a healthy standby, sync replica or standby leaders cannot be "in
|
||||
archive recovery" because this service doesn't check for lag and timelines.
|
||||
|
||||
### Added
|
||||
|
||||
* Add the timeline in the `cluster_has_replica` perfstats. (#50)
|
||||
* Add a mention about shell completion support and shell versions in the doc. (#53)
|
||||
* Add the leader type and whether it's archiving to the `cluster_has_leader` perfstats. (#58)
|
||||
|
||||
### Fixed
|
||||
|
||||
* Add compatibility with [requests](https://requests.readthedocs.io)
|
||||
version 2.25 and higher.
|
||||
* Fix what `cluster_has_replica` deems a healthy replica. (#50, reported by @mbanck)
|
||||
* Fix `cluster_has_replica` to display perfstats for replicas whenever it's possible (healthy or not). (#50)
|
||||
* Fix `cluster_has_leader` to correctly check for standby leaders. (#58, reported by @mbanck)
|
||||
* Fix `cluster_node_count` to correctly manage replication states. (#50, reported by @mbanck)
|
||||
|
||||
### Misc
|
||||
|
||||
* Improve the documentation for `node_is_replica`.
|
||||
* Improve test coverage by running an HTTP server to fake the Patroni API (#55
|
||||
by @dlax).
|
||||
* Work around old pytest versions in type annotations in the test suite.
|
||||
* Declare compatibility with click version 7.1 (or higher).
|
||||
* In tests, work around nagiosplugin 1.3.2 not properly handling stdout
|
||||
redirection.
|
||||
|
||||
## check_patroni 1.0.0 - 2023-08-28
|
||||
|
||||
Check patroni is now tagged as Production/Stable.
|
||||
|
||||
### Added
|
||||
|
||||
* Add `sync_standby` as a valid replica type for `cluster_has_replica`. (contributed by @mattpoel)
|
||||
* Add info and options (`--sync-warning` and `--sync-critical`) about sync replica to `cluster_has_replica`.
|
||||
* Add a new service `cluster_has_scheduled_action` to warn of any scheduled switchover or restart.
|
||||
* Add options to `node_is_replica` to check specifically for a synchronous (`--is-sync`) or asynchronous node (`--is-async`).
|
||||
* Add `standby-leader` as a valid leader type for `cluster_has_leader`.
|
||||
* Add a new service `node_is_leader` to check if a node is a leader (which includes standby leader nodes)
|
||||
|
||||
### Fixed
|
||||
|
||||
* Fix the `node_is_alive` check. (#31)
|
||||
* Fix the `cluster_has_replica` and `cluster_node_count` checks to account for
|
||||
the new replica state `streaming` introduced in v3.0.4 (#28, reported by @log1-c)
|
||||
|
||||
### Misc
|
||||
|
||||
* Create CHANGELOG.md
|
||||
* Add tests for the output of the scripts in addition to the return code
|
||||
* Documentation in CONTRIBUTING.md
|
||||
|
||||
## check_patroni 0.2.0 - 2023-03-20
|
||||
|
||||
### Added
|
||||
|
||||
* Add a `--save` option when state files are used
|
||||
* Modify `-e/--endpoints` to allow a comma separated list of endpoints (#21, reported by @lihnjo)
|
||||
* Use requests instead of urllib3 (with extensive help from @dlax)
|
||||
* Change the way logging is handled (with extensive help from @dlax)
|
||||
|
||||
### Fix
|
||||
|
||||
* Reverse the test for `node_is_pending`
|
||||
* SSL handling
|
||||
|
||||
### Misc
|
||||
|
||||
* Several doc Fix and Updates
|
||||
* Use spellcheck and isort
|
||||
* Remove tests for python 3.6
|
||||
* Add python tests for python 3.11
|
||||
|
||||
## check_patroni 0.1.1 - 2022-07-15
|
||||
|
||||
The initial release covers the following checks :
|
||||
|
||||
* check a cluster for
|
||||
+ configuration change
|
||||
+ presence of a leader
|
||||
+ presence of a replica
|
||||
+ maintenance status
|
||||
* check a node for
|
||||
+ liveness
|
||||
+ pending restart status
|
||||
+ primary status
|
||||
+ replica status
|
||||
+ tl change
|
||||
+ patroni version
|
||||
|
|
@ -1,94 +0,0 @@
|
|||
# Contributing to check_patroni
|
||||
|
||||
Thanks for your interest in contributing to check_patroni.
|
||||
|
||||
## Clone Git Repository
|
||||
|
||||
Installation from the git repository:
|
||||
|
||||
```
|
||||
$ git clone https://github.com/dalibo/check_patroni.git
|
||||
$ cd check_patroni
|
||||
```
|
||||
|
||||
Change the branch if necessary.
|
||||
|
||||
## Create Python Virtual Environment
|
||||
|
||||
You need a dedicated environment, install dependencies and then check_patroni
|
||||
from the repo:
|
||||
|
||||
```
|
||||
$ python3 -m venv .venv
|
||||
$ . .venv/bin/activate
|
||||
(.venv) $ pip3 install .[test]
|
||||
(.venv) $ pip3 install -r requirements-dev.txt
|
||||
(.venv) $ check_patroni
|
||||
```
|
||||
|
||||
To quit this env and destroy it:
|
||||
|
||||
```
|
||||
$ deactivate
|
||||
$ rm -r .venv
|
||||
```
|
||||
|
||||
## Development Environment
|
||||
|
||||
A vagrant file is available to create a icinga / opm / grafana stack and
|
||||
install check_patroni. You can then add a server to the supervision and
|
||||
watch the graphs in grafana. It's in the `vagrant` directory.
|
||||
|
||||
A vagrant file can be found in [this
|
||||
repository](https://github.com/ioguix/vagrant-patroni) to generate a patroni/etcd
|
||||
setup.
|
||||
|
||||
The `README.md` can be generated with `./docs/make_readme.sh`.
|
||||
|
||||
## Executing Tests
|
||||
|
||||
Crafting repeatable tests using a live Patroni cluster can be intricate. To
|
||||
simplify the development process, a fake HTTP server is set up as a test
|
||||
fixture and serves static files (either from `tests/json` directory or from
|
||||
in-memory data).
|
||||
|
||||
An important consideration is that there is a potential drawback: if the JSON
|
||||
data is incorrect or if modifications have been made to Patroni without
|
||||
corresponding updates to the tests documented here, the tests might still pass
|
||||
erroneously.
|
||||
|
||||
The tests are executed automatically for each PR using the ci (see
|
||||
`.github/workflow/lint.yml` and `.github/workflow/tests.yml`).
|
||||
|
||||
Running the tests,
|
||||
|
||||
* manually:
|
||||
|
||||
```bash
|
||||
pytest --cov tests
|
||||
```
|
||||
|
||||
* or using tox:
|
||||
|
||||
```bash
|
||||
tox -e lint # mypy + flake8 + black + isort ° codespell
|
||||
tox # pytests and "lint" tests for all supported version of python
|
||||
tox -e py # pytests and "lint" tests for the default version of python
|
||||
```
|
||||
|
||||
Please note that when dealing with any service that checks the state of a node,
|
||||
the related tests must use the `old_replica_state` fixture to test with both
|
||||
old (pre 3.0.4) and new replica states.
|
||||
|
||||
A bash script, `check_patroni.sh`, is provided to facilitate testing all
|
||||
services on a Patroni endpoint (`./vagrant/check_patroni.sh`). It requires one
|
||||
parameter: the endpoint URL that will be used as the argument for the
|
||||
`-e/--endpoints` option of `check_patroni`. This script essentially compiles a
|
||||
list of service calls and executes them sequentially in a bash script. It
|
||||
creates a state file in the directory from which you run the script.
|
||||
|
||||
Here's an example usage:
|
||||
|
||||
```bash
|
||||
./vagrant/check_patroni.sh http://10.20.30.51:8008
|
||||
```
|
19
LICENSE
19
LICENSE
|
@ -1,19 +0,0 @@
|
|||
PostgreSQL Licence
|
||||
|
||||
Copyright (c) 2022, DALIBO
|
||||
|
||||
Permission to use, copy, modify, and distribute this software and its
|
||||
documentation for any purpose, without fee, and without a written agreement is
|
||||
hereby granted, provided that the above copyright notice and this paragraph and
|
||||
the following two paragraphs appear in all copies.
|
||||
|
||||
IN NO EVENT SHALL DALIBO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL,
|
||||
INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE
|
||||
USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF DALIBO HAS BEEN ADVISED OF
|
||||
THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
DALIBO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
|
||||
SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND DALIBO HAS NO
|
||||
OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR
|
||||
MODIFICATIONS.
|
10
MANIFEST.in
10
MANIFEST.in
|
@ -1,10 +0,0 @@
|
|||
include *.md
|
||||
include mypy.ini
|
||||
include pytest.ini
|
||||
include tox.ini
|
||||
include .coveragerc
|
||||
include .flake8
|
||||
include pyproject.toml
|
||||
recursive-include docs *.sh
|
||||
recursive-include tests *.json
|
||||
recursive-include tests *.py
|
514
README.md
514
README.md
|
@ -1,514 +0,0 @@
|
|||
# check_patroni
|
||||
|
||||
A nagios plugin for patroni.
|
||||
|
||||
## Features
|
||||
|
||||
- Check presence of leader, replicas, node counts.
|
||||
- Check each node for replication status.
|
||||
|
||||
|
||||
```
|
||||
Usage: check_patroni [OPTIONS] COMMAND [ARGS]...
|
||||
|
||||
Nagios plugin that uses Patroni's REST API to monitor a Patroni cluster.
|
||||
|
||||
Options:
|
||||
--config FILE Read option defaults from the specified INI file
|
||||
[default: config.ini]
|
||||
-e, --endpoints TEXT Patroni API endpoint. Can be specified multiple times
|
||||
or as a list of comma separated addresses. The node
|
||||
services checks the status of one node, therefore if
|
||||
several addresses are specified they should point to
|
||||
different interfaces on the same node. The cluster
|
||||
services check the status of the cluster, therefore
|
||||
it's better to give a list of all Patroni node
|
||||
addresses. [default: http://127.0.0.1:8008]
|
||||
--cert_file PATH File with the client certificate.
|
||||
--key_file PATH File with the client key.
|
||||
--ca_file PATH The CA certificate.
|
||||
-v, --verbose Increase verbosity -v (info)/-vv (warning)/-vvv
|
||||
(debug)
|
||||
--version
|
||||
--timeout INTEGER Timeout in seconds for the API queries (0 to disable)
|
||||
[default: 2]
|
||||
--help Show this message and exit.
|
||||
|
||||
Commands:
|
||||
cluster_config_has_changed Check if the hash of the configuration...
|
||||
cluster_has_leader Check if the cluster has a leader.
|
||||
cluster_has_replica Check if the cluster has healthy replicas...
|
||||
cluster_has_scheduled_action Check if the cluster has a scheduled...
|
||||
cluster_is_in_maintenance Check if the cluster is in maintenance...
|
||||
cluster_node_count Count the number of nodes in the cluster.
|
||||
node_is_alive Check if the node is alive ie patroni is...
|
||||
node_is_leader Check if the node is a leader node.
|
||||
node_is_pending_restart Check if the node is in pending restart...
|
||||
node_is_primary Check if the node is the primary with the...
|
||||
node_is_replica Check if the node is a replica with no...
|
||||
node_patroni_version Check if the version is equal to the input
|
||||
node_tl_has_changed Check if the timeline has changed.
|
||||
```
|
||||
|
||||
## Install
|
||||
|
||||
check_patroni is licensed under PostgreSQL license.
|
||||
|
||||
```
|
||||
$ pip install git+https://github.com/dalibo/check_patroni.git
|
||||
```
|
||||
|
||||
check_patroni works on python 3.6, we keep it that way because patroni also
|
||||
supports it and there are still lots of RH 7 variants around. That being said
|
||||
python 3.6 has been EOL for ages and there is no support for it in the github
|
||||
CI.
|
||||
|
||||
## Support
|
||||
|
||||
If you hit a bug or need help, open a [GitHub
|
||||
issue](https://github.com/dalibo/check_patroni/issues/new). Dalibo has no
|
||||
commitment on response time for public free support. Thanks for you
|
||||
contribution !
|
||||
|
||||
## Config file
|
||||
|
||||
All global and service specific parameters can be specified via a config file has follows:
|
||||
|
||||
```
|
||||
[options]
|
||||
endpoints = https://10.20.199.3:8008, https://10.20.199.4:8008,https://10.20.199.5:8008
|
||||
cert_file = ./ssl/my-cert.pem
|
||||
key_file = ./ssl/my-key.pem
|
||||
ca_file = ./ssl/CA-cert.pem
|
||||
timeout = 0
|
||||
|
||||
[options.node_is_replica]
|
||||
lag=100
|
||||
```
|
||||
## Thresholds
|
||||
|
||||
The format for the threshold parameters is `[@][start:][end]`.
|
||||
|
||||
* `start:` may be omitted if `start == 0`
|
||||
* `~:` means that start is negative infinity
|
||||
* If `end` is omitted, infinity is assumed
|
||||
* To invert the match condition, prefix the range expression with `@`.
|
||||
|
||||
A match is found when: `start <= VALUE <= end`.
|
||||
|
||||
For example, the following command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
|
||||
|
||||
```
|
||||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||
```
|
||||
|
||||
## SSL
|
||||
|
||||
Several options are available:
|
||||
|
||||
* the server's CA certificate is not available or trusted by the client system:
|
||||
* `--ca_cert`: your certification chain `cat CA-certificate server-certificate > cabundle`
|
||||
* you have a client certificate for authenticating with Patroni's REST API:
|
||||
* `--cert_file`: your certificate or the concatenation of your certificate and private key
|
||||
* `--key_file`: your private key (optional)
|
||||
|
||||
## Shell completion
|
||||
|
||||
We use the [click] library which supports shell completion natively.
|
||||
|
||||
Shell completion can be added by typing the following command or adding it to
|
||||
a file spécific to your shell of choice.
|
||||
|
||||
* for Bash (add to `~/.bashrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
|
||||
```
|
||||
* for Zsh (add to `~/.zshrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
|
||||
```
|
||||
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
|
||||
```
|
||||
|
||||
Please note that shell completion is not supported far all shell versions, for
|
||||
example only Bash versions older than 4.4 are supported.
|
||||
|
||||
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
|
||||
|
||||
## Cluster services
|
||||
|
||||
### cluster_config_has_changed
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_config_has_changed [OPTIONS]
|
||||
|
||||
Check if the hash of the configuration has changed.
|
||||
|
||||
Note: either a hash or a state file must be provided for this service to
|
||||
work.
|
||||
|
||||
Check:
|
||||
* `OK`: The hash didn't change
|
||||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||
|
||||
Perfdata:
|
||||
* `is_configuration_changed` is 1 if the configuration has changed
|
||||
|
||||
Options:
|
||||
--hash TEXT A hash to compare with.
|
||||
-s, --state-file TEXT A state file to store the hash of the configuration.
|
||||
--save Set the current configuration hash as the reference
|
||||
for future calls.
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### cluster_has_leader
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_has_leader [OPTIONS]
|
||||
|
||||
Check if the cluster has a leader.
|
||||
|
||||
This check applies to any kind of leaders including standby leaders.
|
||||
|
||||
A leader is a node with the "leader" role and a "running" state.
|
||||
|
||||
A standby leader is a node with a "standby_leader" role and a "streaming" or
|
||||
"in archive recovery" state. Please note that log shipping could be stuck
|
||||
because the WAL are not available or applicable. Patroni doesn't provide
|
||||
information about the origin cluster (timeline or lag), so we cannot check
|
||||
if there is a problem in that particular case. That's why we issue a warning
|
||||
when the node is "in archive recovery". We suggest using other supervision
|
||||
tools to do this (eg. check_pgactivity).
|
||||
|
||||
Check:
|
||||
* `OK`: if there is a leader node.
|
||||
* 'WARNING': if there is a stanby leader in archive mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata:
|
||||
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
|
||||
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
|
||||
archive recovery", 0 otherwise
|
||||
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
|
||||
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### cluster_has_replica
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_has_replica [OPTIONS]
|
||||
|
||||
Check if the cluster has healthy replicas and/or if some are sync standbies
|
||||
|
||||
For patroni (and this check):
|
||||
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
|
||||
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
|
||||
|
||||
A healthy replica:
|
||||
* has a `replica` or `sync_standby` role
|
||||
* has the same timeline as the leader and
|
||||
* is in `running` state (patroni < V3.0.4)
|
||||
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
|
||||
* has a lag lower or equal to `max_lag`
|
||||
|
||||
Please note that replica `in archive recovery` could be stuck because the
|
||||
WAL are not available or applicable (the server's timeline has diverged for
|
||||
the leader's). We already detect the latter but we will miss the former.
|
||||
Therefore, it's preferable to check for the lag in addition to the healthy
|
||||
state if you rely on log shipping to help lagging standbies to catch up.
|
||||
|
||||
Since we require a healthy replica to have the same timeline as the leader,
|
||||
it's possible that we raise alerts when the cluster is performing a
|
||||
switchover or failover and the standbies are in the process of catching up
|
||||
with the new leader. The alert shouldn't last long.
|
||||
|
||||
Check:
|
||||
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
||||
and if the sync_replica count is compatible with the sync replica count threshold.
|
||||
* `WARNING` / `CRITICAL`: otherwise
|
||||
|
||||
Perfdata:
|
||||
* healthy_replica & unhealthy_replica count
|
||||
* the number of sync_replica, they are included in the previous count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
* the timeline of each replica labelled with "member name"_timeline
|
||||
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
|
||||
|
||||
Options:
|
||||
-w, --warning TEXT Warning threshold for the number of healthy replica
|
||||
nodes.
|
||||
-c, --critical TEXT Critical threshold for the number of healthy replica
|
||||
nodes.
|
||||
--sync-warning TEXT Warning threshold for the number of sync replica.
|
||||
--sync-critical TEXT Critical threshold for the number of sync replica.
|
||||
--max-lag TEXT maximum allowed lag
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### cluster_has_scheduled_action
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_has_scheduled_action [OPTIONS]
|
||||
|
||||
Check if the cluster has a scheduled action (switchover or restart)
|
||||
|
||||
Check:
|
||||
* `OK`: If the cluster has no scheduled action
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata:
|
||||
* `scheduled_actions` is 1 if the cluster has scheduled actions.
|
||||
* `scheduled_switchover` is 1 if the cluster has a scheduled switchover.
|
||||
* `scheduled_restart` counts the number of scheduled restart in the cluster.
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### cluster_is_in_maintenance
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_is_in_maintenance [OPTIONS]
|
||||
|
||||
Check if the cluster is in maintenance mode or paused.
|
||||
|
||||
Check:
|
||||
* `OK`: If the cluster is in maintenance mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata:
|
||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### cluster_node_count
|
||||
|
||||
```
|
||||
Usage: check_patroni cluster_node_count [OPTIONS]
|
||||
|
||||
Count the number of nodes in the cluster.
|
||||
|
||||
The role refers to the role of the server in the cluster. Possible values
|
||||
are:
|
||||
* master or leader
|
||||
* replica
|
||||
* standby_leader
|
||||
* sync_standby
|
||||
* demoted
|
||||
* promoted
|
||||
* uninitialized
|
||||
|
||||
The state refers to the state of PostgreSQL. Possible values are:
|
||||
* initializing new cluster, initdb failed
|
||||
* running custom bootstrap script, custom bootstrap failed
|
||||
* starting, start failed
|
||||
* restarting, restart failed
|
||||
* running, streaming, in archive recovery
|
||||
* stopping, stopped, stop failed
|
||||
* creating replica
|
||||
* crashed
|
||||
|
||||
The "healthy" checks only ensures that:
|
||||
* a leader has the running state
|
||||
* a standby_leader has the running or streaming (V3.0.4) state
|
||||
* a replica or sync-standby has the running or streaming (V3.0.4) state
|
||||
|
||||
Since we dont check the lag or timeline, "in archive recovery" is not
|
||||
considered a valid state for this service. See cluster_has_leader and
|
||||
cluster_has_replica for specialized checks.
|
||||
|
||||
Check:
|
||||
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
|
||||
* `OK`: If they are not provided.
|
||||
|
||||
Perfdata:
|
||||
* `members`: the member count.
|
||||
* `healthy_members`: the running and streaming member count.
|
||||
* all the roles of the nodes in the cluster with their count (start with "role_").
|
||||
* all the statuses of the nodes in the cluster with their count (start with "state_").
|
||||
|
||||
Options:
|
||||
-w, --warning TEXT Warning threshold for the number of nodes.
|
||||
-c, --critical TEXT Critical threshold for the number of nodes.
|
||||
--healthy-warning TEXT Warning threshold for the number of healthy nodes
|
||||
(running + streaming).
|
||||
--healthy-critical TEXT Critical threshold for the number of healthy nodes
|
||||
(running + streaming).
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
## Node services
|
||||
|
||||
### node_is_alive
|
||||
|
||||
```
|
||||
Usage: check_patroni node_is_alive [OPTIONS]
|
||||
|
||||
Check if the node is alive ie patroni is running. This is a liveness check
|
||||
as defined in Patroni's documentation.
|
||||
|
||||
Check:
|
||||
* `OK`: If patroni is running.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata:
|
||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_is_pending_restart
|
||||
|
||||
```
|
||||
Usage: check_patroni node_is_pending_restart [OPTIONS]
|
||||
|
||||
Check if the node is in pending restart state.
|
||||
|
||||
This situation can arise if the configuration has been modified but requires
|
||||
a restart of PostgreSQL to take effect.
|
||||
|
||||
Check:
|
||||
* `OK`: if the node has no pending restart tag.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata: `is_pending_restart` is 1 if the node has pending restart tag, 0
|
||||
otherwise.
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_is_leader
|
||||
|
||||
```
|
||||
Usage: check_patroni node_is_leader [OPTIONS]
|
||||
|
||||
Check if the node is a leader node.
|
||||
|
||||
This check applies to any kind of leaders including standby leaders. To
|
||||
check explicitly for a standby leader use the `--is-standby-leader` option.
|
||||
|
||||
Check:
|
||||
* `OK`: if the node is a leader.
|
||||
* `CRITICAL:` otherwise
|
||||
|
||||
Perfdata: `is_leader` is 1 if the node is a leader node, 0 otherwise.
|
||||
|
||||
Options:
|
||||
--is-standby-leader Check for a standby leader
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_is_primary
|
||||
|
||||
```
|
||||
Usage: check_patroni node_is_primary [OPTIONS]
|
||||
|
||||
Check if the node is the primary with the leader lock.
|
||||
|
||||
This service is not valid for a standby leader, because this kind of node is
|
||||
not a primary.
|
||||
|
||||
Check:
|
||||
* `OK`: if the node is a primary with the leader lock.
|
||||
* `CRITICAL:` otherwise
|
||||
|
||||
Perfdata: `is_primary` is 1 if the node is a primary with the leader lock, 0
|
||||
otherwise.
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_is_replica
|
||||
|
||||
```
|
||||
Usage: check_patroni node_is_replica [OPTIONS]
|
||||
|
||||
Check if the node is a replica with no noloadbalance tag.
|
||||
|
||||
It is possible to check if the node is synchronous or asynchronous. If
|
||||
nothing is specified any kind of replica is accepted. When checking for a
|
||||
synchronous replica, it's not possible to specify a lag.
|
||||
|
||||
This service is using the following Patroni endpoints: replica, asynchronous
|
||||
and synchronous. The first two implement the `lag` tag. For these endpoints
|
||||
the state of a replica node doesn't reflect the replication state
|
||||
(`streaming` or `in archive recovery`), we only know if it's `running`. The
|
||||
timeline is also not checked.
|
||||
|
||||
Therefore, if a cluster is using asynchronous replication, it is recommended
|
||||
to check for the lag to detect a divegence as soon as possible.
|
||||
|
||||
Check:
|
||||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata: `is_replica` is 1 if the node is a running replica with
|
||||
noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||
|
||||
Options:
|
||||
--max-lag TEXT maximum allowed lag
|
||||
--is-sync check if the replica is synchronous
|
||||
--is-async check if the replica is asynchronous
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_patroni_version
|
||||
|
||||
```
|
||||
Usage: check_patroni node_patroni_version [OPTIONS]
|
||||
|
||||
Check if the version is equal to the input
|
||||
|
||||
Check:
|
||||
* `OK`: The version is the same as the input `--patroni-version`
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata:
|
||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||
|
||||
Options:
|
||||
--patroni-version TEXT Patroni version to compare to [required]
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### node_tl_has_changed
|
||||
|
||||
```
|
||||
Usage: check_patroni node_tl_has_changed [OPTIONS]
|
||||
|
||||
Check if the timeline has changed.
|
||||
|
||||
Note: either a timeline or a state file must be provided for this service to
|
||||
work.
|
||||
|
||||
Check:
|
||||
* `OK`: The timeline is the same as last time (`--state_file`) or the inputted timeline (`--timeline`)
|
||||
* `CRITICAL`: The tl is not the same.
|
||||
|
||||
Perfdata:
|
||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||
* the timeline
|
||||
|
||||
Options:
|
||||
--timeline TEXT A timeline number to compare with.
|
||||
-s, --state-file TEXT A state file to store the last tl number into.
|
||||
--save Set the current timeline number as the reference for
|
||||
future calls.
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
|
38
RELEASE.md
38
RELEASE.md
|
@ -1,38 +0,0 @@
|
|||
# Release HOW TO
|
||||
|
||||
## Preparatory changes
|
||||
|
||||
* Review the **Unreleased** section, if any, in `CHANGELOG.md` possibly adding
|
||||
any missing item from closed issues, merged pull requests, or directly the
|
||||
git history[^git-changes],
|
||||
* Rename the **Unreleased** section according to the version to be released,
|
||||
with a date,
|
||||
* Bump the version in `check_patroni/__init__.py`,
|
||||
* Rebuild the `README.md` (`cd docs; ./make_readme.sh`),
|
||||
* Commit these changes (either on a dedicated branch, before submitting a pull
|
||||
request or directly on the `master` branch) with the commit message `release
|
||||
X.Y.Z`.
|
||||
* Then, when changes landed in the `master` branch, create an annotated (and
|
||||
possibly signed) tag, as `git tag -a [-s] -m 'release X.Y.Z' vX.Y.Z`,
|
||||
and,
|
||||
* Push with `--follow-tags`.
|
||||
|
||||
[^git-changes]: Use `git log $(git describe --tags --abbrev=0).. --format=%s
|
||||
--reverse` to get commits from the previous tag.
|
||||
|
||||
## PyPI package
|
||||
|
||||
The package is generated and uploaded to pypi when a `v*` tag is created (see
|
||||
`.github/workflow/publish.yml`).
|
||||
|
||||
Alternatively, the release can be done manually with:
|
||||
|
||||
```
|
||||
tox -e build
|
||||
tox -e upload
|
||||
```
|
||||
|
||||
## GitHub release
|
||||
|
||||
Draft a new release from the release page, choosing the tag just pushed and
|
||||
copy the relevant change log section as a description.
|
BIN
check-patroni_1.0.0.orig.tar.gz.delta
Normal file
BIN
check-patroni_1.0.0.orig.tar.gz.delta
Normal file
Binary file not shown.
1
check-patroni_1.0.0.orig.tar.gz.id
Normal file
1
check-patroni_1.0.0.orig.tar.gz.id
Normal file
|
@ -0,0 +1 @@
|
|||
b2b3623e4494aa159395ea754eaeb599bd3e73cf
|
BIN
check-patroni_2.0.0.orig.tar.gz.delta
Normal file
BIN
check-patroni_2.0.0.orig.tar.gz.delta
Normal file
Binary file not shown.
1
check-patroni_2.0.0.orig.tar.gz.id
Normal file
1
check-patroni_2.0.0.orig.tar.gz.id
Normal file
|
@ -0,0 +1 @@
|
|||
a53d3e6184ae38be80088911068e7b30d2ac2b22
|
|
@ -1,5 +0,0 @@
|
|||
import logging
|
||||
|
||||
__version__ = "2.0.0"
|
||||
|
||||
_log: logging.Logger = logging.getLogger(__name__)
|
|
@ -1,4 +0,0 @@
|
|||
from .cli import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -1,809 +0,0 @@
|
|||
import logging
|
||||
import re
|
||||
from configparser import ConfigParser
|
||||
from typing import List
|
||||
|
||||
import click
|
||||
import nagiosplugin
|
||||
|
||||
from . import __version__, _log
|
||||
from .cluster import (
|
||||
ClusterConfigHasChanged,
|
||||
ClusterConfigHasChangedSummary,
|
||||
ClusterHasLeader,
|
||||
ClusterHasLeaderSummary,
|
||||
ClusterHasReplica,
|
||||
ClusterHasScheduledAction,
|
||||
ClusterIsInMaintenance,
|
||||
ClusterNodeCount,
|
||||
)
|
||||
from .convert import size_to_byte
|
||||
from .node import (
|
||||
NodeIsAlive,
|
||||
NodeIsAliveSummary,
|
||||
NodeIsLeader,
|
||||
NodeIsLeaderSummary,
|
||||
NodeIsPendingRestart,
|
||||
NodeIsPendingRestartSummary,
|
||||
NodeIsPrimary,
|
||||
NodeIsPrimarySummary,
|
||||
NodeIsReplica,
|
||||
NodeIsReplicaSummary,
|
||||
NodePatroniVersion,
|
||||
NodePatroniVersionSummary,
|
||||
NodeTLHasChanged,
|
||||
NodeTLHasChangedSummary,
|
||||
)
|
||||
from .types import ConnectionInfo, Parameters
|
||||
|
||||
DEFAULT_CFG = "config.ini"
|
||||
handler = logging.StreamHandler()
|
||||
handler.setFormatter(logging.Formatter("%(levelname)s - %(message)s"))
|
||||
_log.addHandler(handler)
|
||||
|
||||
|
||||
def print_version(ctx: click.Context, param: str, value: str) -> None:
|
||||
if not value or ctx.resilient_parsing:
|
||||
return
|
||||
click.echo(f"Version {__version__}")
|
||||
ctx.exit()
|
||||
|
||||
|
||||
def configure(ctx: click.Context, param: str, filename: str) -> None:
|
||||
"""Use a config file for the parameters
|
||||
stolen from https://jwodder.github.io/kbits/posts/click-config/
|
||||
"""
|
||||
# FIXME should use click-configfile / click-config-file ?
|
||||
cfg = ConfigParser()
|
||||
cfg.read(filename)
|
||||
ctx.default_map = {}
|
||||
for sect in cfg.sections():
|
||||
command_path = sect.split(".")
|
||||
if command_path[0] != "options":
|
||||
continue
|
||||
defaults = ctx.default_map
|
||||
for cmdname in command_path[1:]:
|
||||
defaults = defaults.setdefault(cmdname, {})
|
||||
defaults.update(cfg[sect])
|
||||
try:
|
||||
# endpoints is an array of addresses separated by ,
|
||||
if isinstance(defaults["endpoints"], str):
|
||||
defaults["endpoints"] = re.split(r"\s*,\s*", defaults["endpoints"])
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
|
||||
@click.group()
|
||||
@click.option(
|
||||
"--config",
|
||||
type=click.Path(dir_okay=False),
|
||||
default=DEFAULT_CFG,
|
||||
callback=configure,
|
||||
is_eager=True,
|
||||
expose_value=False,
|
||||
help="Read option defaults from the specified INI file",
|
||||
show_default=True,
|
||||
)
|
||||
@click.option(
|
||||
"-e",
|
||||
"--endpoints",
|
||||
"endpoints",
|
||||
type=str,
|
||||
multiple=True,
|
||||
default=["http://127.0.0.1:8008"],
|
||||
help=(
|
||||
"Patroni API endpoint. Can be specified multiple times or as a list "
|
||||
"of comma separated addresses. "
|
||||
"The node services checks the status of one node, therefore if "
|
||||
"several addresses are specified they should point to different "
|
||||
"interfaces on the same node. The cluster services check the "
|
||||
"status of the cluster, therefore it's better to give a list of "
|
||||
"all Patroni node addresses."
|
||||
),
|
||||
show_default=True,
|
||||
)
|
||||
@click.option(
|
||||
"--cert_file",
|
||||
"cert_file",
|
||||
type=click.Path(exists=True),
|
||||
default=None,
|
||||
help="File with the client certificate.",
|
||||
)
|
||||
@click.option(
|
||||
"--key_file",
|
||||
"key_file",
|
||||
type=click.Path(exists=True),
|
||||
default=None,
|
||||
help="File with the client key.",
|
||||
)
|
||||
@click.option(
|
||||
"--ca_file",
|
||||
"ca_file",
|
||||
type=click.Path(exists=True),
|
||||
default=None,
|
||||
help="The CA certificate.",
|
||||
)
|
||||
@click.option(
|
||||
"-v",
|
||||
"--verbose",
|
||||
"verbose",
|
||||
count=True,
|
||||
default=0,
|
||||
help="Increase verbosity -v (info)/-vv (warning)/-vvv (debug)",
|
||||
show_default=False,
|
||||
)
|
||||
@click.option(
|
||||
"--version", is_flag=True, callback=print_version, expose_value=False, is_eager=True
|
||||
)
|
||||
@click.option(
|
||||
"--timeout",
|
||||
"timeout",
|
||||
default=2,
|
||||
type=int,
|
||||
help="Timeout in seconds for the API queries (0 to disable)",
|
||||
show_default=True,
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def main(
|
||||
ctx: click.Context,
|
||||
endpoints: List[str],
|
||||
cert_file: str,
|
||||
key_file: str,
|
||||
ca_file: str,
|
||||
verbose: int,
|
||||
timeout: int,
|
||||
) -> None:
|
||||
"""Nagios plugin that uses Patroni's REST API to monitor a Patroni cluster."""
|
||||
# FIXME Not all "is/has" services have the same return code for ok. Check if it's ok
|
||||
|
||||
# We use this to pass parameters instead of ctx.parent.params because the
|
||||
# latter is typed as Optional[Context] and mypy complains with the following
|
||||
# error unless we test if ctx.parent is none which looked ugly.
|
||||
#
|
||||
# error: Item "None" of "Optional[Context]" has an attribute "params" [union-attr]
|
||||
|
||||
# The config file allows endpoints to be specified as a comma separated list of endpoints
|
||||
# To avoid confusion, We allow the same in command line parameters
|
||||
tendpoints: List[str] = []
|
||||
for e in endpoints:
|
||||
tendpoints += re.split(r"\s*,\s*", e)
|
||||
endpoints = tendpoints
|
||||
|
||||
if verbose == 3:
|
||||
logging.getLogger("urllib3").addHandler(handler)
|
||||
logging.getLogger("urllib3").setLevel(logging.DEBUG)
|
||||
_log.setLevel(logging.DEBUG)
|
||||
|
||||
connection_info: ConnectionInfo
|
||||
if cert_file is None and key_file is None:
|
||||
connection_info = ConnectionInfo(endpoints, None, ca_file)
|
||||
else:
|
||||
connection_info = ConnectionInfo(endpoints, (cert_file, key_file), ca_file)
|
||||
|
||||
ctx.obj = Parameters(
|
||||
connection_info,
|
||||
timeout,
|
||||
verbose,
|
||||
)
|
||||
|
||||
|
||||
@main.command(name="cluster_node_count") # required otherwise _ are converted to -
|
||||
@click.option(
|
||||
"-w",
|
||||
"--warning",
|
||||
"warning",
|
||||
type=str,
|
||||
help="Warning threshold for the number of nodes.",
|
||||
)
|
||||
@click.option(
|
||||
"-c",
|
||||
"--critical",
|
||||
"critical",
|
||||
type=str,
|
||||
help="Critical threshold for the number of nodes.",
|
||||
)
|
||||
@click.option(
|
||||
"--healthy-warning",
|
||||
"healthy_warning",
|
||||
type=str,
|
||||
help="Warning threshold for the number of healthy nodes (running + streaming).",
|
||||
)
|
||||
@click.option(
|
||||
"--healthy-critical",
|
||||
"healthy_critical",
|
||||
type=str,
|
||||
help="Critical threshold for the number of healthy nodes (running + streaming).",
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_node_count(
|
||||
ctx: click.Context,
|
||||
warning: str,
|
||||
critical: str,
|
||||
healthy_warning: str,
|
||||
healthy_critical: str,
|
||||
) -> None:
|
||||
"""Count the number of nodes in the cluster.
|
||||
|
||||
\b
|
||||
The role refers to the role of the server in the cluster. Possible values
|
||||
are:
|
||||
* master or leader
|
||||
* replica
|
||||
* standby_leader
|
||||
* sync_standby
|
||||
* demoted
|
||||
* promoted
|
||||
* uninitialized
|
||||
|
||||
\b
|
||||
The state refers to the state of PostgreSQL. Possible values are:
|
||||
* initializing new cluster, initdb failed
|
||||
* running custom bootstrap script, custom bootstrap failed
|
||||
* starting, start failed
|
||||
* restarting, restart failed
|
||||
* running, streaming, in archive recovery
|
||||
* stopping, stopped, stop failed
|
||||
* creating replica
|
||||
* crashed
|
||||
|
||||
\b
|
||||
The "healthy" checks only ensures that:
|
||||
* a leader has the running state
|
||||
* a standby_leader has the running or streaming (V3.0.4) state
|
||||
* a replica or sync-standby has the running or streaming (V3.0.4) state
|
||||
|
||||
Since we dont check the lag or timeline, "in archive recovery" is not considered a valid state
|
||||
for this service. See cluster_has_leader and cluster_has_replica for specialized checks.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* Compares the number of nodes against the normal and healthy nodes warning and critical thresholds.
|
||||
* `OK`: If they are not provided.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `members`: the member count.
|
||||
* `healthy_members`: the running and streaming member count.
|
||||
* all the roles of the nodes in the cluster with their count (start with "role_").
|
||||
* all the statuses of the nodes in the cluster with their count (start with "state_").
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterNodeCount(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext(
|
||||
"members",
|
||||
warning,
|
||||
critical,
|
||||
),
|
||||
nagiosplugin.ScalarContext(
|
||||
"healthy_members",
|
||||
healthy_warning,
|
||||
healthy_critical,
|
||||
),
|
||||
nagiosplugin.ScalarContext("member_roles"),
|
||||
nagiosplugin.ScalarContext("member_statuses"),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="cluster_has_leader")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_has_leader(ctx: click.Context) -> None:
|
||||
"""Check if the cluster has a leader.
|
||||
|
||||
This check applies to any kind of leaders including standby leaders.
|
||||
|
||||
A leader is a node with the "leader" role and a "running" state.
|
||||
|
||||
A standby leader is a node with a "standby_leader" role and a "streaming"
|
||||
or "in archive recovery" state. Please note that log shipping could be
|
||||
stuck because the WAL are not available or applicable. Patroni doesn't
|
||||
provide information about the origin cluster (timeline or lag), so we
|
||||
cannot check if there is a problem in that particular case. That's why we
|
||||
issue a warning when the node is "in archive recovery". We suggest using
|
||||
other supervision tools to do this (eg. check_pgactivity).
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if there is a leader node.
|
||||
* 'WARNING': if there is a stanby leader in archive mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `has_leader` is 1 if there is any kind of leader node, 0 otherwise
|
||||
* `is_standby_leader_in_arc_rec` is 1 if the standby leader node is "in
|
||||
archive recovery", 0 otherwise
|
||||
* `is_standby_leader` is 1 if there is a standby leader node, 0 otherwise
|
||||
* `is_leader` is 1 if there is a "classical" leader node, 0 otherwise
|
||||
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterHasLeader(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("has_leader", None, "@0:0"),
|
||||
nagiosplugin.ScalarContext("is_standby_leader_in_arc_rec", "@1:1", None),
|
||||
nagiosplugin.ScalarContext("is_leader", None, None),
|
||||
nagiosplugin.ScalarContext("is_standby_leader", None, None),
|
||||
ClusterHasLeaderSummary(),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="cluster_has_replica")
|
||||
@click.option(
|
||||
"-w",
|
||||
"--warning",
|
||||
"warning",
|
||||
type=str,
|
||||
help="Warning threshold for the number of healthy replica nodes.",
|
||||
)
|
||||
@click.option(
|
||||
"-c",
|
||||
"--critical",
|
||||
"critical",
|
||||
type=str,
|
||||
help="Critical threshold for the number of healthy replica nodes.",
|
||||
)
|
||||
@click.option(
|
||||
"--sync-warning",
|
||||
"sync_warning",
|
||||
type=str,
|
||||
help="Warning threshold for the number of sync replica.",
|
||||
)
|
||||
@click.option(
|
||||
"--sync-critical",
|
||||
"sync_critical",
|
||||
type=str,
|
||||
help="Critical threshold for the number of sync replica.",
|
||||
)
|
||||
@click.option("--max-lag", "max_lag", type=str, help="maximum allowed lag")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_has_replica(
|
||||
ctx: click.Context,
|
||||
warning: str,
|
||||
critical: str,
|
||||
sync_warning: str,
|
||||
sync_critical: str,
|
||||
max_lag: str,
|
||||
) -> None:
|
||||
"""Check if the cluster has healthy replicas and/or if some are sync standbies
|
||||
|
||||
\b
|
||||
For patroni (and this check):
|
||||
* a replica is `streaming` if the `pg_stat_wal_receiver` say's so.
|
||||
* a replica is `in archive recovery`, if it's not `streaming` and has a `restore_command`.
|
||||
|
||||
\b
|
||||
A healthy replica:
|
||||
* has a `replica` or `sync_standby` role
|
||||
* has the same timeline as the leader and
|
||||
* is in `running` state (patroni < V3.0.4)
|
||||
* is in `streaming` or `in archive recovery` state (patroni >= V3.0.4)
|
||||
* has a lag lower or equal to `max_lag`
|
||||
|
||||
Please note that replica `in archive recovery` could be stuck because the WAL
|
||||
are not available or applicable (the server's timeline has diverged for the
|
||||
leader's). We already detect the latter but we will miss the former.
|
||||
Therefore, it's preferable to check for the lag in addition to the healthy
|
||||
state if you rely on log shipping to help lagging standbies to catch up.
|
||||
|
||||
Since we require a healthy replica to have the same timeline as the
|
||||
leader, it's possible that we raise alerts when the cluster is performing a
|
||||
switchover or failover and the standbies are in the process of catching up with
|
||||
the new leader. The alert shouldn't last long.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
||||
and if the sync_replica count is compatible with the sync replica count threshold.
|
||||
* `WARNING` / `CRITICAL`: otherwise
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* healthy_replica & unhealthy_replica count
|
||||
* the number of sync_replica, they are included in the previous count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
* the timeline of each replica labelled with "member name"_timeline
|
||||
* a boolean to tell if the node is a sync stanbdy labelled with "member name"_sync
|
||||
"""
|
||||
|
||||
tmax_lag = size_to_byte(max_lag) if max_lag is not None else None
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterHasReplica(ctx.obj.connection_info, tmax_lag),
|
||||
nagiosplugin.ScalarContext(
|
||||
"healthy_replica",
|
||||
warning,
|
||||
critical,
|
||||
),
|
||||
nagiosplugin.ScalarContext(
|
||||
"sync_replica",
|
||||
sync_warning,
|
||||
sync_critical,
|
||||
),
|
||||
nagiosplugin.ScalarContext("unhealthy_replica"),
|
||||
nagiosplugin.ScalarContext("replica_lag"),
|
||||
nagiosplugin.ScalarContext("replica_timeline"),
|
||||
nagiosplugin.ScalarContext("replica_sync"),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="cluster_config_has_changed")
|
||||
@click.option("--hash", "config_hash", type=str, help="A hash to compare with.")
|
||||
@click.option(
|
||||
"-s",
|
||||
"--state-file",
|
||||
"state_file",
|
||||
type=str,
|
||||
help="A state file to store the hash of the configuration.",
|
||||
)
|
||||
@click.option(
|
||||
"--save",
|
||||
"save_config",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="Set the current configuration hash as the reference for future calls.",
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_config_has_changed(
|
||||
ctx: click.Context, config_hash: str, state_file: str, save_config: bool
|
||||
) -> None:
|
||||
"""Check if the hash of the configuration has changed.
|
||||
|
||||
Note: either a hash or a state file must be provided for this service to work.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: The hash didn't change
|
||||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `is_configuration_changed` is 1 if the configuration has changed
|
||||
"""
|
||||
# Note: hash cannot be in the perf data = not a number
|
||||
if (config_hash is None and state_file is None) or (
|
||||
config_hash is not None and state_file is not None
|
||||
):
|
||||
raise click.UsageError(
|
||||
"Either --hash or --state-file should be provided for this service", ctx
|
||||
)
|
||||
|
||||
old_config_hash = config_hash
|
||||
if state_file is not None:
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
old_config_hash = cookie.get("hash")
|
||||
cookie.close()
|
||||
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterConfigHasChanged(
|
||||
ctx.obj.connection_info, old_config_hash, state_file, save_config
|
||||
),
|
||||
nagiosplugin.ScalarContext("is_configuration_changed", None, "@1:1"),
|
||||
ClusterConfigHasChangedSummary(old_config_hash),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="cluster_is_in_maintenance")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
||||
"""Check if the cluster is in maintenance mode or paused.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: If the cluster is in maintenance mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterIsInMaintenance(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("is_in_maintenance", None, "0:0"),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="cluster_has_scheduled_action")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_has_scheduled_action(ctx: click.Context) -> None:
|
||||
"""Check if the cluster has a scheduled action (switchover or restart)
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: If the cluster has no scheduled action
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `scheduled_actions` is 1 if the cluster has scheduled actions.
|
||||
* `scheduled_switchover` is 1 if the cluster has a scheduled switchover.
|
||||
* `scheduled_restart` counts the number of scheduled restart in the cluster.
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
ClusterHasScheduledAction(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("has_scheduled_actions", None, "0:0"),
|
||||
nagiosplugin.ScalarContext("scheduled_switchover"),
|
||||
nagiosplugin.ScalarContext("scheduled_restart"),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_is_primary")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_is_primary(ctx: click.Context) -> None:
|
||||
"""Check if the node is the primary with the leader lock.
|
||||
|
||||
This service is not valid for a standby leader, because this kind of node is not a primary.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if the node is a primary with the leader lock.
|
||||
* `CRITICAL:` otherwise
|
||||
|
||||
Perfdata: `is_primary` is 1 if the node is a primary with the leader lock, 0 otherwise.
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeIsPrimary(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("is_primary", None, "@0:0"),
|
||||
NodeIsPrimarySummary(),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_is_leader")
|
||||
@click.option(
|
||||
"--is-standby-leader",
|
||||
"check_standby_leader",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="Check for a standby leader",
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_is_leader(ctx: click.Context, check_standby_leader: bool) -> None:
|
||||
"""Check if the node is a leader node.
|
||||
|
||||
This check applies to any kind of leaders including standby leaders.
|
||||
To check explicitly for a standby leader use the `--is-standby-leader` option.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if the node is a leader.
|
||||
* `CRITICAL:` otherwise
|
||||
|
||||
Perfdata: `is_leader` is 1 if the node is a leader node, 0 otherwise.
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeIsLeader(ctx.obj.connection_info, check_standby_leader),
|
||||
nagiosplugin.ScalarContext("is_leader", None, "@0:0"),
|
||||
NodeIsLeaderSummary(check_standby_leader),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_is_replica")
|
||||
@click.option("--max-lag", "max_lag", type=str, help="maximum allowed lag")
|
||||
@click.option(
|
||||
"--is-sync",
|
||||
"check_is_sync",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="check if the replica is synchronous",
|
||||
)
|
||||
@click.option(
|
||||
"--is-async",
|
||||
"check_is_async",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="check if the replica is asynchronous",
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_is_replica(
|
||||
ctx: click.Context, max_lag: str, check_is_sync: bool, check_is_async: bool
|
||||
) -> None:
|
||||
"""Check if the node is a replica with no noloadbalance tag.
|
||||
|
||||
It is possible to check if the node is synchronous or asynchronous. If
|
||||
nothing is specified any kind of replica is accepted. When checking for a
|
||||
synchronous replica, it's not possible to specify a lag.
|
||||
|
||||
This service is using the following Patroni endpoints: replica, asynchronous
|
||||
and synchronous. The first two implement the `lag` tag. For these endpoints
|
||||
the state of a replica node doesn't reflect the replication state
|
||||
(`streaming` or `in archive recovery`), we only know if it's `running`. The
|
||||
timeline is also not checked.
|
||||
|
||||
Therefore, if a cluster is using asynchronous replication, it is
|
||||
recommended to check for the lag to detect a divegence as soon as possible.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata: `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||
"""
|
||||
|
||||
if check_is_sync and max_lag is not None:
|
||||
raise click.UsageError(
|
||||
"--is-sync and --max-lag cannot be provided at the same time for this service",
|
||||
ctx,
|
||||
)
|
||||
|
||||
if check_is_sync and check_is_async:
|
||||
raise click.UsageError(
|
||||
"--is-sync and --is-async cannot be provided at the same time for this service",
|
||||
ctx,
|
||||
)
|
||||
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeIsReplica(ctx.obj.connection_info, max_lag, check_is_sync, check_is_async),
|
||||
nagiosplugin.ScalarContext("is_replica", None, "@0:0"),
|
||||
NodeIsReplicaSummary(max_lag, check_is_sync, check_is_async),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_is_pending_restart")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_is_pending_restart(ctx: click.Context) -> None:
|
||||
"""Check if the node is in pending restart state.
|
||||
|
||||
This situation can arise if the configuration has been modified but
|
||||
requires a restart of PostgreSQL to take effect.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: if the node has no pending restart tag.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata: `is_pending_restart` is 1 if the node has pending restart tag, 0 otherwise.
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeIsPendingRestart(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("is_pending_restart", None, "0:0"),
|
||||
NodeIsPendingRestartSummary(),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_tl_has_changed")
|
||||
@click.option(
|
||||
"--timeline", "timeline", type=str, help="A timeline number to compare with."
|
||||
)
|
||||
@click.option(
|
||||
"-s",
|
||||
"--state-file",
|
||||
"state_file",
|
||||
type=str,
|
||||
help="A state file to store the last tl number into.",
|
||||
)
|
||||
@click.option(
|
||||
"--save",
|
||||
"save_tl",
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help="Set the current timeline number as the reference for future calls.",
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_tl_has_changed(
|
||||
ctx: click.Context, timeline: str, state_file: str, save_tl: bool
|
||||
) -> None:
|
||||
"""Check if the timeline has changed.
|
||||
|
||||
Note: either a timeline or a state file must be provided for this service to work.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: The timeline is the same as last time (`--state_file`) or the inputted timeline (`--timeline`)
|
||||
* `CRITICAL`: The tl is not the same.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||
* the timeline
|
||||
"""
|
||||
if (timeline is None and state_file is None) or (
|
||||
timeline is not None and state_file is not None
|
||||
):
|
||||
raise click.UsageError(
|
||||
"Either --timeline or --state-file should be provided for this service", ctx
|
||||
)
|
||||
|
||||
old_timeline = timeline
|
||||
if state_file is not None:
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
old_timeline = cookie.get("timeline")
|
||||
cookie.close()
|
||||
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeTLHasChanged(ctx.obj.connection_info, old_timeline, state_file, save_tl),
|
||||
nagiosplugin.ScalarContext("is_timeline_changed", None, "@1:1"),
|
||||
nagiosplugin.ScalarContext("timeline"),
|
||||
NodeTLHasChangedSummary(old_timeline),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_patroni_version")
|
||||
@click.option(
|
||||
"--patroni-version",
|
||||
"patroni_version",
|
||||
type=str,
|
||||
help="Patroni version to compare to",
|
||||
required=True,
|
||||
)
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_patroni_version(ctx: click.Context, patroni_version: str) -> None:
|
||||
"""Check if the version is equal to the input
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: The version is the same as the input `--patroni-version`
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||
"""
|
||||
# TODO the version cannot be written in perfdata find something else ?
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodePatroniVersion(ctx.obj.connection_info, patroni_version),
|
||||
nagiosplugin.ScalarContext("is_version_ok", None, "@0:0"),
|
||||
nagiosplugin.ScalarContext("patroni_version"),
|
||||
NodePatroniVersionSummary(patroni_version),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
||||
|
||||
|
||||
@main.command(name="node_is_alive")
|
||||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def node_is_alive(ctx: click.Context) -> None:
|
||||
"""Check if the node is alive ie patroni is running. This is
|
||||
a liveness check as defined in Patroni's documentation.
|
||||
|
||||
\b
|
||||
Check:
|
||||
* `OK`: If patroni is running.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata:
|
||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
check.add(
|
||||
NodeIsAlive(ctx.obj.connection_info),
|
||||
nagiosplugin.ScalarContext("is_alive", None, "@0:0"),
|
||||
NodeIsAliveSummary(),
|
||||
)
|
||||
check.main(verbose=ctx.obj.verbose, timeout=ctx.obj.timeout)
|
|
@ -1,340 +0,0 @@
|
|||
import hashlib
|
||||
import json
|
||||
from collections import Counter
|
||||
from typing import Any, Iterable, Union
|
||||
|
||||
import nagiosplugin
|
||||
|
||||
from . import _log
|
||||
from .types import ConnectionInfo, PatroniResource, handle_unknown
|
||||
|
||||
|
||||
def replace_chars(text: str) -> str:
|
||||
return text.replace("'", "").replace(" ", "_")
|
||||
|
||||
|
||||
class ClusterNodeCount(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
def debug_member(member: Any, health: str) -> None:
|
||||
_log.debug(
|
||||
"Node %(node_name)s is %(health)s: role %(role)s state %(state)s.",
|
||||
{
|
||||
"node_name": member["name"],
|
||||
"health": health,
|
||||
"role": member["role"],
|
||||
"state": member["state"],
|
||||
},
|
||||
)
|
||||
|
||||
# get the cluster info
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
role_counters: Counter[str] = Counter()
|
||||
roles = []
|
||||
status_counters: Counter[str] = Counter()
|
||||
statuses = []
|
||||
healthy_member = 0
|
||||
|
||||
for member in item_dict["members"]:
|
||||
state, role = member["state"], member["role"]
|
||||
roles.append(replace_chars(role))
|
||||
statuses.append(replace_chars(state))
|
||||
|
||||
if role == "leader" and state == "running":
|
||||
healthy_member += 1
|
||||
debug_member(member, "healthy")
|
||||
continue
|
||||
|
||||
if role in ["standby_leader", "replica", "sync_standby"] and (
|
||||
(self.has_detailed_states() and state == "streaming")
|
||||
or (not self.has_detailed_states() and state == "running")
|
||||
):
|
||||
healthy_member += 1
|
||||
debug_member(member, "healthy")
|
||||
continue
|
||||
|
||||
debug_member(member, "unhealthy")
|
||||
role_counters.update(roles)
|
||||
status_counters.update(statuses)
|
||||
|
||||
# The actual check: members, healthy_members
|
||||
yield nagiosplugin.Metric("members", len(item_dict["members"]))
|
||||
yield nagiosplugin.Metric("healthy_members", healthy_member)
|
||||
|
||||
# The performance data : role
|
||||
for role in role_counters:
|
||||
yield nagiosplugin.Metric(
|
||||
f"role_{role}", role_counters[role], context="member_roles"
|
||||
)
|
||||
|
||||
# The performance data : statuses (except running)
|
||||
for state in status_counters:
|
||||
yield nagiosplugin.Metric(
|
||||
f"state_{state}", status_counters[state], context="member_statuses"
|
||||
)
|
||||
|
||||
|
||||
class ClusterHasLeader(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
is_leader_found = False
|
||||
is_standby_leader_found = False
|
||||
is_standby_leader_in_arc_rec = False
|
||||
for member in item_dict["members"]:
|
||||
if member["role"] == "leader" and member["state"] == "running":
|
||||
is_leader_found = True
|
||||
break
|
||||
|
||||
if member["role"] == "standby_leader":
|
||||
if member["state"] not in ["streaming", "in archive recovery"]:
|
||||
# for patroni >= 3.0.4 any state would be wrong
|
||||
# for patroni < 3.0.4 a state different from running would be wrong
|
||||
if self.has_detailed_states() or member["state"] != "running":
|
||||
continue
|
||||
|
||||
if member["state"] in ["in archive recovery"]:
|
||||
is_standby_leader_in_arc_rec = True
|
||||
|
||||
is_standby_leader_found = True
|
||||
break
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"has_leader",
|
||||
1 if is_leader_found or is_standby_leader_found else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_standby_leader_in_arc_rec",
|
||||
1 if is_standby_leader_in_arc_rec else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_standby_leader",
|
||||
1 if is_standby_leader_found else 0,
|
||||
),
|
||||
nagiosplugin.Metric(
|
||||
"is_leader",
|
||||
1 if is_leader_found else 0,
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
class ClusterHasLeaderSummary(nagiosplugin.Summary):
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "The cluster has a running leader."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "The cluster has no running leader or the standby leader is in archive recovery."
|
||||
|
||||
|
||||
class ClusterHasReplica(PatroniResource):
|
||||
def __init__(self, connection_info: ConnectionInfo, max_lag: Union[int, None]):
|
||||
super().__init__(connection_info)
|
||||
self.max_lag = max_lag
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
def debug_member(member: Any, health: str) -> None:
|
||||
_log.debug(
|
||||
"Node %(node_name)s is %(health)s: lag %(lag)s, state %(state)s, tl %(tl)s.",
|
||||
{
|
||||
"node_name": member["name"],
|
||||
"health": health,
|
||||
"lag": member["lag"],
|
||||
"state": member["state"],
|
||||
"tl": member["timeline"],
|
||||
},
|
||||
)
|
||||
|
||||
# get the cluster info
|
||||
cluster_item_dict = self.rest_api("cluster")
|
||||
|
||||
replicas = []
|
||||
healthy_replica = 0
|
||||
unhealthy_replica = 0
|
||||
sync_replica = 0
|
||||
leader_tl = None
|
||||
|
||||
# Look for replicas
|
||||
for member in cluster_item_dict["members"]:
|
||||
if member["role"] in ["replica", "sync_standby"]:
|
||||
if member["lag"] == "unknown":
|
||||
# This could happen if the node is stopped
|
||||
# nagiosplugin doesn't handle strings in perfstats
|
||||
# so we have to ditch all the stats in that case
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
continue
|
||||
else:
|
||||
replicas.append(
|
||||
{
|
||||
"name": member["name"],
|
||||
"lag": member["lag"],
|
||||
"timeline": member["timeline"],
|
||||
"sync": 1 if member["role"] == "sync_standby" else 0,
|
||||
}
|
||||
)
|
||||
|
||||
# Get the leader tl if we haven't already
|
||||
if leader_tl is None:
|
||||
# If there are no leaders, we will loop here for all
|
||||
# members because leader_tl will remain None. it's not
|
||||
# a big deal since having no leader is rare.
|
||||
for tmember in cluster_item_dict["members"]:
|
||||
if tmember["role"] == "leader":
|
||||
leader_tl = int(tmember["timeline"])
|
||||
break
|
||||
|
||||
_log.debug(
|
||||
"Patroni's leader_timeline is %(leader_tl)s",
|
||||
{
|
||||
"leader_tl": leader_tl,
|
||||
},
|
||||
)
|
||||
|
||||
# Test for an unhealthy replica
|
||||
if (
|
||||
self.has_detailed_states()
|
||||
and not (
|
||||
member["state"] in ["streaming", "in archive recovery"]
|
||||
and int(member["timeline"]) == leader_tl
|
||||
)
|
||||
) or (
|
||||
not self.has_detailed_states()
|
||||
and not (
|
||||
member["state"] == "running"
|
||||
and int(member["timeline"]) == leader_tl
|
||||
)
|
||||
):
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
continue
|
||||
|
||||
if member["role"] == "sync_standby":
|
||||
sync_replica += 1
|
||||
|
||||
if self.max_lag is None or self.max_lag >= int(member["lag"]):
|
||||
debug_member(member, "healthy")
|
||||
healthy_replica += 1
|
||||
else:
|
||||
debug_member(member, "unhealthy")
|
||||
unhealthy_replica += 1
|
||||
|
||||
# The actual check
|
||||
yield nagiosplugin.Metric("healthy_replica", healthy_replica)
|
||||
yield nagiosplugin.Metric("sync_replica", sync_replica)
|
||||
|
||||
# The performance data : unhealthy replica count, replicas lag
|
||||
yield nagiosplugin.Metric("unhealthy_replica", unhealthy_replica)
|
||||
for replica in replicas:
|
||||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_lag", replica["lag"], context="replica_lag"
|
||||
)
|
||||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_timeline",
|
||||
replica["timeline"],
|
||||
context="replica_timeline",
|
||||
)
|
||||
yield nagiosplugin.Metric(
|
||||
f"{replica['name']}_sync", replica["sync"], context="replica_sync"
|
||||
)
|
||||
|
||||
|
||||
# FIXME is this needed ??
|
||||
# class ClusterHasReplicaSummary(nagiosplugin.Summary):
|
||||
# def ok(self, results):
|
||||
# def problem(self, results):
|
||||
|
||||
|
||||
class ClusterConfigHasChanged(PatroniResource):
|
||||
def __init__(
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
config_hash: str, # Always contains the old hash
|
||||
state_file: str, # Only used to update the hash in the state_file (when needed)
|
||||
save: bool = False, # Save the configuration
|
||||
):
|
||||
super().__init__(connection_info)
|
||||
self.state_file = state_file
|
||||
self.config_hash = config_hash
|
||||
self.save = save
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("config")
|
||||
|
||||
new_hash = hashlib.md5(json.dumps(item_dict).encode()).hexdigest()
|
||||
|
||||
_log.debug("save result: %(issave)s", {"issave": self.save})
|
||||
old_hash = self.config_hash
|
||||
if self.state_file is not None and self.save:
|
||||
_log.debug(
|
||||
"saving new hash to state file / cookie %(state_file)s",
|
||||
{"state_file": self.state_file},
|
||||
)
|
||||
cookie = nagiosplugin.Cookie(self.state_file)
|
||||
cookie.open()
|
||||
cookie["hash"] = new_hash
|
||||
cookie.commit()
|
||||
cookie.close()
|
||||
|
||||
_log.debug(
|
||||
"hash info: old hash %(old_hash)s, new hash %(new_hash)s",
|
||||
{"old_hash": old_hash, "new_hash": new_hash},
|
||||
)
|
||||
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"is_configuration_changed",
|
||||
1 if new_hash != old_hash else 0,
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
class ClusterConfigHasChangedSummary(nagiosplugin.Summary):
|
||||
def __init__(self, config_hash: str) -> None:
|
||||
self.old_config_hash = config_hash
|
||||
|
||||
# Note: It would be helpful to display the old / new hash here. Unfortunately, it's not a metric.
|
||||
# So we only have the old / expected one.
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The hash of patroni's dynamic configuration has not changed ({self.old_config_hash})."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The hash of patroni's dynamic configuration has changed. The old hash was {self.old_config_hash}."
|
||||
|
||||
|
||||
class ClusterIsInMaintenance(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
# The actual check
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"is_in_maintenance",
|
||||
1 if "pause" in item_dict and item_dict["pause"] else 0,
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
class ClusterHasScheduledAction(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("cluster")
|
||||
|
||||
scheduled_switchover = 0
|
||||
scheduled_restart = 0
|
||||
if "scheduled_switchover" in item_dict:
|
||||
scheduled_switchover = 1
|
||||
|
||||
for member in item_dict["members"]:
|
||||
if "scheduled_restart" in member:
|
||||
scheduled_restart += 1
|
||||
|
||||
# The actual check
|
||||
yield nagiosplugin.Metric(
|
||||
"has_scheduled_actions",
|
||||
1 if (scheduled_switchover + scheduled_restart) > 0 else 0,
|
||||
)
|
||||
|
||||
# The performance data : scheduled_switchover, scheduled action count
|
||||
yield nagiosplugin.Metric("scheduled_switchover", scheduled_switchover)
|
||||
yield nagiosplugin.Metric("scheduled_restart", scheduled_restart)
|
|
@ -1,59 +0,0 @@
|
|||
import re
|
||||
from typing import Tuple, Union
|
||||
|
||||
import click
|
||||
|
||||
|
||||
def size_to_byte(value: str) -> int:
|
||||
"""Convert any size to Byte
|
||||
>>> size_to_byte('1TB')
|
||||
1099511627776
|
||||
>>> size_to_byte('5kB')
|
||||
5120
|
||||
>>> size_to_byte('.5kB')
|
||||
512
|
||||
>>> size_to_byte('.5 yoyo')
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
click.exceptions.BadParameter: Invalid unit for size f{value}
|
||||
"""
|
||||
convert = {
|
||||
"B": 1,
|
||||
"kB": 1024,
|
||||
"MB": 1024 * 1024,
|
||||
"GB": 1024 * 1024 * 1024,
|
||||
"TB": 1024 * 1024 * 1024 * 1024,
|
||||
}
|
||||
val, unit = strtod(value)
|
||||
|
||||
if val is None:
|
||||
val = 1
|
||||
|
||||
if unit is None:
|
||||
# No unit, all good
|
||||
# we can round half bytes dont really make sense
|
||||
return round(val)
|
||||
else:
|
||||
try:
|
||||
multiplicateur = convert[unit]
|
||||
except KeyError:
|
||||
raise click.BadParameter("Invalid unit for size f{value}")
|
||||
|
||||
# we can round half bytes dont really make sense
|
||||
return round(val * multiplicateur)
|
||||
|
||||
|
||||
DBL_RE = re.compile(r"^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?")
|
||||
|
||||
|
||||
def strtod(value: str) -> Tuple[Union[float, None], Union[str, None]]:
|
||||
"""As most as possible close equivalent of strtod(3) function used by postgres to parse parameter values.
|
||||
>>> strtod(' A ') == (None, 'A')
|
||||
True
|
||||
"""
|
||||
value = str(value).strip()
|
||||
match = DBL_RE.match(value)
|
||||
if match:
|
||||
end = match.end()
|
||||
return float(value[:end]), value[end:]
|
||||
return None, value
|
|
@ -1,247 +0,0 @@
|
|||
from typing import Iterable
|
||||
|
||||
import nagiosplugin
|
||||
|
||||
from . import _log
|
||||
from .types import APIError, ConnectionInfo, PatroniResource, handle_unknown
|
||||
|
||||
|
||||
class NodeIsPrimary(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
self.rest_api("primary")
|
||||
except APIError:
|
||||
return [nagiosplugin.Metric("is_primary", 0)]
|
||||
return [nagiosplugin.Metric("is_primary", 1)]
|
||||
|
||||
|
||||
class NodeIsPrimarySummary(nagiosplugin.Summary):
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is the primary with the leader lock."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is not the primary with the leader lock."
|
||||
|
||||
|
||||
class NodeIsLeader(PatroniResource):
|
||||
def __init__(
|
||||
self, connection_info: ConnectionInfo, check_is_standby_leader: bool
|
||||
) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.check_is_standby_leader = check_is_standby_leader
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
apiname = "leader"
|
||||
if self.check_is_standby_leader:
|
||||
apiname = "standby-leader"
|
||||
|
||||
try:
|
||||
self.rest_api(apiname)
|
||||
except APIError:
|
||||
return [nagiosplugin.Metric("is_leader", 0)]
|
||||
return [nagiosplugin.Metric("is_leader", 1)]
|
||||
|
||||
|
||||
class NodeIsLeaderSummary(nagiosplugin.Summary):
|
||||
def __init__(self, check_is_standby_leader: bool) -> None:
|
||||
if check_is_standby_leader:
|
||||
self.leader_kind = "standby leader"
|
||||
else:
|
||||
self.leader_kind = "leader"
|
||||
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"This node is a {self.leader_kind} node."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"This node is not a {self.leader_kind} node."
|
||||
|
||||
|
||||
class NodeIsReplica(PatroniResource):
|
||||
def __init__(
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
max_lag: str,
|
||||
check_is_sync: bool,
|
||||
check_is_async: bool,
|
||||
) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.max_lag = max_lag
|
||||
self.check_is_sync = check_is_sync
|
||||
self.check_is_async = check_is_async
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
if self.check_is_sync:
|
||||
api_name = "synchronous"
|
||||
elif self.check_is_async:
|
||||
api_name = "asynchronous"
|
||||
else:
|
||||
api_name = "replica"
|
||||
|
||||
if self.max_lag is None:
|
||||
self.rest_api(api_name)
|
||||
else:
|
||||
self.rest_api(f"{api_name}?lag={self.max_lag}")
|
||||
except APIError:
|
||||
return [nagiosplugin.Metric("is_replica", 0)]
|
||||
return [nagiosplugin.Metric("is_replica", 1)]
|
||||
|
||||
|
||||
class NodeIsReplicaSummary(nagiosplugin.Summary):
|
||||
def __init__(self, lag: str, check_is_sync: bool, check_is_async: bool) -> None:
|
||||
self.lag = lag
|
||||
if check_is_sync:
|
||||
self.replica_kind = "synchronous replica"
|
||||
elif check_is_async:
|
||||
self.replica_kind = "asynchronous replica"
|
||||
else:
|
||||
self.replica_kind = "replica"
|
||||
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
if self.lag is None:
|
||||
return (
|
||||
f"This node is a running {self.replica_kind} with no noloadbalance tag."
|
||||
)
|
||||
return f"This node is a running {self.replica_kind} with no noloadbalance tag and the lag is under {self.lag}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
if self.lag is None:
|
||||
return f"This node is not a running {self.replica_kind} with no noloadbalance tag."
|
||||
return f"This node is not a running {self.replica_kind} with no noloadbalance tag and a lag under {self.lag}."
|
||||
|
||||
|
||||
class NodeIsPendingRestart(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
|
||||
is_pending_restart = item_dict.get("pending_restart", False)
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"is_pending_restart",
|
||||
1 if is_pending_restart else 0,
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
class NodeIsPendingRestartSummary(nagiosplugin.Summary):
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node doesn't have the pending restart flag."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node has the pending restart flag."
|
||||
|
||||
|
||||
class NodeTLHasChanged(PatroniResource):
|
||||
def __init__(
|
||||
self,
|
||||
connection_info: ConnectionInfo,
|
||||
timeline: str, # Always contains the old timeline
|
||||
state_file: str, # Only used to update the timeline in the state_file (when needed)
|
||||
save: bool, # save timeline in state file
|
||||
) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.state_file = state_file
|
||||
self.timeline = timeline
|
||||
self.save = save
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
new_tl = item_dict["timeline"]
|
||||
|
||||
_log.debug("save result: %(issave)s", {"issave": self.save})
|
||||
old_tl = self.timeline
|
||||
if self.state_file is not None and self.save:
|
||||
_log.debug(
|
||||
"saving new timeline to state file / cookie %(state_file)s",
|
||||
{"state_file": self.state_file},
|
||||
)
|
||||
cookie = nagiosplugin.Cookie(self.state_file)
|
||||
cookie.open()
|
||||
cookie["timeline"] = new_tl
|
||||
cookie.commit()
|
||||
cookie.close()
|
||||
|
||||
_log.debug(
|
||||
"Tl data: old tl %(old_tl)s, new tl %(new_tl)s",
|
||||
{"old_tl": old_tl, "new_tl": new_tl},
|
||||
)
|
||||
|
||||
# The actual check
|
||||
yield nagiosplugin.Metric(
|
||||
"is_timeline_changed",
|
||||
1 if str(new_tl) != str(old_tl) else 0,
|
||||
)
|
||||
|
||||
# The performance data : the timeline number
|
||||
yield nagiosplugin.Metric("timeline", new_tl)
|
||||
|
||||
|
||||
class NodeTLHasChangedSummary(nagiosplugin.Summary):
|
||||
def __init__(self, timeline: str) -> None:
|
||||
self.timeline = timeline
|
||||
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The timeline is still {self.timeline}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return f"The expected timeline was {self.timeline} got {results['timeline'].metric}."
|
||||
|
||||
|
||||
class NodePatroniVersion(PatroniResource):
|
||||
def __init__(self, connection_info: ConnectionInfo, patroni_version: str) -> None:
|
||||
super().__init__(connection_info)
|
||||
self.patroni_version = patroni_version
|
||||
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
item_dict = self.rest_api("patroni")
|
||||
|
||||
version = item_dict["patroni"]["version"]
|
||||
_log.debug(
|
||||
"Version data: patroni version %(version)s input version %(patroni_version)s",
|
||||
{"version": version, "patroni_version": self.patroni_version},
|
||||
)
|
||||
|
||||
# The actual check
|
||||
return [
|
||||
nagiosplugin.Metric(
|
||||
"is_version_ok",
|
||||
1 if version == self.patroni_version else 0,
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
class NodePatroniVersionSummary(nagiosplugin.Summary):
|
||||
def __init__(self, patroni_version: str) -> None:
|
||||
self.patroni_version = patroni_version
|
||||
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return f"Patroni's version is {self.patroni_version}."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
# FIXME find a way to make the following work, check is perf data can be strings
|
||||
# return f"The expected patroni version was {self.patroni_version} got {results['patroni_version'].metric}."
|
||||
return f"Patroni's version is not {self.patroni_version}."
|
||||
|
||||
|
||||
class NodeIsAlive(PatroniResource):
|
||||
def probe(self) -> Iterable[nagiosplugin.Metric]:
|
||||
try:
|
||||
self.rest_api("liveness")
|
||||
except APIError:
|
||||
return [nagiosplugin.Metric("is_alive", 0)]
|
||||
return [nagiosplugin.Metric("is_alive", 1)]
|
||||
|
||||
|
||||
class NodeIsAliveSummary(nagiosplugin.Summary):
|
||||
def ok(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is alive (patroni is running)."
|
||||
|
||||
@handle_unknown
|
||||
def problem(self, results: nagiosplugin.Result) -> str:
|
||||
return "This node is not alive (patroni is not running)."
|
|
@ -1,114 +0,0 @@
|
|||
import json
|
||||
from functools import lru_cache
|
||||
from typing import Any, Callable, List, Optional, Tuple, Union
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import attr
|
||||
import nagiosplugin
|
||||
import requests
|
||||
|
||||
from . import _log
|
||||
|
||||
|
||||
class APIError(requests.exceptions.RequestException):
|
||||
"""This exception is raised when the rest api couldn't
|
||||
be reached and we got a http status code different from 200.
|
||||
"""
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True, frozen=True, slots=True)
|
||||
class ConnectionInfo:
|
||||
endpoints: List[str] = ["http://127.0.0.1:8008"]
|
||||
cert: Optional[Union[str, Tuple[str, str]]] = None
|
||||
ca_cert: Optional[str] = None
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True, frozen=True, slots=True)
|
||||
class Parameters:
|
||||
connection_info: ConnectionInfo
|
||||
timeout: int
|
||||
verbose: int
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True, eq=False, slots=True)
|
||||
class PatroniResource(nagiosplugin.Resource):
|
||||
conn_info: ConnectionInfo
|
||||
|
||||
def rest_api(self, service: str) -> Any:
|
||||
"""Try to connect to all the provided endpoints for the requested service"""
|
||||
for endpoint in self.conn_info.endpoints:
|
||||
cert: Optional[Union[Tuple[str, str], str]] = None
|
||||
verify: Optional[Union[str, bool]] = None
|
||||
if urlparse(endpoint).scheme == "https":
|
||||
if self.conn_info.cert is not None:
|
||||
# we can have: a key + a cert or a single file with key and cert.
|
||||
cert = self.conn_info.cert
|
||||
if self.conn_info.ca_cert is not None:
|
||||
verify = self.conn_info.ca_cert
|
||||
|
||||
_log.debug(
|
||||
"Trying to connect to %(endpoint)s/%(service)s with cert: %(cert)s verify: %(verify)s",
|
||||
{
|
||||
"endpoint": endpoint,
|
||||
"service": service,
|
||||
"cert": cert,
|
||||
"verify": verify,
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
r = requests.get(f"{endpoint}/{service}", verify=verify, cert=cert)
|
||||
except Exception as e:
|
||||
_log.debug(e)
|
||||
continue
|
||||
# The status code is already displayed by urllib3
|
||||
_log.debug(
|
||||
"api call data: %(data)s", {"data": r.text if r.text else "<Empty>"}
|
||||
)
|
||||
|
||||
if r.status_code != 200:
|
||||
raise APIError(
|
||||
f"Failed to connect to {endpoint}/{service} status code {r.status_code}"
|
||||
)
|
||||
|
||||
try:
|
||||
return r.json()
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
return None
|
||||
raise nagiosplugin.CheckError("Connection failed for all provided endpoints")
|
||||
|
||||
@lru_cache(maxsize=None)
|
||||
def has_detailed_states(self) -> bool:
|
||||
# get patroni's version to find out if the "streaming" and "in archive recovery" states are available
|
||||
patroni_item_dict = self.rest_api("patroni")
|
||||
|
||||
if tuple(
|
||||
int(v) for v in patroni_item_dict["patroni"]["version"].split(".", 2)
|
||||
) >= (3, 0, 4):
|
||||
_log.debug(
|
||||
"Patroni's version is %(version)s, more detailed states can be used to check for the health of replicas.",
|
||||
{"version": patroni_item_dict["patroni"]["version"]},
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
_log.debug(
|
||||
"Patroni's version is %(version)s, the running state and the timelines must be used to check for the health of replicas.",
|
||||
{"version": patroni_item_dict["patroni"]["version"]},
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
HandleUnknown = Callable[[nagiosplugin.Summary, nagiosplugin.Results], Any]
|
||||
|
||||
|
||||
def handle_unknown(func: HandleUnknown) -> HandleUnknown:
|
||||
"""decorator to handle the unknown state in Summary.problem"""
|
||||
|
||||
def wrapper(summary: nagiosplugin.Summary, results: nagiosplugin.Results) -> Any:
|
||||
if results.most_significant[0].state.code == 3:
|
||||
"""get the appropriate message for all unknown error"""
|
||||
return results.most_significant[0].hint
|
||||
return func(summary, results)
|
||||
|
||||
return wrapper
|
25
debian/changelog
vendored
25
debian/changelog
vendored
|
@ -1,25 +0,0 @@
|
|||
check-patroni (2.0.0-1~bpo12+1) bookworm-backports; urgency=medium
|
||||
|
||||
* Rebuild for bookworm-backports
|
||||
|
||||
-- David Prévot <dprevot@evolix.fr> Thu, 18 Apr 2024 16:10:08 +0200
|
||||
|
||||
check-patroni (2.0.0-1) unstable; urgency=medium
|
||||
|
||||
[ benoit ]
|
||||
* cluster_has_replica: fix the way a healthy replica is detected
|
||||
* Fix the cluster_has_leader service for standby clusters
|
||||
* Fix cluster_node_count's management of replication states
|
||||
* Fix cluster_has_leader in archive recovery tests
|
||||
* Release V2.0.0 (Closes: #1053548)
|
||||
|
||||
[ David Prévot ]
|
||||
* Update Standards-Version to 4.7.0
|
||||
|
||||
-- David Prévot <taffit@debian.org> Sun, 14 Apr 2024 09:34:48 +0200
|
||||
|
||||
check-patroni (1.0.0-1) unstable; urgency=medium
|
||||
|
||||
* Initial release, initiated by py2dsp/3.20230219
|
||||
|
||||
-- David Prévot <taffit@debian.org> Wed, 06 Sep 2023 14:26:10 +0530
|
2
debian/check_patroni.1.in
vendored
2
debian/check_patroni.1.in
vendored
|
@ -1,2 +0,0 @@
|
|||
[name]
|
||||
check-patroni \- Nagios plugin to check on patroni
|
27
debian/control
vendored
27
debian/control
vendored
|
@ -1,27 +0,0 @@
|
|||
Source: check-patroni
|
||||
Section: utils
|
||||
Priority: optional
|
||||
Maintainer: David Prévot <taffit@debian.org>
|
||||
Build-Depends: debhelper-compat (= 13),
|
||||
help2man,
|
||||
pybuild-plugin-pyproject,
|
||||
python3-all,
|
||||
python3-attr,
|
||||
python3-click,
|
||||
python3-nagiosplugin,
|
||||
python3-pytest-mock,
|
||||
python3-requests,
|
||||
python3-setuptools
|
||||
Standards-Version: 4.7.0
|
||||
Testsuite: autopkgtest-pkg-pybuild
|
||||
Homepage: https://github.com/dalibo/check_patroni
|
||||
Vcs-Git: https://salsa.debian.org/debian/check-patroni.git
|
||||
Vcs-Browser: https://salsa.debian.org/debian/check-patroni
|
||||
Rules-Requires-Root: no
|
||||
|
||||
Package: check-patroni
|
||||
Architecture: all
|
||||
Depends: ${misc:Depends}, ${python3:Depends}
|
||||
Description: Nagios plugin to check on patroni
|
||||
A nagios plugin for patroni that checks presence of leader, replicas,
|
||||
and node counts, and also checks each node for replication status.
|
55
debian/copyright
vendored
55
debian/copyright
vendored
|
@ -1,55 +0,0 @@
|
|||
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
|
||||
Upstream-Name: check-patroni
|
||||
Upstream-Contact: Dalibo <contact@dalibo.com>
|
||||
Source: https://github.com/dalibo/check_patroni
|
||||
|
||||
Files: *
|
||||
Copyright: 2022, DALIBO <contact@dalibo.com>
|
||||
License: PostgreSQL
|
||||
|
||||
Files: vagrant/*
|
||||
Copyright: 2019, Jehan-Guillaume (ioguix) de Rorthais
|
||||
License: BSD-3-clause
|
||||
|
||||
License: BSD-3-clause
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
.
|
||||
* Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
.
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
.
|
||||
* Neither the name of the copyright holder nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
.
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
License: PostgreSQL
|
||||
Permission to use, copy, modify, and distribute this software and its
|
||||
documentation for any purpose, without fee, and without a written agreement is
|
||||
hereby granted, provided that the above copyright notice and this paragraph and
|
||||
the following two paragraphs appear in all copies.
|
||||
.
|
||||
IN NO EVENT SHALL DALIBO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL,
|
||||
INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE
|
||||
USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF DALIBO HAS BEEN ADVISED OF
|
||||
THE POSSIBILITY OF SUCH DAMAGE.
|
||||
.
|
||||
DALIBO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
|
||||
SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND DALIBO HAS NO
|
||||
OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR
|
||||
MODIFICATIONS.
|
5
debian/gbp.conf
vendored
5
debian/gbp.conf
vendored
|
@ -1,5 +0,0 @@
|
|||
[DEFAULT]
|
||||
debian-branch = debian/latest
|
||||
pristine-tar = True
|
||||
upstream-branch = upstream/latest
|
||||
upstream-vcs-tag = v%(version%~%-)s
|
1
debian/manpages
vendored
1
debian/manpages
vendored
|
@ -1 +0,0 @@
|
|||
debian/tmp/check_patroni.1
|
1
debian/python3-check-patroni.docs
vendored
1
debian/python3-check-patroni.docs
vendored
|
@ -1 +0,0 @@
|
|||
README.md
|
14
debian/rules
vendored
14
debian/rules
vendored
|
@ -1,14 +0,0 @@
|
|||
#! /usr/bin/make -f
|
||||
|
||||
export PYBUILD_NAME=check-patroni
|
||||
%:
|
||||
dh $@ --with python3 --buildsystem=pybuild
|
||||
|
||||
execute_before_dh_installman:
|
||||
mkdir --parent $(CURDIR)/debian/tmp
|
||||
PYTHONPATH=debian/check-patroni/usr/lib/python3.11/dist-packages \
|
||||
help2man \
|
||||
--no-info \
|
||||
--include=$(CURDIR)/debian/check_patroni.1.in \
|
||||
debian/check-patroni/usr/bin/check_patroni \
|
||||
> $(CURDIR)/debian/tmp/check_patroni.1
|
1
debian/source/format
vendored
1
debian/source/format
vendored
|
@ -1 +0,0 @@
|
|||
3.0 (quilt)
|
1
debian/source/options
vendored
1
debian/source/options
vendored
|
@ -1 +0,0 @@
|
|||
extend-diff-ignore="^[^/]+.(egg-info|dist-info)/"
|
5
debian/upstream/metadata
vendored
5
debian/upstream/metadata
vendored
|
@ -1,5 +0,0 @@
|
|||
---
|
||||
Bug-Database: https://github.com/dalibo/check_patroni/issues
|
||||
Bug-Submit: https://github.com/dalibo/check_patroni/issues/new
|
||||
Repository: https://github.com/dalibo/check_patroni.git
|
||||
Repository-Browse: https://github.com/dalibo/check_patroni
|
2
debian/watch
vendored
2
debian/watch
vendored
|
@ -1,2 +0,0 @@
|
|||
version=4
|
||||
https://github.com/dalibo/check_patroni/tags (?:.*?/)?v?(\d[\d.]*)\.tar\.gz
|
|
@ -1,158 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
if ! command -v check_patroni &>/dev/null; then
|
||||
echo "check_partroni must be installed to generate the documentation"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
top_srcdir="$(readlink -m "$0/../..")"
|
||||
README="${top_srcdir}/README.md"
|
||||
function readme(){
|
||||
echo "$1" >> $README
|
||||
}
|
||||
|
||||
function helpme(){
|
||||
readme
|
||||
readme '```'
|
||||
check_patroni $1 --help >> $README
|
||||
readme '```'
|
||||
readme
|
||||
}
|
||||
|
||||
cat << '_EOF_' > $README
|
||||
# check_patroni
|
||||
|
||||
A nagios plugin for patroni.
|
||||
|
||||
## Features
|
||||
|
||||
- Check presence of leader, replicas, node counts.
|
||||
- Check each node for replication status.
|
||||
|
||||
_EOF_
|
||||
helpme
|
||||
cat << '_EOF_' >> $README
|
||||
## Install
|
||||
|
||||
check_patroni is licensed under PostgreSQL license.
|
||||
|
||||
```
|
||||
$ pip install git+https://github.com/dalibo/check_patroni.git
|
||||
```
|
||||
|
||||
check_patroni works on python 3.6, we keep it that way because patroni also
|
||||
supports it and there are still lots of RH 7 variants around. That being said
|
||||
python 3.6 has been EOL for ages and there is no support for it in the github
|
||||
CI.
|
||||
|
||||
## Support
|
||||
|
||||
If you hit a bug or need help, open a [GitHub
|
||||
issue](https://github.com/dalibo/check_patroni/issues/new). Dalibo has no
|
||||
commitment on response time for public free support. Thanks for you
|
||||
contribution !
|
||||
|
||||
## Config file
|
||||
|
||||
All global and service specific parameters can be specified via a config file has follows:
|
||||
|
||||
```
|
||||
[options]
|
||||
endpoints = https://10.20.199.3:8008, https://10.20.199.4:8008,https://10.20.199.5:8008
|
||||
cert_file = ./ssl/my-cert.pem
|
||||
key_file = ./ssl/my-key.pem
|
||||
ca_file = ./ssl/CA-cert.pem
|
||||
timeout = 0
|
||||
|
||||
[options.node_is_replica]
|
||||
lag=100
|
||||
```
|
||||
## Thresholds
|
||||
|
||||
The format for the threshold parameters is `[@][start:][end]`.
|
||||
|
||||
* `start:` may be omitted if `start == 0`
|
||||
* `~:` means that start is negative infinity
|
||||
* If `end` is omitted, infinity is assumed
|
||||
* To invert the match condition, prefix the range expression with `@`.
|
||||
|
||||
A match is found when: `start <= VALUE <= end`.
|
||||
|
||||
For example, the following command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, which can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, which can be translated to outside of range [1;+INF[
|
||||
|
||||
```
|
||||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||
```
|
||||
|
||||
## SSL
|
||||
|
||||
Several options are available:
|
||||
|
||||
* the server's CA certificate is not available or trusted by the client system:
|
||||
* `--ca_cert`: your certification chain `cat CA-certificate server-certificate > cabundle`
|
||||
* you have a client certificate for authenticating with Patroni's REST API:
|
||||
* `--cert_file`: your certificate or the concatenation of your certificate and private key
|
||||
* `--key_file`: your private key (optional)
|
||||
|
||||
## Shell completion
|
||||
|
||||
We use the [click] library which supports shell completion natively.
|
||||
|
||||
Shell completion can be added by typing the following command or adding it to
|
||||
a file spécific to your shell of choice.
|
||||
|
||||
* for Bash (add to `~/.bashrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=bash_source check_patroni)"
|
||||
```
|
||||
* for Zsh (add to `~/.zshrc`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=zsh_source check_patroni)"
|
||||
```
|
||||
* for Fish (add to `~/.config/fish/completions/check_patroni.fish`):
|
||||
```
|
||||
eval "$(_CHECK_PATRONI_COMPLETE=fish_source check_patroni)"
|
||||
```
|
||||
|
||||
Please note that shell completion is not supported far all shell versions, for
|
||||
example only Bash versions older than 4.4 are supported.
|
||||
|
||||
[click]: https://click.palletsprojects.com/en/8.1.x/shell-completion/
|
||||
_EOF_
|
||||
readme
|
||||
readme "## Cluster services"
|
||||
readme
|
||||
readme "### cluster_config_has_changed"
|
||||
helpme cluster_config_has_changed
|
||||
readme "### cluster_has_leader"
|
||||
helpme cluster_has_leader
|
||||
readme "### cluster_has_replica"
|
||||
helpme cluster_has_replica
|
||||
readme "### cluster_has_scheduled_action"
|
||||
helpme cluster_has_scheduled_action
|
||||
readme "### cluster_is_in_maintenance"
|
||||
helpme cluster_is_in_maintenance
|
||||
readme "### cluster_node_count"
|
||||
helpme cluster_node_count
|
||||
readme "## Node services"
|
||||
readme
|
||||
readme "### node_is_alive"
|
||||
helpme node_is_alive
|
||||
readme "### node_is_pending_restart"
|
||||
helpme node_is_pending_restart
|
||||
readme "### node_is_leader"
|
||||
helpme node_is_leader
|
||||
readme "### node_is_primary"
|
||||
helpme node_is_primary
|
||||
readme "### node_is_replica"
|
||||
helpme node_is_replica
|
||||
readme "### node_patroni_version"
|
||||
helpme node_patroni_version
|
||||
readme "### node_tl_has_changed"
|
||||
helpme node_tl_has_changed
|
||||
cat << _EOF_ >> $README
|
||||
|
||||
_EOF_
|
27
mypy.ini
27
mypy.ini
|
@ -1,27 +0,0 @@
|
|||
[mypy]
|
||||
files = .
|
||||
show_error_codes = true
|
||||
strict = true
|
||||
exclude = build/
|
||||
|
||||
[mypy-setup]
|
||||
ignore_errors = True
|
||||
|
||||
[mypy-nagiosplugin.*]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[mypy-check_patroni.types]
|
||||
# no stubs for nagioplugin => ignore: Class cannot subclass "Resource" (has type "Any") [misc]
|
||||
disallow_subclassing_any = false
|
||||
|
||||
[mypy-check_patroni.node]
|
||||
# no subs for nagiosplugin => ignore: Class cannot subclass "Summary" (has type "Any") [misc]
|
||||
disallow_subclassing_any = false
|
||||
|
||||
[mypy-check_patroni.cluster]
|
||||
# no subs for nagiosplugin => ignore: Class cannot subclass "Summary" (has type "Any") [misc]
|
||||
disallow_subclassing_any = false
|
||||
|
||||
[mypy-check_patroni.cli]
|
||||
# no stubs for nagiosplugin => ignore: Untyped decorator makes function "main" untyped [misc]
|
||||
disallow_untyped_decorators = false
|
|
@ -1,7 +0,0 @@
|
|||
[build-system]
|
||||
requires = ["setuptools", "setuptools-scm"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[tool.isort]
|
||||
profile = "black"
|
||||
|
|
@ -1,2 +0,0 @@
|
|||
[pytest]
|
||||
addopts = --doctest-modules
|
|
@ -1,12 +0,0 @@
|
|||
black
|
||||
codespell
|
||||
isort
|
||||
flake8
|
||||
mypy==0.961
|
||||
pytest
|
||||
pytest-cov
|
||||
types-requests
|
||||
setuptools
|
||||
tox
|
||||
twine
|
||||
wheel
|
58
setup.py
58
setup.py
|
@ -1,58 +0,0 @@
|
|||
import pathlib
|
||||
|
||||
from setuptools import find_packages, setup
|
||||
|
||||
HERE = pathlib.Path(__file__).parent
|
||||
|
||||
long_description = (HERE / "README.md").read_text()
|
||||
|
||||
|
||||
def get_version() -> str:
|
||||
fpath = HERE / "check_patroni" / "__init__.py"
|
||||
with fpath.open() as f:
|
||||
for line in f:
|
||||
if line.startswith("__version__"):
|
||||
return line.split('"')[1]
|
||||
raise Exception(f"version information not found in {fpath}")
|
||||
|
||||
|
||||
setup(
|
||||
name="check_patroni",
|
||||
version=get_version(),
|
||||
author="Dalibo",
|
||||
author_email="contact@dalibo.com",
|
||||
packages=find_packages(include=["check_patroni*"]),
|
||||
include_package_data=True,
|
||||
url="https://github.com/dalibo/check_patroni",
|
||||
license="PostgreSQL",
|
||||
description="Nagios plugin to check on patroni",
|
||||
long_description=long_description,
|
||||
long_description_content_type="text/markdown",
|
||||
classifiers=[
|
||||
"Development Status :: 5 - Production/Stable",
|
||||
"Environment :: Console",
|
||||
"License :: OSI Approved :: PostgreSQL License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Topic :: System :: Monitoring",
|
||||
],
|
||||
keywords="patroni nagios check",
|
||||
python_requires=">=3.6",
|
||||
install_requires=[
|
||||
"attrs >= 17, !=21.1",
|
||||
"requests",
|
||||
"nagiosplugin >= 1.3.2",
|
||||
"click >= 7.1",
|
||||
],
|
||||
extras_require={
|
||||
"test": [
|
||||
"importlib_metadata; python_version < '3.8'",
|
||||
"pytest >= 6.0.2",
|
||||
],
|
||||
},
|
||||
entry_points={
|
||||
"console_scripts": [
|
||||
"check_patroni=check_patroni.cli:main",
|
||||
],
|
||||
},
|
||||
zip_safe=False,
|
||||
)
|
|
@ -1,65 +0,0 @@
|
|||
import json
|
||||
import logging
|
||||
import shutil
|
||||
from contextlib import contextmanager
|
||||
from functools import partial
|
||||
from http.server import HTTPServer, SimpleHTTPRequestHandler
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator, Mapping, Union
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PatroniAPI(HTTPServer):
|
||||
def __init__(self, directory: Path, *, datadir: Path) -> None:
|
||||
self.directory = directory
|
||||
self.datadir = datadir
|
||||
handler_cls = partial(SimpleHTTPRequestHandler, directory=str(directory))
|
||||
super().__init__(("", 0), handler_cls)
|
||||
|
||||
def serve_forever(self, *args: Any) -> None:
|
||||
logger.info(
|
||||
"starting fake Patroni API at %s (directory=%s)",
|
||||
self.endpoint,
|
||||
self.directory,
|
||||
)
|
||||
return super().serve_forever(*args)
|
||||
|
||||
@property
|
||||
def endpoint(self) -> str:
|
||||
return f"http://{self.server_name}:{self.server_port}"
|
||||
|
||||
@contextmanager
|
||||
def routes(self, mapping: Mapping[str, Union[Path, str]]) -> Iterator[None]:
|
||||
"""Temporarily install specified files in served directory, thus
|
||||
building "routes" from given mapping.
|
||||
|
||||
The 'mapping' defines target route paths as keys and files to be
|
||||
installed in served directory as values. Mapping values of type 'str'
|
||||
are assumed be relative file path to the 'datadir'.
|
||||
"""
|
||||
for route_path, fpath in mapping.items():
|
||||
if isinstance(fpath, str):
|
||||
fpath = self.datadir / fpath
|
||||
shutil.copy(fpath, self.directory / route_path)
|
||||
try:
|
||||
yield None
|
||||
finally:
|
||||
for fname in mapping:
|
||||
(self.directory / fname).unlink()
|
||||
|
||||
|
||||
def cluster_api_set_replica_running(in_json: Path, target_dir: Path) -> Path:
|
||||
# starting from 3.0.4 the state of replicas is streaming or in archive recovery
|
||||
# instead of running
|
||||
with in_json.open() as f:
|
||||
js = json.load(f)
|
||||
for node in js["members"]:
|
||||
if node["role"] in ["replica", "sync_standby", "standby_leader"]:
|
||||
if node["state"] in ["streaming", "in archive recovery"]:
|
||||
node["state"] = "running"
|
||||
assert target_dir.is_dir()
|
||||
out_json = target_dir / in_json.name
|
||||
with out_json.open("w") as f:
|
||||
json.dump(js, f)
|
||||
return out_json
|
|
@ -1,76 +0,0 @@
|
|||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from threading import Thread
|
||||
from typing import Any, Iterator, Tuple
|
||||
from unittest.mock import patch
|
||||
|
||||
if sys.version_info >= (3, 8):
|
||||
from importlib.metadata import version as metadata_version
|
||||
else:
|
||||
from importlib_metadata import version as metadata_version
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def numversion(pkgname: str) -> Tuple[int, ...]:
|
||||
version = metadata_version(pkgname)
|
||||
return tuple(int(v) for v in version.split(".", 3))
|
||||
|
||||
|
||||
if numversion("pytest") >= (6, 2):
|
||||
TempPathFactory = pytest.TempPathFactory
|
||||
else:
|
||||
from _pytest.tmpdir import TempPathFactory
|
||||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def nagioplugin_runtime_stdout() -> Iterator[None]:
|
||||
# work around https://github.com/mpounsett/nagiosplugin/issues/24 when
|
||||
# nagiosplugin is older than 1.3.3
|
||||
if numversion("nagiosplugin") < (1, 3, 3):
|
||||
target = "nagiosplugin.runtime.Runtime.stdout"
|
||||
with patch(target, None):
|
||||
logger.warning("patching %r", target)
|
||||
yield None
|
||||
else:
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.fixture(
|
||||
params=[False, True],
|
||||
ids=lambda v: "new-replica-state" if v else "old-replica-state",
|
||||
)
|
||||
def old_replica_state(request: Any) -> Any:
|
||||
return request.param
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def datadir() -> Path:
|
||||
return Path(__file__).parent / "json"
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def patroni_api(
|
||||
tmp_path_factory: TempPathFactory, datadir: Path
|
||||
) -> Iterator[PatroniAPI]:
|
||||
"""A fake HTTP server for the Patroni API serving files from a temporary
|
||||
directory.
|
||||
"""
|
||||
httpd = PatroniAPI(tmp_path_factory.mktemp("api"), datadir=datadir)
|
||||
t = Thread(target=httpd.serve_forever)
|
||||
t.start()
|
||||
yield httpd
|
||||
httpd.shutdown()
|
||||
t.join()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def runner() -> CliRunner:
|
||||
"""A CliRunner with stdout and stderr not mixed."""
|
||||
return CliRunner(mix_stderr=False)
|
|
@ -1,16 +0,0 @@
|
|||
{
|
||||
"loop_wait": 10,
|
||||
"master_start_timeout": 300,
|
||||
"postgresql": {
|
||||
"parameters": {
|
||||
"archive_command": "pgbackrest --stanza=main archive-push %p",
|
||||
"archive_mode": "on",
|
||||
"max_connections": 300,
|
||||
"restore_command": "pgbackrest --stanza=main archive-get %f \"%p\""
|
||||
},
|
||||
"use_pg_rewind": false,
|
||||
"use_slot": true
|
||||
},
|
||||
"retry_timeout": 10,
|
||||
"ttl": 30
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "stopped",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "stopped",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": "unknown"
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,35 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 10241024
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 20000000
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 50,
|
||||
"lag": 1000000
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "sync_standby",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 1024
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 51,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "3.0.0",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 51,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "3.1.0",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,27 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "p1",
|
||||
"role": "sync_standby",
|
||||
"state": "streaming",
|
||||
"api_url": "http://10.20.30.51:8008/patroni",
|
||||
"host": "10.20.30.51",
|
||||
"port": 5432,
|
||||
"timeline": 3,
|
||||
"scheduled_restart": {
|
||||
"schedule": "2023-10-08T11:30:00+00:00",
|
||||
"postmaster_start_time": "2023-08-21 08:08:33.415237+00:00"
|
||||
},
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "p2",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "http://10.20.30.52:8008/patroni",
|
||||
"host": "10.20.30.52",
|
||||
"port": 5432,
|
||||
"timeline": 3
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,28 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "p1",
|
||||
"role": "sync_standby",
|
||||
"state": "streaming",
|
||||
"api_url": "http://10.20.30.51:8008/patroni",
|
||||
"host": "10.20.30.51",
|
||||
"port": 5432,
|
||||
"timeline": 3,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "p2",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "http://10.20.30.52:8008/patroni",
|
||||
"host": "10.20.30.52",
|
||||
"port": 5432,
|
||||
"timeline": 3
|
||||
}
|
||||
],
|
||||
"scheduled_switchover": {
|
||||
"at": "2023-10-08T11:30:00+00:00",
|
||||
"from": "p1",
|
||||
"to": "p2"
|
||||
}
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "sync_standby",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,34 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
],
|
||||
"pause": true
|
||||
}
|
|
@ -1,34 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
],
|
||||
"pause": false
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,34 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
],
|
||||
"pause": false
|
||||
}
|
|
@ -1,13 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,31 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "start failed",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"lag": "unknown"
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "start failed",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"lag": "unknown"
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,23 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "standby_leader",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "in archive recovery",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,33 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
},
|
||||
{
|
||||
"name": "srv3",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.5:8008/patroni",
|
||||
"host": "10.20.199.5",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,23 +0,0 @@
|
|||
{
|
||||
"members": [
|
||||
{
|
||||
"name": "srv1",
|
||||
"role": "leader",
|
||||
"state": "running",
|
||||
"api_url": "https://10.20.199.3:8008/patroni",
|
||||
"host": "10.20.199.3",
|
||||
"port": 5432,
|
||||
"timeline": 51
|
||||
},
|
||||
{
|
||||
"name": "srv2",
|
||||
"role": "replica",
|
||||
"state": "streaming",
|
||||
"api_url": "https://10.20.199.4:8008/patroni",
|
||||
"host": "10.20.199.4",
|
||||
"port": 5432,
|
||||
"timeline": 51,
|
||||
"lag": 0
|
||||
}
|
||||
]
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,19 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2023-08-23 14:30:50.201691+00:00",
|
||||
"role": "standby_leader",
|
||||
"server_version": 140009,
|
||||
"xlog": {
|
||||
"received_location": 889192448,
|
||||
"replayed_location": 889192448,
|
||||
"replayed_timestamp": null,
|
||||
"paused": false
|
||||
},
|
||||
"timeline": 1,
|
||||
"dcs_last_seen": 1692805971,
|
||||
"database_system_identifier": "7270495803765492571",
|
||||
"patroni": {
|
||||
"version": "3.1.0",
|
||||
"scope": "patroni-demo-sb"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,19 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2023-08-23 14:30:50.201691+00:00",
|
||||
"role": "standby_leader",
|
||||
"server_version": 140009,
|
||||
"xlog": {
|
||||
"received_location": 889192448,
|
||||
"replayed_location": 889192448,
|
||||
"replayed_timestamp": null,
|
||||
"paused": false
|
||||
},
|
||||
"timeline": 1,
|
||||
"dcs_last_seen": 1692805971,
|
||||
"database_system_identifier": "7270495803765492571",
|
||||
"patroni": {
|
||||
"version": "3.1.0",
|
||||
"scope": "patroni-demo-sb"
|
||||
}
|
||||
}
|
|
@ -1,27 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"pending_restart": true,
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,19 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:57:51.693 UTC",
|
||||
"role": "replica",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"received_location": 1174407088,
|
||||
"replayed_location": 1174407088,
|
||||
"replayed_timestamp": null,
|
||||
"paused": false
|
||||
},
|
||||
"timeline": 58,
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,19 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:57:51.693 UTC",
|
||||
"role": "replica",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"received_location": 1174407088,
|
||||
"replayed_location": 1174407088,
|
||||
"replayed_timestamp": null,
|
||||
"paused": false
|
||||
},
|
||||
"timeline": 58,
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
{
|
||||
"state": "running",
|
||||
"postmaster_start_time": "2021-08-11 07:02:20.732 UTC",
|
||||
"role": "master",
|
||||
"server_version": 110012,
|
||||
"cluster_unlocked": false,
|
||||
"xlog": {
|
||||
"location": 1174407088
|
||||
},
|
||||
"timeline": 58,
|
||||
"replication": [
|
||||
{
|
||||
"usename": "replicator",
|
||||
"application_name": "srv1",
|
||||
"client_addr": "10.20.199.3",
|
||||
"state": "streaming",
|
||||
"sync_state": "async",
|
||||
"sync_priority": 0
|
||||
}
|
||||
],
|
||||
"database_system_identifier": "6965971025273547206",
|
||||
"patroni": {
|
||||
"version": "2.0.2",
|
||||
"scope": "patroni-demo"
|
||||
}
|
||||
}
|
|
@ -1,20 +0,0 @@
|
|||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_api_status_code_200(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
def test_api_status_code_404(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 3
|
|
@ -1,171 +0,0 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator
|
||||
|
||||
import nagiosplugin
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def cluster_config_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"config": "cluster_config_has_changed.json"}):
|
||||
yield None
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_ok_with_hash(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"96b12d82571473d13e890b893734e731",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED OK - The hash of patroni's dynamic configuration has not changed (96b12d82571473d13e890b893734e731). | is_configuration_changed=0;;@1:1\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_ok_with_state_file(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"hash": "96b12d82571473d13e890b893734e731"}')
|
||||
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED OK - The hash of patroni's dynamic configuration has not changed (96b12d82571473d13e890b893734e731). | is_configuration_changed=0;;@1:1\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_ko_with_hash(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"96b12d82571473d13e890b8937ffffff",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_ko_with_state_file_and_save(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"hash": "96b12d82571473d13e890b8937ffffff"}')
|
||||
|
||||
# test without saving the new hash
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
|
||||
)
|
||||
|
||||
state_file = tmp_path / "cluster_config_has_changed.state_file"
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_config_hash = cookie.get("hash")
|
||||
cookie.close()
|
||||
|
||||
assert new_config_hash == "96b12d82571473d13e890b8937ffffff"
|
||||
|
||||
# test when we save the hash
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
"--save",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED CRITICAL - The hash of patroni's dynamic configuration has changed. The old hash was 96b12d82571473d13e890b8937ffffff. | is_configuration_changed=1;;@1:1\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_config_hash = cookie.get("hash")
|
||||
cookie.close()
|
||||
|
||||
assert new_config_hash == "96b12d82571473d13e890b893734e731"
|
||||
|
||||
|
||||
def test_cluster_config_has_changed_params(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
|
||||
fake_state_file = tmp_path / "fake_file_name.state_file"
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_config_has_changed",
|
||||
"--hash",
|
||||
"640df9f0211c791723f18fc3ed9dbb95",
|
||||
"--state-file",
|
||||
str(fake_state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --hash or --state-file should be provided for this service\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(
|
||||
main, ["-e", "https://10.20.199.3:8008", "cluster_config_has_changed"]
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERCONFIGHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --hash or --state-file should be provided for this service\n"
|
||||
)
|
|
@ -1,139 +0,0 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ok")
|
||||
def test_cluster_has_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=1 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ok_standby_leader(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ok_standby_leader.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ok_standby_leader")
|
||||
def test_cluster_has_leader_ok_standby_leader(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ko.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko")
|
||||
def test_cluster_has_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko_standby_leader(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_leader_ko_standby_leader.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader")
|
||||
def test_cluster_has_leader_ko_standby_leader(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER CRITICAL - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=0;;@0 is_leader=0 is_standby_leader=0 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_leader_ko_standby_leader_archiving(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = (
|
||||
"cluster_has_leader_ko_standby_leader_archiving.json"
|
||||
)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_leader_ko_standby_leader_archiving")
|
||||
def test_cluster_has_leader_ko_standby_leader_archiving(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_leader"])
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER OK - The cluster has a running leader. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=0;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
else:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASLEADER WARNING - The cluster has no running leader or the standby leader is in archive recovery. | has_leader=1;;@0 is_leader=0 is_standby_leader=1 is_standby_leader_in_arc_rec=1;@1:1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
|
@ -1,288 +0,0 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_relica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_has_replica"])
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_replica_ok_with_count_thresholds(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok")
|
||||
def test_cluster_has_replica_ok_with_sync_count_thresholds(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--sync-warning",
|
||||
"1:",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=1 srv3_timeline=51 sync_replica=1;1: unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ok_lag(
|
||||
patroni_api: PatroniAPI, datadir: Path, tmp_path: Path, old_replica_state: bool
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ok_lag.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ok_lag")
|
||||
def test_cluster_has_replica_ok_with_count_thresholds_lag(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA OK - healthy_replica is 2 | healthy_replica=2;@1;@0 srv2_lag=1024 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=0\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko.json"
|
||||
patroni_path: Union[str, Path] = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko")
|
||||
def test_cluster_has_replica_ko_with_count_thresholds(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko")
|
||||
def test_cluster_has_replica_ko_with_sync_count_thresholds(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--sync-warning",
|
||||
"2:",
|
||||
"--sync-critical",
|
||||
"1:",
|
||||
],
|
||||
)
|
||||
# The lag on srv2 is "unknown". We don't handle string in perfstats so we have to scratch all the second node stats
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - sync_replica is 0 (outside range 1:) | healthy_replica=1 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0;2:;1: unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_lag(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_lag.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_lag")
|
||||
def test_cluster_has_replica_ko_with_count_thresholds_and_lag(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv2_lag=10241024 srv2_sync=0 srv2_timeline=51 srv3_lag=20000000 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=2\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_wrong_tl(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_wrong_tl.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_wrong_tl")
|
||||
def test_cluster_has_replica_ko_wrong_tl(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA WARNING - healthy_replica is 1 (outside range @0:1) | healthy_replica=1;@1;@0 srv2_lag=1000000 srv2_sync=0 srv2_timeline=50 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_has_replica_ko_all_replica(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_has_replica_ko_all_replica.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_has_replica_ko_all_replica")
|
||||
def test_cluster_has_replica_ko_all_replica(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_has_replica",
|
||||
"--warning",
|
||||
"@1",
|
||||
"--critical",
|
||||
"@0",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASREPLICA CRITICAL - healthy_replica is 0 (outside range @0:0) | healthy_replica=0;@1;@0 srv1_lag=0 srv1_sync=0 srv1_timeline=51 srv2_lag=0 srv2_sync=0 srv2_timeline=51 srv3_lag=0 srv3_sync=0 srv3_timeline=51 sync_replica=0 unhealthy_replica=3\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
|
@ -1,51 +0,0 @@
|
|||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_cluster_has_scheduled_action_ok(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes({"cluster": "cluster_has_scheduled_action_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASSCHEDULEDACTION OK - has_scheduled_actions is 0 | has_scheduled_actions=0;;0 scheduled_restart=0 scheduled_switchover=0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_has_scheduled_action_ko_switchover(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_has_scheduled_action_ko_switchover.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASSCHEDULEDACTION CRITICAL - has_scheduled_actions is 1 (outside range 0:0) | has_scheduled_actions=1;;0 scheduled_restart=0 scheduled_switchover=1\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_has_scheduled_action_ko_restart(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_has_scheduled_action_ko_restart.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_has_scheduled_action"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERHASSCHEDULEDACTION CRITICAL - has_scheduled_actions is 1 (outside range 0:0) | has_scheduled_actions=1;;0 scheduled_restart=1 scheduled_switchover=0\n"
|
||||
)
|
|
@ -1,49 +0,0 @@
|
|||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_cluster_is_in_maintenance_ok(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERISINMAINTENANCE OK - is_in_maintenance is 0 | is_in_maintenance=0;;0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_is_in_maintenance_ko(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes({"cluster": "cluster_is_in_maintenance_ko.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERISINMAINTENANCE CRITICAL - is_in_maintenance is 1 (outside range 0:0) | is_in_maintenance=1;;0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_cluster_is_in_maintenance_ok_pause_false(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
with patroni_api.routes(
|
||||
{"cluster": "cluster_is_in_maintenance_ok_pause_false.json"}
|
||||
):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "cluster_is_in_maintenance"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERISINMAINTENANCE OK - is_in_maintenance is 0 | is_in_maintenance=0;;0\n"
|
||||
)
|
|
@ -1,272 +0,0 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator, Union
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI, cluster_api_set_replica_running
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_ok(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_ok.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ok")
|
||||
def test_cluster_node_count_ok(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "cluster_node_count"])
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=3\n"
|
||||
)
|
||||
else:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3 members=3 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ok")
|
||||
def test_cluster_node_count_ok_with_thresholds(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@0:1",
|
||||
"--critical",
|
||||
"@2",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
"--healthy-critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=3\n"
|
||||
)
|
||||
else:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3;@1;@2 role_leader=1 role_replica=2 state_running=1 state_streaming=2\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_healthy_warning(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_healthy_warning.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_healthy_warning")
|
||||
def test_cluster_node_count_healthy_warning(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
"--healthy-critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=2\n"
|
||||
)
|
||||
else:
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT WARNING - healthy_members is 2 (outside range @0:2) | healthy_members=2;@2;@1 members=2 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_healthy_critical(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_healthy_critical.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_healthy_critical")
|
||||
def test_cluster_node_count_healthy_critical(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
"--healthy-critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.output
|
||||
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_leader=1 role_replica=2 state_running=1 state_start_failed=2\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_warning(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_warning.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_warning")
|
||||
def test_cluster_node_count_warning(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@2",
|
||||
"--critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=2\n"
|
||||
)
|
||||
else:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT WARNING - members is 2 (outside range @0:2) | healthy_members=2 members=2;@2;@1 role_leader=1 role_replica=1 state_running=1 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_critical(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_critical.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_critical")
|
||||
def test_cluster_node_count_critical(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--warning",
|
||||
"@2",
|
||||
"--critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT CRITICAL - members is 1 (outside range @0:1) | healthy_members=1 members=1;@2;@1 role_leader=1 state_running=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def cluster_node_count_ko_in_archive_recovery(
|
||||
patroni_api: PatroniAPI, old_replica_state: bool, datadir: Path, tmp_path: Path
|
||||
) -> Iterator[None]:
|
||||
cluster_path: Union[str, Path] = "cluster_node_count_ko_in_archive_recovery.json"
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.1.0.json"
|
||||
if old_replica_state:
|
||||
cluster_path = cluster_api_set_replica_running(datadir / cluster_path, tmp_path)
|
||||
patroni_path = "cluster_has_replica_patroni_verion_3.0.0.json"
|
||||
with patroni_api.routes({"cluster": cluster_path, "patroni": patroni_path}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("cluster_node_count_ko_in_archive_recovery")
|
||||
def test_cluster_node_count_ko_in_archive_recovery(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, old_replica_state: bool
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"cluster_node_count",
|
||||
"--healthy-warning",
|
||||
"@2",
|
||||
"--healthy-critical",
|
||||
"@0:1",
|
||||
],
|
||||
)
|
||||
if old_replica_state:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT OK - members is 3 | healthy_members=3;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_running=3\n"
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
else:
|
||||
assert (
|
||||
result.stdout
|
||||
== "CLUSTERNODECOUNT CRITICAL - healthy_members is 1 (outside range @0:1) | healthy_members=1;@2;@1 members=3 role_replica=2 role_standby_leader=1 state_in_archive_recovery=2 state_streaming=1\n"
|
||||
)
|
||||
assert result.exit_code == 2
|
|
@ -1,30 +0,0 @@
|
|||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_alive_ok(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
liveness = tmp_path / "liveness"
|
||||
liveness.touch()
|
||||
with patroni_api.routes({"liveness": liveness}):
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISALIVE OK - This node is alive (patroni is running). | is_alive=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_alive_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_alive"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISALIVE CRITICAL - This node is not alive (patroni is not running). | is_alive=0;;@0\n"
|
||||
)
|
|
@ -1,58 +0,0 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def node_is_leader_ok(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes(
|
||||
{
|
||||
"leader": "node_is_leader_ok.json",
|
||||
"standby-leader": "node_is_leader_ok_standby_leader.json",
|
||||
}
|
||||
):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_is_leader_ok")
|
||||
def test_node_is_leader_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER OK - This node is a leader node. | is_leader=1;;@0\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER OK - This node is a standby leader node. | is_leader=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_leader_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_leader"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER CRITICAL - This node is not a leader node. | is_leader=0;;@0\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(
|
||||
main,
|
||||
["-e", patroni_api.endpoint, "node_is_leader", "--is-standby-leader"],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISLEADER CRITICAL - This node is not a standby leader node. | is_leader=0;;@0\n"
|
||||
)
|
|
@ -1,29 +0,0 @@
|
|||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_pending_restart_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ok.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISPENDINGRESTART OK - This node doesn't have the pending restart flag. | is_pending_restart=0;;0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_pending_restart_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"patroni": "node_is_pending_restart_ko.json"}):
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_pending_restart"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISPENDINGRESTART CRITICAL - This node has the pending restart flag. | is_pending_restart=1;;0\n"
|
||||
)
|
|
@ -1,24 +0,0 @@
|
|||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
def test_node_is_primary_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
with patroni_api.routes({"primary": "node_is_primary_ok.json"}):
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISPRIMARY OK - This node is the primary with the leader lock. | is_primary=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_primary_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_primary"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISPRIMARY CRITICAL - This node is not the primary with the leader lock. | is_primary=0;;@0\n"
|
||||
)
|
|
@ -1,155 +0,0 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def node_is_replica_ok(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes(
|
||||
{
|
||||
k: "node_is_replica_ok.json"
|
||||
for k in ("replica", "synchronous", "asynchronous")
|
||||
}
|
||||
):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA OK - This node is a running replica with no noloadbalance tag. | is_replica=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_replica_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_is_replica"])
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA CRITICAL - This node is not a running replica with no noloadbalance tag. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_replica_ko_lag(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--max-lag", "100"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA CRITICAL - This node is not a running replica with no noloadbalance tag and a lag under 100. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-async",
|
||||
"--max-lag",
|
||||
"100",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA CRITICAL - This node is not a running asynchronous replica with no noloadbalance tag and a lag under 100. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_sync_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA OK - This node is a running synchronous replica with no noloadbalance tag. | is_replica=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_replica_sync_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-sync"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA CRITICAL - This node is not a running synchronous replica with no noloadbalance tag. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_async_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA OK - This node is a running asynchronous replica with no noloadbalance tag. | is_replica=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_is_replica_async_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main, ["-e", patroni_api.endpoint, "node_is_replica", "--is-async"]
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA CRITICAL - This node is not a running asynchronous replica with no noloadbalance tag. | is_replica=0;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_is_replica_ok")
|
||||
def test_node_is_replica_params(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-async",
|
||||
"--is-sync",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA UNKNOWN: click.exceptions.UsageError: --is-sync and --is-async cannot be provided at the same time for this service\n"
|
||||
)
|
||||
|
||||
# We don't do the check ourselves, patroni does it and changes the return code
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_is_replica",
|
||||
"--is-sync",
|
||||
"--max-lag",
|
||||
"1MB",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEISREPLICA UNKNOWN: click.exceptions.UsageError: --is-sync and --max-lag cannot be provided at the same time for this service\n"
|
||||
)
|
|
@ -1,50 +0,0 @@
|
|||
from typing import Iterator
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture(scope="module", autouse=True)
|
||||
def node_patroni_version(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"patroni": "node_patroni_version.json"}):
|
||||
yield None
|
||||
|
||||
|
||||
def test_node_patroni_version_ok(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_patroni_version",
|
||||
"--patroni-version",
|
||||
"2.0.2",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEPATRONIVERSION OK - Patroni's version is 2.0.2. | is_version_ok=1;;@0\n"
|
||||
)
|
||||
|
||||
|
||||
def test_node_patroni_version_ko(runner: CliRunner, patroni_api: PatroniAPI) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_patroni_version",
|
||||
"--patroni-version",
|
||||
"1.0.0",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODEPATRONIVERSION CRITICAL - Patroni's version is not 1.0.0. | is_version_ok=0;;@0\n"
|
||||
)
|
|
@ -1,173 +0,0 @@
|
|||
from pathlib import Path
|
||||
from typing import Iterator
|
||||
|
||||
import nagiosplugin
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from check_patroni.cli import main
|
||||
|
||||
from . import PatroniAPI
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def node_tl_has_changed(patroni_api: PatroniAPI) -> Iterator[None]:
|
||||
with patroni_api.routes({"patroni": "node_tl_has_changed.json"}):
|
||||
yield None
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ok_with_timeline(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"58",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED OK - The timeline is still 58. | is_timeline_changed=0;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ok_with_state_file(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
state_file = tmp_path / "node_tl_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"timeline": 58}')
|
||||
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED OK - The timeline is still 58. | is_timeline_changed=0;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ko_with_timeline(
|
||||
runner: CliRunner, patroni_api: PatroniAPI
|
||||
) -> None:
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"700",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_ko_with_state_file_and_save(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
state_file = tmp_path / "node_tl_has_changed.state_file"
|
||||
with state_file.open("w") as f:
|
||||
f.write('{"timeline": 700}')
|
||||
|
||||
# test without saving the new tl
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_tl = cookie.get("timeline")
|
||||
cookie.close()
|
||||
|
||||
assert new_tl == 700
|
||||
|
||||
# test when we save the hash
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--state-file",
|
||||
str(state_file),
|
||||
"--save",
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 2
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED CRITICAL - The expected timeline was 700 got 58. | is_timeline_changed=1;;@1:1 timeline=58\n"
|
||||
)
|
||||
|
||||
cookie = nagiosplugin.Cookie(state_file)
|
||||
cookie.open()
|
||||
new_tl = cookie.get("timeline")
|
||||
cookie.close()
|
||||
|
||||
assert new_tl == 58
|
||||
|
||||
|
||||
@pytest.mark.usefixtures("node_tl_has_changed")
|
||||
def test_node_tl_has_changed_params(
|
||||
runner: CliRunner, patroni_api: PatroniAPI, tmp_path: Path
|
||||
) -> None:
|
||||
# This one is placed last because it seems like the exceptions are not flushed from stderr for the next tests.
|
||||
fake_state_file = tmp_path / "fake_file_name.state_file"
|
||||
result = runner.invoke(
|
||||
main,
|
||||
[
|
||||
"-e",
|
||||
patroni_api.endpoint,
|
||||
"node_tl_has_changed",
|
||||
"--timeline",
|
||||
"58",
|
||||
"--state-file",
|
||||
str(fake_state_file),
|
||||
],
|
||||
)
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --timeline or --state-file should be provided for this service\n"
|
||||
)
|
||||
|
||||
result = runner.invoke(main, ["-e", patroni_api.endpoint, "node_tl_has_changed"])
|
||||
assert result.exit_code == 3
|
||||
assert (
|
||||
result.stdout
|
||||
== "NODETLHASCHANGED UNKNOWN: click.exceptions.UsageError: Either --timeline or --state-file should be provided for this service\n"
|
||||
)
|
49
tox.ini
49
tox.ini
|
@ -1,49 +0,0 @@
|
|||
[tox]
|
||||
# the versions specified here are overridden by github workflow
|
||||
envlist = lint, mypy, py{37,38,39,310,311}
|
||||
skip_missing_interpreters = True
|
||||
|
||||
[testenv]
|
||||
extras = test
|
||||
commands =
|
||||
pytest {toxinidir}/check_patroni {toxinidir}/tests {posargs:-vv --log-level=debug}
|
||||
|
||||
[testenv:lint]
|
||||
skip_install = True
|
||||
deps =
|
||||
codespell
|
||||
black
|
||||
flake8
|
||||
isort
|
||||
commands =
|
||||
codespell {toxinidir}/check_patroni {toxinidir}/tests {toxinidir}/docs/ {toxinidir}/RELEASE.md {toxinidir}/CONTRIBUTING.md
|
||||
black --check --diff {toxinidir}/check_patroni {toxinidir}/tests
|
||||
flake8 {toxinidir}/check_patroni {toxinidir}/tests
|
||||
isort --check --diff {toxinidir}/check_patroni {toxinidir}/tests
|
||||
|
||||
[testenv:mypy]
|
||||
deps =
|
||||
mypy == 0.961
|
||||
commands =
|
||||
# we need to install types-requests
|
||||
mypy --install-types --non-interactive
|
||||
|
||||
[testenv:build]
|
||||
deps =
|
||||
wheel
|
||||
setuptools
|
||||
twine
|
||||
allowlist_externals =
|
||||
rm
|
||||
commands =
|
||||
rm --verbose --recursive --force {toxinidir}/dist/
|
||||
python -m build
|
||||
python -m twine check dist/*
|
||||
|
||||
[testenv:upload]
|
||||
# requires a check_patroni section in ~/.pypirc
|
||||
skip_install = True
|
||||
deps =
|
||||
twine
|
||||
commands =
|
||||
python -m twine upload --repository check_patroni dist/*
|
|
@ -1,29 +0,0 @@
|
|||
BSD 3-Clause License
|
||||
|
||||
Copyright (c) 2019, Jehan-Guillaume (ioguix) de Rorthais
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the copyright holder nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@ -1,22 +0,0 @@
|
|||
export VAGRANT_BOX_UPDATE_CHECK_DISABLE=1
|
||||
export VAGRANT_CHECKPOINT_DISABLE=1
|
||||
|
||||
.PHONY: all prov validate
|
||||
|
||||
all: prov
|
||||
|
||||
prov:
|
||||
vagrant up --provision
|
||||
|
||||
clean:
|
||||
vagrant destroy -f
|
||||
|
||||
validate:
|
||||
@vagrant validate
|
||||
@if which shellcheck >/dev/null ;\
|
||||
then shellcheck provision/* ;\
|
||||
else echo "WARNING: shellcheck is not in PATH, not checking bash syntax" ;\
|
||||
fi
|
||||
|
||||
|
||||
|
|
@ -1,127 +0,0 @@
|
|||
# Icinga
|
||||
|
||||
## Install
|
||||
|
||||
Create the VM:
|
||||
|
||||
```
|
||||
make
|
||||
```
|
||||
|
||||
## IcingaWeb
|
||||
|
||||
Configure Icingaweb :
|
||||
|
||||
```
|
||||
http://$IP/icingaweb2/setup
|
||||
```
|
||||
|
||||
* Screen 1: Welcome
|
||||
|
||||
Use the icinga token given a the end of the `icinga2-setup` provision, or:
|
||||
|
||||
```
|
||||
sudo icingacli setup token show
|
||||
```
|
||||
|
||||
Next
|
||||
|
||||
* Screen 2: Modules
|
||||
|
||||
Activate Monitor (already set)
|
||||
|
||||
Next
|
||||
|
||||
* Screen 3: Icinga Web 2
|
||||
|
||||
Next
|
||||
|
||||
* Screen 4: Authentication
|
||||
|
||||
Next
|
||||
|
||||
* Screen 5: Database Resource
|
||||
|
||||
Database Name: icingaweb_db
|
||||
Username: supervisor
|
||||
Password: th3Pass
|
||||
Charset: UTF8
|
||||
|
||||
Validate
|
||||
Next
|
||||
|
||||
* Screen 6: Authentication Backend
|
||||
|
||||
Next
|
||||
|
||||
* Screen 7: Administration
|
||||
|
||||
Fill the blanks
|
||||
Next
|
||||
|
||||
* Screen 8: Application Configuration
|
||||
|
||||
Next
|
||||
|
||||
* Screen 9: Summary
|
||||
|
||||
Next
|
||||
|
||||
* Screen 10: Welcome ... again
|
||||
|
||||
Next
|
||||
|
||||
* Screen 11: Monitoring IDO Resource
|
||||
|
||||
Database Name: icinga2
|
||||
Username: supervisor
|
||||
Password: th3Pass
|
||||
Charset: UTF8
|
||||
|
||||
Validate
|
||||
Next
|
||||
|
||||
* Screen 12: Command Transport
|
||||
|
||||
Transaport name: icinga2
|
||||
Transport Type: API
|
||||
Host: 127.0.0.1
|
||||
Port: 5665
|
||||
User: icinga_api
|
||||
Password: th3Pass
|
||||
|
||||
Next
|
||||
|
||||
* Screen 13: Monitoring Security
|
||||
|
||||
Next
|
||||
|
||||
* Screen 14: Summary
|
||||
|
||||
Finish
|
||||
|
||||
* Screen 15: Hopefully success
|
||||
|
||||
Login
|
||||
|
||||
## Add servers to icinga
|
||||
|
||||
```
|
||||
# Connect to the vm
|
||||
vagrant ssh s1
|
||||
|
||||
# Create /etc/icinga2/conf.d/check_patroni.conf
|
||||
sudo /vagrant/provision/director.bash init cluster1 p1=10.20.89.54 p2=10.20.89.55
|
||||
|
||||
# Check and load conf
|
||||
sudo icinga2 daemon -C
|
||||
sudo systemctl restart icinga2.service
|
||||
```
|
||||
|
||||
# Grafana
|
||||
|
||||
Connect to: http://10.20.89.52:3000/login
|
||||
User / pass: admin/admin
|
||||
|
||||
Import the dashboards for the grafana directory. They are created for cluster1,
|
||||
and servers p1, p2.
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue