Update the README and help
This commit is contained in:
parent
e663695b26
commit
7898011c40
65
README.md
65
README.md
|
@ -9,6 +9,7 @@ Options:
|
||||||
--config FILE Read option defaults from the specified INI file
|
--config FILE Read option defaults from the specified INI file
|
||||||
[default: config.ini]
|
[default: config.ini]
|
||||||
-e, --endpoints TEXT API endpoint. Can be specified multiple times.
|
-e, --endpoints TEXT API endpoint. Can be specified multiple times.
|
||||||
|
[default: http://127.0.0.1:8008]
|
||||||
--cert_file TEXT File with the client certificate.
|
--cert_file TEXT File with the client certificate.
|
||||||
--key_file TEXT File with the client key.
|
--key_file TEXT File with the client key.
|
||||||
--ca_file TEXT The CA certificate.
|
--ca_file TEXT The CA certificate.
|
||||||
|
@ -16,6 +17,7 @@ Options:
|
||||||
(debug) [x>=0]
|
(debug) [x>=0]
|
||||||
--version
|
--version
|
||||||
--timeout INTEGER Timeout in seconds for the API queries (0 to disable)
|
--timeout INTEGER Timeout in seconds for the API queries (0 to disable)
|
||||||
|
[default: 2]
|
||||||
--help Show this message and exit.
|
--help Show this message and exit.
|
||||||
|
|
||||||
Commands:
|
Commands:
|
||||||
|
@ -32,28 +34,39 @@ Commands:
|
||||||
node_tl_has_changed Check if the timeline has changed.
|
node_tl_has_changed Check if the timeline has changed.
|
||||||
```
|
```
|
||||||
|
|
||||||
## install
|
## Install
|
||||||
|
|
||||||
The check requers python3. Using a virtual env is advised for testing :
|
Installation from the git repository:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip -m venv ~/venv
|
$ git clone <FIXME>
|
||||||
source ~venv/bin/activate
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Clone the repo, then install with pip3 from it :
|
Change the branch if necessary. Then create a dedicated environment,
|
||||||
|
install dependencies and then check_patroni from the repo:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip3 install .
|
$ cd check_patroni
|
||||||
pip3 install .[dev]
|
$ python3 -m venv .venv
|
||||||
pip3 install .[test]
|
$ . .venv/bin/activate
|
||||||
|
(.venv) $ pip3 install .
|
||||||
|
(.venv) $ pip3 install .[dev] # for dev purposes
|
||||||
|
(.venv) $ pip3 install .[test] # for testing purposes
|
||||||
|
(.venv) $ check_patroni
|
||||||
```
|
```
|
||||||
|
|
||||||
Links :
|
To quit this env and destroy it:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ deactivate
|
||||||
|
$ rm -r .venv
|
||||||
|
```
|
||||||
|
|
||||||
|
Links:
|
||||||
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
||||||
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
||||||
|
|
||||||
## config file
|
## Config file
|
||||||
|
|
||||||
All global and service specific parameters can be specified via a config file has follows:
|
All global and service specific parameters can be specified via a config file has follows:
|
||||||
|
|
||||||
|
@ -68,7 +81,7 @@ timeout = 0
|
||||||
[options.node_is_replica]
|
[options.node_is_replica]
|
||||||
lag=100
|
lag=100
|
||||||
```
|
```
|
||||||
## thresholds
|
## Thresholds
|
||||||
|
|
||||||
The format for the threshold parameters is "[@][start:][end]".
|
The format for the threshold parameters is "[@][start:][end]".
|
||||||
|
|
||||||
|
@ -77,9 +90,9 @@ The format for the threshold parameters is "[@][start:][end]".
|
||||||
* If `end` is omitted, infinity is assumed
|
* If `end` is omitted, infinity is assumed
|
||||||
* To invert the match condition, prefix the range expression with "@".
|
* To invert the match condition, prefix the range expression with "@".
|
||||||
|
|
||||||
A match is found when : start <= VALUE <= end
|
A match is found when: start <= VALUE <= end
|
||||||
|
|
||||||
For example, the followinf command will raise :
|
For example, the followinf command will raise:
|
||||||
|
|
||||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||||
|
@ -88,7 +101,8 @@ For example, the followinf command will raise :
|
||||||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||||
```
|
```
|
||||||
|
|
||||||
## cluster services
|
## Cluster services
|
||||||
|
|
||||||
### cluster_config_has_changed
|
### cluster_config_has_changed
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -103,7 +117,7 @@ Usage: check_patroni cluster_config_has_changed [OPTIONS]
|
||||||
* `OK`: The hash didn't change
|
* `OK`: The hash didn't change
|
||||||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_configuration_changed` is 1 if the configuration has changed
|
* `is_configuration_changed` is 1 if the configuration has changed
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
@ -123,7 +137,7 @@ Usage: check_patroni cluster_has_leader [OPTIONS]
|
||||||
* `OK`: if there is a leader node.
|
* `OK`: if there is a leader node.
|
||||||
* `CRITICAL`: otherwise
|
* `CRITICAL`: otherwise
|
||||||
|
|
||||||
Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise
|
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
--help Show this message and exit.
|
--help Show this message and exit.
|
||||||
|
@ -136,7 +150,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
||||||
|
|
||||||
Check if the cluster has healthy replicates.
|
Check if the cluster has healthy replicates.
|
||||||
|
|
||||||
A healthy replicate :
|
A healthy replicate:
|
||||||
* is in running state
|
* is in running state
|
||||||
* has a replica role
|
* has a replica role
|
||||||
* has a lag lower or equal to max_lag
|
* has a lag lower or equal to max_lag
|
||||||
|
@ -145,7 +159,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
||||||
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
||||||
* `WARNING` / `CRITICAL`: otherwise
|
* `WARNING` / `CRITICAL`: otherwise
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* healthy_replica & unhealthy_replica count
|
* healthy_replica & unhealthy_replica count
|
||||||
* the lag of each replica labelled with "member name"_lag
|
* the lag of each replica labelled with "member name"_lag
|
||||||
|
|
||||||
|
@ -161,13 +175,13 @@ Options:
|
||||||
```
|
```
|
||||||
Usage: check_patroni cluster_is_in_maintenance [OPTIONS]
|
Usage: check_patroni cluster_is_in_maintenance [OPTIONS]
|
||||||
|
|
||||||
Check if the cluster is in maintenance mode ie paused.
|
Check if the cluster is in maintenance mode or paused.
|
||||||
|
|
||||||
Check:
|
Check:
|
||||||
* `OK`: If the cluster is in maintenance mode.
|
* `OK`: If the cluster is in maintenance mode.
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
@ -198,7 +212,8 @@ Options:
|
||||||
--help Show this message and exit.
|
--help Show this message and exit.
|
||||||
```
|
```
|
||||||
|
|
||||||
## node services
|
## Node services
|
||||||
|
|
||||||
### node_is_alive
|
### node_is_alive
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -210,7 +225,7 @@ Usage: check_patroni node_is_alive [OPTIONS]
|
||||||
* `OK`: If patroni is running.
|
* `OK`: If patroni is running.
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
@ -267,7 +282,7 @@ Usage: check_patroni node_is_replica [OPTIONS]
|
||||||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||||
* `CRITICAL`: otherwise
|
* `CRITICAL`: otherwise
|
||||||
|
|
||||||
Perfdata : `is_replica` is 1 if the node is a running replica with
|
Perfdata: `is_replica` is 1 if the node is a running replica with
|
||||||
noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
@ -286,7 +301,7 @@ Usage: check_patroni node_patroni_version [OPTIONS]
|
||||||
* `OK`: The version is the same as the input `--patroni-version`
|
* `OK`: The version is the same as the input `--patroni-version`
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
@ -308,7 +323,7 @@ Usage: check_patroni node_tl_has_changed [OPTIONS]
|
||||||
* `OK`: The timeline is the same as last time (`--state_file`) or the inputed timeline (`--timeline`)
|
* `OK`: The timeline is the same as last time (`--state_file`) or the inputed timeline (`--timeline`)
|
||||||
* `CRITICAL`: The tl is not the same.
|
* `CRITICAL`: The tl is not the same.
|
||||||
|
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||||
* the timeline
|
* the timeline
|
||||||
|
|
||||||
|
|
|
@ -230,7 +230,7 @@ def cluster_has_leader(ctx: click.Context) -> None:
|
||||||
* `OK`: if there is a leader node.
|
* `OK`: if there is a leader node.
|
||||||
* `CRITICAL`: otherwise
|
* `CRITICAL`: otherwise
|
||||||
|
|
||||||
Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise
|
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||||
"""
|
"""
|
||||||
# FIXME: Manage primary or standby leader in the same place ?
|
# FIXME: Manage primary or standby leader in the same place ?
|
||||||
check = nagiosplugin.Check()
|
check = nagiosplugin.Check()
|
||||||
|
@ -266,7 +266,7 @@ def cluster_has_replica(
|
||||||
"""Check if the cluster has healthy replicates.
|
"""Check if the cluster has healthy replicates.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
A healthy replicate :
|
A healthy replicate:
|
||||||
* is in running state
|
* is in running state
|
||||||
* has a replica role
|
* has a replica role
|
||||||
* has a lag lower or equal to max_lag
|
* has a lag lower or equal to max_lag
|
||||||
|
@ -277,7 +277,7 @@ def cluster_has_replica(
|
||||||
* `WARNING` / `CRITICAL`: otherwise
|
* `WARNING` / `CRITICAL`: otherwise
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* healthy_replica & unhealthy_replica count
|
* healthy_replica & unhealthy_replica count
|
||||||
* the lag of each replica labelled with "member name"_lag
|
* the lag of each replica labelled with "member name"_lag
|
||||||
"""
|
"""
|
||||||
|
@ -321,7 +321,7 @@ def cluster_config_has_changed(
|
||||||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_configuration_changed` is 1 if the configuration has changed
|
* `is_configuration_changed` is 1 if the configuration has changed
|
||||||
"""
|
"""
|
||||||
# FIXME hash in perfdata ?
|
# FIXME hash in perfdata ?
|
||||||
|
@ -345,7 +345,7 @@ def cluster_config_has_changed(
|
||||||
@click.pass_context
|
@click.pass_context
|
||||||
@nagiosplugin.guarded
|
@nagiosplugin.guarded
|
||||||
def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
||||||
"""Check if the cluster is in maintenance mode ie paused.
|
"""Check if the cluster is in maintenance mode or paused.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Check:
|
Check:
|
||||||
|
@ -353,7 +353,7 @@ def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||||
"""
|
"""
|
||||||
check = nagiosplugin.Check()
|
check = nagiosplugin.Check()
|
||||||
|
@ -398,7 +398,7 @@ def node_is_replica(ctx: click.Context, max_lag: str) -> None:
|
||||||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||||
* `CRITICAL`: otherwise
|
* `CRITICAL`: otherwise
|
||||||
|
|
||||||
Perfdata : `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
Perfdata: `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||||
"""
|
"""
|
||||||
# FIXME add a lag check ??
|
# FIXME add a lag check ??
|
||||||
check = nagiosplugin.Check()
|
check = nagiosplugin.Check()
|
||||||
|
@ -459,7 +459,7 @@ def node_tl_has_changed(ctx: click.Context, timeline: str, state_file: str) -> N
|
||||||
* `CRITICAL`: The tl is not the same.
|
* `CRITICAL`: The tl is not the same.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||||
* the timeline
|
* the timeline
|
||||||
"""
|
"""
|
||||||
|
@ -499,7 +499,7 @@ def node_patroni_version(ctx: click.Context, patroni_version: str) -> None:
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||||
"""
|
"""
|
||||||
# TODO the version cannot be written in perfdata find something else ?
|
# TODO the version cannot be written in perfdata find something else ?
|
||||||
|
@ -525,7 +525,7 @@ def node_is_alive(ctx: click.Context) -> None:
|
||||||
* `CRITICAL`: otherwise.
|
* `CRITICAL`: otherwise.
|
||||||
|
|
||||||
\b
|
\b
|
||||||
Perfdata :
|
Perfdata:
|
||||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||||
"""
|
"""
|
||||||
check = nagiosplugin.Check()
|
check = nagiosplugin.Check()
|
||||||
|
|
|
@ -1,9 +0,0 @@
|
||||||
[options]
|
|
||||||
endpoints = https://10.20.199.3:8008, https://10.20.199.4:8008,https://10.20.199.5:8008
|
|
||||||
cert_file = ./ssl/benoit-dalibo-cert.pem
|
|
||||||
key_file = ./ssl/benoit-dalibo-key.pem
|
|
||||||
ca_file = ./ssl/CA-cert.pem
|
|
||||||
timeout = 0
|
|
||||||
|
|
||||||
[options.node_is_replica]
|
|
||||||
lag=100
|
|
|
@ -23,28 +23,39 @@ cat << '_EOF_' > $README
|
||||||
_EOF_
|
_EOF_
|
||||||
helpme
|
helpme
|
||||||
cat << '_EOF_' >> $README
|
cat << '_EOF_' >> $README
|
||||||
## install
|
## Install
|
||||||
|
|
||||||
The check requers python3. Using a virtual env is advised for testing :
|
Installation from the git repository:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip -m venv ~/venv
|
$ git clone <FIXME>
|
||||||
source ~venv/bin/activate
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Clone the repo, then install with pip3 from it :
|
Change the branch if necessary. Then create a dedicated environment,
|
||||||
|
install dependencies and then check_patroni from the repo:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip3 install .
|
$ cd check_patroni
|
||||||
pip3 install .[dev]
|
$ python3 -m venv .venv
|
||||||
pip3 install .[test]
|
$ . .venv/bin/activate
|
||||||
|
(.venv) $ pip3 install .
|
||||||
|
(.venv) $ pip3 install .[dev] # for dev purposes
|
||||||
|
(.venv) $ pip3 install .[test] # for testing purposes
|
||||||
|
(.venv) $ check_patroni
|
||||||
```
|
```
|
||||||
|
|
||||||
Links :
|
To quit this env and destroy it:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ deactivate
|
||||||
|
$ rm -r .venv
|
||||||
|
```
|
||||||
|
|
||||||
|
Links:
|
||||||
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
||||||
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
||||||
|
|
||||||
## config file
|
## Config file
|
||||||
|
|
||||||
All global and service specific parameters can be specified via a config file has follows:
|
All global and service specific parameters can be specified via a config file has follows:
|
||||||
|
|
||||||
|
@ -59,7 +70,7 @@ timeout = 0
|
||||||
[options.node_is_replica]
|
[options.node_is_replica]
|
||||||
lag=100
|
lag=100
|
||||||
```
|
```
|
||||||
## thresholds
|
## Thresholds
|
||||||
|
|
||||||
The format for the threshold parameters is "[@][start:][end]".
|
The format for the threshold parameters is "[@][start:][end]".
|
||||||
|
|
||||||
|
@ -68,9 +79,9 @@ The format for the threshold parameters is "[@][start:][end]".
|
||||||
* If `end` is omitted, infinity is assumed
|
* If `end` is omitted, infinity is assumed
|
||||||
* To invert the match condition, prefix the range expression with "@".
|
* To invert the match condition, prefix the range expression with "@".
|
||||||
|
|
||||||
A match is found when : start <= VALUE <= end
|
A match is found when: start <= VALUE <= end
|
||||||
|
|
||||||
For example, the followinf command will raise :
|
For example, the followinf command will raise:
|
||||||
|
|
||||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||||
|
@ -80,7 +91,8 @@ check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --cri
|
||||||
```
|
```
|
||||||
_EOF_
|
_EOF_
|
||||||
readme
|
readme
|
||||||
readme "## cluster services"
|
readme "## Cluster services"
|
||||||
|
readme
|
||||||
readme "### cluster_config_has_changed"
|
readme "### cluster_config_has_changed"
|
||||||
helpme cluster_config_has_changed
|
helpme cluster_config_has_changed
|
||||||
readme "### cluster_has_leader"
|
readme "### cluster_has_leader"
|
||||||
|
@ -91,7 +103,8 @@ readme "### cluster_is_in_maintenance"
|
||||||
helpme cluster_is_in_maintenance
|
helpme cluster_is_in_maintenance
|
||||||
readme "### cluster_node_count"
|
readme "### cluster_node_count"
|
||||||
helpme cluster_node_count
|
helpme cluster_node_count
|
||||||
readme "## node services"
|
readme "## Node services"
|
||||||
|
readme
|
||||||
readme "### node_is_alive"
|
readme "### node_is_alive"
|
||||||
helpme node_is_alive
|
helpme node_is_alive
|
||||||
readme "### node_is_pending_restart"
|
readme "### node_is_pending_restart"
|
||||||
|
|
Loading…
Reference in a new issue