Update the README and help
This commit is contained in:
parent
e663695b26
commit
7898011c40
65
README.md
65
README.md
|
@ -9,6 +9,7 @@ Options:
|
|||
--config FILE Read option defaults from the specified INI file
|
||||
[default: config.ini]
|
||||
-e, --endpoints TEXT API endpoint. Can be specified multiple times.
|
||||
[default: http://127.0.0.1:8008]
|
||||
--cert_file TEXT File with the client certificate.
|
||||
--key_file TEXT File with the client key.
|
||||
--ca_file TEXT The CA certificate.
|
||||
|
@ -16,6 +17,7 @@ Options:
|
|||
(debug) [x>=0]
|
||||
--version
|
||||
--timeout INTEGER Timeout in seconds for the API queries (0 to disable)
|
||||
[default: 2]
|
||||
--help Show this message and exit.
|
||||
|
||||
Commands:
|
||||
|
@ -32,28 +34,39 @@ Commands:
|
|||
node_tl_has_changed Check if the timeline has changed.
|
||||
```
|
||||
|
||||
## install
|
||||
## Install
|
||||
|
||||
The check requers python3. Using a virtual env is advised for testing :
|
||||
Installation from the git repository:
|
||||
|
||||
```
|
||||
pip -m venv ~/venv
|
||||
source ~venv/bin/activate
|
||||
$ git clone <FIXME>
|
||||
```
|
||||
|
||||
Clone the repo, then install with pip3 from it :
|
||||
Change the branch if necessary. Then create a dedicated environment,
|
||||
install dependencies and then check_patroni from the repo:
|
||||
|
||||
```
|
||||
pip3 install .
|
||||
pip3 install .[dev]
|
||||
pip3 install .[test]
|
||||
$ cd check_patroni
|
||||
$ python3 -m venv .venv
|
||||
$ . .venv/bin/activate
|
||||
(.venv) $ pip3 install .
|
||||
(.venv) $ pip3 install .[dev] # for dev purposes
|
||||
(.venv) $ pip3 install .[test] # for testing purposes
|
||||
(.venv) $ check_patroni
|
||||
```
|
||||
|
||||
Links :
|
||||
To quit this env and destroy it:
|
||||
|
||||
```
|
||||
$ deactivate
|
||||
$ rm -r .venv
|
||||
```
|
||||
|
||||
Links:
|
||||
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
||||
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
||||
|
||||
## config file
|
||||
## Config file
|
||||
|
||||
All global and service specific parameters can be specified via a config file has follows:
|
||||
|
||||
|
@ -68,7 +81,7 @@ timeout = 0
|
|||
[options.node_is_replica]
|
||||
lag=100
|
||||
```
|
||||
## thresholds
|
||||
## Thresholds
|
||||
|
||||
The format for the threshold parameters is "[@][start:][end]".
|
||||
|
||||
|
@ -77,9 +90,9 @@ The format for the threshold parameters is "[@][start:][end]".
|
|||
* If `end` is omitted, infinity is assumed
|
||||
* To invert the match condition, prefix the range expression with "@".
|
||||
|
||||
A match is found when : start <= VALUE <= end
|
||||
A match is found when: start <= VALUE <= end
|
||||
|
||||
For example, the followinf command will raise :
|
||||
For example, the followinf command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||
|
@ -88,7 +101,8 @@ For example, the followinf command will raise :
|
|||
check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1:
|
||||
```
|
||||
|
||||
## cluster services
|
||||
## Cluster services
|
||||
|
||||
### cluster_config_has_changed
|
||||
|
||||
```
|
||||
|
@ -103,7 +117,7 @@ Usage: check_patroni cluster_config_has_changed [OPTIONS]
|
|||
* `OK`: The hash didn't change
|
||||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_configuration_changed` is 1 if the configuration has changed
|
||||
|
||||
Options:
|
||||
|
@ -123,7 +137,7 @@ Usage: check_patroni cluster_has_leader [OPTIONS]
|
|||
* `OK`: if there is a leader node.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
|
@ -136,7 +150,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
|||
|
||||
Check if the cluster has healthy replicates.
|
||||
|
||||
A healthy replicate :
|
||||
A healthy replicate:
|
||||
* is in running state
|
||||
* has a replica role
|
||||
* has a lag lower or equal to max_lag
|
||||
|
@ -145,7 +159,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS]
|
|||
* `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold.
|
||||
* `WARNING` / `CRITICAL`: otherwise
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* healthy_replica & unhealthy_replica count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
|
||||
|
@ -161,13 +175,13 @@ Options:
|
|||
```
|
||||
Usage: check_patroni cluster_is_in_maintenance [OPTIONS]
|
||||
|
||||
Check if the cluster is in maintenance mode ie paused.
|
||||
Check if the cluster is in maintenance mode or paused.
|
||||
|
||||
Check:
|
||||
* `OK`: If the cluster is in maintenance mode.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||
|
||||
Options:
|
||||
|
@ -198,7 +212,8 @@ Options:
|
|||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
## node services
|
||||
## Node services
|
||||
|
||||
### node_is_alive
|
||||
|
||||
```
|
||||
|
@ -210,7 +225,7 @@ Usage: check_patroni node_is_alive [OPTIONS]
|
|||
* `OK`: If patroni is running.
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||
|
||||
Options:
|
||||
|
@ -267,7 +282,7 @@ Usage: check_patroni node_is_replica [OPTIONS]
|
|||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata : `is_replica` is 1 if the node is a running replica with
|
||||
Perfdata: `is_replica` is 1 if the node is a running replica with
|
||||
noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||
|
||||
Options:
|
||||
|
@ -286,7 +301,7 @@ Usage: check_patroni node_patroni_version [OPTIONS]
|
|||
* `OK`: The version is the same as the input `--patroni-version`
|
||||
* `CRITICAL`: otherwise.
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||
|
||||
Options:
|
||||
|
@ -308,7 +323,7 @@ Usage: check_patroni node_tl_has_changed [OPTIONS]
|
|||
* `OK`: The timeline is the same as last time (`--state_file`) or the inputed timeline (`--timeline`)
|
||||
* `CRITICAL`: The tl is not the same.
|
||||
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||
* the timeline
|
||||
|
||||
|
|
|
@ -230,7 +230,7 @@ def cluster_has_leader(ctx: click.Context) -> None:
|
|||
* `OK`: if there is a leader node.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise
|
||||
"""
|
||||
# FIXME: Manage primary or standby leader in the same place ?
|
||||
check = nagiosplugin.Check()
|
||||
|
@ -266,7 +266,7 @@ def cluster_has_replica(
|
|||
"""Check if the cluster has healthy replicates.
|
||||
|
||||
\b
|
||||
A healthy replicate :
|
||||
A healthy replicate:
|
||||
* is in running state
|
||||
* has a replica role
|
||||
* has a lag lower or equal to max_lag
|
||||
|
@ -277,7 +277,7 @@ def cluster_has_replica(
|
|||
* `WARNING` / `CRITICAL`: otherwise
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* healthy_replica & unhealthy_replica count
|
||||
* the lag of each replica labelled with "member name"_lag
|
||||
"""
|
||||
|
@ -321,7 +321,7 @@ def cluster_config_has_changed(
|
|||
* `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`)
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_configuration_changed` is 1 if the configuration has changed
|
||||
"""
|
||||
# FIXME hash in perfdata ?
|
||||
|
@ -345,7 +345,7 @@ def cluster_config_has_changed(
|
|||
@click.pass_context
|
||||
@nagiosplugin.guarded
|
||||
def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
||||
"""Check if the cluster is in maintenance mode ie paused.
|
||||
"""Check if the cluster is in maintenance mode or paused.
|
||||
|
||||
\b
|
||||
Check:
|
||||
|
@ -353,7 +353,7 @@ def cluster_is_in_maintenance(ctx: click.Context) -> None:
|
|||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
|
@ -398,7 +398,7 @@ def node_is_replica(ctx: click.Context, max_lag: str) -> None:
|
|||
* `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold.
|
||||
* `CRITICAL`: otherwise
|
||||
|
||||
Perfdata : `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||
Perfdata: `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise.
|
||||
"""
|
||||
# FIXME add a lag check ??
|
||||
check = nagiosplugin.Check()
|
||||
|
@ -459,7 +459,7 @@ def node_tl_has_changed(ctx: click.Context, timeline: str, state_file: str) -> N
|
|||
* `CRITICAL`: The tl is not the same.
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_timeline_changed` is 1 if the tl has changed, 0 otherwise
|
||||
* the timeline
|
||||
"""
|
||||
|
@ -499,7 +499,7 @@ def node_patroni_version(ctx: click.Context, patroni_version: str) -> None:
|
|||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_version_ok` is 1 if version is ok, 0 otherwise
|
||||
"""
|
||||
# TODO the version cannot be written in perfdata find something else ?
|
||||
|
@ -525,7 +525,7 @@ def node_is_alive(ctx: click.Context) -> None:
|
|||
* `CRITICAL`: otherwise.
|
||||
|
||||
\b
|
||||
Perfdata :
|
||||
Perfdata:
|
||||
* `is_running` is 1 if patroni is running, 0 otherwise
|
||||
"""
|
||||
check = nagiosplugin.Check()
|
||||
|
|
|
@ -1,9 +0,0 @@
|
|||
[options]
|
||||
endpoints = https://10.20.199.3:8008, https://10.20.199.4:8008,https://10.20.199.5:8008
|
||||
cert_file = ./ssl/benoit-dalibo-cert.pem
|
||||
key_file = ./ssl/benoit-dalibo-key.pem
|
||||
ca_file = ./ssl/CA-cert.pem
|
||||
timeout = 0
|
||||
|
||||
[options.node_is_replica]
|
||||
lag=100
|
|
@ -23,28 +23,39 @@ cat << '_EOF_' > $README
|
|||
_EOF_
|
||||
helpme
|
||||
cat << '_EOF_' >> $README
|
||||
## install
|
||||
## Install
|
||||
|
||||
The check requers python3. Using a virtual env is advised for testing :
|
||||
Installation from the git repository:
|
||||
|
||||
```
|
||||
pip -m venv ~/venv
|
||||
source ~venv/bin/activate
|
||||
$ git clone <FIXME>
|
||||
```
|
||||
|
||||
Clone the repo, then install with pip3 from it :
|
||||
Change the branch if necessary. Then create a dedicated environment,
|
||||
install dependencies and then check_patroni from the repo:
|
||||
|
||||
```
|
||||
pip3 install .
|
||||
pip3 install .[dev]
|
||||
pip3 install .[test]
|
||||
$ cd check_patroni
|
||||
$ python3 -m venv .venv
|
||||
$ . .venv/bin/activate
|
||||
(.venv) $ pip3 install .
|
||||
(.venv) $ pip3 install .[dev] # for dev purposes
|
||||
(.venv) $ pip3 install .[test] # for testing purposes
|
||||
(.venv) $ check_patroni
|
||||
```
|
||||
|
||||
Links :
|
||||
To quit this env and destroy it:
|
||||
|
||||
```
|
||||
$ deactivate
|
||||
$ rm -r .venv
|
||||
```
|
||||
|
||||
Links:
|
||||
* [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/)
|
||||
* [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/)
|
||||
|
||||
## config file
|
||||
## Config file
|
||||
|
||||
All global and service specific parameters can be specified via a config file has follows:
|
||||
|
||||
|
@ -59,7 +70,7 @@ timeout = 0
|
|||
[options.node_is_replica]
|
||||
lag=100
|
||||
```
|
||||
## thresholds
|
||||
## Thresholds
|
||||
|
||||
The format for the threshold parameters is "[@][start:][end]".
|
||||
|
||||
|
@ -68,9 +79,9 @@ The format for the threshold parameters is "[@][start:][end]".
|
|||
* If `end` is omitted, infinity is assumed
|
||||
* To invert the match condition, prefix the range expression with "@".
|
||||
|
||||
A match is found when : start <= VALUE <= end
|
||||
A match is found when: start <= VALUE <= end
|
||||
|
||||
For example, the followinf command will raise :
|
||||
For example, the followinf command will raise:
|
||||
|
||||
* a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[
|
||||
* a critical if there are no nodes, wich can be translated to outside of range [1;+INF[
|
||||
|
@ -80,7 +91,8 @@ check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --cri
|
|||
```
|
||||
_EOF_
|
||||
readme
|
||||
readme "## cluster services"
|
||||
readme "## Cluster services"
|
||||
readme
|
||||
readme "### cluster_config_has_changed"
|
||||
helpme cluster_config_has_changed
|
||||
readme "### cluster_has_leader"
|
||||
|
@ -91,7 +103,8 @@ readme "### cluster_is_in_maintenance"
|
|||
helpme cluster_is_in_maintenance
|
||||
readme "### cluster_node_count"
|
||||
helpme cluster_node_count
|
||||
readme "## node services"
|
||||
readme "## Node services"
|
||||
readme
|
||||
readme "### node_is_alive"
|
||||
helpme node_is_alive
|
||||
readme "### node_is_pending_restart"
|
||||
|
|
Loading…
Reference in a new issue