diff --git a/README.md b/README.md index e3a8a5e..abcd847 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ Options: --config FILE Read option defaults from the specified INI file [default: config.ini] -e, --endpoints TEXT API endpoint. Can be specified multiple times. + [default: http://127.0.0.1:8008] --cert_file TEXT File with the client certificate. --key_file TEXT File with the client key. --ca_file TEXT The CA certificate. @@ -16,6 +17,7 @@ Options: (debug) [x>=0] --version --timeout INTEGER Timeout in seconds for the API queries (0 to disable) + [default: 2] --help Show this message and exit. Commands: @@ -32,28 +34,39 @@ Commands: node_tl_has_changed Check if the timeline has changed. ``` -## install +## Install -The check requers python3. Using a virtual env is advised for testing : +Installation from the git repository: ``` -pip -m venv ~/venv -source ~venv/bin/activate +$ git clone ``` -Clone the repo, then install with pip3 from it : +Change the branch if necessary. Then create a dedicated environment, +install dependencies and then check_patroni from the repo: ``` -pip3 install . -pip3 install .[dev] -pip3 install .[test] +$ cd check_patroni +$ python3 -m venv .venv +$ . .venv/bin/activate +(.venv) $ pip3 install . +(.venv) $ pip3 install .[dev] # for dev purposes +(.venv) $ pip3 install .[test] # for testing purposes +(.venv) $ check_patroni ``` -Links : +To quit this env and destroy it: + +``` +$ deactivate +$ rm -r .venv +``` + +Links: * [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/) * [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/) -## config file +## Config file All global and service specific parameters can be specified via a config file has follows: @@ -68,7 +81,7 @@ timeout = 0 [options.node_is_replica] lag=100 ``` -## thresholds +## Thresholds The format for the threshold parameters is "[@][start:][end]". @@ -77,9 +90,9 @@ The format for the threshold parameters is "[@][start:][end]". * If `end` is omitted, infinity is assumed * To invert the match condition, prefix the range expression with "@". -A match is found when : start <= VALUE <= end +A match is found when: start <= VALUE <= end -For example, the followinf command will raise : +For example, the followinf command will raise: * a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[ * a critical if there are no nodes, wich can be translated to outside of range [1;+INF[ @@ -88,7 +101,8 @@ For example, the followinf command will raise : check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --critical 1: ``` -## cluster services +## Cluster services + ### cluster_config_has_changed ``` @@ -103,7 +117,7 @@ Usage: check_patroni cluster_config_has_changed [OPTIONS] * `OK`: The hash didn't change * `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`) - Perfdata : + Perfdata: * `is_configuration_changed` is 1 if the configuration has changed Options: @@ -123,7 +137,7 @@ Usage: check_patroni cluster_has_leader [OPTIONS] * `OK`: if there is a leader node. * `CRITICAL`: otherwise - Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise + Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise Options: --help Show this message and exit. @@ -136,7 +150,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS] Check if the cluster has healthy replicates. - A healthy replicate : + A healthy replicate: * is in running state * has a replica role * has a lag lower or equal to max_lag @@ -145,7 +159,7 @@ Usage: check_patroni cluster_has_replica [OPTIONS] * `OK`: if the healthy_replica count and their lag are compatible with the replica count threshold. * `WARNING` / `CRITICAL`: otherwise - Perfdata : + Perfdata: * healthy_replica & unhealthy_replica count * the lag of each replica labelled with "member name"_lag @@ -161,13 +175,13 @@ Options: ``` Usage: check_patroni cluster_is_in_maintenance [OPTIONS] - Check if the cluster is in maintenance mode ie paused. + Check if the cluster is in maintenance mode or paused. Check: * `OK`: If the cluster is in maintenance mode. * `CRITICAL`: otherwise. - Perfdata : + Perfdata: * `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise Options: @@ -198,7 +212,8 @@ Options: --help Show this message and exit. ``` -## node services +## Node services + ### node_is_alive ``` @@ -210,7 +225,7 @@ Usage: check_patroni node_is_alive [OPTIONS] * `OK`: If patroni is running. * `CRITICAL`: otherwise. - Perfdata : + Perfdata: * `is_running` is 1 if patroni is running, 0 otherwise Options: @@ -267,7 +282,7 @@ Usage: check_patroni node_is_replica [OPTIONS] * `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold. * `CRITICAL`: otherwise - Perfdata : `is_replica` is 1 if the node is a running replica with + Perfdata: `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise. Options: @@ -286,7 +301,7 @@ Usage: check_patroni node_patroni_version [OPTIONS] * `OK`: The version is the same as the input `--patroni-version` * `CRITICAL`: otherwise. - Perfdata : + Perfdata: * `is_version_ok` is 1 if version is ok, 0 otherwise Options: @@ -308,7 +323,7 @@ Usage: check_patroni node_tl_has_changed [OPTIONS] * `OK`: The timeline is the same as last time (`--state_file`) or the inputed timeline (`--timeline`) * `CRITICAL`: The tl is not the same. - Perfdata : + Perfdata: * `is_timeline_changed` is 1 if the tl has changed, 0 otherwise * the timeline diff --git a/check_patroni/cli.py b/check_patroni/cli.py index 78bb9c9..ca5b821 100644 --- a/check_patroni/cli.py +++ b/check_patroni/cli.py @@ -230,7 +230,7 @@ def cluster_has_leader(ctx: click.Context) -> None: * `OK`: if there is a leader node. * `CRITICAL`: otherwise - Perfdata : `has_leader` is 1 if there is a leader node, 0 otherwise + Perfdata: `has_leader` is 1 if there is a leader node, 0 otherwise """ # FIXME: Manage primary or standby leader in the same place ? check = nagiosplugin.Check() @@ -266,7 +266,7 @@ def cluster_has_replica( """Check if the cluster has healthy replicates. \b - A healthy replicate : + A healthy replicate: * is in running state * has a replica role * has a lag lower or equal to max_lag @@ -277,7 +277,7 @@ def cluster_has_replica( * `WARNING` / `CRITICAL`: otherwise \b - Perfdata : + Perfdata: * healthy_replica & unhealthy_replica count * the lag of each replica labelled with "member name"_lag """ @@ -321,7 +321,7 @@ def cluster_config_has_changed( * `CRITICAL`: The hash of the configuration has changed compared to the input (`--hash`) or last time (`--state_file`) \b - Perfdata : + Perfdata: * `is_configuration_changed` is 1 if the configuration has changed """ # FIXME hash in perfdata ? @@ -345,7 +345,7 @@ def cluster_config_has_changed( @click.pass_context @nagiosplugin.guarded def cluster_is_in_maintenance(ctx: click.Context) -> None: - """Check if the cluster is in maintenance mode ie paused. + """Check if the cluster is in maintenance mode or paused. \b Check: @@ -353,7 +353,7 @@ def cluster_is_in_maintenance(ctx: click.Context) -> None: * `CRITICAL`: otherwise. \b - Perfdata : + Perfdata: * `is_in_maintenance` is 1 the cluster is in maintenance mode, 0 otherwise """ check = nagiosplugin.Check() @@ -398,7 +398,7 @@ def node_is_replica(ctx: click.Context, max_lag: str) -> None: * `OK`: if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold. * `CRITICAL`: otherwise - Perfdata : `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise. + Perfdata: `is_replica` is 1 if the node is a running replica with noloadbalance tag and the lag is under the maximum threshold, 0 otherwise. """ # FIXME add a lag check ?? check = nagiosplugin.Check() @@ -459,7 +459,7 @@ def node_tl_has_changed(ctx: click.Context, timeline: str, state_file: str) -> N * `CRITICAL`: The tl is not the same. \b - Perfdata : + Perfdata: * `is_timeline_changed` is 1 if the tl has changed, 0 otherwise * the timeline """ @@ -499,7 +499,7 @@ def node_patroni_version(ctx: click.Context, patroni_version: str) -> None: * `CRITICAL`: otherwise. \b - Perfdata : + Perfdata: * `is_version_ok` is 1 if version is ok, 0 otherwise """ # TODO the version cannot be written in perfdata find something else ? @@ -525,7 +525,7 @@ def node_is_alive(ctx: click.Context) -> None: * `CRITICAL`: otherwise. \b - Perfdata : + Perfdata: * `is_running` is 1 if patroni is running, 0 otherwise """ check = nagiosplugin.Check() diff --git a/config.ini b/config.ini deleted file mode 100644 index a3066ad..0000000 --- a/config.ini +++ /dev/null @@ -1,9 +0,0 @@ -[options] -endpoints = https://10.20.199.3:8008, https://10.20.199.4:8008,https://10.20.199.5:8008 -cert_file = ./ssl/benoit-dalibo-cert.pem -key_file = ./ssl/benoit-dalibo-key.pem -ca_file = ./ssl/CA-cert.pem -timeout = 0 - -[options.node_is_replica] -lag=100 diff --git a/doc/make_readme.sh b/doc/make_readme.sh index 5c9d501..9de71db 100755 --- a/doc/make_readme.sh +++ b/doc/make_readme.sh @@ -23,28 +23,39 @@ cat << '_EOF_' > $README _EOF_ helpme cat << '_EOF_' >> $README -## install +## Install -The check requers python3. Using a virtual env is advised for testing : +Installation from the git repository: ``` -pip -m venv ~/venv -source ~venv/bin/activate +$ git clone ``` -Clone the repo, then install with pip3 from it : +Change the branch if necessary. Then create a dedicated environment, +install dependencies and then check_patroni from the repo: ``` -pip3 install . -pip3 install .[dev] -pip3 install .[test] +$ cd check_patroni +$ python3 -m venv .venv +$ . .venv/bin/activate +(.venv) $ pip3 install . +(.venv) $ pip3 install .[dev] # for dev purposes +(.venv) $ pip3 install .[test] # for testing purposes +(.venv) $ check_patroni ``` -Links : +To quit this env and destroy it: + +``` +$ deactivate +$ rm -r .venv +``` + +Links: * [pip & centos 7](https://linuxize.com/post/how-to-install-pip-on-centos-7/) * [pip & debian10](https://linuxize.com/post/how-to-install-pip-on-debian-10/) -## config file +## Config file All global and service specific parameters can be specified via a config file has follows: @@ -59,7 +70,7 @@ timeout = 0 [options.node_is_replica] lag=100 ``` -## thresholds +## Thresholds The format for the threshold parameters is "[@][start:][end]". @@ -68,9 +79,9 @@ The format for the threshold parameters is "[@][start:][end]". * If `end` is omitted, infinity is assumed * To invert the match condition, prefix the range expression with "@". -A match is found when : start <= VALUE <= end +A match is found when: start <= VALUE <= end -For example, the followinf command will raise : +For example, the followinf command will raise: * a warning if there is less than 1 nodes, wich can be translated to outside of range [2;+INF[ * a critical if there are no nodes, wich can be translated to outside of range [1;+INF[ @@ -80,7 +91,8 @@ check_patroni -e https://10.20.199.3:8008 cluster_has_replica --warning 2: --cri ``` _EOF_ readme -readme "## cluster services" +readme "## Cluster services" +readme readme "### cluster_config_has_changed" helpme cluster_config_has_changed readme "### cluster_has_leader" @@ -91,7 +103,8 @@ readme "### cluster_is_in_maintenance" helpme cluster_is_in_maintenance readme "### cluster_node_count" helpme cluster_node_count -readme "## node services" +readme "## Node services" +readme readme "### node_is_alive" helpme node_is_alive readme "### node_is_pending_restart"