Commit Graph

15 Commits

Author SHA1 Message Date
Goran Mekic
bc5e1ede04
metric to detect filesystems rules that don't match any local dataset (#653)
This PR adds a Prometheus counter called
`zrepl_zfs_list_unmatched_user_specified_dataset_count`.
Monitor for increases of the counter to detect filesystem filter rules that
have no effect because they don't match any local filesystem.

An example use case for this is the following story:
1. Someone sets up zrepl with `filesystems` filter for `zroot/pg14<`.
2. During the upgrade to Postgres 15, they rename the dataset to `zroot/pg15`,
   but forget to update the zrepl `filesystems` filter.
3. zrepl will not snapshot / replicate the `zroot/pg15<` datasets.

Since `filesystems` rules are always evaluated on the side that has the datasets,
we can smuggle this functionality into the `zfs` module's `ZFSList` function that
is used by all jobs with a `filesystems` filter.

Dashboard changes:
- histogram with increase in $__interval, one row per job
- table with increase in $__range
- explainer text box, so, people know what the previous two are about
We had to re-arrange some panels, hence the Git diff isn't great.

closes https://github.com/zrepl/zrepl/pull/653

Co-authored-by: Christian Schwarz <me@cschwarz.com>
Co-authored-by: Goran Mekić <meka@tilda.center>
2023-05-02 22:13:52 +02:00
Christian Schwarz
4fae0bb68e grafana: update dashboard to Grafana 9.3.6
... by importing the old version of the dashboard JSON into Grafana 9.3.6, then
re-exporting it.
2023-02-26 11:28:57 +01:00
Guillermo Ramos
9777a441e9
dist: add openrc service file
closes https://github.com/zrepl/zrepl/pull/664
2023-01-27 23:59:45 +01:00
Christian Schwarz
6ed4626df9 grafana dashboard: remove zrepl version number from title
fixes https://github.com/zrepl/zrepl/issues/624
2022-10-27 00:19:06 +02:00
Christian Schwarz
c0b52b92d5 systemd: set GOTRACEBACK=crash so that we have core dumps
They are useful, not least to debug issues with debugging
SIGSYS caused by overly restrictive settings in the unit file.
(See previous commit for an example.)
2022-10-26 22:39:18 +02:00
Christian Schwarz
12018b3685 go1.19: adjust systemd unit to allow setrlimit
Go 1.19 uses it during startup.

From the Go changelog:

> On Unix operating systems, Go programs that import package os now
> automatically increase the open file limit (RLIMIT_NOFILE) to the
> maximum allowed value; that is, they change the soft limit to match the
> hard limit. This corrects artificially low limits set on some systems
> for compatibility with very old C programs using the select system call.
> Go programs are not helped by that limit, and instead even simple
> programs like gofmt often ran out of file descriptors on such systems
> when processing many files in parallel. One impact of this change is
> that Go programs that in turn execute very old C programs in child
> processes may run those programs with too high a limit. This can be
> corrected by setting the hard limit before invoking the Go program.
2022-10-26 22:39:18 +02:00
Lapo Luchini
4a27cc63a8 prometheus: convert zrepl_version_daemon to zrepl_start_time metric
closes https://github.com/zrepl/zrepl/pull/556
fixes #553
2022-01-20 19:33:18 +01:00
Christian Schwarz
720a284db5 dist/grafana: fix endpoint abstractions cache metric panel
fixup of 30cdc14
2020-08-04 01:36:31 +02:00
Hans Schulz
83fdffbcef replication: prometheus metric for number of failed replications in last attempt
- package replication: metric
- Grafana panel
- wiring
- changelog

Signed-off-by: Christian Schwarz <me@cschwarz.com>

closes #341
2020-08-04 01:19:44 +02:00
Christian Schwarz
e391fa94f9 dist/grafana: update grafana dashboard
- uses version metric for 'instances up'
- displays active task count
- displays send abstractions cache entry count
- in general, graphs have a shorter y axis for better overview

fixes #332
2020-06-14 15:26:05 +02:00
Christian Schwarz
4301f741db dist/systemd: remove @privileged from SystemCallFilter + cleanup comments
fixes #237
2019-11-20 18:44:14 +01:00
Christian Schwarz
9e54f11960 dist/systemd: fix ssh-transport: create stdinserver runtime directory
tested to work on Debian Stretch

refs #237
2019-11-16 22:07:38 +01:00
Juergen Hoetzel
c524acb2df Fix invalid comment syntax 2019-10-06 16:23:20 +02:00
Christian Schwarz
056be1185d dist: add grafana dashboard
fixes #116
2019-03-16 16:12:34 +01:00
Christian Schwarz
b0898ec8bc dist: systemd service definition template
fixes #117
refs #145
2019-03-16 16:12:34 +01:00