zrepl

mirror of https://github.com/zrepl/zrepl.git synced 2024-11-22 08:23:50 +01:00

Author	SHA1	Message	Date
Goran Mekic	bc5e1ede04	metric to detect filesystems rules that don't match any local dataset (#653 ) This PR adds a Prometheus counter called `zrepl_zfs_list_unmatched_user_specified_dataset_count`. Monitor for increases of the counter to detect filesystem filter rules that have no effect because they don't match any local filesystem. An example use case for this is the following story: 1. Someone sets up zrepl with `filesystems` filter for `zroot/pg14<`. 2. During the upgrade to Postgres 15, they rename the dataset to `zroot/pg15`, but forget to update the zrepl `filesystems` filter. 3. zrepl will not snapshot / replicate the `zroot/pg15<` datasets. Since `filesystems` rules are always evaluated on the side that has the datasets, we can smuggle this functionality into the `zfs` module's `ZFSList` function that is used by all jobs with a `filesystems` filter. Dashboard changes: - histogram with increase in $__interval, one row per job - table with increase in $__range - explainer text box, so, people know what the previous two are about We had to re-arrange some panels, hence the Git diff isn't great. closes https://github.com/zrepl/zrepl/pull/653 Co-authored-by: Christian Schwarz <me@cschwarz.com> Co-authored-by: Goran Mekić <meka@tilda.center>	2023-05-02 22:13:52 +02:00
Christian Schwarz	a8e92971d0	zfs: rewrite SendStream, fix bug in Close() on FreeBSD, add platformtests This commit was motivated by https://github.com/zrepl/zrepl/issues/495 where, on FreeBSD with OpenZFS 2.0, a SendStream.Close() call might wait indefinitely for `zfs send` to exit. The reason is that, due to the refactoring done for redacted send & recv (`30af21b025`), the `dump_bytes` function, which writes to the pipe, executes in a separate thread (synctask taskq) iff not `HAVE_LARGE_STACKS`. The `zfs send` process/thread waits for that taskq thread using an uninterruptible primitive. So when we SIGKILL `zfs send`, that signal doesn't reach the right thread to interrupt the pipe write. Theoretically this affects both Linux and FreeBSD, but most Linux users `HAVE_LARGE_STACKS` and since https://github.com/penzfs/zfs/pull/12350/files OpenZFS on FreeBSD `HAVE_LARGE_STACKS` as well. However, at least until FreeBSD 13.1, possibly for the entire 13 lifecycle, we're going to have to live with that oddity. Measures taken in this commit: - Report the behavior as an upstream bug https://github.com/openzfs/zfs/issues/12500 - Change SendStream code so that it closes zrepl's read-end of the pipe (see comment in code) - Clean up and make explicit SendStream's state handling - Write extensive platformtests for SendStream - They pass on my Linux install and on FreeBSD 12 - FreeBSD 13 still needs testing. fixes https://github.com/zrepl/zrepl/issues/495	2021-09-19 20:11:31 +02:00
Christian Schwarz	af2d6579c5	[#347 ] zfscmd: fix dangling trace Task on .Start() failure fixes #347	2020-09-02 22:45:44 +02:00
Christian Schwarz	0f3da73ef1	[#347 ] zfscmd + zfs: define .Start() semantics, apply to call sites in pkg zfs fixes #347	2020-09-02 22:45:44 +02:00
Christian Schwarz	10a14a8c50	[#307 ] add package trace, integrate it with logging, and adopt it throughout zrepl package trace: - introduce the concept of tasks and spans, tracked as linked list within ctx - see package-level docs for an overview of the concepts - main feature 1: unique stack of task and span IDs - makes it easy to follow a series of log entries in concurrent code - main feature 2: ability to produce a chrome://tracing-compatible trace file - either via an env variable or a `zrepl pprof` subcommand - this is not a CPU profile, we already have go pprof for that - but it is very useful to visually inspect where the replication / snapshotter / pruner spends its time ( fixes #307 ) usage in package daemon/logging: - goal: every log entry should have a trace field with the ID stack from package trace - make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function - the context carries a linked list of injected fields which `logging.GetLogger` adds to the logger it returns - `logging.GetLogger` also uses package `trace` to get the task-and-span-stack and injects it into the returned logger's fields	2020-05-19 11:30:02 +02:00
Christian Schwarz	0e5c77d2be	[#277 ] rpc + zfs: drop zfs.StreamCopier, use io.ReadCloser instead	2020-05-18 19:46:24 +02:00
Christian Schwarz	aed6149c8c	zfscmd: fix crash in zfscmd_prometheus.go due to incorrectly extracted ProcessState fixup of `96e188d7c4` refs #196 refs #301 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x9a472a] goroutine 15826 [running]: os.(ProcessState).systemTime(...) /home/cs/go1.13/src/os/exec_unix.go:98 os.(ProcessState).SystemTime(...) /home/cs/go1.13/src/os/exec.go:141 github.com/zrepl/zrepl/zfs/zfscmd.waitPostPrometheus(0xc000c04800, 0xe21ce0, 0xc000068270, 0xbf9f80d88107e861, 0x19bae710e6, 0x13a8b60) /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd_prometheus.go:69 +0x22a github.com/zrepl/zrepl/zfs/zfscmd.(Cmd).waitPost(0xc000c04800, 0xe21ce0, 0xc000068270) /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd.go:155 +0x18a github.com/zrepl/zrepl/zfs/zfscmd.(Cmd).CombinedOutput(0xc000c04800, 0xc0004b8270, 0xd02eea, 0x3, 0xc0001f6c40, 0x3) /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd.go:40 +0xb3 github.com/zrepl/zrepl/zfs.ZFSRelease(0xe36aa0, 0xc0004b8270, 0xc0009a3a40, 0x13, 0xc0004a5d00, 0x1, 0x1, 0xed62eb221, 0x13a8b60) /home/cs/zrepl/zrepl/zfs/holds.go:102 +0x2a7 github.com/zrepl/zrepl/endpoint.ReleaseStep(0xe36aa0, 0xc0004b8270, 0xc0004befc0, 0xe, 0xd08482, 0x8, 0xc0001cb02f, 0x2, 0x1eeea3bff89dc90b, 0x134d6, ...) /home/cs/zrepl/zrepl/endpoint/endpoint_zfs_abstraction_step.go:130 +0x367 github.com/zrepl/zrepl/endpoint.(Sender).SendCompleted.func2(0xc000459190, 0xc000390e30, 0xc00041fd80, 0xc0004befc0, 0xe, 0xd08482, 0x8, 0xc0001cb02f, 0x2, 0x1eeea3bff89dc90b, ...) /home/cs/zrepl/zrepl/endpoint/endpoint.go:419 +0x1c3 created by github.com/zrepl/zrepl/endpoint.(Sender).SendCompleted /home/cs/zrepl/zrepl/endpoint/endpoint.go:413 +0x776	2020-04-21 14:10:25 +02:00
Christian Schwarz	0834a184b8	zfscmd: do not do duplicate waitPre callbacks it just makes sense that if we only dispatch one waitPost, we should also only dispatch one waitPre	2020-04-21 14:10:18 +02:00
Christian Schwarz	96e188d7c4	zfscmd: fix nil deref in waitPostLogging when command was killed fixes #301	2020-04-08 00:26:56 +02:00
Christian Schwarz	1336c91865	zfs: introduce pkg zfs/zfscmd for command logging, status, prometheus metrics refs #196	2020-04-05 20:47:25 +02:00

10 Commits