Commit Graph

972 Commits

Author SHA1 Message Date
Christian Schwarz
226962b4fb zfs: rewrite SendStream, fix bug in Close() on FreeBSD, add platformtests
This commit was motivated by https://github.com/zrepl/zrepl/issues/495
where, on FreeBSD with OpenZFS 2.0, a SendStream.Close() call might wait indefinitely for `zfs send` to exit.
The reason is that, due to the refactoring done for redacted send & recv
(30af21b025),
the `dump_bytes` function, which writes to the pipe, executes in a separate thread (synctask taskq) iff not `HAVE_LARGE_STACKS`.
The `zfs send` process/thread waits for that taskq thread using an uninterruptible primitive.
So when we SIGKILL `zfs send`, that signal doesn't reach the right thread to interrupt the pipe write.

Theoretically this affects both Linux and FreeBSD, but most Linux users `HAVE_LARGE_STACKS` and since https://github.com/penzfs/zfs/pull/12350/files OpenZFS on FreeBSD `HAVE_LARGE_STACKS` as well.
However, at least until FreeBSD 13.1, possibly for the entire 13 lifecycle, we're going to have to live with that oddity.

Measures taken in this commit:
- Report the behavior as an upstream bug https://github.com/openzfs/zfs/issues/12500
- Change SendStream code so that it closes zrepl's read-end of the pipe (see comment in code)
- Clean up and make explicit SendStream's state handling
- Write extensive platformtests for SendStream
    - They pass on my Linux install and on FreeBSD 12
    - FreeBSD 13 still needs testing.

fixes https://github.com/zrepl/zrepl/issues/495
2021-08-23 00:21:34 +02:00
Christian Schwarz
ff6ebe87f3 platformtest: fix 'active child tasks' panic for ReceiveForceRollbackWorksUnencrypted
Revealed by rework of SendStream in a prior commit.
2021-08-23 00:21:34 +02:00
Christian Schwarz
126874b423 platformtest: fix replication tests (SizeEstimationConcurrency field in PlannerPolicy was not set)
fixup of 0ceea1b792 (parallel replication knobs)
2021-08-23 00:21:34 +02:00
Christian Schwarz
157753d781 platformtest: work around missing feature detection for test 'ReplicationPropertyReplicationWorks' 2021-08-23 00:21:34 +02:00
Christian Schwarz
009bd410af docs: prune: improve grid example 2021-07-08 19:46:24 +02:00
Christian Schwarz
bcfcd7a134 docs / CI: stop creating churn with doc commits & commit as zreplbot@ 2021-07-08 17:07:24 +02:00
Matthias Freund
bf1276f767 status: port status-v1 ETA calculation patch
Must have forgotten to integrate it into the status-v2 branch at the
time.

refs https://github.com/zrepl/zrepl/issues/98#issuecomment-872154091

cc @dcdamien
2021-07-08 15:00:26 +02:00
James W. Brinkerhoff IV
9fa7a18351 docs: quickstart: external_disk: fix typo in example 'derive -> drive'
closes #472
2021-04-19 23:36:03 +02:00
sre
50e8ee4549 docs: apt repo: use sudo in the snippet that sets up the repo
I generally like when snippets are provided in a way which could be used without running as root, and uses sudo when applicable. This change allows for this.

It will, however print out one extra line, which is possible to remove by adding '>/dev/null' after '/etc/apt/sources.list.d/zrepl.list'.

closes #461
2021-04-17 21:50:36 +02:00
Lapo Luchini
3b5a1a8b9a docs/monitoring: change suggested prometheus port to 9811
Change to 9811 as registered with the prometheus project now.

Closes #444.
2021-03-28 18:18:02 +02:00
Christian Schwarz
f661d9429f pruning/keep_last_n: correctly handle the case where count > matching snaps
fixes #446
2021-03-25 22:42:01 +01:00
InsanePrawn
ac4b109872 status/interactive: Revert to simple wakeup/reset signalling
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>

closes #452
2021-03-25 22:26:17 +01:00
InsanePrawn
b2c6e51a43 client/signal: Revert "add signal 'snapshot', rename existing signal 'wakeup' to 'replication'"
This was merged to master prematurely as the job components are not decoupled well enough
for these signals to be useful yet.

This reverts commit 2c8c2cfa14.

closes #452
2021-03-25 22:26:17 +01:00
Lukas Schauer
ee2336a24b zfs: pipe size: default to value of /proc/sys/fs/pipe-max-siz
Addition by @problame: move getPipeCapacityHint() into platform-specific
code. This has the added benefit of not recognizing the envvar as an
envconst on platform that do not support resizing pipes.  => won't show
up in (zrepl status --raw).Global.Envconst

fixes #424
cloes #449
2021-03-25 22:24:50 +01:00
Cole Helbling
1e85b1cb5f client/status: allow raw mode without a tty
A fairly common pattern when presented with JSON is to pipe it to `jq`.

This also allows one to `zrepl status --mode raw > log` and operate on
the JSON there.

closes #442
2021-03-22 23:58:32 +01:00
Christian Schwarz
5c6d69a69c zfs: PropertySource: set type to uint32 so that enumer-generated code is platform-independent
make zrepl-bin test-platform-bin vet lint GOOS=freebsd   GOARCH=386
make[2]: Entering directory '/src'
GO111MODULE=on go build -mod=readonly  -ldflags "-X github.com/zrepl/zrepl/version.zreplVersion=v0.3.1-20-g07f2bff" -o "artifacts/zrepl-freebsd-386"
zfs/propertysource_enumer.go:41:9: constant 18446744073709551615 overflows PropertySource
zfs/propertysource_enumer.go:48:66: constant 18446744073709551615 overflows PropertySource
zfs/propertysource_enumer.go:57:23: constant 18446744073709551615 overflows PropertySource

fixes #429
2021-03-14 22:32:45 +01:00
Christian Schwarz
299808aaaf docs: 0.4 changelog: mention the 0.3.1 fix to prune:grid
fixes #400
refs 3a4e841c73
2021-03-14 21:13:26 +01:00
Christian Schwarz
7071adc774 docs: 0.4 changelog 2021-03-14 20:56:02 +01:00
InsanePrawn
8d678eed19 docs: add note about zfs recv -x mountpoint with ZVOLs
refs #430
2021-03-14 20:26:39 +01:00
Calistoc
ab7abd9686 status: show the current replication attempt's runtime 2021-03-14 18:24:28 +01:00
Christian Schwarz
a58ce74ed0 implement new 'zrepl status'
Primary goals:

- Scrollable output ( fixes #245 )
- Sending job signals from status view
- Filtering of output by filesystem

Implementation:

- original TUI framework: github.com/rivo/tview
- but: tview is quasi-unmaintained, didn't support some features
- => use fork https://gitlab.com/tslocum/cview
- however, don't buy into either too much to avoid lock-in

- instead: **port over the existing status UI drawing code
  and adjust it to produce strings instead of directly
  drawing into the termbox buffer**

Co-authored-by: Calistoc <calistoc@protonmail.com>
Co-authored-by: InsanePrawn <insane.prawny@gmail.com>

fixes #245
fixes #220
2021-03-14 18:24:25 +01:00
Calistoc
2c8c2cfa14 add signal 'snapshot', rename existing signal 'wakeup' to 'replication' 2021-03-14 18:16:23 +01:00
Christian Schwarz
0ceea1b792 replication: simplify parallel replication variables & expose them in config
closes #140
2021-03-14 17:30:10 +01:00
Christian Schwarz
07f2bfff6a build: bump release build Go version + quickcheck default to Go 1.16 2021-02-20 17:29:17 +01:00
Christian Schwarz
b9ee14f6d7 endpoint.ReceiverConfig: remove obsolete TODO comment 2021-02-20 17:20:58 +01:00
InsanePrawn
393fc10a69 [#285] support setting zfs send / recv flags in the config (send: -wLcepbS, recv: -ox)
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>

closes #285
closes #276
closes #24
2021-02-20 17:20:45 +01:00
Christian Schwarz
1c937e58f7 zfs.NilBool: document its purpose and move it to its own package 'nodefault' 2021-02-20 17:04:57 +01:00
Christian Schwarz
70bbdfe760 zfs: ResumeToken: parse embedok, largeblockok, savedok if available
Developed for #285 but ultimately not used for it.
2021-02-20 17:04:57 +01:00
Christian Schwarz
efe7b17d21 Update to protobuf v1.25 and grpc 1.35; bump CI to go1.12
From:
github.com/golang/protobuf v1.3.2
google.golang.org/grpc v1.17.0

To:
github.com/golang/protobuf v1.4.3
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0

About the two protobuf packages:
https://developers.google.com/protocol-buffers/docs/reference/go/faq
> Version v1.4.0 and higher of github.com/golang/protobuf wrap the new
implementation and permit programs to adopt the new API incrementally. For
example, the well-known types defined in github.com/golang/protobuf/ptypes are
simply aliases of those defined in the newer module. Thus,
google.golang.org/protobuf/types/known/emptypb and
github.com/golang/protobuf/ptypes/empty may be used interchangeably.

Notable Code Changes in zrepl:
- generate protobufs now contain a mutex so we can't copy them by value
  anymore
- grpc.WithDialer is deprecated => use grpc.WithContextDialer instead

Go1.12 is now actually required by some of the dependencies.
2021-01-25 00:39:01 +01:00
Christian Schwarz
166e80bb96 lazy.sh: use 'go install' + import paths from build/tools.go 2021-01-25 00:16:01 +01:00
Mathias Fredriksson
6ac537210b daemon: avoid math/rand race by using global source
Unless we're using the global source for math/rand, (*rand.Rand).Read
should not be called concurrently. We seed the rng in daemon.Run to
avoid ambiguity or hiding global side effects inside packages.

closes #414
2021-01-25 00:16:01 +01:00
Christian Schwarz
596a39c0f5 bump golangci-lint to 1.35.2 and fix resulting lint errors
GO111MODULE=on golangci-lint run ./...
endpoint/endpoint.go:487:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
                panic(fmt.Sprintf("ClientIdentityKey context value must be set"))
                      ^
platformtest/platformtest_ops.go:259:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
                                return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
                                                                    ^
platformtest/platformtest_ops.go:266:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
                                return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
                                                                    ^
util/optionaldeadline/optionaldeadline_test.go:97:50: SA1029: should not use built-in type string as key for value; define your own type to avoid collisions (staticcheck)
        pctx := context.WithValue(context.Background(), "key", "value")
                                                        ^
rpc/rpc_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/dataconn_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/frameconn/frameconn.go:42:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
                panic(fmt.Sprintf("frame header is 8 bytes long"))
                      ^
platformtest/platformtest_ops.go:322:40: S1039: unnecessary use of fmt.Sprintf (gosimple)
                        return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
2021-01-25 00:16:01 +01:00
Mathias Fredriksson
d118bcc717 rpc: fix data race in timeoutconn
- `timeoutconn` handles state, yet calls to Read/Write make a copy of
  that state (non-pointer receiver) so any outbound calls will not have
  the state updated

- Even without the copy issue, the renew methods can in edge cases set a
  new deadline _after_ DisableTimeouts have been called, consider the
  following racy behavior:
    1. `renewReadDeadline` is called, checks `renewDeadlinesDisabled`
       (not disabled)
    2. `DisableTimeouts` is called, sets `renewDeadlinesDisabled`
    3. `DisableTimeouts` invokes `c.SetDeadline`
    4. `renewReadDeadline` invokes `c.SetReadDeadline`

To fix the above, the `Conn` receiver was made to be a pointer
everywhere and access to renewDeadlinesDisabled is now guarded
by an RWMutex instead of using atomics.

closes #415
2021-01-25 00:16:01 +01:00
Mathias Fredriksson
48be4032a2 daemon: fix data race in snapjob pruner report
closes #416
2021-01-25 00:16:01 +01:00
Mathias Fredriksson
2d545c87cd rpc: fix read mutex unlock in dataconn stream
closes #413
2021-01-25 00:16:01 +01:00
Christian Schwarz
0b48ba54f8 replication/driver: simplify second-attempt step correlation code & fix statekeeping
Before this change, the step correlation code returned early in several cases:
- did not set f.planning.done in the cases where it was a no-op
- did not set f.planning.err in the cases where correlation did not
  succeed

Reported-by: InsanePrawn <insane.prawny@gmail.com>
2021-01-25 00:16:01 +01:00
Christian Schwarz
c3d87289bb [#388] util/semaphore: fix TestSemaphore test
fixes #388
2021-01-24 22:28:47 +01:00
Christian Schwarz
0d96627ffb [#388] trace: make WithTaskGroup actually concurrent 2021-01-24 22:28:47 +01:00
Christian Schwarz
61acc7494a [#388] endpoint: fix incorrect use of trace.WithTaskGroup in ListAbstractionsStreamed
Capturing behavior was broken.
2021-01-24 13:58:23 +01:00
Rafał Bugajewski
96d5288667
docs: fix typos 2020-12-17 12:00:29 +01:00
Christian Schwarz
b8cd3c59f1 docs: update supporters 2020-11-11 16:06:54 +01:00
Christian Schwarz
91632b52bb build: circleci: periodic full build: only build each of the matching branches once 2020-11-04 17:53:54 +01:00
Christian Schwarz
d39c0e3745 docs + readme: actually fix Patreon badge
see https://github.com/endel/shieldsio-patreon/issues/8#issuecomment-700144629
2020-11-01 16:10:48 +01:00
Christian Schwarz
180eaea195 docs: 0.3.1 changelog 2020-11-01 14:19:03 +01:00
Christian Schwarz
69ed2d7117 docs + readme: fix Patreon badge 2020-11-01 14:18:36 +01:00
Christian Schwarz
c420f3c909 [#381] zfs: ListFilesystemVersions: make list filesystems version invocation deterministic
fixes #381
ref #379
2020-11-01 13:59:21 +01:00
Christian Schwarz
0a2dea05a9 [#385] status + replication: warning if replication succeeeded without any filesystem being replicated
refs #385
refs #384
2020-11-01 13:51:28 +01:00
Christian Schwarz
293c89d392 [#385] replication: report AttemptDone if no filesystems are replicated
fixes #385
2020-11-01 13:51:28 +01:00
Jeremy Bryan Smith
bb5ef0c8b2 docs: fix link to template.sh sample hook file 2020-11-01 10:45:17 +01:00
Christian Schwarz
8f44cdc284 build: rpm: don't strip binaries
fixes build failure for aarch64

```
make rpm-docker GOARCH=arm64 GOOS=linux

+ /usr/lib/rpm/brp-compress
+ /usr/lib/rpm/brp-strip /usr/bin/strip
/usr/bin/strip: Unable to recognise the format of the input file `/build/src/artifacts/rpmbuild/BUILDROOT/zrepl-v0.3.0.38.g53028ed5-1.aarch64/usr/bin/zrepl'
error: Bad exit status from /var/tmp/rpm-tmp.g8Fdbs (%install)
    Bad exit status from /var/tmp/rpm-tmp.g8Fdbs (%install)
```
2020-11-01 10:23:50 +01:00