Commit Graph

1109 Commits

Author SHA1 Message Date
Christian Schwarz
12503dc55a rpc/dataconn/timeoutconn: disable TestPartialWriteMockConn in CircleCI 2021-12-18 18:02:34 +01:00
Christian Schwarz
7d10a71cc0 0.5 changelog + front page update 2021-12-18 17:36:54 +01:00
Christian Schwarz
3d3d1b5679 quickstart: sample config uses placeholders, so provide sample value for recv.placeholder.encryption 2021-12-18 17:18:58 +01:00
Christian Schwarz
5240ab4949 docs: quickstart: make users aware that the example rules apply to all snaps, not just zrepl's
fixes https://github.com/zrepl/zrepl/issues/540
2021-12-18 16:30:15 +01:00
Christian Schwarz
19aebd399f docs: add a note that FreeBSD jail zfs userland needs to be kept in sync with kernel module
fixes https://github.com/zrepl/zrepl/issues/500
2021-12-18 16:06:26 +01:00
Christian Schwarz
04e03f4d06 platformtest: retry zpool export if 'pool is busy'
On Ubuntu, something seems to be holding on to the pool for too long.
2021-12-18 15:58:24 +01:00
Christian Schwarz
2e2a8a1d5d docs: add docs on how to run platform tests
fixes https://github.com/zrepl/zrepl/issues/478
2021-12-18 15:55:22 +01:00
Christian Schwarz
a2b2e0fe34 daemon/control: make http server {Read,Write}Timeout envconst-configurable
refs https://github.com/zrepl/zrepl/issues/379
2021-12-18 15:14:33 +01:00
Christian Schwarz
af2905d245 docs: apt repo: deploy gpg to /usr/share/keyrings and use 'signed-by' in repo definition
gpg --dearmor because of note in https://wiki.debian.org/DebianRepository/UseThirdParty

fixes https://github.com/zrepl/zrepl/issues/529
2021-12-18 15:14:33 +01:00
Christian Schwarz
c3f0041efd zrepl test placeholder: fix panic if dataset does not exist
fixes https://github.com/zrepl/zrepl/issues/406
2021-12-18 15:14:33 +01:00
Christian Schwarz
083f6001eb build: freebsd armv7 and arm64 binaries
fixes https://github.com/zrepl/zrepl/issues/539
2021-12-18 15:14:33 +01:00
Christian Schwarz
2d57ec6ee0 docs: changelog: mention upstream ashift 9 => 12 send/recv bug 2021-12-18 15:14:33 +01:00
Christian Schwarz
fb6a9be954 fix encrypt-on-receive with placeholders
fixes https://github.com/zrepl/zrepl/issues/504

Problem:
  plain send + recv with root_fs encrypted + placeholders causes plain recvs
  whereas user would expect encrypt-on-recv
Reason:
  We create placeholder filesytems with -o encryption=off.
  Thus, children received below those placeholders won't inherit
  encryption of root_fs.
Fix:
  We'll have three values for `recv.placeholders.encryption: unspecified (default) | off | inherit`.
  When we create a placeholder, we will fail the operation if  `recv.placeholders.encryption = unspecified`.
  The exception is if the placeholder filesystem is to encode the client identity ($root_fs/$client_identity) in a pull job.
  Those are created in `inherit` mode if the config field is `unspecified` so that users who don't need
  placeholders are not bothered by these details.

Future Work:
  Automatically warn existing users of encrypt-on-recv about the problem
  if they are affected.
  The problem that I hit during implementation of this is that the
  `encryption` prop's `source` doesn't quite behave like other props:
  `source` is `default` for `encryption=off` and `-` when `encryption=on`.
  Hence, we can't use `source` to distinguish the following 2x2 cases:
  (1) placeholder created with explicit -o encryption=off
  (2) placeholder created without specifying -o encryption
  with
  (A) an encrypted parent at creation time
  (B) an unencrypted parent at creation time
2021-12-18 15:12:47 +01:00
Christian Schwarz
c1e2c9826f trace: hint debug env var in error when crashing due to active child tasks
refs https://github.com/zrepl/zrepl/issues/542
2021-12-05 18:57:43 +01:00
Christian Schwarz
b00b61e967 status: user-visible replication step number should start at 1
fixes https://github.com/zrepl/zrepl/issues/538
2021-11-21 15:32:18 +01:00
Christian Schwarz
ac147b5a6f replication: report a filesystem is active vs. blocked on something
- `BlockedOn` prop in JSON report
- Bring back the `*` in front of the filesystem report as an activity indicator.

fixes https://github.com/zrepl/zrepl/issues/505
2021-11-14 17:34:32 +01:00
Samy Mahmoudi
1850a332ed docs: prune: improve docs for 'grid' rule
- Substitute full words for both string name 'gridspec' and short form 'grid spec'
- Fix alignment and make spacing more consistent
- Fix fall of snapshots into buckets for the example to really reflect right-exclusiveness

closes https://github.com/zrepl/zrepl/pull/535
2021-11-14 17:34:32 +01:00
Christian Schwarz
20ff9717bc fix mis-spelled send option for embedded data
fixes https://github.com/zrepl/zrepl/issues/522
2021-11-14 17:34:32 +01:00
Christian Schwarz
c2fbf93365 daemon: provide os.Environ() in zrepl status
Useful for debugging.

fixes https://github.com/zrepl/zrepl/issues/534
2021-11-14 17:34:32 +01:00
Christian Schwarz
cf5e8e8f26 docs: add runbook on how to migrate sending side to new zpool
fixes https://github.com/zrepl/zrepl/issues/525
2021-11-14 17:34:32 +01:00
Christian Schwarz
c600cc1f60 skip timing-sensitive tests on CircleCI
We had too many spurious test failures in the past.
But on a developer machine, the tests don't usually fail because the
system isn't loaded as much.
So, only disable test on CircleCI.
2021-11-14 17:34:32 +01:00
Lapo Luchini
c6a9ebc71c job/active: add "last completed" metric for error reporting
use case:

    So that I can use a more resilient alerting such as "last complete was sent more than 24h ago".

fixes https://github.com/zrepl/zrepl/issues/516
closes https://github.com/zrepl/zrepl/pull/530
2021-11-10 17:35:12 +01:00
Christian Schwarz
1f0f2f8569 pruner + docs: less confusing type names, some comments, better docs for keep: not_replicated
fixes https://github.com/zrepl/zrepl/issues/524
2021-10-10 21:11:38 +02:00
Christian Schwarz
5104ad3d0b build: use go 1.17 for testing & release builds 2021-10-09 16:51:08 +02:00
Christian Schwarz
a6dbda1ea8 go1.17: run goimports to supports the new //go:build lines 2021-10-09 16:51:08 +02:00
Christian Schwarz
1edb8014bc build: circleci: stop storing artifacts
When we need artifacts, we use MinIO anyways.
And we have accumulated about 1TiB of (free) CircleCI artifact storage.
Don't need to waste space unnecessarily.
2021-10-09 16:13:58 +02:00
Christian Schwarz
845195b7ed bandwidth limiting: fix crash with SnapJob
zrepl daemon panics when the snap job triggers

fixup for f5f269bfd5 (bandwidth limiting)
fixes #521

Oct 01 16:14:56 cstp zrepl[56563]: panic: invalid config`BandwidthLimit` field invalid: BucketCapacity must not be zero
Oct 01 16:14:56 cstp zrepl[56563]:         panic: end span: span still has active child spans
Oct 01 16:14:56 cstp zrepl[56563]: goroutine 38 [running]:
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/logging/trace.WithSpan.func2()
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/logging/trace/trace.go:341 +0x2ea
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:40 +0x2e
Oct 01 16:14:56 cstp zrepl[56563]: panic(0xcee9c0, 0xc000676730)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/go1.16.6/src/runtime/panic.go:965 +0x1b9
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/endpoint.NewSender(0xf5bbc0, 0xc0003840c0, 0xc0000b2c90, 0x4, 0xc0002c5958, 0x0, 0x0, 0x0, 0xc000068cf8)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/endpoint/endpoint.go:68 +0x1ec
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/job.(*SnapJob).doPrune(0xc00039e000, 0xf6e3b8, 0xc0006541b0)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/job/snapjob.go:179 +0x198
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run(0xc00039e000, 0xf6e3b8, 0xc0001d83c0)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/job/snapjob.go:127 +0x329
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon.(*jobs).start.func1(0xc0006a4100, 0xf6e3b8, 0xc00022a0f0, 0xf72d18, 0xc00039e000)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/daemon.go:255 +0x15b
Oct 01 16:14:56 cstp zrepl[56563]: created by github.com/zrepl/zrepl/daemon.(*jobs).start
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/daemon.go:251 +0x425
Oct 01 16:14:56 cstp systemd[1]: zrepl.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 01 16:14:56 cstp systemd[1]: zrepl.service: Failed with result 'exit-code'.
2021-10-09 15:52:38 +02:00
Christian Schwarz
2c9fcd7c14 rpc/dataconn: always close send stream returned from Sender.Send()
discovered while debugging #457
2021-10-09 15:43:31 +02:00
Christian Schwarz
4f9b63aa09 rework size estimation & dry sends
- use control connection (gRPC)
- use uint64 everywhere => fixes https://github.com/zrepl/zrepl/issues/463
- [BREAK] bump protocol version

closes https://github.com/zrepl/zrepl/pull/518
fixes https://github.com/zrepl/zrepl/issues/463
2021-10-09 15:43:27 +02:00
Christian Schwarz
a8e92971d0 zfs: rewrite SendStream, fix bug in Close() on FreeBSD, add platformtests
This commit was motivated by https://github.com/zrepl/zrepl/issues/495
where, on FreeBSD with OpenZFS 2.0, a SendStream.Close() call might wait indefinitely for `zfs send` to exit.
The reason is that, due to the refactoring done for redacted send & recv
(30af21b025),
the `dump_bytes` function, which writes to the pipe, executes in a separate thread (synctask taskq) iff not `HAVE_LARGE_STACKS`.
The `zfs send` process/thread waits for that taskq thread using an uninterruptible primitive.
So when we SIGKILL `zfs send`, that signal doesn't reach the right thread to interrupt the pipe write.

Theoretically this affects both Linux and FreeBSD, but most Linux users `HAVE_LARGE_STACKS` and since https://github.com/penzfs/zfs/pull/12350/files OpenZFS on FreeBSD `HAVE_LARGE_STACKS` as well.
However, at least until FreeBSD 13.1, possibly for the entire 13 lifecycle, we're going to have to live with that oddity.

Measures taken in this commit:
- Report the behavior as an upstream bug https://github.com/openzfs/zfs/issues/12500
- Change SendStream code so that it closes zrepl's read-end of the pipe (see comment in code)
- Clean up and make explicit SendStream's state handling
- Write extensive platformtests for SendStream
    - They pass on my Linux install and on FreeBSD 12
    - FreeBSD 13 still needs testing.

fixes https://github.com/zrepl/zrepl/issues/495
2021-09-19 20:11:31 +02:00
Christian Schwarz
b54e477602 platformtest: fix 'active child tasks' panic for ReceiveForceRollbackWorksUnencrypted
Revealed by rework of SendStream in a prior commit.
2021-09-19 20:03:01 +02:00
Christian Schwarz
959fb08a89 platformtest: fix replication tests (SizeEstimationConcurrency field in PlannerPolicy was not set)
fixup of 0ceea1b792 (parallel replication knobs)
2021-09-19 20:03:01 +02:00
Christian Schwarz
6ac012aa3c platformtest: work around missing feature detection for test 'ReplicationPropertyReplicationWorks' 2021-09-19 20:03:01 +02:00
Christian Schwarz
3e93b31f75 platformtest: fix bandwidth-limiting-related panics (missing BucketCapacity in sender/receiver config)
panic while running test: invalid config`Ratelimit` field invalid: BucketCapacity must not be zero
main.runTestCase.func1.1
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:190
runtime.gopanic
	/home/cs/go1.13/src/runtime/panic.go:679
github.com/zrepl/zrepl/endpoint.NewSender
	/home/cs/zrepl/zrepl/endpoint/endpoint.go:68
github.com/zrepl/zrepl/platformtest/tests.replicationInvocation.Do
	/home/cs/zrepl/zrepl/platformtest/tests/replication.go:87
github.com/zrepl/zrepl/platformtest/tests.ReplicationFailingInitialParentProhibitsChildReplication
	/home/cs/zrepl/zrepl/platformtest/tests/replication.go:925
main.runTestCase.func1
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:193
main.runTestCase
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:194
main.HarnessRun
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:107
main.main
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:42
runtime.main
	/home/cs/go1.13/src/runtime/proc.go:203
runtime.goexit
	/home/cs/go1.13/src/runtime/asm_amd64.s:1357

fixup for f5f269bfd5 (bandwidth limiting)
2021-09-19 20:03:01 +02:00
Christian Schwarz
08df208149 bandwidth limiting: use correct field name on error
fixup for f5f269bfd5 (bandwidth limiting)
2021-09-19 20:03:01 +02:00
Christian Schwarz
936ed73a45 rpc/dataconn: fix log message in closeErr case
refs #457
2021-09-19 18:40:39 +02:00
Christian Schwarz
8fee536260 transport/tcp: ipmap tests: remove tests that cover CIDR normalization
They broke on Go 1.17.
See
https://www.bleepingcomputer.com/news/security/go-rust-net-library-affected-by-critical-ip-address-validation-vulnerability/
for context.

fixes #514
2021-09-15 08:43:08 +02:00
Christian Schwarz
01b4792974 fix make deb-docker for all platforms but amd64 2021-09-13 22:54:21 +02:00
Christian Schwarz
ad80bb3735 status: byteprogresshistory: disable averaging as workaround for #497
refs #497
2021-09-12 20:08:44 +02:00
Christian Schwarz
f5f269bfd5 send/recv: job-level bandwidth limiting
Sponsored-by: Prominic.NET, Inc.

fixes #339
2021-09-12 20:08:43 +02:00
Christian Schwarz
5b16769057 docs: update supporters 2021-08-30 11:01:25 +02:00
Christian Schwarz
009bd410af docs: prune: improve grid example 2021-07-08 19:46:24 +02:00
Christian Schwarz
bcfcd7a134 docs / CI: stop creating churn with doc commits & commit as zreplbot@ 2021-07-08 17:07:24 +02:00
Matthias Freund
bf1276f767 status: port status-v1 ETA calculation patch
Must have forgotten to integrate it into the status-v2 branch at the
time.

refs https://github.com/zrepl/zrepl/issues/98#issuecomment-872154091

cc @dcdamien
2021-07-08 15:00:26 +02:00
James W. Brinkerhoff IV
9fa7a18351 docs: quickstart: external_disk: fix typo in example 'derive -> drive'
closes #472
2021-04-19 23:36:03 +02:00
sre
50e8ee4549 docs: apt repo: use sudo in the snippet that sets up the repo
I generally like when snippets are provided in a way which could be used without running as root, and uses sudo when applicable. This change allows for this.

It will, however print out one extra line, which is possible to remove by adding '>/dev/null' after '/etc/apt/sources.list.d/zrepl.list'.

closes #461
2021-04-17 21:50:36 +02:00
Lapo Luchini
3b5a1a8b9a docs/monitoring: change suggested prometheus port to 9811
Change to 9811 as registered with the prometheus project now.

Closes #444.
2021-03-28 18:18:02 +02:00
Christian Schwarz
f661d9429f pruning/keep_last_n: correctly handle the case where count > matching snaps
fixes #446
2021-03-25 22:42:01 +01:00
InsanePrawn
ac4b109872 status/interactive: Revert to simple wakeup/reset signalling
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>

closes #452
2021-03-25 22:26:17 +01:00
InsanePrawn
b2c6e51a43 client/signal: Revert "add signal 'snapshot', rename existing signal 'wakeup' to 'replication'"
This was merged to master prematurely as the job components are not decoupled well enough
for these signals to be useful yet.

This reverts commit 2c8c2cfa14.

closes #452
2021-03-25 22:26:17 +01:00