Commit Graph

1020 Commits

Author SHA1 Message Date
Christian Schwarz
ce6701fb33 status: fix over-counted step when status != stepping
This is a fixup of

  commit b00b61e967
  Author: Christian Schwarz <me@cschwarz.com>
  Date:   Sun Nov 21 15:15:23 2021 +0100

      status: user-visible replication step number should start at 1

fixes https://github.com/zrepl/zrepl/issues/589
refs https://github.com/zrepl/zrepl/issues/538
2022-04-24 15:24:39 +02:00
Christian Schwarz
0121929164 build: use git+https to fix lazy.sh docdep failures
CircleCI fails like so:

    #!/bin/bash -eo pipefail
    ./lazy.sh docdep

    pip3 is /home/circleci/.pyenv/shims/pip3
    Installing doc build dependencies
    Obtaining sphinxcontrib-versioning from git+git://github.com/rwblair/sphinxcontrib-versioning.git@7e3885a389a809e17ea55261316b7b0e98dbf98f#egg=sphinxcontrib-versioning (from -r ./docs/requirements.txt (line 28))
      Cloning git://github.com/rwblair/sphinxcontrib-versioning.git (to revision 7e3885a389a809e17ea55261316b7b0e98dbf98f) to ./src/sphinxcontrib-versioning
      Running command git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning
      fatal: remote error:
        The unauthenticated git protocol on port 9418 is no longer supported.
      Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
      error: subprocess-exited-with-error

      × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
      │ exit code: 128
      ╰─> See above for output.

      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error

    × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
    │ exit code: 128
    ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip.

    Exited with code exit status 1

    CircleCI received exit code 1
2022-03-20 20:23:01 +01:00
Christian Schwarz
bc96f8f212 build/circleci: update to Ubuntu 20.04 image for release-* jobs
Background: `machine: true` is deprecated:

    https://circleci.com/docs/2.0/images/linux-vm/14.04-to-20.04-migration/
2022-02-15 22:55:25 +01:00
Christian Schwarz
459508c9d9 docs: sendrecvoptions: placeholders: fix wrong link name and add summarizing config snippet for recv.placeholders
fixes https://github.com/zrepl/zrepl/issues/573
2022-02-05 10:59:33 +01:00
Lapo Luchini
4a27cc63a8 prometheus: convert zrepl_version_daemon to zrepl_start_time metric
closes https://github.com/zrepl/zrepl/pull/556
fixes #553
2022-01-20 19:33:18 +01:00
Christian Schwarz
0a6840273a build: add tag-release Make target 2022-01-20 19:25:22 +01:00
madbrain76
76ef84f83b docs: fix typo in backup_to_external_disk.rst
closes https://github.com/zrepl/zrepl/pull/568
2022-01-20 19:25:03 +01:00
Christian Schwarz
66946df756 docs: continous_server_backup: simplify by removing need for recv.placeholder 2022-01-09 12:51:00 +01:00
Andrew Gunnerson
556fac3002 docs: document fan-out replication & add quick-start guide
closes https://github.com/zrepl/zrepl/pull/552
fixes https://github.com/zrepl/zrepl/issues/551

Signed-off-by: Andrew Gunnerson <chillermillerlong@hotmail.com>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
2022-01-09 12:45:09 +01:00
Christian Schwarz
1ad7df2df3 docs: badges & links to Matrix chat room
fixes https://github.com/zrepl/zrepl/issues/488
2022-01-09 12:05:19 +01:00
Christian Schwarz
a3d010c5f0 util/optionaldeadline: disable scheduler latency-sensitive tests in CircleCI 2021-12-30 14:41:06 +01:00
Christian Schwarz
12503dc55a rpc/dataconn/timeoutconn: disable TestPartialWriteMockConn in CircleCI 2021-12-18 18:02:34 +01:00
Christian Schwarz
7d10a71cc0 0.5 changelog + front page update 2021-12-18 17:36:54 +01:00
Christian Schwarz
3d3d1b5679 quickstart: sample config uses placeholders, so provide sample value for recv.placeholder.encryption 2021-12-18 17:18:58 +01:00
Christian Schwarz
5240ab4949 docs: quickstart: make users aware that the example rules apply to all snaps, not just zrepl's
fixes https://github.com/zrepl/zrepl/issues/540
2021-12-18 16:30:15 +01:00
Christian Schwarz
19aebd399f docs: add a note that FreeBSD jail zfs userland needs to be kept in sync with kernel module
fixes https://github.com/zrepl/zrepl/issues/500
2021-12-18 16:06:26 +01:00
Christian Schwarz
04e03f4d06 platformtest: retry zpool export if 'pool is busy'
On Ubuntu, something seems to be holding on to the pool for too long.
2021-12-18 15:58:24 +01:00
Christian Schwarz
2e2a8a1d5d docs: add docs on how to run platform tests
fixes https://github.com/zrepl/zrepl/issues/478
2021-12-18 15:55:22 +01:00
Christian Schwarz
a2b2e0fe34 daemon/control: make http server {Read,Write}Timeout envconst-configurable
refs https://github.com/zrepl/zrepl/issues/379
2021-12-18 15:14:33 +01:00
Christian Schwarz
af2905d245 docs: apt repo: deploy gpg to /usr/share/keyrings and use 'signed-by' in repo definition
gpg --dearmor because of note in https://wiki.debian.org/DebianRepository/UseThirdParty

fixes https://github.com/zrepl/zrepl/issues/529
2021-12-18 15:14:33 +01:00
Christian Schwarz
c3f0041efd zrepl test placeholder: fix panic if dataset does not exist
fixes https://github.com/zrepl/zrepl/issues/406
2021-12-18 15:14:33 +01:00
Christian Schwarz
083f6001eb build: freebsd armv7 and arm64 binaries
fixes https://github.com/zrepl/zrepl/issues/539
2021-12-18 15:14:33 +01:00
Christian Schwarz
2d57ec6ee0 docs: changelog: mention upstream ashift 9 => 12 send/recv bug 2021-12-18 15:14:33 +01:00
Christian Schwarz
fb6a9be954 fix encrypt-on-receive with placeholders
fixes https://github.com/zrepl/zrepl/issues/504

Problem:
  plain send + recv with root_fs encrypted + placeholders causes plain recvs
  whereas user would expect encrypt-on-recv
Reason:
  We create placeholder filesytems with -o encryption=off.
  Thus, children received below those placeholders won't inherit
  encryption of root_fs.
Fix:
  We'll have three values for `recv.placeholders.encryption: unspecified (default) | off | inherit`.
  When we create a placeholder, we will fail the operation if  `recv.placeholders.encryption = unspecified`.
  The exception is if the placeholder filesystem is to encode the client identity ($root_fs/$client_identity) in a pull job.
  Those are created in `inherit` mode if the config field is `unspecified` so that users who don't need
  placeholders are not bothered by these details.

Future Work:
  Automatically warn existing users of encrypt-on-recv about the problem
  if they are affected.
  The problem that I hit during implementation of this is that the
  `encryption` prop's `source` doesn't quite behave like other props:
  `source` is `default` for `encryption=off` and `-` when `encryption=on`.
  Hence, we can't use `source` to distinguish the following 2x2 cases:
  (1) placeholder created with explicit -o encryption=off
  (2) placeholder created without specifying -o encryption
  with
  (A) an encrypted parent at creation time
  (B) an unencrypted parent at creation time
2021-12-18 15:12:47 +01:00
Christian Schwarz
c1e2c9826f trace: hint debug env var in error when crashing due to active child tasks
refs https://github.com/zrepl/zrepl/issues/542
2021-12-05 18:57:43 +01:00
Christian Schwarz
b00b61e967 status: user-visible replication step number should start at 1
fixes https://github.com/zrepl/zrepl/issues/538
2021-11-21 15:32:18 +01:00
Christian Schwarz
ac147b5a6f replication: report a filesystem is active vs. blocked on something
- `BlockedOn` prop in JSON report
- Bring back the `*` in front of the filesystem report as an activity indicator.

fixes https://github.com/zrepl/zrepl/issues/505
2021-11-14 17:34:32 +01:00
Samy Mahmoudi
1850a332ed docs: prune: improve docs for 'grid' rule
- Substitute full words for both string name 'gridspec' and short form 'grid spec'
- Fix alignment and make spacing more consistent
- Fix fall of snapshots into buckets for the example to really reflect right-exclusiveness

closes https://github.com/zrepl/zrepl/pull/535
2021-11-14 17:34:32 +01:00
Christian Schwarz
20ff9717bc fix mis-spelled send option for embedded data
fixes https://github.com/zrepl/zrepl/issues/522
2021-11-14 17:34:32 +01:00
Christian Schwarz
c2fbf93365 daemon: provide os.Environ() in zrepl status
Useful for debugging.

fixes https://github.com/zrepl/zrepl/issues/534
2021-11-14 17:34:32 +01:00
Christian Schwarz
cf5e8e8f26 docs: add runbook on how to migrate sending side to new zpool
fixes https://github.com/zrepl/zrepl/issues/525
2021-11-14 17:34:32 +01:00
Christian Schwarz
c600cc1f60 skip timing-sensitive tests on CircleCI
We had too many spurious test failures in the past.
But on a developer machine, the tests don't usually fail because the
system isn't loaded as much.
So, only disable test on CircleCI.
2021-11-14 17:34:32 +01:00
Lapo Luchini
c6a9ebc71c job/active: add "last completed" metric for error reporting
use case:

    So that I can use a more resilient alerting such as "last complete was sent more than 24h ago".

fixes https://github.com/zrepl/zrepl/issues/516
closes https://github.com/zrepl/zrepl/pull/530
2021-11-10 17:35:12 +01:00
Christian Schwarz
1f0f2f8569 pruner + docs: less confusing type names, some comments, better docs for keep: not_replicated
fixes https://github.com/zrepl/zrepl/issues/524
2021-10-10 21:11:38 +02:00
Christian Schwarz
5104ad3d0b build: use go 1.17 for testing & release builds 2021-10-09 16:51:08 +02:00
Christian Schwarz
a6dbda1ea8 go1.17: run goimports to supports the new //go:build lines 2021-10-09 16:51:08 +02:00
Christian Schwarz
1edb8014bc build: circleci: stop storing artifacts
When we need artifacts, we use MinIO anyways.
And we have accumulated about 1TiB of (free) CircleCI artifact storage.
Don't need to waste space unnecessarily.
2021-10-09 16:13:58 +02:00
Christian Schwarz
845195b7ed bandwidth limiting: fix crash with SnapJob
zrepl daemon panics when the snap job triggers

fixup for f5f269bfd5 (bandwidth limiting)
fixes #521

Oct 01 16:14:56 cstp zrepl[56563]: panic: invalid config`BandwidthLimit` field invalid: BucketCapacity must not be zero
Oct 01 16:14:56 cstp zrepl[56563]:         panic: end span: span still has active child spans
Oct 01 16:14:56 cstp zrepl[56563]: goroutine 38 [running]:
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/logging/trace.WithSpan.func2()
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/logging/trace/trace.go:341 +0x2ea
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:40 +0x2e
Oct 01 16:14:56 cstp zrepl[56563]: panic(0xcee9c0, 0xc000676730)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/go1.16.6/src/runtime/panic.go:965 +0x1b9
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/endpoint.NewSender(0xf5bbc0, 0xc0003840c0, 0xc0000b2c90, 0x4, 0xc0002c5958, 0x0, 0x0, 0x0, 0xc000068cf8)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/endpoint/endpoint.go:68 +0x1ec
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/job.(*SnapJob).doPrune(0xc00039e000, 0xf6e3b8, 0xc0006541b0)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/job/snapjob.go:179 +0x198
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run(0xc00039e000, 0xf6e3b8, 0xc0001d83c0)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/job/snapjob.go:127 +0x329
Oct 01 16:14:56 cstp zrepl[56563]: github.com/zrepl/zrepl/daemon.(*jobs).start.func1(0xc0006a4100, 0xf6e3b8, 0xc00022a0f0, 0xf72d18, 0xc00039e000)
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/daemon.go:255 +0x15b
Oct 01 16:14:56 cstp zrepl[56563]: created by github.com/zrepl/zrepl/daemon.(*jobs).start
Oct 01 16:14:56 cstp zrepl[56563]:         /home/cs/zrepl/zrepl/daemon/daemon.go:251 +0x425
Oct 01 16:14:56 cstp systemd[1]: zrepl.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 01 16:14:56 cstp systemd[1]: zrepl.service: Failed with result 'exit-code'.
2021-10-09 15:52:38 +02:00
Christian Schwarz
2c9fcd7c14 rpc/dataconn: always close send stream returned from Sender.Send()
discovered while debugging #457
2021-10-09 15:43:31 +02:00
Christian Schwarz
4f9b63aa09 rework size estimation & dry sends
- use control connection (gRPC)
- use uint64 everywhere => fixes https://github.com/zrepl/zrepl/issues/463
- [BREAK] bump protocol version

closes https://github.com/zrepl/zrepl/pull/518
fixes https://github.com/zrepl/zrepl/issues/463
2021-10-09 15:43:27 +02:00
Christian Schwarz
a8e92971d0 zfs: rewrite SendStream, fix bug in Close() on FreeBSD, add platformtests
This commit was motivated by https://github.com/zrepl/zrepl/issues/495
where, on FreeBSD with OpenZFS 2.0, a SendStream.Close() call might wait indefinitely for `zfs send` to exit.
The reason is that, due to the refactoring done for redacted send & recv
(30af21b025),
the `dump_bytes` function, which writes to the pipe, executes in a separate thread (synctask taskq) iff not `HAVE_LARGE_STACKS`.
The `zfs send` process/thread waits for that taskq thread using an uninterruptible primitive.
So when we SIGKILL `zfs send`, that signal doesn't reach the right thread to interrupt the pipe write.

Theoretically this affects both Linux and FreeBSD, but most Linux users `HAVE_LARGE_STACKS` and since https://github.com/penzfs/zfs/pull/12350/files OpenZFS on FreeBSD `HAVE_LARGE_STACKS` as well.
However, at least until FreeBSD 13.1, possibly for the entire 13 lifecycle, we're going to have to live with that oddity.

Measures taken in this commit:
- Report the behavior as an upstream bug https://github.com/openzfs/zfs/issues/12500
- Change SendStream code so that it closes zrepl's read-end of the pipe (see comment in code)
- Clean up and make explicit SendStream's state handling
- Write extensive platformtests for SendStream
    - They pass on my Linux install and on FreeBSD 12
    - FreeBSD 13 still needs testing.

fixes https://github.com/zrepl/zrepl/issues/495
2021-09-19 20:11:31 +02:00
Christian Schwarz
b54e477602 platformtest: fix 'active child tasks' panic for ReceiveForceRollbackWorksUnencrypted
Revealed by rework of SendStream in a prior commit.
2021-09-19 20:03:01 +02:00
Christian Schwarz
959fb08a89 platformtest: fix replication tests (SizeEstimationConcurrency field in PlannerPolicy was not set)
fixup of 0ceea1b792 (parallel replication knobs)
2021-09-19 20:03:01 +02:00
Christian Schwarz
6ac012aa3c platformtest: work around missing feature detection for test 'ReplicationPropertyReplicationWorks' 2021-09-19 20:03:01 +02:00
Christian Schwarz
3e93b31f75 platformtest: fix bandwidth-limiting-related panics (missing BucketCapacity in sender/receiver config)
panic while running test: invalid config`Ratelimit` field invalid: BucketCapacity must not be zero
main.runTestCase.func1.1
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:190
runtime.gopanic
	/home/cs/go1.13/src/runtime/panic.go:679
github.com/zrepl/zrepl/endpoint.NewSender
	/home/cs/zrepl/zrepl/endpoint/endpoint.go:68
github.com/zrepl/zrepl/platformtest/tests.replicationInvocation.Do
	/home/cs/zrepl/zrepl/platformtest/tests/replication.go:87
github.com/zrepl/zrepl/platformtest/tests.ReplicationFailingInitialParentProhibitsChildReplication
	/home/cs/zrepl/zrepl/platformtest/tests/replication.go:925
main.runTestCase.func1
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:193
main.runTestCase
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:194
main.HarnessRun
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:107
main.main
	/home/cs/zrepl/zrepl/platformtest/harness/harness.go:42
runtime.main
	/home/cs/go1.13/src/runtime/proc.go:203
runtime.goexit
	/home/cs/go1.13/src/runtime/asm_amd64.s:1357

fixup for f5f269bfd5 (bandwidth limiting)
2021-09-19 20:03:01 +02:00
Christian Schwarz
08df208149 bandwidth limiting: use correct field name on error
fixup for f5f269bfd5 (bandwidth limiting)
2021-09-19 20:03:01 +02:00
Christian Schwarz
936ed73a45 rpc/dataconn: fix log message in closeErr case
refs #457
2021-09-19 18:40:39 +02:00
Christian Schwarz
8fee536260 transport/tcp: ipmap tests: remove tests that cover CIDR normalization
They broke on Go 1.17.
See
https://www.bleepingcomputer.com/news/security/go-rust-net-library-affected-by-critical-ip-address-validation-vulnerability/
for context.

fixes #514
2021-09-15 08:43:08 +02:00
Christian Schwarz
01b4792974 fix make deb-docker for all platforms but amd64 2021-09-13 22:54:21 +02:00
Christian Schwarz
ad80bb3735 status: byteprogresshistory: disable averaging as workaround for #497
refs #497
2021-09-12 20:08:44 +02:00