Commit Graph

1052 Commits

Author SHA1 Message Date
Yannick Dylla
1da8f848f2 snapper: support custom timestamp format
fixes https://github.com/zrepl/zrepl/issues/465
closes https://github.com/zrepl/zrepl/pull/639
2022-10-27 00:19:06 +02:00
Christian Schwarz
6ed4626df9 grafana dashboard: remove zrepl version number from title
fixes https://github.com/zrepl/zrepl/issues/624
2022-10-27 00:19:06 +02:00
Christian Schwarz
c07f9ec62e build: use go 1.19 for testing & release builds
New docker image since the old one was deprecated, according
to https://discuss.circleci.com/t/go-lang-docker-image-circleci-golang-1-19-is-missing/44961
2022-10-27 00:19:06 +02:00
Christian Schwarz
fd5b0e6831 build: update golangci-lint
The previous commits were done in response to updating to
the version that we now pin in this commit.
We do the update after the fixes so that each commit builds.
2022-10-27 00:19:06 +02:00
Christian Schwarz
a4cea1b4f3 go1.19: zfs.SendStream.Close() after EOF would return context cancellation error
Before upgrading to Go 1.19, these platform tests would sproadically
fail due to the reason outlined in the comment

  github.com/zrepl/zrepl/platformtest/tests.SendStreamMultipleCloseAfterEOF
  github.com/zrepl/zrepl/platformtest/tests.SendStreamCloseAfterEOFRead
2022-10-27 00:19:06 +02:00
Christian Schwarz
c0b52b92d5 systemd: set GOTRACEBACK=crash so that we have core dumps
They are useful, not least to debug issues with debugging
SIGSYS caused by overly restrictive settings in the unit file.
(See previous commit for an example.)
2022-10-26 22:39:18 +02:00
Christian Schwarz
12018b3685 go1.19: adjust systemd unit to allow setrlimit
Go 1.19 uses it during startup.

From the Go changelog:

> On Unix operating systems, Go programs that import package os now
> automatically increase the open file limit (RLIMIT_NOFILE) to the
> maximum allowed value; that is, they change the soft limit to match the
> hard limit. This corrects artificially low limits set on some systems
> for compatibility with very old C programs using the select system call.
> Go programs are not helped by that limit, and instead even simple
> programs like gofmt often ran out of file descriptors on such systems
> when processing many files in parallel. One impact of this change is
> that Go programs that in turn execute very old C programs in child
> processes may run those programs with too high a limit. This can be
> corrected by setting the hard limit before invoking the Go program.
2022-10-26 22:39:18 +02:00
Christian Schwarz
a91fb873e4 fix incorrect use of sort.StringSlice
A newer version of staticheck found these:

> SA4029: sort.StringSlice is a type, not a function, and
> sort.StringSlice(variants) doesn't sort your values; consider using
> sort.Strings instead (staticcheck)
2022-10-24 22:22:41 +02:00
Christian Schwarz
a6aa610165 run go1.19 gofmt and make adjustments as needed
(Go 1.19 expanded doc comment syntax)
2022-10-24 22:22:41 +02:00
Christian Schwarz
6c87bdb9fb go1.19: switch to new nolint directive that is compatible with Go 1.19 gofmt 2022-10-24 22:22:11 +02:00
Christian Schwarz
b9250a41a2 go1.18: address net.Error.Temporary() deprecation
Go 1.18 deprecated net.Error.Temporary().
This commit cleans up places where we use it incorrectly.
Also, the rpc layer defines some errors that implement

  interface { Temporary() bool }

I added comments to all of the implementations to indicate
whether they will be required if net.Error.Temporary is ever
ever removed in the future.

For HandshakeError, the Temporary() return value is actually
important. I moved & rewrote a (previously misplaced) comment
there.

The ReadStreamError changes were
1. necessary to pacify newer staticcheck and
2. technically, an error can implement Temporary()
   without being net.Err. This applies to some syscall
   errors in the standard library.

Reading list for those interested:
- https://github.com/golang/go/issues/45729
- https://groups.google.com/g/golang-nuts/c/-JcZzOkyqYI
- https://man7.org/linux/man-pages/man2/accept.2.html

Note: This change was prompted by staticheck:

> SA1019: neterr.Temporary has been deprecated since Go 1.18 because it
> shouldn't be used: Temporary errors are not well-defined. Most
> "temporary" errors are timeouts, and the few exceptions are surprising.
> Do not use this method. (staticcheck)
2022-10-24 22:21:52 +02:00
Christian Schwarz
a967986a18 fixup: fix hooks unit tests
The previous commit c743c7b03f
broke the hooks unit tests.

GitHub was not configured to require passing tests for master merge.
Didn't notice it locally due to Go's test caching.
I amended this before pushing this change.
2022-10-09 15:36:00 +02:00
Christian Schwarz
c743c7b03f refactor snapper & support cron-based snapshotting
fixes https://github.com/zrepl/zrepl/issues/554
refs https://github.com/zrepl/zrepl/discussions/547#discussioncomment-1936126
2022-09-25 19:23:44 +02:00
Christian Schwarz
a9c61b4b0b zrepl status UI: include w shortcut to wrap lines in help bar 2022-09-25 19:23:44 +02:00
Christian Schwarz
206d359dcd docs: sendrecvoptions: fix heading level for section on placeholders 2022-09-25 18:23:54 +02:00
Christian Schwarz
2d8c3692ec rework resume token validation to allow resuming from raw sends of unencrypted datasets
Before this change, resuming from an unencrypted dataset with
send.raw=true specified wouldn't work with zrepl due to overly
restrictive resume token checking.

An initial PR to fix this was made in https://github.com/zrepl/zrepl/pull/503
but it didn't address the core of the problem.
The core of the problem was that zrepl assumed that if a resume token
contained `rawok=true, compressok=true`, the resulting send would be
encrypted. But if the sender dataset was unencrypted, such a resume would
actually result in an unencrypted send.
Which could be totally legitimate but zrepl failed to recognize that.

BACKGROUND
==========

The following snippets of OpenZFS code are insightful regarding how the
various ${X}ok values in the resume token are handled:

- 6c3c5fcfbe/module/zfs/dmu_send.c (L1947-L2012)
- 6c3c5fcfbe/module/zfs/dmu_recv.c (L877-L891)
- https://github.com/openzfs/zfs/blob/6c3c5fc/lib/libzfs/libzfs_sendrecv.c#L1663-L1672

Basically, some zfs send flags make the DMU send code set some DMU send
stream featureflags, although it's not a pure mapping, i.e, which DMU
send stream flags are used depends somewhat on the dataset (e.g., is it
encrypted or not, or, does it use zstd or not).

Then, the receiver looks at some (but not all) feature flags and maps
them to ${X}ok dataset zap attributes.

These are funnelled back to the sender 1:1 through the resume_token.

And the sender turns them into lzc flags.

As an example, let's look at zfs send --raw.
if the sender requests a raw send on an unencrypted dataset, the send
stream (and hence the resume token) will not have the raw stream
featureflag set, and hence the resume token will not have the rawok
field set. Instead, it will have compressok, embedok, and depending
on whether large blocks are present in the dataset, largeblockok set.

WHAT'S ZREPL'S ROLE IN THIS?
============================

zrepl provides a virtual encrypted sendflag that is like `raw`,
but further ensures that we only send encrypted datasets.

For any other resume token stuff, it shoudn't do any checking,
because it's a futile effort to keep up with ZFS send/recv features
that are orthogonal to encryption.

CHANGES MADE IN THIS COMMIT
===========================

- Rip out a bunch of needless checking that zrepl would do during
  planning. These checks were there to give better error messages,
  but actually, the error messages created by the endpoint.Sender.Send
  RPC upon send args validation failure are good enough.
- Add platformtests to validate all combinations of
  (Unencrypted/Encrypted FS) x (send.encrypted = true | false) x (send.raw = true | false)
  for cases both non-resuming and resuming send.

Additional manual testing done:
1. With zrepl 0.5, setup with unencrypted dataset, send.raw=true specified, no send.encrypted specified.
2. Observe that regular non-resuming send works, but resuming doesn't work.
3. Upgrade zrepl to this change.
4. Observe that both regular and resuming send works.

closes https://github.com/zrepl/zrepl/pull/613
2022-09-25 17:32:02 +02:00
Christian Schwarz
7769263c2e platformtest: add QueueSubtest functionality
Use it from a top-level test case to queue the
execution of sub-tests after this test case is complete.

Note that the testing harness executes the subtest
_after_ the current top-level test. Hence, the subtest
cannot use any ZFS state of the top-level test.
2022-09-25 17:10:53 +02:00
Christian Schwarz
89f7c76c4e lint: allow empty else branches 2022-09-25 17:10:53 +02:00
jtagcat
c7771f98f5 docs: improve overview
There were and still is too many words. It's a very white paper vibe.
Docs needs to be more brief, exact, and on-point.

closes https://github.com/zrepl/zrepl/pull/618
2022-07-31 15:50:53 +02:00
jtagcat
299f1c906e docs: overview: clarify configs _are_ ordered
Previously with unordered list, and 'are considered'
left if unsure whether one or all files are 'considered'.
In reality, the first valid is used, so an ordered list and
perhaps better wording communicates this fact.

refs https://github.com/zrepl/zrepl/pull/618
2022-07-31 15:33:23 +02:00
Kiss Károly
d3f68ae4e8 replication: ignore bookmarks when computing incremental path
fixes https://github.com/zrepl/zrepl/issues/490
closes https://github.com/zrepl/zrepl/pull/619

Co-authored-by: Christian Schwarz <me@cschwarz.com>
2022-07-31 15:25:19 +02:00
Christian Schwarz
193abbe6b1 fix active child tasks panic with endpoint.ListAbstractionsStreamed
The goroutine that does endTask() for
"list-abstractions-streamed-producer" can be preempted
after it has closed the out and outErrs channel,
but before it calls endTask().
If the parent ("handler") then gets scheduled and
and ends itself, it will observe an active child task
"list-abstractions-streamed-producer".

This is easy to demo by injecting a sleep here:

  --- a/endpoint/endpoint_zfs_abstraction.go
  +++ b/endpoint/endpoint_zfs_abstraction.go
  @@ -575,6 +576,7 @@ func ListAbstractionsStreamed(ctx context.Context, query ListZFSHoldsAndBookmark
          ctx, endTask := trace.WithTask(ctx, "list-abstractions-streamed-producer")
          go func() {
                  defer endTask()
  +               defer time.Sleep(10 * time.Second)
                  defer close(out)
                  defer close(outErrs)

fixes https://github.com/zrepl/zrepl/issues/607
2022-07-17 21:44:03 +02:00
Goran Mekić
02b215128e build: consistently use $(MAKE) when invoking it recursively
Not for the `docker run ... make ...` commands though!

closes https://github.com/zrepl/zrepl/pull/615
2022-07-12 00:18:38 +02:00
Christian Schwarz
dc03db7423 rpc/grpcclientidentity/authlistener_grpc_adaptor: don't assume peer.Addr is set
On Illumos, getpeername doesn't work from Go on socketpair sockets.
That's why .RemoteAddr() returns nil on such a socket.
And that `nil` ultimately lands in the `p.Addr`.
So, `p.Addr.String()` would deref `nil`, leading to

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xaea33e]

    goroutine 614 [running]:
    github.com/zrepl/zrepl/rpc/grpcclientidentity.NewInterceptors.func1({0xf1e158, 0xc000631200}, {0xd514c0, 0xc000631230}, 0xc000032740, 0xc000524348)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/rpc/grpcclientidentity/authlistener_grpc_adaptor.go:121 +0x13e
    github.com/zrepl/zrepl/replication/logic/pdu._Replication_ListFilesystems_Handler({0xdb30c0, 0xc00001a630}, {0xf1e158, 0xc000631200}, 0xc00052b7a0, 0xc000522000)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/replication/logic/pdu/pdu_grpc.pb.go:186 +0x16a
    google.golang.org/grpc.(*Server).processUnaryRPC(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0xc000522150, 0x1497c78, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1217 +0xe28
    google.golang.org/grpc.(*Server).handleStream(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1540 +0xcb3
    google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc000373b70, 0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:878 +0xad
    created by google.golang.org/grpc.(*Server).serveStreams.func1
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:876 +0x1ec

fixes https://github.com/zrepl/zrepl/issues/598
2022-07-10 23:59:40 +02:00
Cole Helbling
1df0f8912a Add --skip-cert-check flag to zrepl configcheck to prevent checking cert files
It may be desirable to check that a config is valid without checking for
the existence of certificate files (e.g. when validating a config inside
a sandbox without access to the cert files).

This will be very useful for NixOS so that we can check the config file
at nix-build time (e.g. potentially without proper permissions to read cert
files for a TLS connection).

fixes https://github.com/zrepl/zrepl/issues/467
closes https://github.com/zrepl/zrepl/pull/587
2022-07-08 20:18:41 +02:00
3nprob
e4112d888c add ZREPL_DESTROY_MAX_BATCH_SIZE env var to control max batch destroy size
fixes #508
closes https://github.com/zrepl/zrepl/pull/604
2022-06-30 09:22:26 +02:00
Christian Schwarz
53f9bd6d88 docs: update CLI usage to --mode raw & remove outdated "Limitations" section
fixes https://github.com/zrepl/zrepl/issues/609
2022-06-28 00:17:34 +02:00
JMoVS
43c2a0d9b0 docs: clarity on the section that covers more complex setups
closes https://github.com/zrepl/zrepl/pull/596
2022-06-27 22:41:12 +02:00
Christian Schwarz
e0c7ceedd5 prevent transient zrepl status error: Post "http://unix/status": EOF
See the comment added to client.go in this commit.

fixes https://github.com/zrepl/zrepl/issues/483
fixes https://github.com/zrepl/zrepl/issues/262
fixes https://github.com/zrepl/zrepl/issues/379
fixes https://github.com/zrepl/zrepl/issues/379
2022-06-26 14:39:35 +02:00
Christian Schwarz
2642c64303 make initial replication policy configurable (most_recent, all, fail)
Config:

```
- type: push
  ...
  conflict_resolution:
    initial_replication: most_recent | all | fali
```

The ``initial_replication`` option determines which snapshots zrepl
replicates if the filesystem has not been replicated before.
If ``most_recent`` (the default), the initial replication will only
transfer the most recent snapshot, while ignoring previous snapshots.
If all snapshots should be replicated, specify ``all``.
Use ``fail`` to make replication of the filesystem fail in case
there is no corresponding fileystem on the receiver.

Code-Level Changes, apart from the obvious:
- Rework IncrementalPath()'s return signature.
  Now returns an error for initial replications as well.
- Rename & rework it's consumer, resolveConflict().

Co-authored-by: Graham Christensen <graham@grahamc.com>

Fixes https://github.com/zrepl/zrepl/issues/550
Fixes https://github.com/zrepl/zrepl/issues/187
Closes https://github.com/zrepl/zrepl/pull/592
2022-06-26 14:36:59 +02:00
JMoVS
1acafabb5b docs: Fix typo in disjoing to disjoint
Signed-off-by: Justin Scholz <git@justinscholz.de>
2022-05-07 22:13:56 +02:00
Christian Schwarz
19b2deb2cf run go mod tidy; go version go1.17.2 linux/amd64
Juding from the (now deleted) comments in go.mod, this might break 1.12
build. If it does, the CI will catch it as it currently build using
1.17 and 1.12.

fixes https://github.com/zrepl/zrepl/issues/586
2022-05-07 21:59:51 +02:00
Christian Schwarz
ce6701fb33 status: fix over-counted step when status != stepping
This is a fixup of

  commit b00b61e967
  Author: Christian Schwarz <me@cschwarz.com>
  Date:   Sun Nov 21 15:15:23 2021 +0100

      status: user-visible replication step number should start at 1

fixes https://github.com/zrepl/zrepl/issues/589
refs https://github.com/zrepl/zrepl/issues/538
2022-04-24 15:24:39 +02:00
Christian Schwarz
0121929164 build: use git+https to fix lazy.sh docdep failures
CircleCI fails like so:

    #!/bin/bash -eo pipefail
    ./lazy.sh docdep

    pip3 is /home/circleci/.pyenv/shims/pip3
    Installing doc build dependencies
    Obtaining sphinxcontrib-versioning from git+git://github.com/rwblair/sphinxcontrib-versioning.git@7e3885a389a809e17ea55261316b7b0e98dbf98f#egg=sphinxcontrib-versioning (from -r ./docs/requirements.txt (line 28))
      Cloning git://github.com/rwblair/sphinxcontrib-versioning.git (to revision 7e3885a389a809e17ea55261316b7b0e98dbf98f) to ./src/sphinxcontrib-versioning
      Running command git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning
      fatal: remote error:
        The unauthenticated git protocol on port 9418 is no longer supported.
      Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
      error: subprocess-exited-with-error

      × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
      │ exit code: 128
      ╰─> See above for output.

      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error

    × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
    │ exit code: 128
    ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip.

    Exited with code exit status 1

    CircleCI received exit code 1
2022-03-20 20:23:01 +01:00
Christian Schwarz
bc96f8f212 build/circleci: update to Ubuntu 20.04 image for release-* jobs
Background: `machine: true` is deprecated:

    https://circleci.com/docs/2.0/images/linux-vm/14.04-to-20.04-migration/
2022-02-15 22:55:25 +01:00
Christian Schwarz
459508c9d9 docs: sendrecvoptions: placeholders: fix wrong link name and add summarizing config snippet for recv.placeholders
fixes https://github.com/zrepl/zrepl/issues/573
2022-02-05 10:59:33 +01:00
Lapo Luchini
4a27cc63a8 prometheus: convert zrepl_version_daemon to zrepl_start_time metric
closes https://github.com/zrepl/zrepl/pull/556
fixes #553
2022-01-20 19:33:18 +01:00
Christian Schwarz
0a6840273a build: add tag-release Make target 2022-01-20 19:25:22 +01:00
madbrain76
76ef84f83b docs: fix typo in backup_to_external_disk.rst
closes https://github.com/zrepl/zrepl/pull/568
2022-01-20 19:25:03 +01:00
Christian Schwarz
66946df756 docs: continous_server_backup: simplify by removing need for recv.placeholder 2022-01-09 12:51:00 +01:00
Andrew Gunnerson
556fac3002 docs: document fan-out replication & add quick-start guide
closes https://github.com/zrepl/zrepl/pull/552
fixes https://github.com/zrepl/zrepl/issues/551

Signed-off-by: Andrew Gunnerson <chillermillerlong@hotmail.com>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
2022-01-09 12:45:09 +01:00
Christian Schwarz
1ad7df2df3 docs: badges & links to Matrix chat room
fixes https://github.com/zrepl/zrepl/issues/488
2022-01-09 12:05:19 +01:00
Christian Schwarz
a3d010c5f0 util/optionaldeadline: disable scheduler latency-sensitive tests in CircleCI 2021-12-30 14:41:06 +01:00
Christian Schwarz
12503dc55a rpc/dataconn/timeoutconn: disable TestPartialWriteMockConn in CircleCI 2021-12-18 18:02:34 +01:00
Christian Schwarz
7d10a71cc0 0.5 changelog + front page update 2021-12-18 17:36:54 +01:00
Christian Schwarz
3d3d1b5679 quickstart: sample config uses placeholders, so provide sample value for recv.placeholder.encryption 2021-12-18 17:18:58 +01:00
Christian Schwarz
5240ab4949 docs: quickstart: make users aware that the example rules apply to all snaps, not just zrepl's
fixes https://github.com/zrepl/zrepl/issues/540
2021-12-18 16:30:15 +01:00
Christian Schwarz
19aebd399f docs: add a note that FreeBSD jail zfs userland needs to be kept in sync with kernel module
fixes https://github.com/zrepl/zrepl/issues/500
2021-12-18 16:06:26 +01:00
Christian Schwarz
04e03f4d06 platformtest: retry zpool export if 'pool is busy'
On Ubuntu, something seems to be holding on to the pool for too long.
2021-12-18 15:58:24 +01:00
Christian Schwarz
2e2a8a1d5d docs: add docs on how to run platform tests
fixes https://github.com/zrepl/zrepl/issues/478
2021-12-18 15:55:22 +01:00