Commit Graph

1068 Commits

Author SHA1 Message Date
Christian Schwarz
4fae0bb68e grafana: update dashboard to Grafana 9.3.6
... by importing the old version of the dashboard JSON into Grafana 9.3.6, then
re-exporting it.
2023-02-26 11:28:57 +01:00
Guillermo Ramos
9777a441e9
dist: add openrc service file
closes https://github.com/zrepl/zrepl/pull/664
2023-01-27 23:59:45 +01:00
InsanePrawn
1a72edea5d docs/jobs: add replication- conflict_resolution-options to active job types 2023-01-26 00:09:28 +01:00
Christian Schwarz
96db636582 build: circleci: don't trigger periodic full pipeline build for problame/circleci-build 2023-01-08 12:35:59 +01:00
Christian Schwarz
190ab7c08d build: circleci: stop using minio for artifact storage
CircleCI artifacts are available publicly.
And regarding expiration of artifacts, it doesn't really
matter because I delete minio artifacts after 30d as well.
2022-12-30 14:24:23 +01:00
Christian Schwarz
6be133f55d remove unused JobDebugSettings along with docs
For this kind of debugging, we switched to env vars a while ago.
For example, ZREPL_RPC_DEBUG.

I don't think we have a substitute for the RPCLog stuff.
However, NetConnLogger is still in the codebase.

obsoletes https://github.com/zrepl/zrepl/pull/661
2022-12-22 18:13:45 +01:00
Christian Schwarz
5ffd470596 docs: update comment on overriding mountpoint properties during zfs recv of ZVOLs
fixes https://github.com/zrepl/zrepl/issues/430
2022-12-10 12:53:24 +01:00
Christian Schwarz
2119dc40ab docs: update supporters list 2022-12-10 12:00:57 +01:00
Christian Schwarz
0df1c4cdcc docs: changelog: move donation banner to 0.6 release 2022-11-01 09:57:24 +01:00
Christian Schwarz
2658695a35 build: bump minimum Go version to 1.18, as a dependency in ./tools requires it
https://app.circleci.com/pipelines/github/zrepl/zrepl/6085/workflows/bf5b11f2-8dc4-40a2-bb7a-fcf3cf8205d4/jobs/42340

  ...
  build github.com/golangci/golangci-lint/cmd/golangci-lint: cannot load io/fs: cannot find module providing package io/fs
  go install github.com/wadey/gocovmerge
  go: downloading github.com/wadey/gocovmerge v0.0.0-20160331181800-b5bfa59ec0ad
  go: extracting github.com/wadey/gocovmerge v0.0.0-20160331181800-b5bfa59ec0ad
  go install golang.org/x/tools/cmd/goimports
  # golang.org/x/mod/module
  ../../go/pkg/mod/golang.org/x/mod@v0.6.0/module/module.go:147:5: undefined: errors.As
  note: module requires Go 1.17
  go install golang.org/x/tools/cmd/stringer
  # golang.org/x/tools/go/internal/gcimporter
  ../../go/pkg/mod/golang.org/x/tools@v0.2.0/go/internal/gcimporter/iimport.go:520:9: undefined: constant.Make
  ../../go/pkg/mod/golang.org/x/tools@v0.2.0/go/internal/gcimporter/iimport.go:616:9: undefined: constant.Make
  note: module requires Go 1.18
  go install google.golang.org/grpc/cmd/protoc-gen-go-grpc
  go: downloading google.golang.org/grpc v1.46.2
  go: extracting google.golang.org/grpc v1.46.2
  go: downloading google.golang.org/grpc/cmd/protoc-gen-go-grpc v1.1.0
  go: extracting google.golang.org/grpc/cmd/protoc-gen-go-grpc v1.1.0
  go install google.golang.org/protobuf/cmd/protoc-gen-go

  Exited with code exit status 123
2022-10-31 20:13:36 +01:00
Christian Schwarz
1ac1635b3d build: circleci: update CA certs in go 1.12 image 2022-10-31 20:13:26 +01:00
Christian Schwarz
4a2806f6d1 build: fix deb-docker performance on newer Docker
See comment in Makefile
2022-10-27 00:47:12 +02:00
Christian Schwarz
0a264b9b41 docs: add announcement for next release 2022-10-27 00:19:06 +02:00
Christian Schwarz
a3379d6785 docs: finalize 0.6 changelog 2022-10-27 00:19:06 +02:00
Christian Schwarz
6260b75031 snapper: fix delayed snapshots caused by system suspend/resume
See explainer comment in periodic.go for details.

fixes https://github.com/zrepl/zrepl/issues/611
2022-10-27 00:19:06 +02:00
Christian Schwarz
3ffb69bfb0 config: support zrepl's day and week units for snapshotting.interval
Originally, I had a patch that would replace all usages of
time.Duration in package config with the new config.Duration
types, but:
1. these are all timeouts/retry intervals that have default values.
   Most users don't touch them, and if they do, they don't need
   day or week units.
2. go-yaml's error reporting for yaml.Unmarshaler is inferior to
   built-in types (line numbers are missing, so the error would not have
   sufficient context)

fixes https://github.com/zrepl/zrepl/issues/486
2022-10-27 00:19:06 +02:00
Yannick Dylla
1da8f848f2 snapper: support custom timestamp format
fixes https://github.com/zrepl/zrepl/issues/465
closes https://github.com/zrepl/zrepl/pull/639
2022-10-27 00:19:06 +02:00
Christian Schwarz
6ed4626df9 grafana dashboard: remove zrepl version number from title
fixes https://github.com/zrepl/zrepl/issues/624
2022-10-27 00:19:06 +02:00
Christian Schwarz
c07f9ec62e build: use go 1.19 for testing & release builds
New docker image since the old one was deprecated, according
to https://discuss.circleci.com/t/go-lang-docker-image-circleci-golang-1-19-is-missing/44961
2022-10-27 00:19:06 +02:00
Christian Schwarz
fd5b0e6831 build: update golangci-lint
The previous commits were done in response to updating to
the version that we now pin in this commit.
We do the update after the fixes so that each commit builds.
2022-10-27 00:19:06 +02:00
Christian Schwarz
a4cea1b4f3 go1.19: zfs.SendStream.Close() after EOF would return context cancellation error
Before upgrading to Go 1.19, these platform tests would sproadically
fail due to the reason outlined in the comment

  github.com/zrepl/zrepl/platformtest/tests.SendStreamMultipleCloseAfterEOF
  github.com/zrepl/zrepl/platformtest/tests.SendStreamCloseAfterEOFRead
2022-10-27 00:19:06 +02:00
Christian Schwarz
c0b52b92d5 systemd: set GOTRACEBACK=crash so that we have core dumps
They are useful, not least to debug issues with debugging
SIGSYS caused by overly restrictive settings in the unit file.
(See previous commit for an example.)
2022-10-26 22:39:18 +02:00
Christian Schwarz
12018b3685 go1.19: adjust systemd unit to allow setrlimit
Go 1.19 uses it during startup.

From the Go changelog:

> On Unix operating systems, Go programs that import package os now
> automatically increase the open file limit (RLIMIT_NOFILE) to the
> maximum allowed value; that is, they change the soft limit to match the
> hard limit. This corrects artificially low limits set on some systems
> for compatibility with very old C programs using the select system call.
> Go programs are not helped by that limit, and instead even simple
> programs like gofmt often ran out of file descriptors on such systems
> when processing many files in parallel. One impact of this change is
> that Go programs that in turn execute very old C programs in child
> processes may run those programs with too high a limit. This can be
> corrected by setting the hard limit before invoking the Go program.
2022-10-26 22:39:18 +02:00
Christian Schwarz
a91fb873e4 fix incorrect use of sort.StringSlice
A newer version of staticheck found these:

> SA4029: sort.StringSlice is a type, not a function, and
> sort.StringSlice(variants) doesn't sort your values; consider using
> sort.Strings instead (staticcheck)
2022-10-24 22:22:41 +02:00
Christian Schwarz
a6aa610165 run go1.19 gofmt and make adjustments as needed
(Go 1.19 expanded doc comment syntax)
2022-10-24 22:22:41 +02:00
Christian Schwarz
6c87bdb9fb go1.19: switch to new nolint directive that is compatible with Go 1.19 gofmt 2022-10-24 22:22:11 +02:00
Christian Schwarz
b9250a41a2 go1.18: address net.Error.Temporary() deprecation
Go 1.18 deprecated net.Error.Temporary().
This commit cleans up places where we use it incorrectly.
Also, the rpc layer defines some errors that implement

  interface { Temporary() bool }

I added comments to all of the implementations to indicate
whether they will be required if net.Error.Temporary is ever
ever removed in the future.

For HandshakeError, the Temporary() return value is actually
important. I moved & rewrote a (previously misplaced) comment
there.

The ReadStreamError changes were
1. necessary to pacify newer staticcheck and
2. technically, an error can implement Temporary()
   without being net.Err. This applies to some syscall
   errors in the standard library.

Reading list for those interested:
- https://github.com/golang/go/issues/45729
- https://groups.google.com/g/golang-nuts/c/-JcZzOkyqYI
- https://man7.org/linux/man-pages/man2/accept.2.html

Note: This change was prompted by staticheck:

> SA1019: neterr.Temporary has been deprecated since Go 1.18 because it
> shouldn't be used: Temporary errors are not well-defined. Most
> "temporary" errors are timeouts, and the few exceptions are surprising.
> Do not use this method. (staticcheck)
2022-10-24 22:21:52 +02:00
Christian Schwarz
a967986a18 fixup: fix hooks unit tests
The previous commit c743c7b03f
broke the hooks unit tests.

GitHub was not configured to require passing tests for master merge.
Didn't notice it locally due to Go's test caching.
I amended this before pushing this change.
2022-10-09 15:36:00 +02:00
Christian Schwarz
c743c7b03f refactor snapper & support cron-based snapshotting
fixes https://github.com/zrepl/zrepl/issues/554
refs https://github.com/zrepl/zrepl/discussions/547#discussioncomment-1936126
2022-09-25 19:23:44 +02:00
Christian Schwarz
a9c61b4b0b zrepl status UI: include w shortcut to wrap lines in help bar 2022-09-25 19:23:44 +02:00
Christian Schwarz
206d359dcd docs: sendrecvoptions: fix heading level for section on placeholders 2022-09-25 18:23:54 +02:00
Christian Schwarz
2d8c3692ec rework resume token validation to allow resuming from raw sends of unencrypted datasets
Before this change, resuming from an unencrypted dataset with
send.raw=true specified wouldn't work with zrepl due to overly
restrictive resume token checking.

An initial PR to fix this was made in https://github.com/zrepl/zrepl/pull/503
but it didn't address the core of the problem.
The core of the problem was that zrepl assumed that if a resume token
contained `rawok=true, compressok=true`, the resulting send would be
encrypted. But if the sender dataset was unencrypted, such a resume would
actually result in an unencrypted send.
Which could be totally legitimate but zrepl failed to recognize that.

BACKGROUND
==========

The following snippets of OpenZFS code are insightful regarding how the
various ${X}ok values in the resume token are handled:

- 6c3c5fcfbe/module/zfs/dmu_send.c (L1947-L2012)
- 6c3c5fcfbe/module/zfs/dmu_recv.c (L877-L891)
- https://github.com/openzfs/zfs/blob/6c3c5fc/lib/libzfs/libzfs_sendrecv.c#L1663-L1672

Basically, some zfs send flags make the DMU send code set some DMU send
stream featureflags, although it's not a pure mapping, i.e, which DMU
send stream flags are used depends somewhat on the dataset (e.g., is it
encrypted or not, or, does it use zstd or not).

Then, the receiver looks at some (but not all) feature flags and maps
them to ${X}ok dataset zap attributes.

These are funnelled back to the sender 1:1 through the resume_token.

And the sender turns them into lzc flags.

As an example, let's look at zfs send --raw.
if the sender requests a raw send on an unencrypted dataset, the send
stream (and hence the resume token) will not have the raw stream
featureflag set, and hence the resume token will not have the rawok
field set. Instead, it will have compressok, embedok, and depending
on whether large blocks are present in the dataset, largeblockok set.

WHAT'S ZREPL'S ROLE IN THIS?
============================

zrepl provides a virtual encrypted sendflag that is like `raw`,
but further ensures that we only send encrypted datasets.

For any other resume token stuff, it shoudn't do any checking,
because it's a futile effort to keep up with ZFS send/recv features
that are orthogonal to encryption.

CHANGES MADE IN THIS COMMIT
===========================

- Rip out a bunch of needless checking that zrepl would do during
  planning. These checks were there to give better error messages,
  but actually, the error messages created by the endpoint.Sender.Send
  RPC upon send args validation failure are good enough.
- Add platformtests to validate all combinations of
  (Unencrypted/Encrypted FS) x (send.encrypted = true | false) x (send.raw = true | false)
  for cases both non-resuming and resuming send.

Additional manual testing done:
1. With zrepl 0.5, setup with unencrypted dataset, send.raw=true specified, no send.encrypted specified.
2. Observe that regular non-resuming send works, but resuming doesn't work.
3. Upgrade zrepl to this change.
4. Observe that both regular and resuming send works.

closes https://github.com/zrepl/zrepl/pull/613
2022-09-25 17:32:02 +02:00
Christian Schwarz
7769263c2e platformtest: add QueueSubtest functionality
Use it from a top-level test case to queue the
execution of sub-tests after this test case is complete.

Note that the testing harness executes the subtest
_after_ the current top-level test. Hence, the subtest
cannot use any ZFS state of the top-level test.
2022-09-25 17:10:53 +02:00
Christian Schwarz
89f7c76c4e lint: allow empty else branches 2022-09-25 17:10:53 +02:00
jtagcat
c7771f98f5 docs: improve overview
There were and still is too many words. It's a very white paper vibe.
Docs needs to be more brief, exact, and on-point.

closes https://github.com/zrepl/zrepl/pull/618
2022-07-31 15:50:53 +02:00
jtagcat
299f1c906e docs: overview: clarify configs _are_ ordered
Previously with unordered list, and 'are considered'
left if unsure whether one or all files are 'considered'.
In reality, the first valid is used, so an ordered list and
perhaps better wording communicates this fact.

refs https://github.com/zrepl/zrepl/pull/618
2022-07-31 15:33:23 +02:00
Kiss Károly
d3f68ae4e8 replication: ignore bookmarks when computing incremental path
fixes https://github.com/zrepl/zrepl/issues/490
closes https://github.com/zrepl/zrepl/pull/619

Co-authored-by: Christian Schwarz <me@cschwarz.com>
2022-07-31 15:25:19 +02:00
Christian Schwarz
193abbe6b1 fix active child tasks panic with endpoint.ListAbstractionsStreamed
The goroutine that does endTask() for
"list-abstractions-streamed-producer" can be preempted
after it has closed the out and outErrs channel,
but before it calls endTask().
If the parent ("handler") then gets scheduled and
and ends itself, it will observe an active child task
"list-abstractions-streamed-producer".

This is easy to demo by injecting a sleep here:

  --- a/endpoint/endpoint_zfs_abstraction.go
  +++ b/endpoint/endpoint_zfs_abstraction.go
  @@ -575,6 +576,7 @@ func ListAbstractionsStreamed(ctx context.Context, query ListZFSHoldsAndBookmark
          ctx, endTask := trace.WithTask(ctx, "list-abstractions-streamed-producer")
          go func() {
                  defer endTask()
  +               defer time.Sleep(10 * time.Second)
                  defer close(out)
                  defer close(outErrs)

fixes https://github.com/zrepl/zrepl/issues/607
2022-07-17 21:44:03 +02:00
Goran Mekić
02b215128e build: consistently use $(MAKE) when invoking it recursively
Not for the `docker run ... make ...` commands though!

closes https://github.com/zrepl/zrepl/pull/615
2022-07-12 00:18:38 +02:00
Christian Schwarz
dc03db7423 rpc/grpcclientidentity/authlistener_grpc_adaptor: don't assume peer.Addr is set
On Illumos, getpeername doesn't work from Go on socketpair sockets.
That's why .RemoteAddr() returns nil on such a socket.
And that `nil` ultimately lands in the `p.Addr`.
So, `p.Addr.String()` would deref `nil`, leading to

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xaea33e]

    goroutine 614 [running]:
    github.com/zrepl/zrepl/rpc/grpcclientidentity.NewInterceptors.func1({0xf1e158, 0xc000631200}, {0xd514c0, 0xc000631230}, 0xc000032740, 0xc000524348)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/rpc/grpcclientidentity/authlistener_grpc_adaptor.go:121 +0x13e
    github.com/zrepl/zrepl/replication/logic/pdu._Replication_ListFilesystems_Handler({0xdb30c0, 0xc00001a630}, {0xf1e158, 0xc000631200}, 0xc00052b7a0, 0xc000522000)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/build/amd64/replication/logic/pdu/pdu_grpc.pb.go:186 +0x16a
    google.golang.org/grpc.(*Server).processUnaryRPC(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0xc000522150, 0x1497c78, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1217 +0xe28
    google.golang.org/grpc.(*Server).handleStream(0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200, 0x0)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:1540 +0xcb3
    google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc000373b70, 0xc00016e700, {0xf2bc00, 0xc0000f2780}, 0xc00011c200)
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:878 +0xad
    created by google.golang.org/grpc.(*Server).serveStreams.func1
    	/dpool/export/home/mills/Downloads/code/oi-userland-gh/components/sysutils/zrepl/zrepl-0.5.0/gopath/pkg/mod/google.golang.org/grpc@v1.35.0/server.go:876 +0x1ec

fixes https://github.com/zrepl/zrepl/issues/598
2022-07-10 23:59:40 +02:00
Cole Helbling
1df0f8912a Add --skip-cert-check flag to zrepl configcheck to prevent checking cert files
It may be desirable to check that a config is valid without checking for
the existence of certificate files (e.g. when validating a config inside
a sandbox without access to the cert files).

This will be very useful for NixOS so that we can check the config file
at nix-build time (e.g. potentially without proper permissions to read cert
files for a TLS connection).

fixes https://github.com/zrepl/zrepl/issues/467
closes https://github.com/zrepl/zrepl/pull/587
2022-07-08 20:18:41 +02:00
3nprob
e4112d888c add ZREPL_DESTROY_MAX_BATCH_SIZE env var to control max batch destroy size
fixes #508
closes https://github.com/zrepl/zrepl/pull/604
2022-06-30 09:22:26 +02:00
Christian Schwarz
53f9bd6d88 docs: update CLI usage to --mode raw & remove outdated "Limitations" section
fixes https://github.com/zrepl/zrepl/issues/609
2022-06-28 00:17:34 +02:00
JMoVS
43c2a0d9b0 docs: clarity on the section that covers more complex setups
closes https://github.com/zrepl/zrepl/pull/596
2022-06-27 22:41:12 +02:00
Christian Schwarz
e0c7ceedd5 prevent transient zrepl status error: Post "http://unix/status": EOF
See the comment added to client.go in this commit.

fixes https://github.com/zrepl/zrepl/issues/483
fixes https://github.com/zrepl/zrepl/issues/262
fixes https://github.com/zrepl/zrepl/issues/379
fixes https://github.com/zrepl/zrepl/issues/379
2022-06-26 14:39:35 +02:00
Christian Schwarz
2642c64303 make initial replication policy configurable (most_recent, all, fail)
Config:

```
- type: push
  ...
  conflict_resolution:
    initial_replication: most_recent | all | fali
```

The ``initial_replication`` option determines which snapshots zrepl
replicates if the filesystem has not been replicated before.
If ``most_recent`` (the default), the initial replication will only
transfer the most recent snapshot, while ignoring previous snapshots.
If all snapshots should be replicated, specify ``all``.
Use ``fail`` to make replication of the filesystem fail in case
there is no corresponding fileystem on the receiver.

Code-Level Changes, apart from the obvious:
- Rework IncrementalPath()'s return signature.
  Now returns an error for initial replications as well.
- Rename & rework it's consumer, resolveConflict().

Co-authored-by: Graham Christensen <graham@grahamc.com>

Fixes https://github.com/zrepl/zrepl/issues/550
Fixes https://github.com/zrepl/zrepl/issues/187
Closes https://github.com/zrepl/zrepl/pull/592
2022-06-26 14:36:59 +02:00
JMoVS
1acafabb5b docs: Fix typo in disjoing to disjoint
Signed-off-by: Justin Scholz <git@justinscholz.de>
2022-05-07 22:13:56 +02:00
Christian Schwarz
19b2deb2cf run go mod tidy; go version go1.17.2 linux/amd64
Juding from the (now deleted) comments in go.mod, this might break 1.12
build. If it does, the CI will catch it as it currently build using
1.17 and 1.12.

fixes https://github.com/zrepl/zrepl/issues/586
2022-05-07 21:59:51 +02:00
Christian Schwarz
ce6701fb33 status: fix over-counted step when status != stepping
This is a fixup of

  commit b00b61e967
  Author: Christian Schwarz <me@cschwarz.com>
  Date:   Sun Nov 21 15:15:23 2021 +0100

      status: user-visible replication step number should start at 1

fixes https://github.com/zrepl/zrepl/issues/589
refs https://github.com/zrepl/zrepl/issues/538
2022-04-24 15:24:39 +02:00
Christian Schwarz
0121929164 build: use git+https to fix lazy.sh docdep failures
CircleCI fails like so:

    #!/bin/bash -eo pipefail
    ./lazy.sh docdep

    pip3 is /home/circleci/.pyenv/shims/pip3
    Installing doc build dependencies
    Obtaining sphinxcontrib-versioning from git+git://github.com/rwblair/sphinxcontrib-versioning.git@7e3885a389a809e17ea55261316b7b0e98dbf98f#egg=sphinxcontrib-versioning (from -r ./docs/requirements.txt (line 28))
      Cloning git://github.com/rwblair/sphinxcontrib-versioning.git (to revision 7e3885a389a809e17ea55261316b7b0e98dbf98f) to ./src/sphinxcontrib-versioning
      Running command git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning
      fatal: remote error:
        The unauthenticated git protocol on port 9418 is no longer supported.
      Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
      error: subprocess-exited-with-error

      × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
      │ exit code: 128
      ╰─> See above for output.

      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error

    × git clone --filter=blob:none --quiet git://github.com/rwblair/sphinxcontrib-versioning.git /home/circleci/project/src/sphinxcontrib-versioning did not run successfully.
    │ exit code: 128
    ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip.

    Exited with code exit status 1

    CircleCI received exit code 1
2022-03-20 20:23:01 +01:00