Commit Graph

68 Commits

Author SHA1 Message Date
Christian Schwarz
d8d1d25ec2 docs: changelog for 0.6.1 2023-09-10 11:13:14 +00:00
Christian Schwarz
0df1c4cdcc docs: changelog: move donation banner to 0.6 release 2022-11-01 09:57:24 +01:00
Christian Schwarz
0a264b9b41 docs: add announcement for next release 2022-10-27 00:19:06 +02:00
Christian Schwarz
a3379d6785 docs: finalize 0.6 changelog 2022-10-27 00:19:06 +02:00
Yannick Dylla
1da8f848f2 snapper: support custom timestamp format
fixes https://github.com/zrepl/zrepl/issues/465
closes https://github.com/zrepl/zrepl/pull/639
2022-10-27 00:19:06 +02:00
Christian Schwarz
c743c7b03f refactor snapper & support cron-based snapshotting
fixes https://github.com/zrepl/zrepl/issues/554
refs https://github.com/zrepl/zrepl/discussions/547#discussioncomment-1936126
2022-09-25 19:23:44 +02:00
Christian Schwarz
2d8c3692ec rework resume token validation to allow resuming from raw sends of unencrypted datasets
Before this change, resuming from an unencrypted dataset with
send.raw=true specified wouldn't work with zrepl due to overly
restrictive resume token checking.

An initial PR to fix this was made in https://github.com/zrepl/zrepl/pull/503
but it didn't address the core of the problem.
The core of the problem was that zrepl assumed that if a resume token
contained `rawok=true, compressok=true`, the resulting send would be
encrypted. But if the sender dataset was unencrypted, such a resume would
actually result in an unencrypted send.
Which could be totally legitimate but zrepl failed to recognize that.

BACKGROUND
==========

The following snippets of OpenZFS code are insightful regarding how the
various ${X}ok values in the resume token are handled:

- 6c3c5fcfbe/module/zfs/dmu_send.c (L1947-L2012)
- 6c3c5fcfbe/module/zfs/dmu_recv.c (L877-L891)
- https://github.com/openzfs/zfs/blob/6c3c5fc/lib/libzfs/libzfs_sendrecv.c#L1663-L1672

Basically, some zfs send flags make the DMU send code set some DMU send
stream featureflags, although it's not a pure mapping, i.e, which DMU
send stream flags are used depends somewhat on the dataset (e.g., is it
encrypted or not, or, does it use zstd or not).

Then, the receiver looks at some (but not all) feature flags and maps
them to ${X}ok dataset zap attributes.

These are funnelled back to the sender 1:1 through the resume_token.

And the sender turns them into lzc flags.

As an example, let's look at zfs send --raw.
if the sender requests a raw send on an unencrypted dataset, the send
stream (and hence the resume token) will not have the raw stream
featureflag set, and hence the resume token will not have the rawok
field set. Instead, it will have compressok, embedok, and depending
on whether large blocks are present in the dataset, largeblockok set.

WHAT'S ZREPL'S ROLE IN THIS?
============================

zrepl provides a virtual encrypted sendflag that is like `raw`,
but further ensures that we only send encrypted datasets.

For any other resume token stuff, it shoudn't do any checking,
because it's a futile effort to keep up with ZFS send/recv features
that are orthogonal to encryption.

CHANGES MADE IN THIS COMMIT
===========================

- Rip out a bunch of needless checking that zrepl would do during
  planning. These checks were there to give better error messages,
  but actually, the error messages created by the endpoint.Sender.Send
  RPC upon send args validation failure are good enough.
- Add platformtests to validate all combinations of
  (Unencrypted/Encrypted FS) x (send.encrypted = true | false) x (send.raw = true | false)
  for cases both non-resuming and resuming send.

Additional manual testing done:
1. With zrepl 0.5, setup with unencrypted dataset, send.raw=true specified, no send.encrypted specified.
2. Observe that regular non-resuming send works, but resuming doesn't work.
3. Upgrade zrepl to this change.
4. Observe that both regular and resuming send works.

closes https://github.com/zrepl/zrepl/pull/613
2022-09-25 17:32:02 +02:00
Kiss Károly
d3f68ae4e8 replication: ignore bookmarks when computing incremental path
fixes https://github.com/zrepl/zrepl/issues/490
closes https://github.com/zrepl/zrepl/pull/619

Co-authored-by: Christian Schwarz <me@cschwarz.com>
2022-07-31 15:25:19 +02:00
3nprob
e4112d888c add ZREPL_DESTROY_MAX_BATCH_SIZE env var to control max batch destroy size
fixes #508
closes https://github.com/zrepl/zrepl/pull/604
2022-06-30 09:22:26 +02:00
Christian Schwarz
e0c7ceedd5 prevent transient zrepl status error: Post "http://unix/status": EOF
See the comment added to client.go in this commit.

fixes https://github.com/zrepl/zrepl/issues/483
fixes https://github.com/zrepl/zrepl/issues/262
fixes https://github.com/zrepl/zrepl/issues/379
fixes https://github.com/zrepl/zrepl/issues/379
2022-06-26 14:39:35 +02:00
Lapo Luchini
4a27cc63a8 prometheus: convert zrepl_version_daemon to zrepl_start_time metric
closes https://github.com/zrepl/zrepl/pull/556
fixes #553
2022-01-20 19:33:18 +01:00
Christian Schwarz
7d10a71cc0 0.5 changelog + front page update 2021-12-18 17:36:54 +01:00
Christian Schwarz
2d57ec6ee0 docs: changelog: mention upstream ashift 9 => 12 send/recv bug 2021-12-18 15:14:33 +01:00
Christian Schwarz
20ff9717bc fix mis-spelled send option for embedded data
fixes https://github.com/zrepl/zrepl/issues/522
2021-11-14 17:34:32 +01:00
InsanePrawn
b2c6e51a43 client/signal: Revert "add signal 'snapshot', rename existing signal 'wakeup' to 'replication'"
This was merged to master prematurely as the job components are not decoupled well enough
for these signals to be useful yet.

This reverts commit 2c8c2cfa14.

closes #452
2021-03-25 22:26:17 +01:00
Christian Schwarz
299808aaaf docs: 0.4 changelog: mention the 0.3.1 fix to prune:grid
fixes #400
refs 3a4e841c73
2021-03-14 21:13:26 +01:00
Christian Schwarz
7071adc774 docs: 0.4 changelog 2021-03-14 20:56:02 +01:00
Christian Schwarz
a58ce74ed0 implement new 'zrepl status'
Primary goals:

- Scrollable output ( fixes #245 )
- Sending job signals from status view
- Filtering of output by filesystem

Implementation:

- original TUI framework: github.com/rivo/tview
- but: tview is quasi-unmaintained, didn't support some features
- => use fork https://gitlab.com/tslocum/cview
- however, don't buy into either too much to avoid lock-in

- instead: **port over the existing status UI drawing code
  and adjust it to produce strings instead of directly
  drawing into the termbox buffer**

Co-authored-by: Calistoc <calistoc@protonmail.com>
Co-authored-by: InsanePrawn <insane.prawny@gmail.com>

fixes #245
fixes #220
2021-03-14 18:24:25 +01:00
Calistoc
2c8c2cfa14 add signal 'snapshot', rename existing signal 'wakeup' to 'replication' 2021-03-14 18:16:23 +01:00
Christian Schwarz
180eaea195 docs: 0.3.1 changelog 2020-11-01 14:19:03 +01:00
Christian Schwarz
6b4c6fc062 [#357] docs: update quickstart + tls transport to produce keypairs with subject alternative names
fixes #357
2020-08-22 03:05:30 +02:00
Hans Schulz
83fdffbcef replication: prometheus metric for number of failed replications in last attempt
- package replication: metric
- Grafana panel
- wiring
- changelog

Signed-off-by: Christian Schwarz <me@cschwarz.com>

closes #341
2020-08-04 01:19:44 +02:00
Christian Schwarz
30cdc1430e replication + endpoint: replication guarantees: guarantee_{resumability,incremental,nothing}
This commit

- adds a configuration in which no step holds, replication cursors, etc. are created
- removes the send.step_holds.disable_incremental setting
- creates a new config option `replication` for active-side jobs
- adds the replication.protection.{initial,incremental} settings, each
  of which can have values
    - `guarantee_resumability`
    - `guarantee_incremental`
    - `guarantee_nothing`
  (refer to docs/configuration/replication.rst for semantics)

The `replication` config from an active side is sent to both endpoint.Sender and endpoint.Receiver
for each replication step. Sender and Receiver then act accordingly.

For `guarantee_incremental`, we add the new `tentative-replication-cursor` abstraction.
The necessity for that abstraction is outlined in https://github.com/zrepl/zrepl/issues/340.

fixes https://github.com/zrepl/zrepl/issues/340
2020-07-26 20:32:35 +02:00
Christian Schwarz
b3e856f40d docs: changelog: 0.3: fix broken issue link 2020-06-22 12:30:42 +02:00
Christian Schwarz
8e1937fe75 doc: fixup 0.3 changelog 05f1237a6d 2020-06-14 18:29:37 +02:00
Christian Schwarz
dab222d95f docs: GitHub Sponsors link 2020-06-14 15:26:05 +02:00
Christian Schwarz
05f1237a6d docs: 0.3 changelog 2020-06-14 15:26:05 +02:00
Christian Schwarz
1b39e9d03c docs: update & extend replication overview wrt step holds + bookmarks 2020-06-14 15:21:36 +02:00
Christian Schwarz
0920a40751 reorganize shell completion generator command + support zsh
fixes #308
2020-04-18 19:23:04 +02:00
InsanePrawn
44bd354eae Spellcheck all files
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
2020-02-24 16:06:09 +01:00
Christian Schwarz
58c08c855f new features: {resumable,encrypted,hold-protected} send-recv, last-received-hold
- **Resumable Send & Recv Support**
  No knobs required, automatically used where supported.
- **Hold-Protected Send & Recv**
  Automatic ZFS holds to ensure that we can always resume a replication step.
- **Encrypted Send & Recv Support** for OpenZFS native encryption.
  Configurable at the job level, i.e., for all filesystems a job is responsible for.
- **Receive-side hold on last received dataset**
  The counterpart to the replication cursor bookmark on the send-side.
  Ensures that incremental replication will always be possible between a sender and receiver.

Design Doc
----------

`replication/design.md` doc describes how we use ZFS holds and bookmarks to ensure that a single replication step is always resumable.

The replication algorithm described in the design doc introduces the notion of job IDs (please read the details on this design doc).
We reuse the job names for job IDs and use `JobID` type to ensure that a job name can be embedded into hold tags, bookmark names, etc.
This might BREAK CONFIG on upgrade.

Protocol Version Bump
---------------------

This commit makes backwards-incompatible changes to the replication/pdu protobufs.
Thus, bump the version number used in the protocol handshake.

Replication Cursor Format Change
--------------------------------

The new replication cursor bookmark format is: `#zrepl_CURSOR_G_${this.GUID}_J_${jobid}`
Including the GUID enables transaction-safe moving-forward of the cursor.
Including the job id enables that multiple sending jobs can send the same filesystem without interfering.
The `zrepl migrate replication-cursor:v1-v2` subcommand can be used to safely destroy old-format cursors once zrepl has created new-format cursors.

Changes in This Commit
----------------------

- package zfs
  - infrastructure for holds
  - infrastructure for resume token decoding
  - implement a variant of OpenZFS's `entity_namecheck` and use it for validation in new code
  - ZFSSendArgs to specify a ZFS send operation
    - validation code protects against malicious resume tokens by checking that the token encodes the same send parameters that the send-side would use if no resume token were available (i.e. same filesystem, `fromguid`, `toguid`)
  - RecvOptions support for `recv -s` flag
  - convert a bunch of ZFS operations to be idempotent
    - achieved through more differentiated error message scraping / additional pre-/post-checks

- package replication/pdu
  - add field for encryption to send request messages
  - add fields for resume handling to send & recv request messages
  - receive requests now contain `FilesystemVersion To` in addition to the filesystem into which the stream should be `recv`d into
    - can use `zfs recv $root_fs/$client_id/path/to/dataset@${To.Name}`, which enables additional validation after recv (i.e. whether `To.Guid` matched what we received in the stream)
    - used to set `last-received-hold`
- package replication/logic
  - introduce `PlannerPolicy` struct, currently only used to configure whether encrypted sends should be requested from the sender
  - integrate encryption and resume token support into `Step` struct

- package endpoint
  - move the concepts that endpoint builds on top of ZFS to a single file `endpoint/endpoint_zfs.go`
    - step-holds + step-bookmarks
    - last-received-hold
    - new replication cursor + old replication cursor compat code
  - adjust `endpoint/endpoint.go` handlers for
    - encryption
    - resumability
    - new replication cursor
    - last-received-hold

- client subcommand `zrepl holds list`: list all holds and hold-like bookmarks that zrepl thinks belong to it
- client subcommand `zrepl migrate replication-cursor:v1-v2`
2020-02-14 22:00:13 +01:00
Christian Schwarz
5b50a66c6c daemon/snapper: refactor sync-up algorithm + warn about FSes awaiting first sync point
refs https://github.com/zrepl/zrepl/issues/256
2020-01-15 19:20:37 +01:00
Juergen Hoetzel
d35e2400b2 transport/{TCP,TLS}: optional IP_FREEBIND / IP_BINDANY bind socketops
Allows to bind to an address even if it is not actually (yet or ever)
configured. Fixes #238

Rationale:
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/#whatdoesthismeanformeadeveloper
2020-01-04 17:21:48 +01:00
Christian Schwarz
0261dbfe3d docs: 0.2.1 changelog 2019-11-20 20:16:41 +01:00
Christian Schwarz
3c03f21419 docs: SEPA hint, supporters, fix publish script 2019-10-03 11:57:19 +02:00
Christian Schwarz
8af824df41 docs: promote monetary support in changelog 2019-09-29 19:04:53 +02:00
Christian Schwarz
58ab25919e platformtest: dedicated pool per test, Makefile target, maintainer notice
fixes #216
fixes #211
2019-09-29 18:48:44 +02:00
Christian Schwarz
215848f476 docs: 0.2 changelog 2019-09-28 17:50:07 +02:00
Ross Williams
729c83ee72 pre- and post-snapshot hooks
* stack-based execution model, documented in documentation
* circbuf for capturing hook output
* built-in hooks for postgres and mysql
* refactor docs, too much info on the jobs page, too difficult
  to discover snapshotting & hooks

Co-authored-by: Ross Williams <ross@ross-williams.net>
Co-authored-by: Christian Schwarz <me@cschwarz.com>

fixes #74
2019-09-27 21:25:59 +02:00
Christian Schwarz
07956c2299 zfs,endpoint: use zfs destroy batch syntax if available
refs #72
2019-09-14 13:43:46 +02:00
Christian Schwarz
77d3a1ad4d build: drop go Dep, switch to modules, support Go 1.13
bump enumer to v1.1.1
bump golangci-lint to v1.17.1

no `go mod tidy` because 1.13 and 1.12 seem to alter each other's output

fixes #112
2019-09-14 13:36:44 +02:00
Christian Schwarz
234a327a03 build: Linux arm64 support
* protoc zip fetching
* Makefile:
    * GOOS and GOARCH
    * run vet on all targets

Note: freebsd/arm64 is apparently not supported

fixes #180
refs #181
2019-06-23 15:25:26 +02:00
Christian Schwarz
5138681c13 docs: 0.1.1 changelog 2019-04-06 12:36:06 +02:00
Christian Schwarz
bbcfb47d28 docs: changelog: more maintainer notices 2019-03-30 18:53:28 +01:00
Christian Schwarz
2f2e6e6a00 receiving side: placeholder as simple on|off property 2019-03-20 20:26:30 +01:00
Christian Schwarz
6f7467e8d8 Merge branch 'InsanePrawn-master' into 'master' 2019-03-20 19:45:00 +01:00
Christian Schwarz
f2a6193735 Merge branch 'problame/replication_refactor' 2019-03-20 19:44:15 +01:00
Christian Schwarz
283c2821cd docs: add SnapJob to 0.1 changelog 2019-03-18 14:53:12 +01:00
Christian Schwarz
84019238df docs: changelog: remove whitespace & churn 2019-03-18 14:50:32 +01:00
Christian Schwarz
cef865f5ce docs: changelog 0.1: document forgotten bugfixes & features 2019-03-18 14:50:32 +01:00