Commit Graph

863 Commits

Author SHA1 Message Date
Christian Schwarz
b3e856f40d docs: changelog: 0.3: fix broken issue link 2020-06-22 12:30:42 +02:00
Christian Schwarz
8e1937fe75 doc: fixup 0.3 changelog 05f1237a6d 2020-06-14 18:29:37 +02:00
Christian Schwarz
073514fc21 docs/publish.sh: only render latest (patch+rc) version for each (major,minor) versio. 2020-06-14 18:24:20 +02:00
Christian Schwarz
dab222d95f docs: GitHub Sponsors link 2020-06-14 15:26:05 +02:00
Christian Schwarz
a827894274 docs: add backup-to-external-disk quick-start guide and convert existing tutorial to quick-start guide
refs #219
fixes #329
2020-06-14 15:26:05 +02:00
Christian Schwarz
9a8d813d14 docs: fix typo in cli help for zfs-abstraction subcommand 2020-06-14 15:26:05 +02:00
Christian Schwarz
e391fa94f9 dist/grafana: update grafana dashboard
- uses version metric for 'instances up'
- displays active task count
- displays send abstractions cache entry count
- in general, graphs have a shorter y axis for better overview

fixes #332
2020-06-14 15:26:05 +02:00
Christian Schwarz
509185dfbe prometheus: expose zrepl version as const metric 2020-06-14 15:26:05 +02:00
Christian Schwarz
4b1b7a8561 envconst: queryable report of resolved variables + integration inot zrepl status --raw
fixes #299
refs #186
2020-06-14 15:26:05 +02:00
Christian Schwarz
b330ccca5d transport/ssh: bump go-netssh version to fix ssh client process leaks
fixes #322
2020-06-14 15:26:05 +02:00
Christian Schwarz
05f1237a6d docs: 0.3 changelog 2020-06-14 15:26:05 +02:00
Christian Schwarz
1c270b7e39 add option to disable step holds for incremental sends
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.

Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:

Suppose you yank the external drive during an incremental @from -> @to step:

* restarting that step or future incrementals @from -> @to_later` will be possible
  because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
    * in that case, the replication algorithm should determine that the resumable state
      on the receiving side isuseless because @to no longer exists on the sending side,
      and consequently clear it, and restart an incremental step @from -> @to_later

refs #288
2020-06-14 15:26:05 +02:00
Christian Schwarz
1b39e9d03c docs: update & extend replication overview wrt step holds + bookmarks 2020-06-14 15:21:36 +02:00
Christian Schwarz
655a2e5404 docs/configuration/overview.rst: fix wrong headline hierarchy 2020-06-14 15:21:36 +02:00
Christian Schwarz
9c80eea045 docs: update supporters 2020-06-14 15:21:36 +02:00
Christian Schwarz
175ad1dd0b zfs: ZFSListFilesystemVersions: remove handling of io.ErrUnexpectedEOF
ZFSListChan returns (*DatasetDoesNotExist) for the case mentioned in the comment
2020-06-14 15:21:36 +02:00
Christian Schwarz
728e97700f zfs: fix error message formatting for send args validation 2020-06-14 15:21:36 +02:00
Christian Schwarz
94a0fbf953 [#321] platformtest: add test for zfs.ZFSHolds 2020-06-14 15:21:36 +02:00
Christian Schwarz
b056e7b2b9 [#321] endpoint: ListAbstractions: acutally emit one Abstraction per matching hold 2020-06-14 15:21:36 +02:00
Christian Schwarz
6e927f20f9 [#321] platformtest: minimal integration tests for package replication
# Conflicts:
#	platformtest/tests/generated_cases.go
2020-06-14 15:21:36 +02:00
Christian Schwarz
301f163a44 [#321] platformtest: generate test case list + coverage tooling 2020-06-14 15:21:36 +02:00
Christian Schwarz
474652ea51 [#321] platformtest: fix test ListFilesystemVersionsZeroExistIsNotAnError 2020-06-14 15:21:36 +02:00
Christian Schwarz
1bc731e782 [#316] endpoint: delete unreachable code 2020-06-14 15:21:36 +02:00
Christian Schwarz
292b85b5ef [#316] endpoint / replication protocol: more robust step-holds and replication cursor management
- drop HintMostRecentCommonAncestor rpc call
    - it is wrong to put faith into the active side of the replication to always make that call
      (we might not trust it, ref pull setup)
- clean up step holds + step bookmarks + replication cursor bookmarks on
  send RPC instead
    - this makes it symmetric with Receive RPC
- use a cache (endpoint.sendAbstractionsCache) to avoid the cost of
  listing the on-disk endpoint abstractions state on every step

The "create" methods for endpoint abstractions (CreateReplicationCursor, HoldStep) are now fully
idempotent and return an Abstraction.

Notes about endpoint.sendAbstractionsCache:
- fills lazily from disk state on first `Get` operation
- fill from disk is generally only attempted once
    - unless the `ListAbstractions` fails, in which case the fill from
      disk is retried on next `Get` (the current `Get` will observe a
      subset of the actual on-disk abstractions)
    - the `Invalidate` method is called
- it is a global (zrepl process-wide) cache

fixes #316
2020-06-14 15:21:36 +02:00
Christian Schwarz
dce98d50da [#316] endpoint.Receiver.ListFilesystems: early-exit if root_fs is not imported
- discovered during investigation of #316
- this is not the fix for #316, as a malicious receiver who doesn't
  implement the behavior added by this patch could still cause leakage
  of step holds on the sender

refs #316
2020-05-19 11:30:02 +02:00
Christian Schwarz
10a14a8c50 [#307] add package trace, integrate it with logging, and adopt it throughout zrepl
package trace:

- introduce the concept of tasks and spans, tracked as linked list within ctx
    - see package-level docs for an overview of the concepts
    - **main feature 1**: unique stack of task and span IDs
        - makes it easy to follow a series of log entries in concurrent code
    - **main feature 2**: ability to produce a chrome://tracing-compatible trace file
        - either via an env variable or a `zrepl pprof` subcommand
        - this is not a CPU profile, we already have go pprof for that
        - but it is very useful to visually inspect where the
          replication / snapshotter / pruner spends its time
          ( fixes #307 )

usage in package daemon/logging:

- goal: every log entry should have a trace field with the ID stack from package trace

- make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function
    - the context carries a linked list of injected fields which
      `logging.GetLogger` adds to the logger it returns
    - `logging.GetLogger` also uses package `trace` to get the
      task-and-span-stack and injects it into the returned logger's fields
2020-05-19 11:30:02 +02:00
Christian Schwarz
bcb5965617 [#307] rpc: proper handling of context cancellation for transportmux + dataconn
- prior to this patch, context cancellation would leave rpc.Server open
- did not make problems because context was only cancelled by SIGINT,
  which was immediately followed by os.Exit
2020-05-18 19:46:24 +02:00
Christian Schwarz
f772b3d39f [#277] endpoint: Receiver.Receive: error message explaining problem with placeholders and encryption 2020-05-18 19:46:24 +02:00
Christian Schwarz
27db8c0afe [#277] endpoint: Receiver.Receive: better logging + placeholder state error early exit 2020-05-18 19:46:24 +02:00
Christian Schwarz
0e5c77d2be [#277] rpc + zfs: drop zfs.StreamCopier, use io.ReadCloser instead 2020-05-18 19:46:24 +02:00
Christian Schwarz
0280727985 [#277] replication/driver: enforce ordering during initial replication in order to support encrypted send
fixes #277
2020-05-18 19:46:24 +02:00
Christian Schwarz
b4abebce00 rpc/dataconn/timeoutconn: tests: relax deadline in timeout tests 2020-05-18 19:46:24 +02:00
Christian Schwarz
d89afe58d4 build: circleci: trigger debian binary package builds 2020-05-18 19:39:27 +02:00
Christian Schwarz
c855546b9f build: circleci: fix GitHub API auth method (use Authorization header)
existing API will be deprecated in June/July 2020
2020-05-18 19:39:27 +02:00
Christian Schwarz
7d6ee4c166 daemon: expose prometheus metrics on pprof listener (useful for debugging) 2020-05-18 19:39:27 +02:00
Bruce Smith
2fbd9d8f8c transport/tcp: support for CIDR-mask based ACLs + client-identities
Co-authored-by: Christian Schwarz <me@cschwarz.com>

fixes #235
close #265
2020-05-15 21:17:01 +02:00
Christian Schwarz
18e101a04e platformtest: FailNow on Errorf 2020-05-15 21:04:52 +02:00
Christian Schwarz
e594421322 replication/logic: log filesystem during replication steps 2020-05-15 20:55:59 +02:00
Christian Schwarz
0d4bfda2fb endpoint.ListAbstractionsError: fix stack overflow in .Error()
fixes #320
refs #318
2020-05-15 20:41:22 +02:00
Christian Schwarz
b83c026cdc replication/driver: fs.debug() helper that automatically prefixes with fs name 2020-05-15 20:25:54 +02:00
Christian Schwarz
46caf31075 replication/driver: rename receiver variable (fs *fs) to (f *fs) 2020-05-15 20:25:54 +02:00
Christian Schwarz
2b9d696b49 replication/driver: envconst for experimental parallel replication
refs #140
refs #302
2020-05-15 20:25:54 +02:00
Christian Schwarz
70afff6e3b endpoint: log %#v recv options 2020-05-15 20:25:54 +02:00
Christian Schwarz
01bbda13e5 build: Makefile: GO_EXTRA_BUILDFLAGS 2020-05-15 20:25:54 +02:00
John Ramsden
c5a8f6635f docs: add FreeBSD jail tutorial + reorg 'instalation' section 2020-05-02 13:43:00 +02:00
Christian Schwarz
6f441c55dc fixup e0b5bd7: crash on endpoint.ListStale if replication-cursor-v1 bookmark present
```
cs@cstp:[~/zrepl/zrepl]: artifacts/zrepl-linux-amd64 zfs-abstraction release-stale --dry-run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9de971]

goroutine 1 [running]:
github.com/zrepl/zrepl/endpoint.listStaleFiltering(0xc00012b700, 0x5, 0x8, 0x0, 0x13ccb58)
        /endpoint/endpoint_zfs_abstraction.go:736 +0x281
github.com/zrepl/zrepl/endpoint.ListStale(0xe3ae20, 0xc00026b740, 0x0, 0xe27c60, 0x13ccb58, 0xc0001d8690, 0x0, 0x0, 0x0, 0x1, ...)
        /endpoint/endpoint_zfs_abstraction.go:698 +0x3fd
github.com/zrepl/zrepl/client.doZabsReleaseStale(0xe3ae20, 0xc00026b740, 0x13a28a0, 0xc000151be0, 0x0, 0x1, 0x5705b4, 0xc0002686e0)
        /client/zfsabstractions_release.go:83 +0x1a0
github.com/zrepl/zrepl/cli.(*Subcommand).run(0x13a28a0, 0xc000264c80, 0xc000151be0, 0x0, 0x1)
        /cli/cli.go:104 +0xf5
github.com/spf13/cobra.(*Command).execute(0xc000264c80, 0xc000151bd0, 0x1, 0x1, 0xc000264c80, 0xc000151bd0)
        GOROOT/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x13a43c0, 0x0, 0x0, 0x0)
        GOROOT/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846 +0x2ea
github.com/spf13/cobra.(*Command).Execute(...)
        GOROOT/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
github.com/zrepl/zrepl/cli.Run()
        /cli/cli.go:151 +0x2d
main.main()
        /main.go:24 +0x20
```
2020-05-02 12:50:52 +02:00
Christian Schwarz
7b34d6cba5
Merge pull request #311 from zrepl/problame/zfscmd-fixes-backported-from-307-tracing-wip
zfscmd & zfs fixes created during WIP on #307
2020-04-21 14:24:42 +02:00
Christian Schwarz
70f9c6482f zfs: context propagation to ZFSListFilesystemVersions
fixup of 9568e46f05
2020-04-21 14:10:53 +02:00
Christian Schwarz
aed6149c8c zfscmd: fix crash in zfscmd_prometheus.go due to incorrectly extracted ProcessState
fixup of 96e188d7c4
refs #196
refs #301

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x9a472a]

goroutine 15826 [running]:
os.(*ProcessState).systemTime(...)
        /home/cs/go1.13/src/os/exec_unix.go:98
os.(*ProcessState).SystemTime(...)
        /home/cs/go1.13/src/os/exec.go:141
github.com/zrepl/zrepl/zfs/zfscmd.waitPostPrometheus(0xc000c04800, 0xe21ce0, 0xc000068270, 0xbf9f80d88107e861, 0x19bae710e6, 0x13a8b60)
        /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd_prometheus.go:69 +0x22a
github.com/zrepl/zrepl/zfs/zfscmd.(*Cmd).waitPost(0xc000c04800, 0xe21ce0, 0xc000068270)
        /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd.go:155 +0x18a
github.com/zrepl/zrepl/zfs/zfscmd.(*Cmd).CombinedOutput(0xc000c04800, 0xc0004b8270, 0xd02eea, 0x3, 0xc0001f6c40, 0x3)
        /home/cs/zrepl/zrepl/zfs/zfscmd/zfscmd.go:40 +0xb3
github.com/zrepl/zrepl/zfs.ZFSRelease(0xe36aa0, 0xc0004b8270, 0xc0009a3a40, 0x13, 0xc0004a5d00, 0x1, 0x1, 0xed62eb221, 0x13a8b60)
        /home/cs/zrepl/zrepl/zfs/holds.go:102 +0x2a7
github.com/zrepl/zrepl/endpoint.ReleaseStep(0xe36aa0, 0xc0004b8270, 0xc0004befc0, 0xe, 0xd08482, 0x8, 0xc0001cb02f, 0x2, 0x1eeea3bff89dc90b, 0x134d6, ...)
        /home/cs/zrepl/zrepl/endpoint/endpoint_zfs_abstraction_step.go:130 +0x367
github.com/zrepl/zrepl/endpoint.(*Sender).SendCompleted.func2(0xc000459190, 0xc000390e30, 0xc00041fd80, 0xc0004befc0, 0xe, 0xd08482, 0x8, 0xc0001cb02f, 0x2, 0x1eeea3bff89dc90b, ...)
        /home/cs/zrepl/zrepl/endpoint/endpoint.go:419 +0x1c3
created by github.com/zrepl/zrepl/endpoint.(*Sender).SendCompleted
        /home/cs/zrepl/zrepl/endpoint/endpoint.go:413 +0x776
2020-04-21 14:10:25 +02:00
Christian Schwarz
0834a184b8 zfscmd: do not do duplicate waitPre callbacks
it just makes sense that if we only dispatch one waitPost, we should
also only dispatch one waitPre
2020-04-21 14:10:18 +02:00