zrepl

mirror of https://github.com/zrepl/zrepl.git synced 2024-11-21 16:03:32 +01:00

Author	SHA1	Message	Date
InsanePrawn	fa4e048169	readme: fix typo Signed-off-by: InsanePrawn <insane.prawny@gmail.com>	2020-08-21 22:05:05 +02:00
Christian Schwarz	4f9f21f7f2	logger: fix go-1.15-discovered conversion from int to string logger/datastructures.go:85:10: conversion from Level (int) to string yields a string of one rune	2020-08-12 21:44:02 +02:00
Christian Schwarz	480176ba2d	rpc/dataconn: fix go1.15-discovered recursive Error() method impl rpc/dataconn/dataconn_client.go:69:9: Sprintf format %s with arg e causes recursive Error method call	2020-08-12 21:41:31 +02:00
Christian Schwarz	1190c0f6d2	docs: supporters: update	2020-08-12 21:38:23 +02:00
Christian Schwarz	720a284db5	dist/grafana: fix endpoint abstractions cache metric panel fixup of `30cdc14`	2020-08-04 01:36:31 +02:00
Hans Schulz	83fdffbcef	replication: prometheus metric for number of failed replications in last attempt - package replication: metric - Grafana panel - wiring - changelog Signed-off-by: Christian Schwarz <me@cschwarz.com> closes #341	2020-08-04 01:19:44 +02:00
Christian Schwarz	0ee7a49d31	[#289 ] zfs: workaround for OpenZFS 0.7 dry send info with zero estimated size fixes #289	2020-07-26 20:32:35 +02:00
Christian Schwarz	02db5994fe	[#345 ] fix broken identification of parent-fs for initial replication ordering fixup of `02807279` fixes #345	2020-07-26 20:32:35 +02:00
Christian Schwarz	1d7a84e8ae	build: extract debian binary packaging workflow trigger into a reusable script	2020-07-26 20:32:35 +02:00
Christian Schwarz	a0e3dc7040	[#348 ] replication: add platformtest to check behavior on recv fail while still sending Regression test for #348	2020-07-26 20:32:35 +02:00
Christian Schwarz	43495d70c7	[#348 ] replication: return errors if .Send rpc returns a nil SendRes discovered while developing the next commit in this series	2020-07-26 20:32:35 +02:00
Christian Schwarz	fa586b493c	[#348 ] fix crash on early-recv error * The SendStream.Close() was not called by dataconn.Server, which left the zfs send process dangling. * When the source job's ctx interceptor closed the task, the dangling zfs send was detect by the trace package and panicked. 020-07-25T19:54:41-04:00 [ERRO][latitude][rpc.data][cyZj$J3Ca$J3Ca.CJwB]: cannot write send stream err="frameconn: shutting down" panic: end task: 1 active child tasks: end task: task still has active child tasks goroutine 196966 [running]: github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1.1(0xc000320680, 0xcde000) /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:221 +0x2f7 github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1() /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:237 +0x38 github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1() /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:41 +0x37 github.com/zrepl/zrepl/daemon/job.(PassiveSide).Run.func1(0xdcf780, 0xc0000a3560, 0xdc65a0, 0xc00035e620, 0xc0000a34d0) /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/job/passive.go:194 +0x2e7 github.com/zrepl/zrepl/rpc.NewServer.func3(0xdcf780, 0xc0001ce4b0, 0xdc65e0, 0xc00035e600, 0xc0000a34d0) /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/rpc_server.go:82 +0xd5 github.com/zrepl/zrepl/rpc/dataconn.(Server).serveConn(0xc0000a2ba0, 0xc00018eca0) /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:149 +0x3be github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve.func3(0xc0000b8180, 0xc0000a2ba0, 0xc00018eca0) /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:108 +0x5d created by github.com/zrepl/zrepl/rpc/dataconn.(Server).Serve /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:106 +0x24a 2020-07-25T19:58:55-04:00 [ERRO][latitude][rpc.data][Pt4F$gCWT$gCWT.fzhc]: cannot write send stream err="frameconn: shutting down" panic: end task: 1 active child tasks: end task: task still has active child tasks fixes #348	2020-07-26 20:32:35 +02:00
Christian Schwarz	30cdc1430e	replication + endpoint: replication guarantees: guarantee_{resumability,incremental,nothing} This commit - adds a configuration in which no step holds, replication cursors, etc. are created - removes the send.step_holds.disable_incremental setting - creates a new config option `replication` for active-side jobs - adds the replication.protection.{initial,incremental} settings, each of which can have values - `guarantee_resumability` - `guarantee_incremental` - `guarantee_nothing` (refer to docs/configuration/replication.rst for semantics) The `replication` config from an active side is sent to both endpoint.Sender and endpoint.Receiver for each replication step. Sender and Receiver then act accordingly. For `guarantee_incremental`, we add the new `tentative-replication-cursor` abstraction. The necessity for that abstraction is outlined in https://github.com/zrepl/zrepl/issues/340. fixes https://github.com/zrepl/zrepl/issues/340	2020-07-26 20:32:35 +02:00
Christian Schwarz	27673a23e9	config: add test for fromdefaults behavior	2020-07-26 20:32:35 +02:00
Christian Schwarz	95fc299733	daemon/job: test that sample configs are buildable	2020-07-26 20:32:35 +02:00
Christian Schwarz	4e702eedc9	cmd: zfs-abstraction list --json: fix panic (was panicking because `abstractions` is in fact a channel	2020-07-26 20:32:35 +02:00
Christian Schwarz	8ff83f2f1a	[#342 ] endpoint: always create unencrypted placeholder filesystems This "breaks" the use case of receiving an unencrypted send into an encrypted receiver by setting the receiver's `root_fs`'s `encryption=on`. "breaks" in air-quotes because we have not yet released a version of zrepl with encrypted send support. We will bring back the featured outlined above in a future release. See https://github.com/zrepl/zrepl/issues/342#issuecomment-657231818 and following.	2020-07-26 20:32:35 +02:00
Christian Schwarz	4b8f0ad112	docs: supporters: update	2020-06-22 13:36:00 +02:00
Brian Candler	dbc8bbeb6a	docs: config: prune: example: keep manual snapshots on receiver Fixes #335 closes #336 Signed-off-by: Christian Schwarz <me@cschwarz.com>	2020-06-22 12:32:03 +02:00
Christian Schwarz	b3e856f40d	docs: changelog: 0.3: fix broken issue link	2020-06-22 12:30:42 +02:00
Christian Schwarz	8e1937fe75	doc: fixup 0.3 changelog `05f1237a6d`	2020-06-14 18:29:37 +02:00
Christian Schwarz	073514fc21	docs/publish.sh: only render latest (patch+rc) version for each (major,minor) versio.	2020-06-14 18:24:20 +02:00
Christian Schwarz	dab222d95f	docs: GitHub Sponsors link	2020-06-14 15:26:05 +02:00
Christian Schwarz	a827894274	docs: add backup-to-external-disk quick-start guide and convert existing tutorial to quick-start guide refs #219 fixes #329	2020-06-14 15:26:05 +02:00
Christian Schwarz	9a8d813d14	docs: fix typo in cli help for zfs-abstraction subcommand	2020-06-14 15:26:05 +02:00
Christian Schwarz	e391fa94f9	dist/grafana: update grafana dashboard - uses version metric for 'instances up' - displays active task count - displays send abstractions cache entry count - in general, graphs have a shorter y axis for better overview fixes #332	2020-06-14 15:26:05 +02:00
Christian Schwarz	509185dfbe	prometheus: expose zrepl version as const metric	2020-06-14 15:26:05 +02:00
Christian Schwarz	4b1b7a8561	envconst: queryable report of resolved variables + integration inot zrepl status --raw fixes #299 refs #186	2020-06-14 15:26:05 +02:00
Christian Schwarz	b330ccca5d	transport/ssh: bump go-netssh version to fix ssh client process leaks fixes #322	2020-06-14 15:26:05 +02:00
Christian Schwarz	05f1237a6d	docs: 0.3 changelog	2020-06-14 15:26:05 +02:00
Christian Schwarz	1c270b7e39	add option to disable step holds for incremental sends This is a stop-gap solution until we re-write the pruner to support rules for removing step holds. Note that disabling step holds for incremental sends does not affect zrepl's guarantee that incremental replication is always possible: Suppose you yank the external drive during an incremental @from -> @to step: * restarting that step or future incrementals @from -> @to_later` will be possible because the replication cursor bookmark points to @from until the step is complete * resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to. * in that case, the replication algorithm should determine that the resumable state on the receiving side isuseless because @to no longer exists on the sending side, and consequently clear it, and restart an incremental step @from -> @to_later refs #288	2020-06-14 15:26:05 +02:00
Christian Schwarz	1b39e9d03c	docs: update & extend replication overview wrt step holds + bookmarks	2020-06-14 15:21:36 +02:00
Christian Schwarz	655a2e5404	docs/configuration/overview.rst: fix wrong headline hierarchy	2020-06-14 15:21:36 +02:00
Christian Schwarz	9c80eea045	docs: update supporters	2020-06-14 15:21:36 +02:00
Christian Schwarz	175ad1dd0b	zfs: ZFSListFilesystemVersions: remove handling of io.ErrUnexpectedEOF ZFSListChan returns (*DatasetDoesNotExist) for the case mentioned in the comment	2020-06-14 15:21:36 +02:00
Christian Schwarz	728e97700f	zfs: fix error message formatting for send args validation	2020-06-14 15:21:36 +02:00
Christian Schwarz	94a0fbf953	[#321 ] platformtest: add test for zfs.ZFSHolds	2020-06-14 15:21:36 +02:00
Christian Schwarz	b056e7b2b9	[#321 ] endpoint: ListAbstractions: acutally emit one Abstraction per matching hold	2020-06-14 15:21:36 +02:00
Christian Schwarz	6e927f20f9	[#321 ] platformtest: minimal integration tests for package replication # Conflicts: # platformtest/tests/generated_cases.go	2020-06-14 15:21:36 +02:00
Christian Schwarz	301f163a44	[#321 ] platformtest: generate test case list + coverage tooling	2020-06-14 15:21:36 +02:00
Christian Schwarz	474652ea51	[#321 ] platformtest: fix test ListFilesystemVersionsZeroExistIsNotAnError	2020-06-14 15:21:36 +02:00
Christian Schwarz	1bc731e782	[#316 ] endpoint: delete unreachable code	2020-06-14 15:21:36 +02:00
Christian Schwarz	292b85b5ef	[#316 ] endpoint / replication protocol: more robust step-holds and replication cursor management - drop HintMostRecentCommonAncestor rpc call - it is wrong to put faith into the active side of the replication to always make that call (we might not trust it, ref pull setup) - clean up step holds + step bookmarks + replication cursor bookmarks on send RPC instead - this makes it symmetric with Receive RPC - use a cache (endpoint.sendAbstractionsCache) to avoid the cost of listing the on-disk endpoint abstractions state on every step The "create" methods for endpoint abstractions (CreateReplicationCursor, HoldStep) are now fully idempotent and return an Abstraction. Notes about endpoint.sendAbstractionsCache: - fills lazily from disk state on first `Get` operation - fill from disk is generally only attempted once - unless the `ListAbstractions` fails, in which case the fill from disk is retried on next `Get` (the current `Get` will observe a subset of the actual on-disk abstractions) - the `Invalidate` method is called - it is a global (zrepl process-wide) cache fixes #316	2020-06-14 15:21:36 +02:00
Christian Schwarz	dce98d50da	[#316 ] endpoint.Receiver.ListFilesystems: early-exit if root_fs is not imported - discovered during investigation of #316 - this is not the fix for #316, as a malicious receiver who doesn't implement the behavior added by this patch could still cause leakage of step holds on the sender refs #316	2020-05-19 11:30:02 +02:00
Christian Schwarz	10a14a8c50	[#307 ] add package trace, integrate it with logging, and adopt it throughout zrepl package trace: - introduce the concept of tasks and spans, tracked as linked list within ctx - see package-level docs for an overview of the concepts - main feature 1: unique stack of task and span IDs - makes it easy to follow a series of log entries in concurrent code - main feature 2: ability to produce a chrome://tracing-compatible trace file - either via an env variable or a `zrepl pprof` subcommand - this is not a CPU profile, we already have go pprof for that - but it is very useful to visually inspect where the replication / snapshotter / pruner spends its time ( fixes #307 ) usage in package daemon/logging: - goal: every log entry should have a trace field with the ID stack from package trace - make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function - the context carries a linked list of injected fields which `logging.GetLogger` adds to the logger it returns - `logging.GetLogger` also uses package `trace` to get the task-and-span-stack and injects it into the returned logger's fields	2020-05-19 11:30:02 +02:00
Christian Schwarz	bcb5965617	[#307 ] rpc: proper handling of context cancellation for transportmux + dataconn - prior to this patch, context cancellation would leave rpc.Server open - did not make problems because context was only cancelled by SIGINT, which was immediately followed by os.Exit	2020-05-18 19:46:24 +02:00
Christian Schwarz	f772b3d39f	[#277 ] endpoint: Receiver.Receive: error message explaining problem with placeholders and encryption	2020-05-18 19:46:24 +02:00
Christian Schwarz	27db8c0afe	[#277 ] endpoint: Receiver.Receive: better logging + placeholder state error early exit	2020-05-18 19:46:24 +02:00
Christian Schwarz	0e5c77d2be	[#277 ] rpc + zfs: drop zfs.StreamCopier, use io.ReadCloser instead	2020-05-18 19:46:24 +02:00
Christian Schwarz	0280727985	[#277 ] replication/driver: enforce ordering during initial replication in order to support encrypted send fixes #277	2020-05-18 19:46:24 +02:00

... 4 5 6 7 8 ...

1132 Commits