zrepl

mirror of https://github.com/zrepl/zrepl.git synced 2024-11-21 16:03:32 +01:00

Author	SHA1	Message	Date
Christian Schwarz	e7497ab3d0	LICENSE + docs: adjust copyright	2018-10-13 17:34:05 +02:00
Christian Schwarz	59a4e2db5f	replication: regenerate pdu.pb with new protoc-gen-go	2018-10-13 17:23:39 +02:00
Christian Schwarz	2c994e879c	filters: fix broken error message reported by go vet on go 1.11	2018-10-13 17:17:34 +02:00
Christian Schwarz	de2768c91d	build: produce darwin binaries	2018-10-13 16:57:25 +02:00
Christian Schwarz	fb6f58b735	client/status: switch to package tcell which works with solaris Can't cross compile Solaris binaries though: tcell for Solaris needs cgo.	2018-10-13 16:57:05 +02:00
Christian Schwarz	be4e244f1f	build: fixup `af3d96dab8`: syntax error in builddep install	2018-10-13 16:29:33 +02:00
Christian Schwarz	074f989547	Merge branch 'replication_rewrite' (in fact it's a 90% rewrite)	2018-10-13 16:26:23 +02:00
Christian Schwarz	87c8957889	build: fixup `be962998ba`: broken makefile	2018-10-13 16:22:19 +02:00
Christian Schwarz	f6cf23779f	docs: Remove stale TIP for dry-run zrepl test subcommand. Won't make it to 0.1	2018-10-13 16:22:19 +02:00
Christian Schwarz	92a1a6d2ca	docs: fix wrong subcommand for configcheck	2018-10-13 16:22:19 +02:00
Christian Schwarz	63169c51b7	add 'test filesystems' subcommand for testing filesystem filters	2018-10-13 16:22:19 +02:00
Christian Schwarz	5c3c83b2cb	cli: refactor to allow definition of subcommands next to their implementation	2018-10-13 16:22:19 +02:00
Christian Schwarz	aeb87ffbcf	daemon/job/active: push mode: awful hack for handling of concurrent snapshots + stale remote operation We have the problem that there are legitimate use cases where a user does not want their machine to fill up with snapshots, even if it means unreplicated must be destroyed. This can be expressed by not configuring the keep rule `not_replicated` for the snapshot-creating side. This commit only addresses push mode because we don't support pruning in the source job. We adivse users in the docs to use push mode if they have above use case, so this is fine - at least for 0.1. Ideally, the replication.Replication would communicate to the pruner which snapshots are currently part of the replication plan, and then we'd need some conflict resolution to determine whether it's more important to destroy the snapshots or to replicate them (destroy should win?). However, we don't have the infrastructure for this yet (we could parse the replication report, but that's just ugly). And we want to get 0.1 out, so showtime for a dirty hack: We start replication, and ideally, replication and pruning is done before new snapshot have been taken. If so: great. However, what happens if snapshots have been taken and we are not done with replication and / or pruning? * If replicatoin is making progress according to its state, let it run. This covers the important situation of initial replication, where replication may easily take longer than a single snapshotting interval. * If replication is in an error state, cancel it through context cancellation. * As with the pruner below, the main problem here is that status output will only contain "context cancelled" after the cancellation, instead of showing the reason why it was cancelled. Not nice, but oh well, the logs provide enough detail for this niche situation... * If we are past replication, we're still pruning * Leave the local (send-side) pruning alone. Again, we only implement this hack for push, so we know sender is local, and it will only fail hard, not retry. * If the remote (receiver-side) pruner is in an error state, cancel it through context cancellation. * Otherwise, let it run. Note that every time we "let it run", we tolerate a temporary excess of snapshots, but given sufficiently aggressive timeouts and the assumption that the snapshot interval is much greater than the timeouts, this is not a significant problem in practice.	2018-10-12 22:47:06 +02:00
Christian Schwarz	a85abe8bae	client/status: improve hiding of data if current state makes it obsolete	2018-10-12 22:47:06 +02:00
Christian Schwarz	d584e1ac54	daemon/job/active: fix race in updateTasks If concurrent updates strictly modify different members of the tasks struct, the copying + lock-drop still constitutes a race condition: The last updater always wins and sets tasks to its copy + changes. This eliminates the other updater's changes.	2018-10-12 22:15:07 +02:00
Christian Schwarz	af3d96dab8	use enumer generate tool for state strings	2018-10-12 22:10:49 +02:00
Christian Schwarz	89e0103abd	move wakeup subcommand into signal subcommand and add reset subcommand	2018-10-12 20:50:56 +02:00
Christian Schwarz	025fbda984	client/status: only show progress bar in non-planning states	2018-10-12 16:00:37 +02:00
Christian Schwarz	9bb7b19c93	pruner: handle replication cursor being older than any snapshot correctly	2018-10-12 15:29:07 +02:00
Christian Schwarz	cb83a26c90	replication: wakeup + retry handling: make wakeups work in retry wait states - handle wakeups in Planning state - fsrep.Replication yields immediately in RetryWait - once the queue only contains fsrep.Replication in retryWait: transition replication.Replication into WorkingWait state - handle wakeups in WorkingWait state, too	2018-10-12 13:12:28 +02:00
Christian Schwarz	d17ecc3b5c	replication/fsrep: report Pending[0] problem as fsrep problem in RetryWait state	2018-10-12 12:45:37 +02:00
Christian Schwarz	f9d24d15ed	move wakup mechanism into separate package	2018-10-12 12:44:40 +02:00
Christian Schwarz	1fb59c953a	implement transport protocol handshake (even before streamrpc handshake)	2018-10-11 21:21:46 +02:00
Christian Schwarz	be962998ba	move serve and connecter into transports package	2018-10-11 21:21:46 +02:00
Christian Schwarz	a97684923a	refactor: socketpair into utils package (useful elsewhere)	2018-10-11 21:17:43 +02:00
Christian Schwarz	1643198713	docs: reflect changes in replication_rewrite branch	2018-10-11 18:03:18 +02:00
Christian Schwarz	125b561df3	rename root_dataset to root_fs for receiving-side jobs	2018-10-11 18:03:18 +02:00
Christian Schwarz	0c3a694470	fixup: add test for global section	2018-10-11 17:52:19 +02:00
Christian Schwarz	525a875825	main: better descriptions for root subcommands	2018-10-11 17:52:19 +02:00
Christian Schwarz	4e16952ad9	snapshotting: support 'periodic' and 'manual' mode 1. Change config format to support multiple types of snapshotting modes. 2. Implement a hacky way to support periodic or completely manual snaphots. In manual mode, the user has to trigger replication using the wakeup mechanism after they took snapshots using their own tooling. As indicated by the comment, a more general solution would be desirable, but we want to get the release out and 'manual' mode is a feature that some people requested...	2018-10-11 15:59:23 +02:00
Christian Schwarz	14febbeb4c	config: skip files that do not end in .yml	2018-10-11 13:09:04 +02:00
Christian Schwarz	93c90cd705	pruning: fix YAML representation of PruneKeepRegex	2018-10-11 13:07:52 +02:00
Christian Schwarz	01668a989e	transport local: named listeners + struct renaming	2018-10-11 13:06:47 +02:00
Christian Schwarz	976c1f3929	util.IOCommand: add stderr logging for unexpected crashes in calls to ProcessState.Sys() Crashes observed on a FreeBSD 11.2 system 2018-09-27T05:08:39+02:00 [INFO][csnas]: start replication invocation="62" 2018-09-27T05:08:39+02:00 [INFO][csnas][repl]: start planning invocation="62" 2018-09-27T05:08:58+02:00 [INFO][csnas][repl]: start working invocation="62" 2018-09-27T05:09:57+02:00 [INFO][csnas]: start pruning sender invocation="62" 2018-09-27T05:10:11+02:00 [INFO][csnas]: start pruning receiver invocation="62" 2018-09-27T05:10:32+02:00 [INFO][csnas]: wait for wakeups 2018-09-27T06:08:39+02:00 [INFO][csnas]: start replication invocation="63" 2018-09-27T06:08:39+02:00 [INFO][csnas][repl]: start planning invocation="63" 2018-09-27T06:08:44+02:00 [INFO][csnas][repl]: start working invocation="63" 2018-09-27T06:08:49+02:00 [ERRO][csnas][repl]: receive request failed (might also be error on sender) invocation="63" filesystem="<REDACTED>" err="concurrent use of RPC connection" step="<REDACTED>(@zrepl_20180927_030838_000 => @zrepl_20180927_040835_000)" errType="errors.errorString" panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7d484b] goroutine 3938545 [running]: os.(ProcessState).os.sys(...) /usr/lib/golang/src/os/exec_posix.go:78 os.(ProcessState).Sys(...) /usr/lib/golang/src/os/exec.go:157 github.com/zrepl/zrepl/util.(IOCommand).doWait(0xc4201b2d80, 0xc420070060, 0xc420070060) /go/github.com/zrepl/zrepl/util/iocommand.go:91 +0x4b github.com/zrepl/zrepl/util.(IOCommand).Read(0xc4201b2d80, 0xc420790000, 0x8000, 0x8000, 0x800c76d90, 0x0, 0xc420067c10) /go/github.com/zrepl/zrepl/util/iocommand.go:82 +0xe4 github.com/zrepl/zrepl/util.(ByteCounterReader).Read(0xc4202dc580, 0xc420790000, 0x8000, 0x8000, 0x8c6900, 0x7cb201, 0xc420790000) /go/github.com/zrepl/zrepl/util/io.go:118 +0x51 github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc.(chunkBuffer).readChunk(0xc42057e3c0, 0x800d1bbf0, 0xc4202dc580, 0xc420790000, 0x8000, 0x8000) /go/github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc/stream.go:58 +0x5e github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc.writeStream(0xa04620, 0xc4204a9c20, 0x9fe340, 0xc4200d6380, 0x800d1bbf0, 0xc4202dc580, 0x8000, 0xc42000e000, 0x900420) /go/github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc/stream.go:101 +0x1ce github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc.(Conn).send(0xc4200d6380, 0xa04620, 0xc4204a9c20, 0xc42057e2c0, 0xc42013d570, 0x800d1bbf0, 0xc4202dc580, 0x0, 0x0) /go/github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc/main.go:374 +0x557 github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc.(Client).RequestReply.func1(0x999741, 0x7, 0xc4200d6380, 0xa04620, 0xc4204a9c20, 0xc42013d570, 0xa00aa0, 0xc4202dc580, 0xc420516480) /go/github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc/client.go:169 +0x148 created by github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc.(Client).RequestReply /go/github.com/zrepl/zrepl/vendor/github.com/problame/go-streamrpc/client.go:167 +0x227	2018-09-27 12:06:59 +02:00
Christian Schwarz	75e42fd860	pruner: implement Report method + display in status command	2018-09-24 19:25:40 +02:00
Christian Schwarz	2990193512	replication: export SleepUntil in report	2018-09-24 19:23:53 +02:00
Christian Schwarz	75ba5874a5	active side: track activities in Run() as atomically updated member	2018-09-24 19:23:53 +02:00
Christian Schwarz	9e941d5be5	pruning: implement 'grid' keep rule	2018-09-24 17:33:16 +02:00
Christian Schwarz	328ac687f6	Remove obsolete cmd/** package + subpackages	2018-09-24 14:48:12 +02:00
Christian Schwarz	1ce0c69e4f	implement local replication using new local transport The new local transport uses socketpair() and a switchboard based on client identities. The special local job type is gone, which is good since it does not fit into the 'Active/Passive side ' + 'mode' concept used to implement the duality of push/sink \| pull/source.	2018-09-24 14:43:53 +02:00
Christian Schwarz	f3e8eda04d	fixup `4e04f8d3d2`: snapper with separate stopped state for clean shutdown would tight loop in ErrorWait	2018-09-24 14:40:47 +02:00
Christian Schwarz	cf5d63ee88	config: fix broken tests + reduce example configs	2018-09-24 12:41:39 +02:00
Christian Schwarz	4e04f8d3d2	snapper: make error mode an error wait mode Just because taking one snapshot fails does not mean snapper needs to stop for all others. Since users are advised to monitor error logs, snapshot-taking errors can still be addressed. The ErrorWait mode allows a potential future Report / Status command to distinguish normal waits from error waits.	2018-09-24 12:36:10 +02:00
Christian Schwarz	d04b9713c4	implement pull + sink modes for active and passive side	2018-09-24 12:36:10 +02:00
Christian Schwarz	6889f441b2	endpoint: support remote ReplicationCursor endpoint	2018-09-24 12:36:10 +02:00
Christian Schwarz	9c86e03384	endpoint Remote: fix broken Send endpoint for DryRun=true	2018-09-24 12:36:10 +02:00
Christian Schwarz	ffe33aff3d	fix pruner: protobuf one-ofs require non-zero value, even if no public fields	2018-09-24 12:36:10 +02:00
Christian Schwarz	e3be120d88	refactor push + source into active + passive 'sides' with push and source 'modes'	2018-09-24 12:36:10 +02:00
Christian Schwarz	9446b51a1f	status: infra for reporting jobs instead of just replication.Report	2018-09-23 21:11:33 +02:00
Christian Schwarz	4a6160baf3	update to streamrpc 0.4 & adjust config (not breaking)	2018-09-23 20:28:30 +02:00

1 2 3 4 5 ...

526 Commits