zrepl

mirror of https://github.com/zrepl/zrepl.git synced 2025-07-15 13:45:13 +02:00

Author	SHA1	Message	Date
Christian Schwarz	26ec29d8b2	snapper: retry on errors during syncUp and log them fixes #138	2019-03-21 17:17:10 +01:00
Christian Schwarz	c0028c1c44	daemon/logging: add replication logic logger	2019-03-21 17:03:34 +01:00
Christian Schwarz	7756c9a55c	config + job: forbid non-verlapping receiver root_fs refs #136 refs #140	2019-03-21 12:07:55 +01:00
Christian Schwarz	6f7467e8d8	Merge branch 'InsanePrawn-master' into 'master'	2019-03-20 19:45:00 +01:00
Christian Schwarz	5aefc47f71	daemon: remove last traces of watchdog mechanism	2019-03-19 18:15:34 +01:00
Christian Schwarz	158d1175e3	rename SinglePruner to LocalPruner	2019-03-17 21:18:25 +01:00
Christian Schwarz	b25da7b9b0	job: snap: comment fix	2019-03-17 21:07:42 +01:00
Christian Schwarz	5cd2593f52	job: snap: workaround for replication cursor requirement	2019-03-17 21:07:01 +01:00
Christian Schwarz	d8d9e34914	pruner: single: remove unused member considerSnapAtCursorReplicated	2019-03-17 20:57:34 +01:00
Christian Schwarz	17818439a0	Merge branch 'problame/replication_refactor' into InsanePrawn-master	2019-03-17 17:33:51 +01:00
Christian Schwarz	da3ba50a2c	Merge remote-tracking branch 'origin/master' into problame/replication_refactor	2019-03-16 14:48:01 +01:00
Christian Schwarz	4ee00091d6	pull job: support manual-only invocation	2019-03-16 14:24:05 +01:00
Christian Schwarz	aff639e87a	Merge remote-tracking branch 'origin/master' into InsanePrawn-master	2019-03-15 21:05:20 +01:00
Christian Schwarz	a0f301d700	syslog logging: fix priority parsing + add test for default facility	2019-03-15 18:18:16 +01:00
Ximalas	fc311a9fd6	syslog logging: support setting facility in config	2019-03-15 17:55:11 +01:00
Christian Schwarz	7584c66bdb	pruner: remove retry handling + fix early give-up Retry handling is broken since the gRPC changes (wrong error classification). Will come back at some point, hopefully by merging the replication driver retry infrastructure. However, the simpler architecture allows an easy fix for the problem that the pruner practically gave up on the first error it encountered. fixes #123	2019-03-13 21:04:39 +01:00
Christian Schwarz	d78d20e2d0	pruner: skip placeholders + FSes without correspondents on source fixes #126	2019-03-13 20:42:37 +01:00
Christian Schwarz	c87759affe	replication/driver: automatic retries on connectivity-related errors	2019-03-13 15:00:40 +01:00
Christian Schwarz	07b43bffa4	replication: refactor driving logic (no more explicit state machine)	2019-03-13 15:00:40 +01:00
Christian Schwarz	796c5ad42d	rpc rewrite: control RPCs using gRPC + separate RPC for data transfer transport/ssh: update go-netssh to new version => supports CloseWrite and Deadlines => build: require Go 1.11 (netssh requires it)	2019-03-13 13:53:48 +01:00
InsanePrawn	160a3b6d32	more gofmt, drop snapjob.go_prefmt after it was accidentally added	2018-11-21 22:14:43 +01:00
InsanePrawn	3cef76d463	Refactor snapJob() to snapJobFromConfig()	2018-11-21 14:37:03 +01:00
InsanePrawn	e9564a7e5c	Inlined a couple legacy leftover functions from the mode copypasta	2018-11-21 14:35:40 +01:00
InsanePrawn	b79ad3ddc3	Honour PruneKeepNotReplicated.KeepSnashotAtCursor in SinglePrunerFactory	2018-11-21 14:17:38 +01:00
InsanePrawn	d0f898751f	Gofmt snapjob.go	2018-11-21 14:02:21 +01:00
InsanePrawn	22d9830baa	Fix prometheus with multiple jobs	2018-11-21 04:26:03 +01:00
InsanePrawn	e10dc129de	Make getPruner() private	2018-11-21 03:39:03 +01:00
InsanePrawn	dd11fc96db	Touchups in job.go	2018-11-21 03:27:39 +01:00
InsanePrawn	7de3c0a09a	Removed the references to a pruning 'side' in the singlepruner logging code and the snapjob prometheus thing.	2018-11-21 02:52:33 +01:00
InsanePrawn	141e49727c	Missed a last reference to tasks	2018-11-21 02:51:23 +01:00
InsanePrawn	442d61918b	remove most of the watchdog machinery	2018-11-21 02:42:13 +01:00
InsanePrawn	58dcc07430	Added SnapJobStatus	2018-11-21 02:08:39 +01:00
InsanePrawn	19d0916e34	remove snapMode, rename snap_ActiveSide to SnapJob	2018-11-21 01:54:56 +01:00
InsanePrawn	1265cc7934	pruned unused lines and comments ;)	2018-11-21 01:34:50 +01:00
InsanePrawn	3d2688e959	Ugly but working inital snapjob implementation	2018-11-20 19:30:15 +01:00
Christian Schwarz	3472145df6	pruner + proto change: better handling of missing replication cursor - don't treat missing replication cursor as an error in protocol - treat it as a per-fs planning error instead	2018-11-16 12:21:54 +01:00
Christian Schwarz	5e1ea21f85	pruning: add 'Negate' option to KeepRegex and expose it in config	2018-11-16 12:21:54 +01:00
Christian Schwarz	98bc8d1717	daemon/job: explicit notice of ZREPL_JOB_WATCHDOG_TIMEOUT environment variable on cancellation	2018-10-22 11:03:31 +02:00
Christian Schwarz	94427d334b	replication + pruner + watchdog: adjust timeouts based on practical experience	2018-10-21 18:37:57 +02:00
Christian Schwarz	b2844569c8	replication: rewrite error handling + simplify state machines * Remove explicity state machine code for all but replication.Replication * Introduce explicit error types that satisfy interfaces which provide sufficient information for replication.Replication to make intelligent retry + queuing decisions * Temporary() * LocalToFS() * Remove the queue and replace it with a simple array that we sort each time (yay no generics :( )	2018-10-21 18:37:57 +02:00
Christian Schwarz	fffda09f67	replication + pruner: progress markers during planning	2018-10-21 17:50:08 +02:00
Christian Schwarz	5ec7a5c078	pruner: report: fix broken checks for state (wrong precedence rules)	2018-10-21 13:37:08 +02:00
Christian Schwarz	190c7270d9	daemon/active + watchdog: simplify control flow using explicit ActiveSideState	2018-10-21 12:53:34 +02:00
Christian Schwarz	f704b28cad	daemon/job: track active side state explicitly	2018-10-21 12:52:48 +02:00
Christian Schwarz	5efeec1819	daemon/control: stop logging status endpoint requests	2018-10-20 12:50:31 +02:00
Christian Schwarz	438f950be3	pruner: improve cancellation + error handling strategy Pruner now backs off as soon as there is an error, making that error the Error field in the pruner report. The error is also stored in the specific fs that failed, and we maintain an error counter per fs to de-prioritize those fs that failed. Like with replication, the de-prioritization on errors is to avoid ' getting stuck' with an individual filesystem until the watchdog hits.	2018-10-20 12:46:43 +02:00
Christian Schwarz	50c1549865	pruner: fixup `69bfcb7bed`: add missing progress updates for watchdog	2018-10-20 10:58:22 +02:00
Christian Schwarz	f535b2327f	pruner: use envconst to configure retry interval	2018-10-19 17:23:00 +02:00
Christian Schwarz	e63ac7d1bb	pruner: log transitions to error state + log info to confirm pruning is done in active job	2018-10-19 17:23:00 +02:00
Christian Schwarz	359ab2ca0c	pruner: fail on every error that is not net.OpError.Temporary()	2018-10-19 17:23:00 +02:00

1 2 3 4

158 Commits