zrepl/daemon
Christian Schwarz 69bfcb7bed daemon/active: implement watchdog to handle stuck replication / pruners
ActiveSide.do() can only run sequentially, i.e. we cannot run
replication and pruning in parallel. Why?

* go-streamrpc only allows one active request at a time
(this is bad design and should be fixed at some point)
* replication and pruning are implemented independently, but work on the
same resources (snapshots)

A: pruning might destroy a snapshot that is planned to be replicated
B: replication might replicate snapshots that should be pruned

We do not have any resource management / locking for A and B, but we
have a use case where users don't want their machine fill up with
snapshots if replication does not work.
That means we _have_ to run the pruners.

A further complication is that we cannot just cancel the replication
context after a timeout and move on to the pruner: it could be initial
replication and we don't know how long it will take.
(And we don't have resumable send & recv yet).

With the previous commits, we can implement the watchdog using context
cancellation.
Note that the 'MadeProgress()' calls can only be placed right before
non-error state transition. Otherwise, we could end up in a live-lock.
2018-10-19 17:23:00 +02:00
..
filters filters: fix broken error message 2018-10-13 17:17:34 +02:00
job daemon/active: implement watchdog to handle stuck replication / pruners 2018-10-19 17:23:00 +02:00
logging move serve and connecter into transports package 2018-10-11 21:21:46 +02:00
nethelpers WIP rewrite the daemon 2018-08-27 22:22:44 +02:00
pruner daemon/active: implement watchdog to handle stuck replication / pruners 2018-10-19 17:23:00 +02:00
snapper snapshotting: support 'periodic' and 'manual' mode 2018-10-11 15:59:23 +02:00
streamrpcconfig update to streamrpc 0.4 & adjust config (not breaking) 2018-09-23 20:28:30 +02:00
transport transport/tlsclientauth: handle cancellation of dialCtx 2018-10-19 16:08:20 +02:00
control.go move wakeup subcommand into signal subcommand and add reset subcommand 2018-10-12 20:50:56 +02:00
daemon.go move wakeup subcommand into signal subcommand and add reset subcommand 2018-10-12 20:50:56 +02:00
main.go cli: refactor to allow definition of subcommands next to their implementation 2018-10-13 16:22:19 +02:00
pprof.go privatize pprofServer 2018-08-27 19:13:35 +02:00
prometheus.go status: infra for reporting jobs instead of just replication.Report 2018-09-23 21:11:33 +02:00