Commit Graph

83 Commits

Author SHA1 Message Date
Christian Schwarz
69bfcb7bed daemon/active: implement watchdog to handle stuck replication / pruners
ActiveSide.do() can only run sequentially, i.e. we cannot run
replication and pruning in parallel. Why?

* go-streamrpc only allows one active request at a time
(this is bad design and should be fixed at some point)
* replication and pruning are implemented independently, but work on the
same resources (snapshots)

A: pruning might destroy a snapshot that is planned to be replicated
B: replication might replicate snapshots that should be pruned

We do not have any resource management / locking for A and B, but we
have a use case where users don't want their machine fill up with
snapshots if replication does not work.
That means we _have_ to run the pruners.

A further complication is that we cannot just cancel the replication
context after a timeout and move on to the pruner: it could be initial
replication and we don't know how long it will take.
(And we don't have resumable send & recv yet).

With the previous commits, we can implement the watchdog using context
cancellation.
Note that the 'MadeProgress()' calls can only be placed right before
non-error state transition. Otherwise, we could end up in a live-lock.
2018-10-19 17:23:00 +02:00
Christian Schwarz
4ede99b08c replication: simpler PermanentError state + handle context cancellation 2018-10-19 17:23:00 +02:00
Christian Schwarz
53ac853cb4 client/configcheck: build jobs for checking config and allow selecting what to print 2018-10-18 16:35:29 +02:00
Christian Schwarz
fb6f58b735 client/status: switch to package tcell which works with solaris
Can't cross compile Solaris binaries though:
tcell for Solaris needs cgo.
2018-10-13 16:57:05 +02:00
Christian Schwarz
63169c51b7 add 'test filesystems' subcommand for testing filesystem filters 2018-10-13 16:22:19 +02:00
Christian Schwarz
5c3c83b2cb cli: refactor to allow definition of subcommands next to their implementation 2018-10-13 16:22:19 +02:00
Christian Schwarz
a85abe8bae client/status: improve hiding of data if current state makes it obsolete 2018-10-12 22:47:06 +02:00
Christian Schwarz
af3d96dab8 use enumer generate tool for state strings 2018-10-12 22:10:49 +02:00
Christian Schwarz
89e0103abd move wakeup subcommand into signal subcommand and add reset subcommand 2018-10-12 20:50:56 +02:00
Christian Schwarz
025fbda984 client/status: only show progress bar in non-planning states 2018-10-12 16:00:37 +02:00
Christian Schwarz
75e42fd860 pruner: implement Report method + display in status command 2018-09-24 19:25:40 +02:00
Christian Schwarz
d04b9713c4 implement pull + sink modes for active and passive side 2018-09-24 12:36:10 +02:00
Christian Schwarz
e3be120d88 refactor push + source into active + passive 'sides' with push and source 'modes' 2018-09-24 12:36:10 +02:00
Christian Schwarz
9446b51a1f status: infra for reporting jobs instead of just replication.Report 2018-09-23 21:11:33 +02:00
Christian Schwarz
9dd662df08 status: raw output subcommand 2018-09-23 14:44:53 +02:00
Christian Schwarz
6c3f442f13 daemon control / jsonclient: fix connection leak due to open request body
Also:
- Defensive measures in control http server (1s timeouts)
(prevent the leak, even if request body is not closed)
- Add prometheus metrics to track control socket latencies
(were used for debugging)
2018-09-13 12:44:46 +02:00
Christian Schwarz
0c2ac3a168 pprof subcommand 2018-09-07 00:04:03 -07:00
Christian Schwarz
bf5099baac version subcommand: unified client & server 2018-09-06 23:52:11 -07:00
Christian Schwarz
1edf020ce7 status command: better handling of 'nothing to do' Complete state 2018-09-06 11:46:02 -07:00
Christian Schwarz
c60ed78bc5 status subcommand: only draw one big progress bar of the entire replication
more details on progress per step in text form
2018-09-06 11:05:32 -07:00
Christian Schwarz
acd2418803 handle DryRun send size estimate errors with bookmarks 2018-09-05 17:41:25 -07:00
Christian Schwarz
6c988d0ebb add small subcommand to validate config 2018-09-05 08:32:38 -07:00
Christian Schwarz
308e5e35fb Multi-client servers + bring back stdinserver support 2018-09-04 16:43:55 -07:00
Christian Schwarz
d55a271ac7 WIP adopt updated yaml-config with 'fromdefaults' struct tag 2018-09-02 15:46:03 -07:00
Anton Schirg
5442d8e7d5 status: calculate max fs name length 2018-08-30 15:21:07 +02:00
Anton Schirg
48feaff054 fix some status display alignment 2018-08-30 15:21:07 +02:00
Anton Schirg
47d8a5a7cd status: only show active not all versions of active filesystem 2018-08-30 12:58:13 +02:00
Anton Schirg
583773025f nicer progress bar 2018-08-30 12:58:13 +02:00
Anton Schirg
98f3f3dfd8 show expected size of current send
Needs to be changed to send sizes for all planned steps
2018-08-30 12:58:13 +02:00
Anton Schirg
6ca11a7391 byte counter for status 2018-08-30 12:54:30 +02:00
Anton Schirg
42056f7a32 status: do not show problem field when none exists 2018-08-30 12:54:30 +02:00
Anton Schirg
6cedd0a2e8 add status command 2018-08-30 12:54:30 +02:00
Anton Schirg
e495824834 move wakeup to client package and extract http client 2018-08-30 12:53:21 +02:00