Commit Graph

233 Commits

Author SHA1 Message Date
Christian Schwarz
7303d91abf WIP state-machine based replication 2018-08-11 12:19:10 +02:00
Christian Schwarz
c1f3076eb3 WIP2 logging done somewhat 2018-08-10 17:06:00 +02:00
Christian Schwarz
74445a0017 fixup 2018-08-08 13:12:50 +02:00
Christian Schwarz
a0b320bfeb streamrpc now requires net.Conn => use it instead of rwc everywhere 2018-08-08 13:09:51 +02:00
Christian Schwarz
1826535e6f WIP 2018-07-15 17:36:53 +02:00
Christian Schwarz
1a8d2c5ebe replication: context support and propert closing of stale readers 2018-07-08 23:31:46 +02:00
Christian Schwarz
8cca0a8547 Initial working version
Summary:
* Logging is still bad
* test output in a lot of placed
* FIXMEs every where

Test Plan: None, just review

Differential Revision: https://phabricator.cschwarz.com/D2
2018-06-24 10:44:00 +02:00
Christian Schwarz
0918ef6815 WIP: diffing and replication algorithm 2018-05-02 21:26:24 +02:00
Christian Schwarz
9d7110eaad config: fix shadowed error return values 2018-04-14 11:25:12 +02:00
Christian Schwarz
82ea535692 daemon: expose prometheus in new global.monitoring config section + document it
refs #67
2018-04-14 11:24:47 +02:00
Christian Schwarz
a4da029105 cmd: prometheus job type and Task instrumentation
refs #67
2018-04-13 23:37:53 +02:00
Christian Schwarz
aa3865d0a3 daemon: Job types as dedicated type
refs #67
2018-04-05 22:22:55 +02:00
Christian Schwarz
0895e02844 daemon: Task: track relation to parent job
refs #67
2018-04-05 22:18:22 +02:00
Christian Schwarz
26b436463d ssh+stdinserver: connect: dial_timeout
This  is a follow-up to ccd062e
2018-03-04 17:19:41 +01:00
Christian Schwarz
aa92261ea7 bookmarking: prune policy for bookmarks
refs #34
2018-02-17 20:48:31 +01:00
Christian Schwarz
8e34843eb1 autosnap: do not treat zero fs filter results as fatal 2018-02-17 19:27:00 +01:00
Christian Schwarz
bfaf6fdfbb daemon: fix missing newline on parse error 2018-02-17 17:43:55 +01:00
Christian Schwarz
f992fed968 control pprof rewrite: expose pprof metrics via HTTP server controlled from CLI 2018-02-17 16:20:10 +01:00
Christian Schwarz
94967b596c docs: document changes to ssh+stdinserver transport implementation: ccd062e 2018-02-17 15:16:29 +01:00
Christian Schwarz
f3d3a7f5f8 stdinserver: fixup ccd062e: assert socket is in private directory 2018-02-17 14:12:44 +01:00
Christian Schwarz
ccd062e238 ssh+stdinserver: dump sshbytestream for github.com/problame/go-netssh
Cleaner abstractions + underlying go-rwccmd package does proper handling
of asynchronous exits, etc.
2018-02-17 01:08:15 +01:00
Christian Schwarz
6b5bd0a43c job pull + source: fix broken connection teardown
Issue #56 shows zombie SSH processes.
We fix this by actually Close()ing the RWC in job pull.
If this fixes #56 it also fixes #6 --- it's the same issue.

Additionally, debugging around this revealed another issue: just
Close()ing the sshbytestream in job source will apparently outpace the
normal data stream of stdin and stdout (URG or PUSH flags?).  leading
to ugly errors in the logs.
With proper TCP connections, we would simply set the connection to
linger and close it, letting the kernel handle the final timeout. Meh.

refs #56
refs #6
2018-02-16 20:57:27 +01:00
Christian Schwarz
921bccb960 job source: use task logger 2018-02-15 23:51:57 +01:00
Christian Schwarz
5f2c14adab zfs: use custom datatype to pass ZFS properties in ZFSSet
refs #55
2018-01-05 18:42:10 +01:00
Christian Schwarz
787675aee8 control status command: only show verbose logs on user request 2017-12-30 13:53:19 +01:00
Christian Schwarz
01e0519b7b control status subcommand: fix typo in usage 2017-12-30 13:44:55 +01:00
Christian Schwarz
8742b7f763 handler: fix typo in log message 2017-12-30 13:29:04 +01:00
Christian Schwarz
56f13741f9 test pattern subcommand: better example command 2017-12-29 22:45:38 +01:00
Christian Schwarz
61842988b9 Task & TaskStatus: DeepCopy(): actually copy lastUpdate field
otherwise, only changes to activity level would udpate TaskStatus
LastUpdate field

refs #10
2017-12-29 21:43:12 +01:00
Christian Schwarz
be7176bee7 Puller: fix wrong filesystem log field usage
was introduced in 9465b593
2017-12-29 21:25:42 +01:00
Christian Schwarz
839eccf513 logger.Outlet: WriteEntry must not block
- make TCPOutlet fully asynchronous, dropping messages if connection is
  not fast enough
- syslog is just fine for now, local anyways
- stdout same thing

refs #26
2017-12-29 17:21:58 +01:00
Christian Schwarz
acd9aedb98 cmd control status: unify job logs, option to show only one job & always show logs
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
835cf6b12f cmd control status: warn about inactive tasks
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
4b3d83ec1f TaskStatus: add LastUpdate field
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
d13c6e3fc3 job local: refactor + use Task API
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
63fa7a67e9 job source: refactor + use Task API
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
7d89d1fb00 job pull: refactor + use Task API
refs #10
2017-12-27 18:34:24 +01:00
Christian Schwarz
b69089a527 Puller: refactor + use Task API
* drop rx byte count functionality
* will be re-added to Task as necessary

refs #10
2017-12-27 14:39:47 +01:00
Christian Schwarz
59e34942d1 Puller: make main interface public
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
91c4a97f72 Pruner: refactor + use Task API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
13562b48ed IntervalAutosnap: refactor + use Task API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
58ee796394 adopt Task API: infect datastructures
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
ce351146cf job control: implement JobStatus 2017-12-27 14:39:46 +01:00
Christian Schwarz
14b8d69a63 cmd control status + expose DaemonStatus via control API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
8c7e373049 daemon: DaemonStatus + JobStatus + dummy implementation
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
2c87b15e83 daemon: Task abstraction + TaskStatus
An instance of Task tracks a single thread of activity that is part of a Job.

While the docs already use this terminology of tasks being composed of jobs,
the code did not have an object to represent these semantics.
Now it does:

* A task t is initialized with a root activity, which is its name
* t can t.Enter() and t.Finish() an activity, building
  a stack of activities
* t's code can get a logger t.Log() whose logTaskField is set to the
  concatenated stack of activities
* t's code can update IO progress it made since leaving idle state
* t's code's log output vie t.Log() is captured since leaving idle
  state
  * FIXME: find a way to bound that buffer

refs #10
refs #48
2017-12-27 14:39:46 +01:00
Christian Schwarz
d7f3fb93ae bash completions: hidden subcommand + integrate into Makefile 2017-12-27 14:39:46 +01:00
Christian Schwarz
ebf209427a logging: support ignoring fields in HumanFormatter
should be refactored to logger one day so the implementation of ignoring
is not duplicated to each outlet.

refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
261d095108 logger: support forking of outlets
refs #10
2017-12-27 13:50:07 +01:00
Christian Schwarz
583a63a68f refactor: encapsulate pulling in a struct
refs #10
2017-12-24 15:23:28 +01:00
Christian Schwarz
896f31bbf3 'zrepl version' and 'zrepl control version' subcommand + maintainer README
Version is autodetected on build using git
If it cannot be detected with git, an override must be provided.

For tracability of distros, the distroy packagers should override as
well, which is why I added a README entry for package mainatiners.

refs #35
2017-11-18 21:12:48 +01:00
Christian Schwarz
bfbab9382e fixup: remove unused StdoutOutlet function
refs #28
2017-11-17 00:36:48 +01:00
Christian Schwarz
2bfcfa5be8 logging: first outlet receives logger error message
Abandons stderr special-casing:

* looks weird on shell and IO redirection to same file because of
interleaving of stdout and stderr
* better than a separate dedicated outlet because it does not require
additional configuration

fixes #28

BREAK SEMANTICS CONFIG
2017-11-17 00:25:38 +01:00
Christian Schwarz
a7f70a566d logger: write internal / outlet errors to an error outlet
refs #28
2017-11-16 23:49:47 +01:00
Christian Schwarz
b576253ea8 logging: fixup 4763486: implementation would parse 'date' instead of 'time' field in config 2017-11-15 11:14:20 +01:00
Christian Schwarz
476348689a logging: stdout outlet: include time in output if tty or forced through config 2017-11-15 11:04:34 +01:00
Christian Schwarz
ed68bffea5 bookmark every snapshot
replication logic already supports bookmarks \o/

refs #34
2017-11-13 10:59:46 +01:00
Christian Schwarz
51af880701 refactor: parametrize PrefixFilter VersionType check
refs #34
2017-11-13 10:59:22 +01:00
Christian Schwarz
cef63ac176 logging: stdout formatter: use logfmt package to format non-special stdout fields + handle errors
refs #40
2017-11-13 10:58:07 +01:00
Christian Schwarz
f3433df617 cmd/sampleconf/zrep.yml: remove it, it's from the stone ages 2017-10-05 21:48:18 +02:00
Christian Schwarz
161ce3b3c3 autosnap: fix log level when fs filter does not match any fs 2017-10-05 21:22:17 +02:00
Christian Schwarz
83bb97a845 control job: wrong error on context done 2017-10-05 21:20:01 +02:00
Christian Schwarz
40919d06c2 source job: fix errnous log message when accept() on closed listener 2017-10-05 21:19:42 +02:00
Christian Schwarz
c48069ce88 retention grid: interva length monotonicity: exception for keep=all
fixes #6
2017-10-05 20:34:35 +02:00
Christian Schwarz
72d288567e mappings: fix aliasing bug with '<' wildcards
In contrast to any 'something<' mapping, a '<' mapping cannot be unique
Thus, '<' mappings are thus just an append to target, which is exactly
what we get when trimming empty prefix ''.

Otherwise, given mapping

{ "<": "storage/backups/app-srv" }

Before (clearly a conflict)
zroot     => storage/backups/app-srv
storage   => storage/backups/app-srv
After:
zroot     => storage/backups/app-srv/zroot
storage   => storage/backups/app-srv/storage

However, mapping directly with subtree wildcard is still possible, just
not with the root wildcard

{
    "<"              "storage/backups/app-srv"
    "zroot/var/db<": "storage/db_replication/app-srv"
}

fixes #22
2017-10-05 20:10:05 +02:00
Christian Schwarz
b5d46e2ec3 impl: don't reference m.entries again 2017-10-05 18:55:02 +02:00
Christian Schwarz
83d450b1f2 config: support days (d) and weeks (w) in durations
fixes #18
2017-10-05 15:17:37 +02:00
Christian Schwarz
3e647c14c0 config: source job: rename field 'datasets' to 'filesystems'
While filesystems is also not the right term (since it excludes ZVOLs),
we want to stay consistent with comments & terminology used in docs.

BREAK CONFIG

fixes #17
2017-10-05 13:39:05 +02:00
Christian Schwarz
b95260f4b5 config: logging: defaults + definition as list
* Stdout logger as default logger
* Clearer keyword / value separation
* Allows multiple outlet definitions

BREAK CONFIG

fixes #20
fixes #19
2017-10-05 13:31:16 +02:00
Christian Schwarz
e6d08149ef docs: update 'mappping & filter syntax' + more elaborate sampleconf 2017-10-02 18:29:58 +02:00
Christian Schwarz
45670a7e5d make vet happy: 'don't leak contexts' 2017-09-30 16:39:52 +02:00
Christian Schwarz
aab43af27c tcp outlet: fix error handling on write failure
Also: clarify semantics of RetryInterval
2017-09-30 16:38:48 +02:00
Christian Schwarz
0cbee78b40 fix unreachable code & missing stringer-generated code 2017-09-30 16:31:55 +02:00
Christian Schwarz
03955196a9 cmd: config: build identity map
not necessary with one cert but good practice
2017-09-24 16:25:41 +02:00
Christian Schwarz
54b391f77c tcp outlet: add newline after each entry
otherwise tools like graylog don't parse it
2017-09-24 16:24:43 +02:00
Christian Schwarz
c1a5b04065 TLS support for TCP logger 2017-09-24 14:34:50 +02:00
Christian Schwarz
d5df354e64 sampleconf for supported logging 2017-09-24 02:10:29 +02:00
Christian Schwarz
fae34f5927 implement logfmt formatter 2017-09-24 02:09:50 +02:00
Christian Schwarz
c4c38d5b23 add syslog outlet 2017-09-24 02:05:41 +02:00
Christian Schwarz
e0e362c4ff dump logrus and roll our own logger instead 2017-09-24 00:57:52 +02:00
Christian Schwarz
c31ec8c646 convert more code to structured logging 2017-09-23 17:52:29 +02:00
Christian Schwarz
83edcb3889 experimental TCP hook for logrus 2017-09-23 12:58:13 +02:00
Christian Schwarz
9465b593f9 cmd: configurable logrus formatters
We lost the nice context-stack [jobname][taskname][...] at the beginning
of each log line when switching to logrus.

Define some field names that define these contexts.
Write a human-friendly formatter that presents these field names like
the solution we had before logrus.

Write some other formatters for logfmt and json output along the way.

Limit ourselves to stdout logging for now.
2017-09-23 11:24:36 +02:00
Christian Schwarz
3ff9e6d2f7 structured logging for control job 2017-09-23 11:07:08 +02:00
Christian Schwarz
bfcba7b281 cmd: logging using logrus 2017-09-22 17:01:54 +02:00
Christian Schwarz
a459f0a0f6 go-yaml: direct dependency on github repo 2017-09-22 15:29:54 +02:00
Christian Schwarz
e87ce3f7cf cmd: no context + logging for config parsing 2017-09-22 14:13:30 +02:00
Christian Schwarz
458c28e1d0 cmd: UNIX sockets: try to autoremove stale sockets 2017-09-18 00:16:28 +02:00
Christian Schwarz
eaed271a00 cmd: config: remove annoying parser logs 2017-09-18 00:16:28 +02:00
Christian Schwarz
3eaba92025 cmd: introduce control socket & subcommand
Move pprof debugging there.
2017-09-18 00:16:28 +02:00
Christian Schwarz
aea62a9d85 cmd: extract listening on a UNIX socket in a private directory into a helper func 2017-09-17 23:41:51 +02:00
Christian Schwarz
1a62d635a6 cmd: test: would always run testCmdGlobalInit 2017-09-17 23:40:40 +02:00
Christian Schwarz
9cd83399d3 cmd: remove global state in main.go
* refactoring
* Now supporting default config locations
2017-09-17 18:32:00 +02:00
Christian Schwarz
4ac7e78e2b cmd: config: was using wrong reference to config 2017-09-17 17:45:02 +02:00
Christian Schwarz
71650819d3 cmd: remove stderrFile option 2017-09-17 17:25:24 +02:00
Christian Schwarz
6a05e101cf WIP daemon:
Implement
* pruning on source side
* local job
* test subcommand for doing a dry-run of a prune policy

* use a non-blocking callback from autosnap to trigger the depending
jobs -> avoids races, looks saner in the debug log
2017-09-16 21:13:19 +02:00
Christian Schwarz
b168274048 fixup dmf tests 2017-09-16 20:32:01 +02:00
Christian Schwarz
cd4e09ebb3 cmd: handler: privatise & rename variables 2017-09-16 20:27:08 +02:00
Christian Schwarz
e3ec093d53 cmd: handler: check FilesystemVersionFilter as part of ACL 2017-09-16 20:24:46 +02:00
Christian Schwarz
dc3378e890 cmd: daemon: use closure-local variable when starting job 2017-09-16 20:21:05 +02:00