Commit Graph

194 Commits

Author SHA1 Message Date
Christian Schwarz
91c4a97f72 Pruner: refactor + use Task API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
13562b48ed IntervalAutosnap: refactor + use Task API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
58ee796394 adopt Task API: infect datastructures
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
ce351146cf job control: implement JobStatus 2017-12-27 14:39:46 +01:00
Christian Schwarz
14b8d69a63 cmd control status + expose DaemonStatus via control API
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
8c7e373049 daemon: DaemonStatus + JobStatus + dummy implementation
refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
2c87b15e83 daemon: Task abstraction + TaskStatus
An instance of Task tracks a single thread of activity that is part of a Job.

While the docs already use this terminology of tasks being composed of jobs,
the code did not have an object to represent these semantics.
Now it does:

* A task t is initialized with a root activity, which is its name
* t can t.Enter() and t.Finish() an activity, building
  a stack of activities
* t's code can get a logger t.Log() whose logTaskField is set to the
  concatenated stack of activities
* t's code can update IO progress it made since leaving idle state
* t's code's log output vie t.Log() is captured since leaving idle
  state
  * FIXME: find a way to bound that buffer

refs #10
refs #48
2017-12-27 14:39:46 +01:00
Christian Schwarz
d7f3fb93ae bash completions: hidden subcommand + integrate into Makefile 2017-12-27 14:39:46 +01:00
Christian Schwarz
ebf209427a logging: support ignoring fields in HumanFormatter
should be refactored to logger one day so the implementation of ignoring
is not duplicated to each outlet.

refs #10
2017-12-27 14:39:46 +01:00
Christian Schwarz
261d095108 logger: support forking of outlets
refs #10
2017-12-27 13:50:07 +01:00
Christian Schwarz
583a63a68f refactor: encapsulate pulling in a struct
refs #10
2017-12-24 15:23:28 +01:00
Christian Schwarz
896f31bbf3 'zrepl version' and 'zrepl control version' subcommand + maintainer README
Version is autodetected on build using git
If it cannot be detected with git, an override must be provided.

For tracability of distros, the distroy packagers should override as
well, which is why I added a README entry for package mainatiners.

refs #35
2017-11-18 21:12:48 +01:00
Christian Schwarz
bfbab9382e fixup: remove unused StdoutOutlet function
refs #28
2017-11-17 00:36:48 +01:00
Christian Schwarz
2bfcfa5be8 logging: first outlet receives logger error message
Abandons stderr special-casing:

* looks weird on shell and IO redirection to same file because of
interleaving of stdout and stderr
* better than a separate dedicated outlet because it does not require
additional configuration

fixes #28

BREAK SEMANTICS CONFIG
2017-11-17 00:25:38 +01:00
Christian Schwarz
a7f70a566d logger: write internal / outlet errors to an error outlet
refs #28
2017-11-16 23:49:47 +01:00
Christian Schwarz
b576253ea8 logging: fixup 4763486: implementation would parse 'date' instead of 'time' field in config 2017-11-15 11:14:20 +01:00
Christian Schwarz
476348689a logging: stdout outlet: include time in output if tty or forced through config 2017-11-15 11:04:34 +01:00
Christian Schwarz
ed68bffea5 bookmark every snapshot
replication logic already supports bookmarks \o/

refs #34
2017-11-13 10:59:46 +01:00
Christian Schwarz
51af880701 refactor: parametrize PrefixFilter VersionType check
refs #34
2017-11-13 10:59:22 +01:00
Christian Schwarz
cef63ac176 logging: stdout formatter: use logfmt package to format non-special stdout fields + handle errors
refs #40
2017-11-13 10:58:07 +01:00
Christian Schwarz
f3433df617 cmd/sampleconf/zrep.yml: remove it, it's from the stone ages 2017-10-05 21:48:18 +02:00
Christian Schwarz
161ce3b3c3 autosnap: fix log level when fs filter does not match any fs 2017-10-05 21:22:17 +02:00
Christian Schwarz
83bb97a845 control job: wrong error on context done 2017-10-05 21:20:01 +02:00
Christian Schwarz
40919d06c2 source job: fix errnous log message when accept() on closed listener 2017-10-05 21:19:42 +02:00
Christian Schwarz
c48069ce88 retention grid: interva length monotonicity: exception for keep=all
fixes #6
2017-10-05 20:34:35 +02:00
Christian Schwarz
72d288567e mappings: fix aliasing bug with '<' wildcards
In contrast to any 'something<' mapping, a '<' mapping cannot be unique
Thus, '<' mappings are thus just an append to target, which is exactly
what we get when trimming empty prefix ''.

Otherwise, given mapping

{ "<": "storage/backups/app-srv" }

Before (clearly a conflict)
zroot     => storage/backups/app-srv
storage   => storage/backups/app-srv
After:
zroot     => storage/backups/app-srv/zroot
storage   => storage/backups/app-srv/storage

However, mapping directly with subtree wildcard is still possible, just
not with the root wildcard

{
    "<"              "storage/backups/app-srv"
    "zroot/var/db<": "storage/db_replication/app-srv"
}

fixes #22
2017-10-05 20:10:05 +02:00
Christian Schwarz
b5d46e2ec3 impl: don't reference m.entries again 2017-10-05 18:55:02 +02:00
Christian Schwarz
83d450b1f2 config: support days (d) and weeks (w) in durations
fixes #18
2017-10-05 15:17:37 +02:00
Christian Schwarz
3e647c14c0 config: source job: rename field 'datasets' to 'filesystems'
While filesystems is also not the right term (since it excludes ZVOLs),
we want to stay consistent with comments & terminology used in docs.

BREAK CONFIG

fixes #17
2017-10-05 13:39:05 +02:00
Christian Schwarz
b95260f4b5 config: logging: defaults + definition as list
* Stdout logger as default logger
* Clearer keyword / value separation
* Allows multiple outlet definitions

BREAK CONFIG

fixes #20
fixes #19
2017-10-05 13:31:16 +02:00
Christian Schwarz
e6d08149ef docs: update 'mappping & filter syntax' + more elaborate sampleconf 2017-10-02 18:29:58 +02:00
Christian Schwarz
45670a7e5d make vet happy: 'don't leak contexts' 2017-09-30 16:39:52 +02:00
Christian Schwarz
aab43af27c tcp outlet: fix error handling on write failure
Also: clarify semantics of RetryInterval
2017-09-30 16:38:48 +02:00
Christian Schwarz
0cbee78b40 fix unreachable code & missing stringer-generated code 2017-09-30 16:31:55 +02:00
Christian Schwarz
03955196a9 cmd: config: build identity map
not necessary with one cert but good practice
2017-09-24 16:25:41 +02:00
Christian Schwarz
54b391f77c tcp outlet: add newline after each entry
otherwise tools like graylog don't parse it
2017-09-24 16:24:43 +02:00
Christian Schwarz
c1a5b04065 TLS support for TCP logger 2017-09-24 14:34:50 +02:00
Christian Schwarz
d5df354e64 sampleconf for supported logging 2017-09-24 02:10:29 +02:00
Christian Schwarz
fae34f5927 implement logfmt formatter 2017-09-24 02:09:50 +02:00
Christian Schwarz
c4c38d5b23 add syslog outlet 2017-09-24 02:05:41 +02:00
Christian Schwarz
e0e362c4ff dump logrus and roll our own logger instead 2017-09-24 00:57:52 +02:00
Christian Schwarz
c31ec8c646 convert more code to structured logging 2017-09-23 17:52:29 +02:00
Christian Schwarz
83edcb3889 experimental TCP hook for logrus 2017-09-23 12:58:13 +02:00
Christian Schwarz
9465b593f9 cmd: configurable logrus formatters
We lost the nice context-stack [jobname][taskname][...] at the beginning
of each log line when switching to logrus.

Define some field names that define these contexts.
Write a human-friendly formatter that presents these field names like
the solution we had before logrus.

Write some other formatters for logfmt and json output along the way.

Limit ourselves to stdout logging for now.
2017-09-23 11:24:36 +02:00
Christian Schwarz
3ff9e6d2f7 structured logging for control job 2017-09-23 11:07:08 +02:00
Christian Schwarz
bfcba7b281 cmd: logging using logrus 2017-09-22 17:01:54 +02:00
Christian Schwarz
a459f0a0f6 go-yaml: direct dependency on github repo 2017-09-22 15:29:54 +02:00
Christian Schwarz
e87ce3f7cf cmd: no context + logging for config parsing 2017-09-22 14:13:30 +02:00
Christian Schwarz
458c28e1d0 cmd: UNIX sockets: try to autoremove stale sockets 2017-09-18 00:16:28 +02:00
Christian Schwarz
eaed271a00 cmd: config: remove annoying parser logs 2017-09-18 00:16:28 +02:00
Christian Schwarz
3eaba92025 cmd: introduce control socket & subcommand
Move pprof debugging there.
2017-09-18 00:16:28 +02:00
Christian Schwarz
aea62a9d85 cmd: extract listening on a UNIX socket in a private directory into a helper func 2017-09-17 23:41:51 +02:00
Christian Schwarz
1a62d635a6 cmd: test: would always run testCmdGlobalInit 2017-09-17 23:40:40 +02:00
Christian Schwarz
9cd83399d3 cmd: remove global state in main.go
* refactoring
* Now supporting default config locations
2017-09-17 18:32:00 +02:00
Christian Schwarz
4ac7e78e2b cmd: config: was using wrong reference to config 2017-09-17 17:45:02 +02:00
Christian Schwarz
71650819d3 cmd: remove stderrFile option 2017-09-17 17:25:24 +02:00
Christian Schwarz
6a05e101cf WIP daemon:
Implement
* pruning on source side
* local job
* test subcommand for doing a dry-run of a prune policy

* use a non-blocking callback from autosnap to trigger the depending
jobs -> avoids races, looks saner in the debug log
2017-09-16 21:13:19 +02:00
Christian Schwarz
b168274048 fixup dmf tests 2017-09-16 20:32:01 +02:00
Christian Schwarz
cd4e09ebb3 cmd: handler: privatise & rename variables 2017-09-16 20:27:08 +02:00
Christian Schwarz
e3ec093d53 cmd: handler: check FilesystemVersionFilter as part of ACL 2017-09-16 20:24:46 +02:00
Christian Schwarz
dc3378e890 cmd: daemon: use closure-local variable when starting job 2017-09-16 20:21:05 +02:00
Christian Schwarz
36b66f6fd7 cmd: mapfilter: support rejecting mappings
breaking config
2017-09-16 19:43:02 +02:00
Christian Schwarz
e70b6f3071 WIP: recurring jobs
Done:

* implement autosnapper that asserts interval between snapshots
* implement pruner

* job pull: pulling + pruning
* job source: autosnapping + serving

TODO

* job source: pruning
* job local: everything
* fatal errors such as serve that cannot bind socket must be more
visible
* couldn't things that need a snapshotprefix just use a interface
Prefixer() instead? then we could have prefixsnapshotfilter and not
duplicate it every time...
* either go full context.Context or not at all...? just wait because
community climate around it isn't that great and we only need it for
cancellation? roll our own?
2017-09-15 19:35:19 +02:00
Christian Schwarz
c6ca1efaae cmd: fix typo 2017-09-15 19:34:38 +02:00
Christian Schwarz
0acb2e9ec0 cmd: fix missing error message 2017-09-15 19:32:09 +02:00
Christian Schwarz
5faafbb1b4 cmd: noprune prune policy 2017-09-15 19:32:09 +02:00
Christian Schwarz
e2149de840 cmd: automatic inverting of DatasetMapFilter 2017-09-13 22:55:23 +02:00
Christian Schwarz
1deaa459c8 config: unify job debugging options 2017-09-11 15:45:10 +02:00
Christian Schwarz
93a58a36bf util: add PrefixLogger 2017-09-11 15:37:45 +02:00
Christian Schwarz
d76d3db0b3 handler: remove unused SinkMappingFunc 2017-09-11 13:51:19 +02:00
Christian Schwarz
0a53b2415f signal handling for source job 2017-09-11 13:50:35 +02:00
Christian Schwarz
ce25c01c7e implement stdinserver command + corresponding server
How it works:

`zrepl stdinserver CLIENT_IDENTITY`
 * connects to the socket in $global.serve.stdinserver.sockdir/CLIENT_IDENTITY
 * sends its stdin / stdout file descriptors to the `zrepl daemon` process (see cmsg(3))
 * does nothing more

This enables a setup where `zrepl daemon` is not directly exposed to the
internet but instead all traffic is tunnelled through SSH.
The server with the source job has an authorized_keys file entry for the
public key used by the corresponding pull job

 command="/mnt/zrepl stdinserver CLIENT_IDENTITY" ssh-ed25519 AAAAC3NzaC1E... zrepl@pullingserver
2017-09-11 13:48:07 +02:00
Christian Schwarz
f3689563b5 config: restructure in 'jobs' and 'global' section 2017-09-11 13:43:18 +02:00
Christian Schwarz
73c9033583 WIP: Switch to new config format.
Don't use jobrun for daemon, just call JobDo() once, the job must
organize stuff itself.

Sacrifice all the oneshot commands, they will be reintroduced as
client-calls to the daemon.
2017-09-10 17:53:54 +02:00
Christian Schwarz
8bf3516003 Extend sampleconf, explain what stdinserver serve type does. 2017-09-10 16:01:45 +02:00
Christian Schwarz
0df47b0b0a move config.go to config_old.go 2017-09-09 21:57:20 +02:00
Christian Schwarz
b2f3645bfd alternative prototype for new config format 2017-09-07 11:18:06 +02:00
Christian Schwarz
98fc59dbd5 prototype new config format 2017-09-06 12:46:33 +02:00
Christian Schwarz
64b4901eb0 cmd test: dump config using pretty printer 2017-09-02 12:52:56 +02:00
Christian Schwarz
7e442ea0ea cmd: remove legacy NoMatchError 2017-09-02 12:40:22 +02:00
Christian Schwarz
70258fbada cmd: add 'test' subcommand
configbreak
2017-09-02 12:30:03 +02:00
Christian Schwarz
287e0620ba mapfilter: actually set filterOnly property 2017-09-02 12:22:34 +02:00
Christian Schwarz
8f03e97d47 prototype daemon 2017-09-02 11:08:24 +02:00
Christian Schwarz
4a00bef40b prune: use zfs destroy with sanity check 2017-09-02 11:08:24 +02:00
Christian Schwarz
fee2071514 autosnap: fix pathname 2017-09-02 11:08:24 +02:00
Christian Schwarz
e048386cd5 cmd: add repeat config option to Prune 2017-09-02 11:08:24 +02:00
Christian Schwarz
8a96267ef4 jobrun: use notificationChannel instead of logger for communicating events 2017-09-02 11:08:24 +02:00
Christian Schwarz
f8979d6e83 jobrun/cmd: implement jobrun.Job for config objects 2017-09-02 11:08:24 +02:00
Christian Schwarz
582ae83da3 cmd: remove RunCmd 2017-09-01 19:29:19 +02:00
Christian Schwarz
3070d156a3 jobrun: rename to jobmetadata 2017-09-01 19:29:19 +02:00
Christian Schwarz
6ab05ee1fa reimplement io.ReadWriteCloser based RPC mechanism
The existing ByteStreamRPC requires writing RPC stub + server code
for each RPC endpoint. Does not scale well.

Goal: adding a new RPC call should

- not require writing an RPC stub / handler
- not require modifications to the RPC lib

The wire format is inspired by HTTP2, the API by net/rpc.

Frames are used for framing messages, i.e. a message is made of multiple
frames which are glued together using a frame-bridging reader / writer.
This roughly corresponds to HTTP2 streams, although we're happy with
just one stream at any time and the resulting non-need for flow control,
etc.

Frames are typed using a header. The two most important types are
'Header' and 'Data'.

The RPC protocol is built on top of this:

- Client sends a header         => multiple frames of type 'header'
- Client sends request body     => mulitiple frames of type 'data'
- Server reads a header         => multiple frames of type 'header'
- Server reads request body     => mulitiple frames of type 'data'
- Server sends response header  => ...
- Server sends response body    => ...

An RPC header is serialized JSON and always the same structure.
The body is of the type specified in the header.

The RPC server and client use some semi-fancy reflection tequniques to
automatically infer the data type of the request/response body based on
the method signature of the server handler; or the client parameters,
respectively.
This boils down to a special-case for io.Reader, which are just dumped
into a series of data frames as efficiently as possible.
All other types are (de)serialized using encoding/json.

The RPC layer and Frame Layer log some arbitrary messages that proved
useful during debugging. By default, they log to a non-logger, which
should not have a big impact on performance.

pprof analysis shows the implementation spends its CPU time
        60% waiting for syscalls
        30% in memmove
        10% ...

On a Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz CPU, Linux 4.12, the
implementation achieved ~3.6GiB/s.

Future optimization may include spice(2) / vmspice(2) on Linux, although
this doesn't fit so well with the heavy use of io.Reader / io.Writer
throughout the codebase.

The existing hackaround for local calls was re-implemented to fit the
new interface of PRCServer and RPCClient.
The 'R'PC method invocation is a bit slower because reflection is
involved inbetween, but otherwise performance should be no different.

The RPC code currently does not support multipart requests and thus does
not support the equivalent of a POST.

Thus, the switch to the new rpc code had the following fallout:

- Move request objects + constants from rpc package to main app code
- Sacrifice the hacky 'push = pull me' way of doing push
-> need to further extend RPC to support multipart requests or
     something to implement this properly with additional interfaces
-> should be done after replication is abstracted better than separate
     algorithms for doPull() and doPush()
2017-09-01 19:24:53 +02:00
Christian Schwarz
676ac41677 fix leaking channel when closing connection 2017-08-09 21:03:05 +02:00
Christian Schwarz
4e45b4090b pull log output: optimize to be readable by humans 2017-08-06 18:28:05 +02:00
Christian Schwarz
cba083cadf Make zfs.DatasetPath json.Marshaler and json.Unmarshaler
Had to resort to using pointers to zfs.DatasetPath everywhere... Should
find a better solution for that.
2017-08-06 16:22:15 +02:00
Christian Schwarz
2ce07c9342 rework filters & mappings
config defines a single datastructure that can act both as a Map and as a Filter
(DatasetMapFilter)

Cleanup wildcard syntax along the way (also changes semantics).
2017-08-06 16:21:54 +02:00
Christian Schwarz
3fac6a67df extract PullACL check into function 2017-08-06 16:21:54 +02:00
Christian Schwarz
4732fdd4cc Implement placeholder filesystems.
Note the docs on the placeholder user property introduced with this
commit. The solution is not really satisfying but couldn't think of a
better one OTOMH
2017-08-06 16:21:54 +02:00
Christian Schwarz
8eb4a2ba44 Rudimentary progress reporting on send / recv side. 2017-08-06 16:21:54 +02:00
Christian Schwarz
d1999fc17c Remove months as a possible time interval unit as it is too volatile.
Thanks to @erdgeist for pointing that out.

refs #2
2017-07-09 00:38:16 +02:00
Dirk Engling
5afbedbd87 Shrink the 'monthly' interval from 32 weeks to 32 days 2017-07-09 00:11:02 +02:00