Commit Graph

53 Commits

Author SHA1 Message Date
Christian Schwarz
84eefa57bc rpc/grpcclientidentity: remove hard-coded deadline in listener adatper causing crash
Verified once again that grpc.DialContext is indeed non-blocking.
However, it checks in a defer stmt that the passed dial is not ctx.Done().
That is highly unusual if the dial is non-blocking.
But it might still happen, maybe because of machine suspend during the function call and before the defer stmt is executed.

panic:
context deadline exceeded
goroutine 49 [running]:
github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.ClientConn(0x1906ea0, 0xc0003ea1e0, 0x1921620, 0xc0002da660, 0x0)
        /gopath/src/github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:49 +0x38c
github.com/zrepl/zrepl/rpc.NewClient(0x1906f00, 0xc0002d60f0, 0x1921620, 0xc0002da640, 0x1921620, 0xc0002da660, 0x1921620, 0xc0002da6a0, 0x1921620)
        /gopath/src/github.com/zrepl/zrepl/rpc/rpc_client.go:53 +0x199
github.com/zrepl/zrepl/daemon/job.(*modePush).ConnectEndpoints(0xc0000d1e90, 0x1921620, 0xc0002da640, 0x1921620, 0xc0002da660, 0x1921620, 0xc0002da6a0, 0x1906f00, 0xc0002d60f0)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:105 +0x15d
github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do(0xc0000d6120, 0x1918720, 0xc00020f170)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:356 +0x236
github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run(0xc0000d6120, 0x1918720, 0xc00009c660)
        /gopath/src/github.com/zrepl/zrepl/daemon/job/active.go:347 +0x289
github.com/zrepl/zrepl/daemon.(*jobs).start.func1(0xc0000fc880, 0x1921620, 0xc0002da120, 0x191a320, 0xc0000d6120, 0x1918720, 0xc0002d6a80)
2019-10-10 14:02:12 +02:00
Christian Schwarz
a6b578b648 rpc/dataconn/stream: Conn: handle concurrent Close calls + goroutine leak fix
* Add Close() in closeState to identify the first closer
* Non-first closers get an error
* Reads and Writes from the Conn get an error if the conn was closed
  during the Read / Write was running
* The first closer starts _separate_ goroutine draining the c.frameReads channel
* The first closer then waits for the goroutine that fills c.frameReads
  to exit

refs 3bfe0c16d0
fixes #174

readFrames would block on `reads <-`
   but only after that would stream.Conn.readFrames close c.waitReadFramesDone
   which was too late because stream.Conn.Close would wait for c.waitReadFramesDone to be closed before draining the channel
                              ^^^^^^ (not frameconn.Conn, that closed successfully)

   195 @ 0x1032ae0 0x1006cab 0x1006c81 0x1006a65 0x15505be 0x155163e 0x1060bc1
           0x15505bd       github.com/zrepl/zrepl/rpc/dataconn/stream.readFrames+0x16d             github.com/zrepl/zrepl/rpc/dataconn/stream/stream.go:220
           0x155163d       github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames+0xbd      github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:71

   195 @ 0x1032ae0 0x10078c8 0x100789e 0x100758b 0x1552678 0x1557a4b 0x1556aec 0x1060bc1
           0x1552677       github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).Close+0x77           github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:191
           0x1557a4a       github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn.func1+0x5a      github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:93
           0x1556aeb       github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn+0x87b           github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:176
2019-09-29 19:05:54 +02:00
Christian Schwarz
8c88e168c1 rpc/dataconn/client: ReqRecv to log level Debug
reported by @avg-l
2019-09-28 11:49:20 +02:00
Christian Schwarz
a78c854404 rpc/dataconn/frameconn: mask ECONNRESET error on Close()
fixes #190
2019-09-28 11:49:20 +02:00
Christian Schwarz
77d3a1ad4d build: drop go Dep, switch to modules, support Go 1.13
bump enumer to v1.1.1
bump golangci-lint to v1.17.1

no `go mod tidy` because 1.13 and 1.12 seem to alter each other's output

fixes #112
2019-09-14 13:36:44 +02:00
Christian Schwarz
3bfe0c16d0 rpc/dataconn/stream: fix goroutine leaks & transitive buffer leaks
fixes #174
2019-09-07 22:22:05 +02:00
Christian Schwarz
254a292362 rpc/timeoutconn: platform-independent sizes for computing syscall.Iovec.Len
refs #184
2019-08-19 18:11:25 +02:00
Christian Schwarz
95e16f01c1 rpc/dataconn: audit and comment use of unsafe for readv usage
Go 1.13 will add a more precise escape analysis:
https://tip.golang.org/doc/go1.13#compiler

Reviewed usage of unsafe to ensure we adhere to the unsafe.Pointer safety rules:
https://tip.golang.org/pkg/unsafe/#Pointer
2019-08-19 18:11:25 +02:00
Christian Schwarz
d6304f4f1d rpc/dataconn: fix I/O timeout on variable receive rate
refs #162
2019-03-31 14:20:06 +02:00
Christian Schwarz
4d2765e5f2 rpc: actually return an error if ReqSend fails 2019-03-28 22:17:12 +01:00
Christian Schwarz
5b97953bfb run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
Christian Schwarz
afed762774 format source tree using goimports 2019-03-22 19:41:12 +01:00
Christian Schwarz
b2c5ffcaea rpc: dataconn: handle incorrect handler return values
refs #137
2019-03-16 14:47:29 +01:00
Christian Schwarz
2c3b3c093d rpc: do not leak grpc state change logger goroutine 2019-03-15 16:18:01 +01:00
Christian Schwarz
ab3e783168 rpc: treat protocol handshake errors as permanent
treat handshake errors as permanent on the client

The issue was observed by 100% CPU usage due to lack ofrate-limiting in
dataconn.ReqPing retries=> safeguard that
2019-03-15 16:18:01 +01:00
Christian Schwarz
b85ec52387 rpc/ctrl: nicer perr info debug log messages 2019-03-13 19:20:04 +01:00
Christian Schwarz
c87759affe replication/driver: automatic retries on connectivity-related errors 2019-03-13 15:00:40 +01:00
Christian Schwarz
07b43bffa4 replication: refactor driving logic (no more explicit state machine) 2019-03-13 15:00:40 +01:00
Christian Schwarz
0230c6321f rpc/dataconn: microbenchmark 2019-03-13 13:57:21 +01:00
Christian Schwarz
796c5ad42d rpc rewrite: control RPCs using gRPC + separate RPC for data transfer
transport/ssh: update go-netssh to new version
    => supports CloseWrite and Deadlines
    => build: require Go 1.11 (netssh requires it)
2019-03-13 13:53:48 +01:00
Christian Schwarz
8cca0a8547 Initial working version
Summary:
* Logging is still bad
* test output in a lot of placed
* FIXMEs every where

Test Plan: None, just review

Differential Revision: https://phabricator.cschwarz.com/D2
2018-06-24 10:44:00 +02:00
Christian Schwarz
75fd21e454 make generate: stringer was updated and now uses strconv instead of fmt
bd4635fd25 (diff-0415b5286e4cf3e373f349d917e5e039)
2018-04-01 15:30:04 +02:00
Christian Schwarz
4cf910874d rpc: make DataType a stringer, fixing debug messages 2018-02-18 13:33:53 +01:00
Christian Schwarz
0cbee78b40 fix unreachable code & missing stringer-generated code 2017-09-30 16:31:55 +02:00
Christian Schwarz
fa4d2098a8 rpc: re-architect connection teardown
Tear down occurs on each protocol level, stack-wise.

Open RWC
Open ML (with NewMessageLayer)
Open RPC (with NewServer/ NewClient)
Close RPC (with Close() from Client())
Close ML
* in Server: after error / receive of Close request
* in Client: after getting ACK for Close request from Server
Close RWC

To achieve this, a DataType for RPC control messages was added, which
has a separate set of endpoints. Not exactly pretty, but works for now.

The necessity of the RST frame remains to be determined. However, it is
nice to have a way to signal the other side something went terribly
wrong in the middle of an operation. Example: A frameBridingWriter fails
to read the next chunk of a file it is supposed to send, it can just
send an RST frame to signal this operation failed... Wouldn't trailers
make sense then?
2017-09-11 10:54:56 +02:00
Christian Schwarz
6ab05ee1fa reimplement io.ReadWriteCloser based RPC mechanism
The existing ByteStreamRPC requires writing RPC stub + server code
for each RPC endpoint. Does not scale well.

Goal: adding a new RPC call should

- not require writing an RPC stub / handler
- not require modifications to the RPC lib

The wire format is inspired by HTTP2, the API by net/rpc.

Frames are used for framing messages, i.e. a message is made of multiple
frames which are glued together using a frame-bridging reader / writer.
This roughly corresponds to HTTP2 streams, although we're happy with
just one stream at any time and the resulting non-need for flow control,
etc.

Frames are typed using a header. The two most important types are
'Header' and 'Data'.

The RPC protocol is built on top of this:

- Client sends a header         => multiple frames of type 'header'
- Client sends request body     => mulitiple frames of type 'data'
- Server reads a header         => multiple frames of type 'header'
- Server reads request body     => mulitiple frames of type 'data'
- Server sends response header  => ...
- Server sends response body    => ...

An RPC header is serialized JSON and always the same structure.
The body is of the type specified in the header.

The RPC server and client use some semi-fancy reflection tequniques to
automatically infer the data type of the request/response body based on
the method signature of the server handler; or the client parameters,
respectively.
This boils down to a special-case for io.Reader, which are just dumped
into a series of data frames as efficiently as possible.
All other types are (de)serialized using encoding/json.

The RPC layer and Frame Layer log some arbitrary messages that proved
useful during debugging. By default, they log to a non-logger, which
should not have a big impact on performance.

pprof analysis shows the implementation spends its CPU time
        60% waiting for syscalls
        30% in memmove
        10% ...

On a Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz CPU, Linux 4.12, the
implementation achieved ~3.6GiB/s.

Future optimization may include spice(2) / vmspice(2) on Linux, although
this doesn't fit so well with the heavy use of io.Reader / io.Writer
throughout the codebase.

The existing hackaround for local calls was re-implemented to fit the
new interface of PRCServer and RPCClient.
The 'R'PC method invocation is a bit slower because reflection is
involved inbetween, but otherwise performance should be no different.

The RPC code currently does not support multipart requests and thus does
not support the equivalent of a POST.

Thus, the switch to the new rpc code had the following fallout:

- Move request objects + constants from rpc package to main app code
- Sacrifice the hacky 'push = pull me' way of doing push
-> need to further extend RPC to support multipart requests or
     something to implement this properly with additional interfaces
-> should be done after replication is abstracted better than separate
     algorithms for doPull() and doPush()
2017-09-01 19:24:53 +02:00
Christian Schwarz
44b77a8ef9 rpc: always log goodbye 2017-08-09 21:03:12 +02:00
Christian Schwarz
4e45b4090b pull log output: optimize to be readable by humans 2017-08-06 18:28:05 +02:00
Christian Schwarz
cba083cadf Make zfs.DatasetPath json.Marshaler and json.Unmarshaler
Had to resort to using pointers to zfs.DatasetPath everywhere... Should
find a better solution for that.
2017-08-06 16:22:15 +02:00
Christian Schwarz
8eb4a2ba44 Rudimentary progress reporting on send / recv side. 2017-08-06 16:21:54 +02:00
Christian Schwarz
35dcfc234e Implement push support.
Pushing is achieved by inverting the roles on the established
connection, i.e. the client tells the server what data it should pull
from the client (PullMeRequest).

Role inversion is achieved by moving the server loop to the serverLoop
function of ByteStreamRPC, which can be called from both the Listen()
function (server-side) and the PullMeRequest() client-side function.

A donwside of this PullMe approach is that the replication policies
become part of the rpc, because the puller must follow the policy.
2017-05-20 18:17:08 +02:00
Christian Schwarz
3c7f782dac rpc: remove FilesystemRequest.Direction (unused) 2017-05-20 17:43:49 +02:00
Christian Schwarz
40fe7e643d cmd: Move replication logic to separate file. 2017-05-20 17:29:37 +02:00
Christian Schwarz
48a4e8033a rpc: close outgoing SSH connection on exit. 2017-05-14 14:11:19 +02:00
Christian Schwarz
74719ad846 rpc: chunk JSON parts of communication + refactoring
JSONDecoder was buffering more of connection data than just the JSON.
=> Unchunker didn't bother and just started unchunking.

While chaining JSONDecoder.Buffered() and the connection using
ChainedReader works, it's still not a clean architecture.

=> Every JSON message is now wrapped in a chunked stream
   (chunked and unchunked)
   => no special-cases
=> Keep ChainedReader, might be useful later on...
2017-05-13 15:33:46 +02:00
Christian Schwarz
feabf1abcd rpc: logging for bytestream listener 2017-05-13 15:25:09 +02:00
Christian Schwarz
cd8796aed4 rpc: Initial|IncrementalTransferRequest transfer zfs data structures 2017-05-07 12:20:56 +02:00
Christian Schwarz
c71be910f9 rpc: fix incremental transfer request handling 2017-05-07 11:53:47 +02:00
Christian Schwarz
82beea94d5 rpc: missing response header for InitialTransferRequest 2017-05-06 23:43:55 +02:00
Christian Schwarz
9b871fb7c0 rpc: more detailed errors 2017-05-06 10:58:23 +02:00
Christian Schwarz
f005ce318d Purge model package, not really used anyways. 2017-05-03 17:26:45 +02:00
Christian Schwarz
514f9aa123 rpc: hex-ints for RequestType definitions 2017-05-03 17:13:08 +02:00
Christian Schwarz
b8e7ddd61f rpc fixup msising directions filesystemrequest 2017-05-03 17:12:31 +02:00
Christian Schwarz
43f67d2b7c rpc: add FilesystemVersionsRequest 2017-05-03 17:12:15 +02:00
Christian Schwarz
b87829817a rpc: bytestream: listen: filesystems request: reply with header 2017-04-30 23:47:12 +02:00
Christian Schwarz
8bdcdd5ec6 rpc: bytestream: listen: consistent error handling 2017-04-30 23:47:12 +02:00
Christian Schwarz
9fdd1ea909 rpc: fix panic when parsing filesystem response 2017-04-30 23:47:12 +02:00
Christian Schwarz
08370689c8 rpc: implement respondWithError 2017-04-30 23:47:11 +02:00
Christian Schwarz
d9ecfc8eb4 Gofmt megacommit. 2017-04-26 20:29:54 +02:00
Christian Schwarz
4494afe47f Finish implementation of RPC. 2017-04-16 21:38:31 +02:00