Commit Graph

41 Commits

Author SHA1 Message Date
Christian Schwarz
740ab4b1b2 chore: io/ioutil has been deprecated 2024-09-08 23:19:45 +00:00
Christian Schwarz
def510abfd chore: require go 1.22/1.23, upgrade protobuf, upgrade all deps
Go upgrade:
- Go 1.23 is current => use that for release builds
- Go 1.22 is less than one year old, it's desirable to support it.
- The [`Go Toolchains`](https://go.dev/doc/toolchain) stuff is available
  in both of these (would also be in Go 1.21). That is quite nice stuff,
  but required some changes to how we versions we use in CircleCI and
  the `release-docker` Makefile target.

Protobuf upgrade:
- Go to protobuf GH release website
- Download latest locally
- run `sha256sum`
- replace existing pinned hashes
- `make generate`

Deps upgrade:
- `go get -t -u all`
- repository moves aren't handled well automatically, fix manually
- repeat until no changes
2024-09-08 20:49:09 +00:00
Christian Schwarz
39f8ff62f0 address updated golangci-lint errors: ineffectual assignment to err (ineffassign) 2023-09-10 11:12:46 +00:00
Christian Schwarz
a6aa610165 run go1.19 gofmt and make adjustments as needed
(Go 1.19 expanded doc comment syntax)
2022-10-24 22:22:41 +02:00
Christian Schwarz
6c87bdb9fb go1.19: switch to new nolint directive that is compatible with Go 1.19 gofmt 2022-10-24 22:22:11 +02:00
Christian Schwarz
b9250a41a2 go1.18: address net.Error.Temporary() deprecation
Go 1.18 deprecated net.Error.Temporary().
This commit cleans up places where we use it incorrectly.
Also, the rpc layer defines some errors that implement

  interface { Temporary() bool }

I added comments to all of the implementations to indicate
whether they will be required if net.Error.Temporary is ever
ever removed in the future.

For HandshakeError, the Temporary() return value is actually
important. I moved & rewrote a (previously misplaced) comment
there.

The ReadStreamError changes were
1. necessary to pacify newer staticcheck and
2. technically, an error can implement Temporary()
   without being net.Err. This applies to some syscall
   errors in the standard library.

Reading list for those interested:
- https://github.com/golang/go/issues/45729
- https://groups.google.com/g/golang-nuts/c/-JcZzOkyqYI
- https://man7.org/linux/man-pages/man2/accept.2.html

Note: This change was prompted by staticheck:

> SA1019: neterr.Temporary has been deprecated since Go 1.18 because it
> shouldn't be used: Temporary errors are not well-defined. Most
> "temporary" errors are timeouts, and the few exceptions are surprising.
> Do not use this method. (staticcheck)
2022-10-24 22:21:52 +02:00
Cole Helbling
1df0f8912a Add --skip-cert-check flag to zrepl configcheck to prevent checking cert files
It may be desirable to check that a config is valid without checking for
the existence of certificate files (e.g. when validating a config inside
a sandbox without access to the cert files).

This will be very useful for NixOS so that we can check the config file
at nix-build time (e.g. potentially without proper permissions to read cert
files for a TLS connection).

fixes https://github.com/zrepl/zrepl/issues/467
closes https://github.com/zrepl/zrepl/pull/587
2022-07-08 20:18:41 +02:00
Christian Schwarz
12503dc55a rpc/dataconn/timeoutconn: disable TestPartialWriteMockConn in CircleCI 2021-12-18 18:02:34 +01:00
Christian Schwarz
c600cc1f60 skip timing-sensitive tests on CircleCI
We had too many spurious test failures in the past.
But on a developer machine, the tests don't usually fail because the
system isn't loaded as much.
So, only disable test on CircleCI.
2021-11-14 17:34:32 +01:00
Christian Schwarz
a6dbda1ea8 go1.17: run goimports to supports the new //go:build lines 2021-10-09 16:51:08 +02:00
Christian Schwarz
2c9fcd7c14 rpc/dataconn: always close send stream returned from Sender.Send()
discovered while debugging #457
2021-10-09 15:43:31 +02:00
Christian Schwarz
4f9b63aa09 rework size estimation & dry sends
- use control connection (gRPC)
- use uint64 everywhere => fixes https://github.com/zrepl/zrepl/issues/463
- [BREAK] bump protocol version

closes https://github.com/zrepl/zrepl/pull/518
fixes https://github.com/zrepl/zrepl/issues/463
2021-10-09 15:43:27 +02:00
Christian Schwarz
936ed73a45 rpc/dataconn: fix log message in closeErr case
refs #457
2021-09-19 18:40:39 +02:00
Christian Schwarz
efe7b17d21 Update to protobuf v1.25 and grpc 1.35; bump CI to go1.12
From:
github.com/golang/protobuf v1.3.2
google.golang.org/grpc v1.17.0

To:
github.com/golang/protobuf v1.4.3
google.golang.org/grpc v1.35.0
google.golang.org/protobuf v1.25.0

About the two protobuf packages:
https://developers.google.com/protocol-buffers/docs/reference/go/faq
> Version v1.4.0 and higher of github.com/golang/protobuf wrap the new
implementation and permit programs to adopt the new API incrementally. For
example, the well-known types defined in github.com/golang/protobuf/ptypes are
simply aliases of those defined in the newer module. Thus,
google.golang.org/protobuf/types/known/emptypb and
github.com/golang/protobuf/ptypes/empty may be used interchangeably.

Notable Code Changes in zrepl:
- generate protobufs now contain a mutex so we can't copy them by value
  anymore
- grpc.WithDialer is deprecated => use grpc.WithContextDialer instead

Go1.12 is now actually required by some of the dependencies.
2021-01-25 00:39:01 +01:00
Christian Schwarz
596a39c0f5 bump golangci-lint to 1.35.2 and fix resulting lint errors
GO111MODULE=on golangci-lint run ./...
endpoint/endpoint.go:487:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
                panic(fmt.Sprintf("ClientIdentityKey context value must be set"))
                      ^
platformtest/platformtest_ops.go:259:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
                                return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
                                                                    ^
platformtest/platformtest_ops.go:266:41: S1039: unnecessary use of fmt.Sprintf (gosimple)
                                return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
                                                                    ^
util/optionaldeadline/optionaldeadline_test.go:97:50: SA1029: should not use built-in type string as key for value; define your own type to avoid collisions (staticcheck)
        pctx := context.WithValue(context.Background(), "key", "value")
                                                        ^
rpc/rpc_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/dataconn_debug.go:8:5: var `debugEnabled` is unused (unused)
rpc/dataconn/frameconn/frameconn.go:42:9: S1039: unnecessary use of fmt.Sprintf (gosimple)
                panic(fmt.Sprintf("frame header is 8 bytes long"))
                      ^
platformtest/platformtest_ops.go:322:40: S1039: unnecessary use of fmt.Sprintf (gosimple)
                        return nil, &LineError{scan.Text(), fmt.Sprintf("unexpected tokens at EOL")}
2021-01-25 00:16:01 +01:00
Mathias Fredriksson
d118bcc717 rpc: fix data race in timeoutconn
- `timeoutconn` handles state, yet calls to Read/Write make a copy of
  that state (non-pointer receiver) so any outbound calls will not have
  the state updated

- Even without the copy issue, the renew methods can in edge cases set a
  new deadline _after_ DisableTimeouts have been called, consider the
  following racy behavior:
    1. `renewReadDeadline` is called, checks `renewDeadlinesDisabled`
       (not disabled)
    2. `DisableTimeouts` is called, sets `renewDeadlinesDisabled`
    3. `DisableTimeouts` invokes `c.SetDeadline`
    4. `renewReadDeadline` invokes `c.SetReadDeadline`

To fix the above, the `Conn` receiver was made to be a pointer
everywhere and access to renewDeadlinesDisabled is now guarded
by an RWMutex instead of using atomics.

closes #415
2021-01-25 00:16:01 +01:00
Mathias Fredriksson
2d545c87cd rpc: fix read mutex unlock in dataconn stream
closes #413
2021-01-25 00:16:01 +01:00
Christian Schwarz
480176ba2d rpc/dataconn: fix go1.15-discovered recursive Error() method impl
rpc/dataconn/dataconn_client.go:69:9: Sprintf format %s with arg e causes recursive Error method call
2020-08-12 21:41:31 +02:00
Christian Schwarz
fa586b493c [#348] fix crash on early-recv error
* The SendStream.Close() was not called by dataconn.Server, which left the zfs send process dangling.
* When the source job's ctx interceptor closed the task, the dangling zfs send was detect by the trace package and panicked.

    020-07-25T19:54:41-04:00 [ERRO][latitude][rpc.data][cyZj$J3Ca$J3Ca.CJwB]: cannot write send stream err="frameconn: shutting down"
    panic: end task: 1 active child tasks: end task: task still has active child tasks

    goroutine 196966 [running]:
    github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1.1(0xc000320680, 0xcde000)
            /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:221 +0x2f7
    github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1()
            /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace.go:237 +0x38
    github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
            /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:41 +0x37
    github.com/zrepl/zrepl/daemon/job.(*PassiveSide).Run.func1(0xdcf780, 0xc0000a3560, 0xdc65a0, 0xc00035e620, 0xc0000a34d0)
            /home/jeremy/go/src/github.com/zrepl/zrepl/daemon/job/passive.go:194 +0x2e7
    github.com/zrepl/zrepl/rpc.NewServer.func3(0xdcf780, 0xc0001ce4b0, 0xdc65e0, 0xc00035e600, 0xc0000a34d0)
            /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/rpc_server.go:82 +0xd5
    github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn(0xc0000a2ba0, 0xc00018eca0)
            /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:149 +0x3be
    github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve.func3(0xc0000b8180, 0xc0000a2ba0, 0xc00018eca0)
            /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:108 +0x5d
    created by github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve
            /home/jeremy/go/src/github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:106 +0x24a
    2020-07-25T19:58:55-04:00 [ERRO][latitude][rpc.data][Pt4F$gCWT$gCWT.fzhc]: cannot write send stream err="frameconn: shutting down"
    panic: end task: 1 active child tasks: end task: task still has active child tasks

fixes #348
2020-07-26 20:32:35 +02:00
Christian Schwarz
10a14a8c50 [#307] add package trace, integrate it with logging, and adopt it throughout zrepl
package trace:

- introduce the concept of tasks and spans, tracked as linked list within ctx
    - see package-level docs for an overview of the concepts
    - **main feature 1**: unique stack of task and span IDs
        - makes it easy to follow a series of log entries in concurrent code
    - **main feature 2**: ability to produce a chrome://tracing-compatible trace file
        - either via an env variable or a `zrepl pprof` subcommand
        - this is not a CPU profile, we already have go pprof for that
        - but it is very useful to visually inspect where the
          replication / snapshotter / pruner spends its time
          ( fixes #307 )

usage in package daemon/logging:

- goal: every log entry should have a trace field with the ID stack from package trace

- make `logging.GetLogger(ctx, Subsys)` the authoritative `logger.Logger` factory function
    - the context carries a linked list of injected fields which
      `logging.GetLogger` adds to the logger it returns
    - `logging.GetLogger` also uses package `trace` to get the
      task-and-span-stack and injects it into the returned logger's fields
2020-05-19 11:30:02 +02:00
Christian Schwarz
bcb5965617 [#307] rpc: proper handling of context cancellation for transportmux + dataconn
- prior to this patch, context cancellation would leave rpc.Server open
- did not make problems because context was only cancelled by SIGINT,
  which was immediately followed by os.Exit
2020-05-18 19:46:24 +02:00
Christian Schwarz
0e5c77d2be [#277] rpc + zfs: drop zfs.StreamCopier, use io.ReadCloser instead 2020-05-18 19:46:24 +02:00
Christian Schwarz
b4abebce00 rpc/dataconn/timeoutconn: tests: relax deadline in timeout tests 2020-05-18 19:46:24 +02:00
InsanePrawn
44bd354eae Spellcheck all files
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
2020-02-24 16:06:09 +01:00
Christian Schwarz
02b3b4f80c fix some typos 2020-02-17 18:02:04 +01:00
Christian Schwarz
4994b7a9ea rpc/dataconn + build: support GOOS={solaris,illumos} 2019-11-16 22:07:47 +01:00
Christian Schwarz
a6b578b648 rpc/dataconn/stream: Conn: handle concurrent Close calls + goroutine leak fix
* Add Close() in closeState to identify the first closer
* Non-first closers get an error
* Reads and Writes from the Conn get an error if the conn was closed
  during the Read / Write was running
* The first closer starts _separate_ goroutine draining the c.frameReads channel
* The first closer then waits for the goroutine that fills c.frameReads
  to exit

refs 3bfe0c16d0
fixes #174

readFrames would block on `reads <-`
   but only after that would stream.Conn.readFrames close c.waitReadFramesDone
   which was too late because stream.Conn.Close would wait for c.waitReadFramesDone to be closed before draining the channel
                              ^^^^^^ (not frameconn.Conn, that closed successfully)

   195 @ 0x1032ae0 0x1006cab 0x1006c81 0x1006a65 0x15505be 0x155163e 0x1060bc1
           0x15505bd       github.com/zrepl/zrepl/rpc/dataconn/stream.readFrames+0x16d             github.com/zrepl/zrepl/rpc/dataconn/stream/stream.go:220
           0x155163d       github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames+0xbd      github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:71

   195 @ 0x1032ae0 0x10078c8 0x100789e 0x100758b 0x1552678 0x1557a4b 0x1556aec 0x1060bc1
           0x1552677       github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).Close+0x77           github.com/zrepl/zrepl/rpc/dataconn/stream/stream_conn.go:191
           0x1557a4a       github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn.func1+0x5a      github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:93
           0x1556aeb       github.com/zrepl/zrepl/rpc/dataconn.(*Server).serveConn+0x87b           github.com/zrepl/zrepl/rpc/dataconn/dataconn_server.go:176
2019-09-29 19:05:54 +02:00
Christian Schwarz
8c88e168c1 rpc/dataconn/client: ReqRecv to log level Debug
reported by @avg-l
2019-09-28 11:49:20 +02:00
Christian Schwarz
a78c854404 rpc/dataconn/frameconn: mask ECONNRESET error on Close()
fixes #190
2019-09-28 11:49:20 +02:00
Christian Schwarz
77d3a1ad4d build: drop go Dep, switch to modules, support Go 1.13
bump enumer to v1.1.1
bump golangci-lint to v1.17.1

no `go mod tidy` because 1.13 and 1.12 seem to alter each other's output

fixes #112
2019-09-14 13:36:44 +02:00
Christian Schwarz
3bfe0c16d0 rpc/dataconn/stream: fix goroutine leaks & transitive buffer leaks
fixes #174
2019-09-07 22:22:05 +02:00
Christian Schwarz
254a292362 rpc/timeoutconn: platform-independent sizes for computing syscall.Iovec.Len
refs #184
2019-08-19 18:11:25 +02:00
Christian Schwarz
95e16f01c1 rpc/dataconn: audit and comment use of unsafe for readv usage
Go 1.13 will add a more precise escape analysis:
https://tip.golang.org/doc/go1.13#compiler

Reviewed usage of unsafe to ensure we adhere to the unsafe.Pointer safety rules:
https://tip.golang.org/pkg/unsafe/#Pointer
2019-08-19 18:11:25 +02:00
Christian Schwarz
d6304f4f1d rpc/dataconn: fix I/O timeout on variable receive rate
refs #162
2019-03-31 14:20:06 +02:00
Christian Schwarz
5b97953bfb run golangci-lint and apply suggested fixes 2019-03-27 13:12:26 +01:00
Christian Schwarz
afed762774 format source tree using goimports 2019-03-22 19:41:12 +01:00
Christian Schwarz
b2c5ffcaea rpc: dataconn: handle incorrect handler return values
refs #137
2019-03-16 14:47:29 +01:00
Christian Schwarz
c87759affe replication/driver: automatic retries on connectivity-related errors 2019-03-13 15:00:40 +01:00
Christian Schwarz
07b43bffa4 replication: refactor driving logic (no more explicit state machine) 2019-03-13 15:00:40 +01:00
Christian Schwarz
0230c6321f rpc/dataconn: microbenchmark 2019-03-13 13:57:21 +01:00
Christian Schwarz
796c5ad42d rpc rewrite: control RPCs using gRPC + separate RPC for data transfer
transport/ssh: update go-netssh to new version
    => supports CloseWrite and Deadlines
    => build: require Go 1.11 (netssh requires it)
2019-03-13 13:53:48 +01:00