Increased the default send timeout from 2 to 5
Added a max of 4 retries
with an increased timeout after the second attempt
using the grpc client context and
checking the error value for canceled context
Refactored updateServerStates and calculateState
added some checks to ensure we are not sending connecting on context canceled
removed some state updates from the RunClient function
Fix the status indication in the client service. The status of the
management server and the signal server was incorrect if the network
connection was broken. Basically the status update was not used by
the management and signal library.
Before defining if we will use direct or proxy connection we will exchange a
message with the other peer if the modes match we keep the decision
from the shouldUseProxy function otherwise we skip using direct connection.
Added a feature support message to the signal protocol
Added additional common blacklisted interfaces
Updated the signal protocol to pass the peer port and netbird version
Co-authored-by: braginini <bangvalo@gmail.com>
This PR fixes a race condition that happens
when agents connect to a Signal stream, multiple
times within a short amount of time. Common on
slow and unstable internet connections.
Every time an agent establishes a new connection
to Signal, Signal creates a Stream and writes an entry
to the registry of connected peers storing the stream.
Every time an agent disconnects, Signal removes the
stream from the registry.
Due to unstable connections, the agent could detect
a broken connection, and attempt to reconnect to Signal.
Signal will override the stream, but it might detect
the old broken connection later, causing peer deregistration.
It will deregister the peer leaving the client thinking
it is still connected, rejecting any messages.
Right now Signal Service runs the Let'sEncrypt manager on port 80
and a gRPC server on port 10000. There are two separate listeners.
This PR combines these listeners into one with a cmux lib.
The gRPC server runs on either 443 with TLS or 80 without TLS.
Let's Encrypt manager always runs on port 80.
The Management client will try reconnecting in case.
of network issues or non-permanent errors.
If the device was off-boarded, then the client will stop retrying.
* feature: support new management service protocol
* chore: add more logging to track networkmap serial
* refactor: organize peer update code in engine
* chore: fix lint issues
* refactor: extract Signal client interface
* test: add signal client mock
* refactor: introduce Management Service client interface
* chore: place management and signal clients mocks to respective packages
* test: add Serial test to the engine
* fix: lint issues
* test: unit tests for a networkMapUpdate
* test: unit tests Sync update
* fix: too many open files caused by agent not being closed after unsuccessful attempts to start a peer connection (happens when no network available)
* fix: minor refactor to consider signal status
* refactor: move goroutine that runs Signal Client Receive to the engine for better control
* chore: fix comments typo
* test: fix golint
* chore: comments update
* chore: consider connection state=READY in signal and management clients
* chore: fix typos
* test: fix signal ping-pong test
* chore: add wait condition to signal client
* refactor: add stream status to the Signal client
* refactor: defer mutex unlock