* update cli commands to respect an empty string and handle different from undefined
* remove test for unintended behaviour
* remove test for unintended behaviour
This PR adds `gosec` linter with the following checks disabled:
- G102: Bind to all interfaces
- G107: Url provided to HTTP request as taint input
- G112: Potential slowloris attack
- G114: Use of net/http serve function that has no support for setting timeouts
- G204: Audit use of command execution
- G401: Detect the usage of DES, RC4, MD5 or SHA1
- G402: Look for bad TLS connection settings
- G404: Insecure random number source (rand)
- G501: Import blocklist: crypto/md5
- G505: Import blocklist: crypto/sha1
We have complaints related to the checks above. They have to be addressed separately.
* Add gocritic linter
`gocritic` provides diagnostics that check for bugs, performance, and style issues
We disable the following checks:
- commentFormatting
- captLocal
- deprecatedComment
This PR contains many `//nolint:gocritic` to disable `appendAssign`.
Most operating systems add a /32 route for the default gateway address to its routing table
This will allow routes to be configured into the system even when the incoming range contains the default gateway.
In case a range is a sub-range of an existing route and this range happens to contain the default gateway it attempts to create a default gateway route to prevent loop issues
With this change we should be able to collect and expose the following histograms:
* `management.updatechannel.create.duration.ms` with `closed` boolean label
* `management.updatechannel.create.duration.micro` with `closed` boolean label
* `management.updatechannel.close.one.duration.ms`
* `management.updatechannel.close.one.duration.micro`
* `management.updatechannel.close.multiple.duration.ms`
* `management.updatechannel.close.multiple.duration.micro`
* `management.updatechannel.close.multiple.channels`
* `management.updatechannel.send.duration.ms` with `found` and `dropped` boolean labels
* `management.updatechannel.send.duration.micro` with `found` and `dropped` boolean labels
* `management.updatechannel.get.all.duration.ms`
* `management.updatechannel.get.all.duration.micro`
* `management.updatechannel.get.all.peers`
- dupword checks for duplicate words in the source code
- durationcheck checks for two durations multiplied together
- forbidigo forbids identifiers
- mirror reports wrong mirror patterns of bytes/strings usage
- misspell finds commonly misspelled English words in comments
- predeclared finds code that shadows one of Go's predeclared identifiers
- thelper detects Go test helpers without t.Helper() call and checks the consistency of test helpers
Handle original search domains in resolv.conf type implementations.
- parse the original resolv.conf file
- merge the search domains
- ignore the domain keyword
- append any other config lines (sortstlist, options)
- fix read origin resolv.conf from bkp in resolvconf implementation
- fix line length validation
- fix number of search domains validation
The no rules matched message is operating system language specific, and can cause errors
Now we check if firewall is reachable by the app and then if the rule is returned or not in two different calls:
isWindowsFirewallReachable
isFirewallRuleActive
Supporting search domains will allow users to define match domains to also
be added to a list of search domains in their systems
Fix Windows registry key configuration for search domains using a key within the netbird interface path
Restructure data handling for improved performance and flexibility.
Introduce 'G'-prefixed fields to represent Gorm relations, simplifying resource management.
Eliminate complexity in lookup tables for enhanced query and write speed.
Enable independent operations on data structures, requiring adjustments in the Store interface and Account Manager.
added intergration with JumpCloud User API. Use the steps in setup.md for configuration.
Additional changes:
- Enhance compatibility for providers that lack audience support in the Authorization Code Flow and the Authorization - - Code Flow with Proof Key for Code Exchange (PKCE) using NETBIRD_DASH_AUTH_USE_AUDIENCE=falseenv
- Verify tokens by utilizing the client ID when audience support is absent in providers
The use of reflection should generally be minimized in Go code because
it can make the code less readable, less type-safe, and potentially slower.
In this particular case we can simply rely on type switch.
Implement user deletion across all IDP-ss. Expires all user peers
when the user is deleted. Users are permanently removed from a local
store, but in IDP, we remove Netbird attributes for the user
untilUserDeleteFromIDPEnabled setting is not enabled.
To test, an admin user should remove any additional users.
Until the UI incorporates this feature, use a curl DELETE request
targeting the /users/<USER_ID> management endpoint. Note that this
request only removes user attributes and doesn't trigger a delete
from the IDP.
To enable user removal from the IdP, set UserDeleteFromIDPEnabled
to true in account settings. Until we have a UI for this, make this
change directly in the store file.
Store the deleted email addresses in encrypted in activity store.
* shutdown the pkce server on user cancellation
* Refactor openURL to exclusively manage authentication flow instructions and browser launching
* Refactor authentication flow initialization based on client OS
The NewOAuthFlow method now first checks the operating system and if it is a non-desktop Linux, it opts for Device Code Flow. PKCEFlow is tried first and if it fails, then it falls back on Device Code Flow. If both unsuccessful, the authentication process halts and error messages have been updated to provide more helpful feedback for troubleshooting authentication errors
* Replace log-based Linux desktop check with process check
To verify if a Linux OS is running a desktop environment in the Authentication utility, the log-based method that checks the XDG_CURRENT_DESKTOP env has been replaced with a method that checks directly if either X or Wayland display server processes are running. This method is more reliable as it directly checks for the display server process rather than relying on an environment variable that may not be set in all desktop environments.
* Refactor PKCE Authorization Flow to improve server handling
* refactor check for linux running desktop environment
* Improve server shutdown handling and encapsulate handlers with new server multiplexer
The changes enhance the way the server shuts down by specifying a context with timeout of 5 seconds, adding a safeguard to ensure the server halts even on potential hanging requests. Also, the server's root handler is now encapsulated within a new ServeMux instance, to support multiple registrations of a path
In case the 53 UDP port is not an option to bind then we hijack the DNS traffic with eBPF, and we forward the traffic to the listener on a custom port. With this implementation, we should be able to listen to DNS queries on any address and still set the local host system to send queries to the custom address on port 53.
Because we tried to attach multiple XDP programs to the same interface, I did a refactor in the WG traffic forward code also.
Add a default firewall rule to allow netbird traffic to be handled
by the access control managers.
Userspace manager behavior:
- When running on Windows, a default rule is add on Windows firewall
- For Linux, we are using one of the Kernel managers to add a single rule
- This PR doesn't handle macOS
Kernel manager behavior:
- For NFtables, if there is a filter table, an INPUT rule is added
- Iptables follows the previous flow if running on kernel mode. If running
on userspace mode, it adds a single rule for INPUT and OUTPUT chains
A new checkerFW package has been introduced to consolidate checks across
route and access control managers.
It supports a new environment variable to skip nftables and allow iptables tests
This PR showcases the implementation of additional linter rules. I've updated the golangci-lint GitHub Actions to the latest available version. This update makes sure that the tool works the same way locally - assuming being updated regularly - and with the GitHub Actions.
I've also taken care of keeping all the GitHub Actions up to date, which helps our code stay current. But there's one part, goreleaser that's a bit tricky to test on our computers. So, it's important to take a close look at that.
To make it easier to understand what I've done, I've made separate changes for each thing that the new linters found. This should help the people reviewing the changes see what's going on more clearly. Some of the changes might not be obvious at first glance.
Things to consider for the future
CI runs on Ubuntu so the static analysis only happens for Linux. Consider running it for the rest: Darwin, Windows
The ephemeral manager keep the inactive ephemeral peers in a linked list. The manager schedule a cleanup procedure to the head of the linked list (to the most deprecated peer). At the end of cleanup schedule the next cleanup to the new head.
If a device connect back to the server the manager will remote it from the peers list.
Switches the order of initialization in the OAuth flow within
the NewOAuthFlow method. Instead of initializing the
Device Authorization Flow first, it now initializes
the PKCE Authorization Flow first, and falls back
to the Device Authorization Flow if the PKCE initialization fails.
In case the route management feature is not supported
then do not create unnecessary firewall and manager instances.
This can happen if the nftables nor iptables is not available on the host OS.
- Move the error handling to upper layer
- Remove fake, useless implementations of interfaces
- Update go-iptables because In Docker the old version can not
determine well the path of executable file
- update lib to 0.70
In case of 'always-on' feature has switched on, after the reboot the service do not start properly in all cases.
If the device is in offline state (no internet connection) the auth login steps will fail and the service will stop.
For the auth steps make no sense in this case because if the OS start the service we do not have option for
the user interaction.
* Move ebpf code to its own package to avoid crash issues in Android
Older versions of android crashes because of the bytecode files
Even when they aren't loaded as it was our case
* move c file to own folder
* fix lint
Enhance the user experience by enabling authentication to Netbird using Single Sign-On (SSO) with any Identity Provider (IDP) provider. Current client offers this capability through the Device Authorization Flow, however, is not widely supported by many IDPs, and even some that do support it do not provide a complete verification URL.
To address these challenges, this pull request enable Authorization Code Flow with Proof Key for Code Exchange (PKCE) for client logins, which is a more widely adopted and secure approach to facilitate SSO with various IDP providers.
EBPF proxy between TURN (relay) and WireGuard to reduce number of used ports used by the NetBird agent.
- Separate the wg configuration from the proxy logic
- In case if eBPF type proxy has only one single proxy instance
- In case if the eBPF is not supported fallback to the original proxy Implementation
Between the signature of eBPF type proxy and original proxy has
differences so this is why the factory structure exists
* Optimize rules with All groups
* Use IP sets in ACLs (nftables implementation)
* Fix squash rule when we receive optimized rules list from management
* Add DNS list argument for mobile client
* Write testable code
Many places are checked the wgInterface != nil condition.
It is doing it just because to avoid the real wgInterface creation for tests.
Instead of this involve a wgInterface interface what is moc-able.
* Refactor the DNS server internal code structure
With the fake resolver has been involved several
if-else statement and generated some unused
variables to distinguish the listener and fake
resolver solutions at running time. With this
commit the fake resolver and listener based
solution has been moved into two separated
structure. Name of this layer is the 'service'.
With this modification the unit test looks
simpler and open the option to add new logic for
the permanent DNS service usage for mobile
systems.
* Remove is running check in test
We can not ensure the state well so remove this
check. The test will fail if the server is not
running well.
* ACL firewall manager fix/improvement
Fix issue with rule squashing, it contained issue when calculated
total amount of IPs in the Peer map (doesn't included offline peers).
That why squashing not worked.
Also this commit changes the rules apply behaviour. Instead policy:
1. Apply all rules from network map
2. Remove all previous applied rules
We do:
1. Apply only new rules
2. Remove outdated rules
Why first variant was implemented: because when you have drop policy
it is important in which order order you rules are and you need totally
clean previous state to apply the new. But in the release we didn't
include drop policy so we can do this improvement.
* Print log message about processed ACL rules
Reduce the peer status notifications
When receive new network map invoke multiple notifications for
every single peers. It cause high cpu usage We handle the in a
batch the peer notification in update network map.
- Remove the unnecessary UpdatePeerFQDN calls in addNewPeer
- Fix notification in RemovePeer function
- Involve FinishPeerListModifications logic
Works only with userspace implementation:
1. Configure host to solve DNS requests via a fake DSN server address in the Netbird network.
2. Add to firewall catch rule for these DNS requests.
3. Resolve these DNS requests and respond by writing directly to wireguard device.
Prevent peer updates if the status is not changing from disconnected to connected and vice versa.
Fixed route score calculation, added tests and changed the log message
fixed installer /usr/local/bin creation
* Extend protocol and firewall manager to handle old management
* Send correct empty firewall rules list when delete peer
* Add extra tests for firewall manager and uspfilter
* Work with inconsistent state
* Review note
* Update comment
Add new feature to notify the user when new client route has arrived.
Refactor the initial route handling. I move every route logic into the route
manager package.
* Add notification management for client rules
* Export the route notification for Android
* Compare the notification based on network range instead of id.
Adds functionality to support Identity Provider (IdP) managers
that do not support a complete verification URI in the
device authentication flow.
In cases where the verification_uri_complete field is empty,
the user will be prompted with their user_code,
and the verification_uri field will be used as a fallback
This PR brings support of a shared port between stun (ICE agent) and
the kernel WireGuard
It implements a single port mode for execution with kernel WireGuard
interface using a raw socket listener.
BPF filters ensure that only STUN packets hit the NetBird userspace app
Removed a lot of the proxy logic and direct mode exchange.
Now we are doing an extra hole punch to the remote WireGuard
port for best-effort cases and support to old client's direct mode.
This PR adds supports for the WireGuard userspace implementation
using Bind interface from wireguard-go.
The newly introduced ICEBind struct implements Bind with UDPMux-based
structs from pion/ice to handle hole punching using ICE.
The core implementation was taken from StdBind of wireguard-go.
The result is a single WireGuard port that is used for host and server reflexive candidates.
Relay candidates are still handled separately and will be integrated in the following PRs.
ICEBind checks the incoming packets for being STUN or WireGuard ones
and routes them to UDPMux (to handle hole punching) or to WireGuard respectively.
Refactored updateServerStates and calculateState
added some checks to ensure we are not sending connecting on context canceled
removed some state updates from the RunClient function
Some IDP requires different scope requests and
issue access tokens for different purposes
This change allow for remote configurable scopes
and the use of ID token
On Android, because of the hard SELinux policies can not list the
interfaces of the ICE package. Without it can not generate a host type
candidate. In this pull request, the list of interfaces comes via the Java
interface.
Fix the status indication in the client service. The status of the
management server and the signal server was incorrect if the network
connection was broken. Basically the status update was not used by
the management and signal library.
Before defining if we will use direct or proxy connection we will exchange a
message with the other peer if the modes match we keep the decision
from the shouldUseProxy function otherwise we skip using direct connection.
Added a feature support message to the signal protocol
The peer login expiration ACL check introduced in #714
filters out peers that are expired and agents receive a network map
without that expired peers.
However, the agents should see those peers in status "Disconnected".
This PR extends the Agent <-> Management protocol
by introducing a new field OfflinePeers
that contain expired peers. Agents keep track of those and display
then just in the Status response.
The ConnStatus is a custom type based on iota
like an enum. The problem was nowhere used to the
benefits of this implementation. All ConnStatus
instances has been compared with strings. I
suppose the reason to do it to avoid a circle
dependency. In this commit the separated status
package has been moved to peer package.
Remove unused, exported functions from engine
Code cleaning in the config.go of the client. This change keep the
logic in original state. The name of the exported function was not
covered well the internal workflow. Without read the comment was not
understandable what is the difference between the GetConfig and
ReadConfig. By the way both of them doing write operation.
Small code cleaning in the iface package. These changes necessary to
get a clean code in case if we involve more platforms. The OS related
functions has been distributed into separate files and it has been
mixed with not OS related logic. The goal is to get a clear picture
of the layer between WireGuard and business logic.
* Disable upstream DNS resolver after several tries and fails
* Add tests for upstream fails
* Use an extra flag to disable domains in DNS upstreams
* Fix hashing IPs of nameservers for updates.
avoid sending admin or management URLs on service start
as it doesn't have an input
Parse management and admin URL when needed
Pass empty admin url on commands to prevent default overwrite
Adding --external-ip-map and --dns-resolver-address to up command and shorthand option to global flags.
Refactor get and read config functions with new ConfigInput type.
updated cobra package to latest release.
This PR adds system activity tracking.
The management service records events like
add/remove peer, group, rule, route, etc.
The activity events are stored in the SQLite event store
and can be queried by the HTTP API.
If peer is deleted in the console,
we set its state as needs login
On Down command we clean any previous state errors
this prevents need for daemon restart
Removed state error wrapping when engine exits, log is enough
Use stdout and stderr log path only if on Linux and attempt to create the path
Update status system with FQDN fields and
status command to display the domain names of remote and local peers
Set some DNS logs to tracing
update readme file
Added host configurators for Linux, Windows, and macOS.
The host configurator will update the peer system configuration
directing DNS queries according to its capabilities.
Some Linux distributions don't support split (match) DNS or custom ports,
and that will be reported to our management system in another PR
Due to peer reconnects when restarting the Management service,
there are lots of SaveStore operations to update peer status.
Store.SavePeerStatus stores peer status separately and the
FileStore implementation stores it in memory.
Added DNS update protocol message
Added sync to clients
Update nameserver API with new fields
Added default NS groups
Added new dns-name flag for the management service append to peer DNS label
If the gateway address would be nil which is
the case on macOS, we return the preferredSrc
added tests for getExistingRIBRouteGateway function
update log message
* Add additional check for needed kernel modules
* Check if wireguard and tun modules are loaded
If modules are loaded return true, otherwise attempt to load them
* fix state check
* Add module function tests
* Add test execution in container
* run client package tests on docker
* add package comment to new file
* force entrypoint
* add --privileged flag
* clean only if tables where created
* run from within the directories
Handle routes updates from management
Manage routing firewall rules
Manage peer RIB table
Add get peer and get notification channel from the status recorder
Update interface peers allowed IPs
Added additional common blacklisted interfaces
Updated the signal protocol to pass the peer port and netbird version
Co-authored-by: braginini <bangvalo@gmail.com>
Support Generic OAuth 2.0 Device Authorization Grant
as per RFC specification https://www.rfc-editor.org/rfc/rfc8628.
The previous version supported only Auth0 as an IDP backend.
This implementation enables the Interactive SSO Login feature
for any IDP compatible with the specification, e.g., Keycloak.
This PR fixes a race condition that happens
when agents connect to a Signal stream, multiple
times within a short amount of time. Common on
slow and unstable internet connections.
Every time an agent establishes a new connection
to Signal, Signal creates a Stream and writes an entry
to the registry of connected peers storing the stream.
Every time an agent disconnects, Signal removes the
stream from the registry.
Due to unstable connections, the agent could detect
a broken connection, and attempt to reconnect to Signal.
Signal will override the stream, but it might detect
the old broken connection later, causing peer deregistration.
It will deregister the peer leaving the client thinking
it is still connected, rejecting any messages.
All the existing agents by default connect to port 33073 of the
Management service. This value is also stored in the local config.
All the agents won't switch to the new port 443
unless explicitly specified in the config.
We want the transition to be smooth for our users, therefore
this PR adds logic to check whether the old port 33073 can be
changed to 443 and updates the config automatically.
This PR is a part of an effort to use standard ports (443 or 80) that are usually allowed by default in most of the environments.
Right now Management Service runs the Let'sEncrypt manager on port 443, HTTP API server on port 33071,
and a gRPC server on port 33073. There are three separate listeners.
This PR combines these listeners into one.
With this change, the HTTP and gRPC server runs on either 443 with TLS or 80 without TLS
by default (no --port specified).
Let's Encrypt manager always runs on port 443 if enabled.
The backward compatibility server runs on port 33073 (with TLS or without).
HTTP port 33071 is obsolete and not used anymore.
Newly installed agents will connect to port 443 by default instead of port 33073 if not specified otherwise.
This PR fixes issues with the terminal when
running netbird ssh to a remote agent.
Every session looks up a user and loads its
profile. If no user is found, the connection is rejected.
The default user is root.
The Management client will try reconnecting in case.
of network issues or non-permanent errors.
If the device was off-boarded, then the client will stop retrying.
This PR adds support for SSH access through the NetBird network
without managing SSH skeys.
NetBird client app has an embedded SSH server (Linux/Mac only)
and a netbird ssh command.
Before this change, NetBird Agent wasn't handling
peer interface configuration changes dynamically.
Also, remote peer configuration changes have
not been applied (e.g. AllowedIPs changed).
Not a very common cause, but still it should be handled.
Now, Agent reacts to PeerConfig changes sent from the
management service and restarts remote connections
if AllowedIps have been changed.
* GetClientID method and increase interval on slow_down err
* Reuse existing authentication flow if is not expired
Created a new struct to hold additional info
about the flow
If there is a waiting sso running, we cancel its context
* Run the up command on a goroutine
* Use time.Until
* Use proper ctx and consistently use goroutine for up/down
Send Desktop UI client version as user-agent to daemon
This is sent on every login request to the management
Parse the GRPC context on the system package and
retrieves the user-agent
Management receives the new UIVersion field and
store in the Peer's system meta
UI and CLI Clients are now able to use SSO login by default
we will check if the management has configured or supports SSO providers
daemon will handle fetching and waiting for an access token
Oauth package was moved to internal to avoid one extra package at this stage
Secrets were removed from OAuth
CLI clients have less and better output
2 new status were introduced, NeedsLogin and FailedLogin for better messaging
With NeedsLogin we no longer have endless login attempts
The management will validate the JWT as it does in the API
and will register the Peer to the user's account.
New fields were added to grpc messages in management
and client daemon and its clients were updated
Peer has one new field, UserID,
that will hold the id of the user that registered it
JWT middleware CheckJWT got a splitter
and renamed to support validation for non HTTP requests
Added test for adding new Peer with UserID
Lots of tests update because of a new field
Agent systray UI has been extended with
a setting window that allows configuring
management URL, admin URL and
supports pre-shared key.
While for the Netbird managed version
the Settings are not necessary, it helps
to properly configure the self-hosted version.
Add method for rotating access token with refresh tokens
This will be useful for catching expired sessions and
offboarding users
Also added functions to handle secrets. They have to be revisited
as some tests didn't run on CI as they waited some user input, like password
Updates test workflows with serial execution to avoid collision
of ports and resource names.
Also, used -exec sudo flag for UNIX tests and removed not-needed
limits configuration on Linux and added a 5 minutes timeout.
Updated the multi-peer tests in the client/internal/engine_test.go
to provide proper validation when creating or starting
a peer engine instance fails.
As some operations of the tests running on windows
are slow, we will experiment with disabling the Defender before
restoring cache and checkout a repository, then we reenable
it to run the tests.
disabled extra logs for windows interface
When stopping engine, all peer conns have to be closed
and for each peer WireGuard iface is called
to remove WireGuard peer.
This operation happens in a goroutine causing
Engine to remove the whole WireGuard interface before.
Therefore consequent calls to RemovePeer are unsuccessful.
This fix just adds a small delay before removing interface.