* Add status anonymization
* Add OS/arch to the status command
* Use human-friendly last-update status messages
* Add debug bundle command to collect (anonymized) logs
* Add debug log level command
* And debug for a certain time span command
Now that we have the latency between peers available we can use this data to consider when choosing the best route. This way the route with the routing peer with the lower latency will be preferred over others with the same target network.
* Add Linux legacy routing if ip rule functionality is not available
* Ignore exclusion route errors if host has no route
* Exclude iOS from route manager
* Also retrieve IPv6 routes
* Ignore loopback addresses not being in the main table
* Ignore "not supported" errors on cleanup
* Fix regression in ListenUDP not using fwmarks
All routes are now installed in a custom netbird routing table.
Management and wireguard traffic is now marked with a custom fwmark.
When the mark is present the traffic is routed via the main routing table, bypassing the VPN.
When the mark is absent the traffic is routed via the netbird routing table, if:
- there's no match in the main routing table
- it would match the default route in the routing table
IPv6 traffic is blocked when a default route IPv4 route is configured to avoid leakage.
* adding peer healthcheck
* generate proto file
* fix return in udp mux and replace with continue
* use ice agent for latency checks
* fix status output
* remove some logs
* fix status test
* revert bind and ebpf code
* fix error handling on binding response callback
* extend error handling on binding response callback
---------
Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
* Fix using wrong array index in log to avoid potential panic
* Increase gRPC connection timeout and add the timeout resolv.conf option
This makes sure the dns client is able to failover to a second
configured nameserver, if present. That is the case then when using the
dns `file` manager and a resolv.conf file generated for netbird.
* On file backup restore, remove the first NS if it's the netbird NS
* Bump dns mangager discovery message from debug to info to ease debugging
This changes the default behavior for new peers, by requiring the agent to be executed with allow-server-ssh set to true in order for the management configuration to take effect.
With these changes, the command up supports the flag --disable-auto-connect that allows users to disable auto connection on the client after a computer restart or when the daemon restarts.
This PR implements the following posture checks:
* Agent minimum version allowed
* OS minimum version allowed
* Geo-location based on connection IP
For the geo-based location, we rely on GeoLite2 databases which are free IP geolocation databases. MaxMind was tested and we provide a script that easily allows to download of all necessary files, see infrastructure_files/download-geolite2.sh.
The OpenAPI spec should extensively cover the life cycle of current version posture checks.
* Make sure our iOS dialer does not get overwritten
* set dial timeout for both clients on ios
---------
Co-authored-by: Pascal Fischer <pascal@netbird.io>
sends an extra server reflexive candidate to the remote peer with our related port (usually the Wireguard port)
this is useful when a network has an existing port forwarding rule for the Wireguard port and the local peer and avoids creating a 1:1 NAT on the local network.
Ensure we use WG address instead of loopback addresses for eBPF.
- First try to use 53 port
- Try to use 5053 port on WG interface for eBPF
- Try to use 5053 on WG interface or loopback interface
In the case of disabled stub listeren the list of name servers is unordered. The solution is to configure the resolv.conf file directly instead of dbus API.
Because third-party services also can manipulate the DNS settings the agent watch the resolv.conf file and keep it up to date.
- apply file type DNS manager if in the name server list does not exist the 127.0.0.53 address
- watching the resolv.conf file with inotify service and overwrite all the time if the configuration has changed and it invalid
- fix resolv.conf generation algorithm
* Adds management, signal, and relay (STUN/TURN) health probes to the status command.
* Adds a reason when the management or signal connections are disconnected.
* Adds last wireguard handshake and received/sent bytes per peer
This PR aims to integrate Rosenpass with NetBird. It adds a manager for Rosenpass that starts a Rosenpass server and handles the managed peers. It uses the cunicu/go-rosenpass implementation. Rosenpass will then negotiate a pre-shared key every 2 minutes and apply it to the wireguard connection.
The Feature can be enabled by setting a flag during the netbird up --enable-rosenpass command.
If two peers are both support and have the Rosenpass feature enabled they will create a post-quantum secure connection. If one of the peers or both don't have this feature enabled or are running an older version that does not have this feature yet, the NetBird client will fall back to a plain Wireguard connection without pre-shared keys for those connections (keeping Rosenpass negotiation for the rest).
Additionally, this PR includes an update of all Github Actions workflows to use go version 1.21.0 as this is a requirement for the integration.
---------
Co-authored-by: braginini <bangvalo@gmail.com>
Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
Add netstack support for the agent to run it without privileges.
- use interface for tun device
- use common IPC for userspace WireGuard integration
- move udpmux creation and sharedsock to tun layer
* starting engine by passing file descriptor on engine start
* inject logger that does not compile
* logger and first client
* first working connection
* support for routes and working connection
* small refactor for better code quality in swift
* trying to add DNS
* fix
* updated
* fix route deletion
* trying to bind the DNS resolver dialer to an interface
* use dns.Client.Exchange
* fix metadata send on startup
* switching between client to query upstream
* fix panic on no dns response
* fix after merge changes
* add engine ready listener
* replace engine listener with connection listener
* disable relay connection for iOS until proxy is refactored into bind
* Extract private upstream for iOS and fix function headers for other OS
* Update mock Server
* Fix dns server and upstream tests
* Fix engine null pointer with mobile dependencies for other OS
* Revert back to disabling upstream on no response
* Fix some of the remarks from the linter
* Fix linter
* re-arrange duration calculation
* revert exported HostDNSConfig
* remove unused engine listener
* remove development logs
* refactor dns code and interface name propagation
* clean dns server test
* disable upstream deactivation for iOS
* remove files after merge
* fix dns server darwin
* fix server mock
* fix build flags
* move service listen back to initialize
* add wgInterface to hostManager initialization on android
* fix typo and remove unused function
* extract upstream exchange for ios and rest
* remove todo
* separate upstream logic to ios file
* Fix upstream test
* use interface and embedded struct for upstream
* set properly upstream client
* remove placeholder
* remove ios specific attributes
* fix upstream test
* merge ipc parser and wg configurer for mobile
* fix build annotation
* use json for DNS settings handover through gomobile
* add logs for DNS json string
* bring back check on ios for private upstream
* remove wrong (and unused) line
* fix wrongly updated comments on DNSSetting export
---------
Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
* update cli commands to respect an empty string and handle different from undefined
* remove test for unintended behaviour
* remove test for unintended behaviour
This PR adds `gosec` linter with the following checks disabled:
- G102: Bind to all interfaces
- G107: Url provided to HTTP request as taint input
- G112: Potential slowloris attack
- G114: Use of net/http serve function that has no support for setting timeouts
- G204: Audit use of command execution
- G401: Detect the usage of DES, RC4, MD5 or SHA1
- G402: Look for bad TLS connection settings
- G404: Insecure random number source (rand)
- G501: Import blocklist: crypto/md5
- G505: Import blocklist: crypto/sha1
We have complaints related to the checks above. They have to be addressed separately.
* Add gocritic linter
`gocritic` provides diagnostics that check for bugs, performance, and style issues
We disable the following checks:
- commentFormatting
- captLocal
- deprecatedComment
This PR contains many `//nolint:gocritic` to disable `appendAssign`.
Most operating systems add a /32 route for the default gateway address to its routing table
This will allow routes to be configured into the system even when the incoming range contains the default gateway.
In case a range is a sub-range of an existing route and this range happens to contain the default gateway it attempts to create a default gateway route to prevent loop issues
With this change we should be able to collect and expose the following histograms:
* `management.updatechannel.create.duration.ms` with `closed` boolean label
* `management.updatechannel.create.duration.micro` with `closed` boolean label
* `management.updatechannel.close.one.duration.ms`
* `management.updatechannel.close.one.duration.micro`
* `management.updatechannel.close.multiple.duration.ms`
* `management.updatechannel.close.multiple.duration.micro`
* `management.updatechannel.close.multiple.channels`
* `management.updatechannel.send.duration.ms` with `found` and `dropped` boolean labels
* `management.updatechannel.send.duration.micro` with `found` and `dropped` boolean labels
* `management.updatechannel.get.all.duration.ms`
* `management.updatechannel.get.all.duration.micro`
* `management.updatechannel.get.all.peers`
- dupword checks for duplicate words in the source code
- durationcheck checks for two durations multiplied together
- forbidigo forbids identifiers
- mirror reports wrong mirror patterns of bytes/strings usage
- misspell finds commonly misspelled English words in comments
- predeclared finds code that shadows one of Go's predeclared identifiers
- thelper detects Go test helpers without t.Helper() call and checks the consistency of test helpers
Handle original search domains in resolv.conf type implementations.
- parse the original resolv.conf file
- merge the search domains
- ignore the domain keyword
- append any other config lines (sortstlist, options)
- fix read origin resolv.conf from bkp in resolvconf implementation
- fix line length validation
- fix number of search domains validation
The no rules matched message is operating system language specific, and can cause errors
Now we check if firewall is reachable by the app and then if the rule is returned or not in two different calls:
isWindowsFirewallReachable
isFirewallRuleActive
Supporting search domains will allow users to define match domains to also
be added to a list of search domains in their systems
Fix Windows registry key configuration for search domains using a key within the netbird interface path
Restructure data handling for improved performance and flexibility.
Introduce 'G'-prefixed fields to represent Gorm relations, simplifying resource management.
Eliminate complexity in lookup tables for enhanced query and write speed.
Enable independent operations on data structures, requiring adjustments in the Store interface and Account Manager.
added intergration with JumpCloud User API. Use the steps in setup.md for configuration.
Additional changes:
- Enhance compatibility for providers that lack audience support in the Authorization Code Flow and the Authorization - - Code Flow with Proof Key for Code Exchange (PKCE) using NETBIRD_DASH_AUTH_USE_AUDIENCE=falseenv
- Verify tokens by utilizing the client ID when audience support is absent in providers
The use of reflection should generally be minimized in Go code because
it can make the code less readable, less type-safe, and potentially slower.
In this particular case we can simply rely on type switch.
Implement user deletion across all IDP-ss. Expires all user peers
when the user is deleted. Users are permanently removed from a local
store, but in IDP, we remove Netbird attributes for the user
untilUserDeleteFromIDPEnabled setting is not enabled.
To test, an admin user should remove any additional users.
Until the UI incorporates this feature, use a curl DELETE request
targeting the /users/<USER_ID> management endpoint. Note that this
request only removes user attributes and doesn't trigger a delete
from the IDP.
To enable user removal from the IdP, set UserDeleteFromIDPEnabled
to true in account settings. Until we have a UI for this, make this
change directly in the store file.
Store the deleted email addresses in encrypted in activity store.
* shutdown the pkce server on user cancellation
* Refactor openURL to exclusively manage authentication flow instructions and browser launching
* Refactor authentication flow initialization based on client OS
The NewOAuthFlow method now first checks the operating system and if it is a non-desktop Linux, it opts for Device Code Flow. PKCEFlow is tried first and if it fails, then it falls back on Device Code Flow. If both unsuccessful, the authentication process halts and error messages have been updated to provide more helpful feedback for troubleshooting authentication errors
* Replace log-based Linux desktop check with process check
To verify if a Linux OS is running a desktop environment in the Authentication utility, the log-based method that checks the XDG_CURRENT_DESKTOP env has been replaced with a method that checks directly if either X or Wayland display server processes are running. This method is more reliable as it directly checks for the display server process rather than relying on an environment variable that may not be set in all desktop environments.
* Refactor PKCE Authorization Flow to improve server handling
* refactor check for linux running desktop environment
* Improve server shutdown handling and encapsulate handlers with new server multiplexer
The changes enhance the way the server shuts down by specifying a context with timeout of 5 seconds, adding a safeguard to ensure the server halts even on potential hanging requests. Also, the server's root handler is now encapsulated within a new ServeMux instance, to support multiple registrations of a path
In case the 53 UDP port is not an option to bind then we hijack the DNS traffic with eBPF, and we forward the traffic to the listener on a custom port. With this implementation, we should be able to listen to DNS queries on any address and still set the local host system to send queries to the custom address on port 53.
Because we tried to attach multiple XDP programs to the same interface, I did a refactor in the WG traffic forward code also.
Add a default firewall rule to allow netbird traffic to be handled
by the access control managers.
Userspace manager behavior:
- When running on Windows, a default rule is add on Windows firewall
- For Linux, we are using one of the Kernel managers to add a single rule
- This PR doesn't handle macOS
Kernel manager behavior:
- For NFtables, if there is a filter table, an INPUT rule is added
- Iptables follows the previous flow if running on kernel mode. If running
on userspace mode, it adds a single rule for INPUT and OUTPUT chains
A new checkerFW package has been introduced to consolidate checks across
route and access control managers.
It supports a new environment variable to skip nftables and allow iptables tests
This PR showcases the implementation of additional linter rules. I've updated the golangci-lint GitHub Actions to the latest available version. This update makes sure that the tool works the same way locally - assuming being updated regularly - and with the GitHub Actions.
I've also taken care of keeping all the GitHub Actions up to date, which helps our code stay current. But there's one part, goreleaser that's a bit tricky to test on our computers. So, it's important to take a close look at that.
To make it easier to understand what I've done, I've made separate changes for each thing that the new linters found. This should help the people reviewing the changes see what's going on more clearly. Some of the changes might not be obvious at first glance.
Things to consider for the future
CI runs on Ubuntu so the static analysis only happens for Linux. Consider running it for the rest: Darwin, Windows
The ephemeral manager keep the inactive ephemeral peers in a linked list. The manager schedule a cleanup procedure to the head of the linked list (to the most deprecated peer). At the end of cleanup schedule the next cleanup to the new head.
If a device connect back to the server the manager will remote it from the peers list.
Switches the order of initialization in the OAuth flow within
the NewOAuthFlow method. Instead of initializing the
Device Authorization Flow first, it now initializes
the PKCE Authorization Flow first, and falls back
to the Device Authorization Flow if the PKCE initialization fails.
In case the route management feature is not supported
then do not create unnecessary firewall and manager instances.
This can happen if the nftables nor iptables is not available on the host OS.
- Move the error handling to upper layer
- Remove fake, useless implementations of interfaces
- Update go-iptables because In Docker the old version can not
determine well the path of executable file
- update lib to 0.70
In case of 'always-on' feature has switched on, after the reboot the service do not start properly in all cases.
If the device is in offline state (no internet connection) the auth login steps will fail and the service will stop.
For the auth steps make no sense in this case because if the OS start the service we do not have option for
the user interaction.
* Move ebpf code to its own package to avoid crash issues in Android
Older versions of android crashes because of the bytecode files
Even when they aren't loaded as it was our case
* move c file to own folder
* fix lint
Enhance the user experience by enabling authentication to Netbird using Single Sign-On (SSO) with any Identity Provider (IDP) provider. Current client offers this capability through the Device Authorization Flow, however, is not widely supported by many IDPs, and even some that do support it do not provide a complete verification URL.
To address these challenges, this pull request enable Authorization Code Flow with Proof Key for Code Exchange (PKCE) for client logins, which is a more widely adopted and secure approach to facilitate SSO with various IDP providers.
EBPF proxy between TURN (relay) and WireGuard to reduce number of used ports used by the NetBird agent.
- Separate the wg configuration from the proxy logic
- In case if eBPF type proxy has only one single proxy instance
- In case if the eBPF is not supported fallback to the original proxy Implementation
Between the signature of eBPF type proxy and original proxy has
differences so this is why the factory structure exists
* Optimize rules with All groups
* Use IP sets in ACLs (nftables implementation)
* Fix squash rule when we receive optimized rules list from management
* Add DNS list argument for mobile client
* Write testable code
Many places are checked the wgInterface != nil condition.
It is doing it just because to avoid the real wgInterface creation for tests.
Instead of this involve a wgInterface interface what is moc-able.
* Refactor the DNS server internal code structure
With the fake resolver has been involved several
if-else statement and generated some unused
variables to distinguish the listener and fake
resolver solutions at running time. With this
commit the fake resolver and listener based
solution has been moved into two separated
structure. Name of this layer is the 'service'.
With this modification the unit test looks
simpler and open the option to add new logic for
the permanent DNS service usage for mobile
systems.
* Remove is running check in test
We can not ensure the state well so remove this
check. The test will fail if the server is not
running well.
* ACL firewall manager fix/improvement
Fix issue with rule squashing, it contained issue when calculated
total amount of IPs in the Peer map (doesn't included offline peers).
That why squashing not worked.
Also this commit changes the rules apply behaviour. Instead policy:
1. Apply all rules from network map
2. Remove all previous applied rules
We do:
1. Apply only new rules
2. Remove outdated rules
Why first variant was implemented: because when you have drop policy
it is important in which order order you rules are and you need totally
clean previous state to apply the new. But in the release we didn't
include drop policy so we can do this improvement.
* Print log message about processed ACL rules
Reduce the peer status notifications
When receive new network map invoke multiple notifications for
every single peers. It cause high cpu usage We handle the in a
batch the peer notification in update network map.
- Remove the unnecessary UpdatePeerFQDN calls in addNewPeer
- Fix notification in RemovePeer function
- Involve FinishPeerListModifications logic
Works only with userspace implementation:
1. Configure host to solve DNS requests via a fake DSN server address in the Netbird network.
2. Add to firewall catch rule for these DNS requests.
3. Resolve these DNS requests and respond by writing directly to wireguard device.