- Resolve issue with join not being checked properly for success without
using external tools
- Resolve issue where initial boot was not being checked properly
- Now output errors when zerotier fails to start
closes#1581
cc @altano for inspiration for this patch
Signed-off-by: Erik Hollensbe <git@hollensbe.org>
Proactively seek, and distribute external surface addresses
This patch introduces a new "self-awareness" behavior which proactively queries peers for external surface addresses and distributes them via PUSH_DIRECT_PATHS. This has the effect of making ZT more responsive to interface changes.
Current behavior:
Previously, this type of information was only mediated via RENDEZVOUS and was only triggered when the client detected that it no longer had a single alive path to a peer. While PUSH_DIRECT_PATHS would correctly (and often) send local addresses, this was not the case for external addresses collected from response HELLOs. This would lead to situations where only one physical address would be distributed to peers. Additionally, if a new physical interface were to be made available to the client, the client would correctly bind to it but never seek information about its external mapping from a peer, and thus the new physical interface would remain unavailable for other peers to learn about until all paths on the previous interface have expired which can take a couple of minutes. In traditional usage of ZT this is not usually a problem, but it becomes a problem in the following scenarios:
Network interfaces go up and down while ZT is running (e.g. switching to LTE or WiFi from a wired connection)
Network interfaces are added or removed in multipath setups
Proposed behavior:
I propose that normal full HELLOs are sent not only on the first interface in use, but all interfaces. This causes planets to respond with a HELLO containing the surface address for each interface. We then collect each address using SelfAwareness::whoami() and distribute them via the normal PUSH_DIRECT_PATHS mechanism.
Rate gate ECHO per Path instead of per Peer
In multipath scenarios user traffic is used to judge the aliveness of a path. If the user traffic is too infrequent to establish aliveness for a given time window (say 500 ms), the bonding layer will send extra ECHOs at a maximum rate of failoverInterval / 3 (or ~ 166 ms) per path. This patch relaxes the rate-limiting of ECHOs significantly in order to prevent a non-multipath node from dropping ECHOs causing multipath nodes to erroneously judge paths to that node to be dead.
Details
This patch decreases the rate limiting from 1000 ms per peer by a factor of 6 to ~166 ms and rate limits ECHOs per Path instead of per Peer. This allows rate limiting to scale with the number of established paths to a peer.
As a result, if all 64 path slots are used a total of 64 x 6 = 384 ECHOs per second will be allowed in the most aggressive case where failoverInterval is set to 500 ms.