This is necessary since the builder no longer passes the platform flag
to the build. This would lead to dockerfiles that are mixing multi and single
arch stages to pull the wrong architecture images, particularly when
trying to build images in emulated builds (e.g. armv7hf built on aarch64).
Moving the full build to multi-arch solves this as the docker engine is
capable of chosing the right architecture from the manifest.
Relatest-to: balena-io/balena-builder#1010
Change-type: patch
Preloaded devices can require that the device is pinned to the preloaded
release on provisioning. However if the provisioned release gets
released in the future, that would lead to the device remaining in "VPN
only" state forever as the provisioning process could not finish due to
pinning failure.
This commit changes the behavior so if the release does not exist, the
pinning step is skipped and the device follows the fleet pinning state.
Closes: #2133
Change-type: patch
This is necessary since the builder no longer passes the platform flag
to the build. This would lead to dockerfiles that are mixing multi and single
arch stages to pull the wrong architecture images, particularly when
trying to build images in emulated builds (e.g. armv7hf built on aarch64).
Moving the full build to single-arch solves this as the docker engine is
capable of chosing the right architecture from the manifest. Once some
of the builder issues are fixed, we should move to #2141
Relates-to: balena-io/balena-builder#1010
Change-type: patch
The issue with the original Supervisor implementation of the firewall is that
on Supervisor start, the Supervisor flushes the INPUT chain of the filter table.
This doesn't play well with services that add to the INPUT chain on startup that
may start up before the Supervisor, such as certain NetworkManager connection
profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain,
preserving the other rules as well as their order.
Closes: #1482
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates, this
can be disruptive for updates as trying to create a container referencing a duplicate
network results in a 400 error from the engine.
This commit finds and removes duplicate networks via the state engine,
this means that even if somehow a container could be referencing a
network that has been duplicated later somehow, this will remove the
container first.
While thies doesn't solve the problem of duplicate networks being
created in the first place, it will fix the state of the system to
correct the inconsistency.
Change-type: minor
Closes: #590
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates,
this is disruptive of updates, as the supervisor usually queries
resource by name, resulting in a 400 error from the engine because of
the ambiguity.
This replaces those queries by name to queries by id. This includes
network removal. If a `removeNetwork` step is generated, the supervisor
opts to remove all instances of the network with the same name as it
cannot easily resolve the ambiguity.
This doesn't solve the problem of ambiguous networks, because even if
networks are referenced by id when creating a container, the engine will
throw an error (see https://github.com/balena-os/balena-supervisor/issues/590#issuecomment-1423557871)
Change-type: patch
Relates-to: #590
This includes:
- proxyvisor.js
- references in docs
- references device-state, api-binder, compose modules, API
- references in tests
The commit also adds a migration to remove the 4 dependent device tables from the DB.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
The wait-for-it script used during tests would setup a timer
that would send SIGUSR2 to the parent process after the timer ends.
Since node was ignoring additional signals, the timer ending would have
no effect after the node process had replaced the start script. However
when node has pid != 1, SIGUSR2 default behavior is to terminate the
process, meaning the tests would fail after 30 seconds.
The script is now updated so the timer is killed once the services are
ready for the tests.
As reported by issue #2100, the supervisor was not correctly reacting to
`SIGTERM` sent by the engine when terminating the process (for instance
before a reboot). This would lead to the supervisor requiring an
additional 10 seconds to terminate (after which the engine will send a
`SIGKILL`).
The reason for this is explained by the following info coming from Node
> Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to `SIGINT` (`CTRL-C`) and similar signals. [reference](https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#handling-kernel-signals)
On internal testing, it was discovered that simply adding a listener for
the signal on the Node process was enough to handle the signal, even
when the process runs as PID 1.
This adds a listener for `SIGTERM` before starting the supervisor main
loop.
Closes: #2100
Change-type: patch