We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates, this
can be disruptive for updates as trying to create a container referencing a duplicate
network results in a 400 error from the engine.
This commit finds and removes duplicate networks via the state engine,
this means that even if somehow a container could be referencing a
network that has been duplicated later somehow, this will remove the
container first.
While thies doesn't solve the problem of duplicate networks being
created in the first place, it will fix the state of the system to
correct the inconsistency.
Change-type: minor
Closes: #590
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates,
this is disruptive of updates, as the supervisor usually queries
resource by name, resulting in a 400 error from the engine because of
the ambiguity.
This replaces those queries by name to queries by id. This includes
network removal. If a `removeNetwork` step is generated, the supervisor
opts to remove all instances of the network with the same name as it
cannot easily resolve the ambiguity.
This doesn't solve the problem of ambiguous networks, because even if
networks are referenced by id when creating a container, the engine will
throw an error (see https://github.com/balena-os/balena-supervisor/issues/590#issuecomment-1423557871)
Change-type: patch
Relates-to: #590
This includes:
- proxyvisor.js
- references in docs
- references device-state, api-binder, compose modules, API
- references in tests
The commit also adds a migration to remove the 4 dependent device tables from the DB.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
The wait-for-it script used during tests would setup a timer
that would send SIGUSR2 to the parent process after the timer ends.
Since node was ignoring additional signals, the timer ending would have
no effect after the node process had replaced the start script. However
when node has pid != 1, SIGUSR2 default behavior is to terminate the
process, meaning the tests would fail after 30 seconds.
The script is now updated so the timer is killed once the services are
ready for the tests.
As reported by issue #2100, the supervisor was not correctly reacting to
`SIGTERM` sent by the engine when terminating the process (for instance
before a reboot). This would lead to the supervisor requiring an
additional 10 seconds to terminate (after which the engine will send a
`SIGKILL`).
The reason for this is explained by the following info coming from Node
> Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to `SIGINT` (`CTRL-C`) and similar signals. [reference](https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#handling-kernel-signals)
On internal testing, it was discovered that simply adding a listener for
the signal on the Node process was enough to handle the signal, even
when the process runs as PID 1.
This adds a listener for `SIGTERM` before starting the supervisor main
loop.
Closes: #2100
Change-type: patch
The Raspberry Pi config.txt file defines the use of colon to configure
variables of the same name in different ports, for instance on those
devices with two hdmi ports. This syntax was previously not supported by
the supervisor. This change relaxes the syntax validation on config vars
to allow the use of the colon character.
Relates-to: #1573, #2046
Change-type: minor
This includes:
- /v1/apps/:appId/(stop|start)
- /v2/applications/:appId/(restart|stop|start)-service
Signed-off-by: Christina Ying Wang <christina@balena.io>
This also adds a 500 response with the old key if the API key
refresh was unsuccessful. Previously, if the key refresh was
unsuccessful, this would result in an UnhandledPromiseRejection.
This is a new interface.
Signed-off-by: Christina Ying Wang <christina@balena.io>