While ordering is important in the RPI firmware configuration file (config.txt),
some dt params are by default considered part of the base dt overlay
if they are not used by other overlays.
Unfortunately the [list of dtparams](https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L133)
is too long to add all of them as exceptions, but we can add the params
used in the default config.txt provided in OS images, to avoid reboots
when updating to this new supervisor and correctly parsing the
provisioning config.txt as variables.
While this addition handles most common scenarios, there is still a
chance a user may have use other base overlay dt params in the initial
config, in which case those will be interpreted according to the
relative ordering
Change-type: patch
DT overlays and DT params need to be consumed in the order that they
appear on the file. DT params apply to the last dtoverlay defined on the
file, or to the base overlay.
This commit updates config.txt parsing to consider this ordering, and it
also ensures global dtparams are written first so they cannot be
overriden by later overlays.
Because of the more strict parsing method, it is possible that existing
HOST_CONFIG vars do not match the interpretation of the parser. If
that's the case, the supervisor will re-apply the target state which
will cause the device to reboot.
Change-type: major
Fixes behavior for release updates which removes a service in current state
and adds a new service in target state.
Change-type: patch
Closes: #2095
Signed-off-by: Christina Ying Wang <christina@balena.io>
This is meant to allow users to configure their device to
resolve `.local` queries via dnsmasq by modifying config.json, e.g. `dnsServers":
"/bob.local/172.17.0.33`.
This would fail before as MDNS lookups would always come first
Change-type: minor
The `updateMetadata` step renames the container to match the target
release when the service doesn't change between releases. We have seen
this step fail because of an engine bug that seems to relate to the
engine keeping stale references after container restarts. The only way
around this issue is to remove the old container and create it again.
This implements that workaround during the updateMetadata step to deal
with that issue.
Change-type: minor
Relates-to: balena-os/balena-engine#261
Whenever the Supervisor reports current state, it diffs the current state
with its last reported current state. However, when the Supervisor starts
up, there is no last reported state, since that last report is stored in
process memory. Caching the last report in a location that survives
Supervisor restarts will reduce the current report bandwidth used on startup.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
Intermediate state is utilized when executing device actions such as a
volume purge. It's a type of state apply, but despite that,
applyInProgress is not true.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
PR #2217 removed the expose configuration but also caused a regresion
where ports set via the `ports` configuration would no longer get
exposed to the host, despite portmappings being set. This fixes that
issue by exposing only those ports comming from port mappings.
Change-type: patch
The docker EXPOSE directive and corresponding docker-compose `expose`
service configuration serves as documentation/metadata that a container
listens on a certain port that may be used for service discovery but it doesn't
have any real impact on the ability for
other containers on the same network to access the exposed service via
the port. In newer engine implementations, this property may conflict
with other network configurations, and prevent the container from being
started by the docker engine (see #2211).
This PR removes code that would manage the expose property and takes the
property out of the whitelist. A composition with the `expose` property
will result in the log message `Ignoring unsupported or unknown compose fields: expose`.
While this change should not have operational impact, it still removes
a previously supported configuration and as such there is a chance of it
being a breaking change for some applications. For this reason it is
being published as a new major version.
Change-type: major
Closes: #2211
This reverts commit 0c7bad779291e15e419166a2c66c2a21dd06aa83, as that
change causes a service restart loop. The supervisor cannot distinguish
between ports exposed via the `EXPOSE` directive and the docker-compose
`expose` property. Because of this, in the case of `network-mode:
service:<...>` the current state and target state never match, leading
to a service restart loop.
Change-type: patch
The supervisor exposes ports configured using the `EXPOSE` directive in
the dockerfile when configuring the container for runtime. This can
cause issues if using `network_mode: service:<service name>` as the
expose configuration is not compatible with that network mode. This
fix now skips image exposed ports for that particular network mode.
Change-type: patch
Relates-to: #2211
Memory tests have shown performance improvements to using the native method.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This change is mainly for the hostOS
to know if update locks should be ignored
when updating to a newer version.
Change-type: patch
Signed-off-by: jaomaloy <jao.maloy@balena.io>
The node-dbus module is unmaintained and a blocker for the update to
Node 18. Switching to our own node bindings for systemd solves this
issue
Relates-to: Shouqun/node-dbus#241
Change-type: patch
We need the supervisor to be able to manage config.txt changes for the
Revolution Pi Connect S.
Change-type: patch
Signed-off-by: Florin Sarbu <florin@balena.io>
It's not an official status from container inspects, and the Supervisor
doesn't set it internally anywhere. It's better to remove it entirely as the
method by which Supervisor sets internal service statuses is by using a global
event emitter (reportNewStatus) which makes things difficult to test.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).
Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.
Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
It was returning stale information, particularly the download progress
of the target release images never got updated.
Change-type: patch
Closes: #2174
As explained in the comments of this commit, a container with the restart policy
of 'on-failure' with a non-zero exit code matches the conditions for the race, so
the Supervisor will also attempt to start it. A container with the 'no' restart
policy that has been started once will not be started again. If a container with
'no' has never been started, its service status will be 'Installed' and the Supervisor
will already try to start it until success, so the service with 'no' doesn't require
special handling.
Signed-off-by: Christina Ying Wang <christina@balena.io>
There exists a race condition between Engine and a host resource that may not
be immediately created. In this race condition, if a container's compose config
depends on the existence of that host resource, such as a network interface, and the
Engine tries to create & start the container before the host resource is created, the
Engine will not reattempt to start the container, regardless of the restart policy.
This is undesireable behavior but seems to be the behavior as implemented by Docker.
To rectify this, the Supervisor state funnel noops for a grace period of 1 minute
after starting a container to see that the container's status has become 'running`.
If the container exits because of the race condition, the status becomes 'exited' and the
Supervisor will attempt to generate another start step. This noop-wait-start step loop
will repeat until the container is able to start.
If the container is never able to start, there was a problem in the host in the creation of the
host resource, and that should be fixed at the host level.
This commit does not handle the case of services with restart policies "no" or "on-failure"
which encounter this host race, as metadata from container inspects needs to be introduced
during step calculation in order to figure out whether services with those restart policies
need to be started. This will be fixed in a future PR.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
A bug in service comparison would make it that a device already running
a service from a new release with network changes would never stop the
running service so remaining services would forever get stuck in
`Downloaded` state.
This fixes the comparison so the service will get killed in this case,
particularly allowing devices to recover from #1576
Change-type: patch
Previous behavior would make it that an `updateMetadata` step would take
precedence over a `kill` step when network changes are present. This
would lead to an inconsistent state if an update included a
network and a container change.
Closes: #1576
Change-type: patch
These tests use the supervisor API to check that applying a target state
allows the device to eventually get to the desired target configuration.
This are high-level tests that work with real images and containers
using dind.
Change-type: patch
The supervisor allows the target image to be an image without a
registry (e.g. `alpine:latest`), while this really only happens while in
local mode, we don't want to pass credentials to the default registry as
those credentials are meant for balena registry and will otherwise fail.
Change-type: patch