balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2025-03-14 00:06:49 +00:00

Author	SHA1	Message	Date
Felipe Lalanne	6e6a796da5	Add special case for base DTO params on RPI config While ordering is important in the RPI firmware configuration file (config.txt), some dt params are by default considered part of the base dt overlay if they are not used by other overlays. Unfortunately the [list of dtparams](https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L133) is too long to add all of them as exceptions, but we can add the params used in the default config.txt provided in OS images, to avoid reboots when updating to this new supervisor and correctly parsing the provisioning config.txt as variables. While this addition handles most common scenarios, there is still a chance a user may have use other base overlay dt params in the initial config, in which case those will be interpreted according to the relative ordering Change-type: patch	2024-02-08 15:48:10 -03:00
Felipe Lalanne	9546a1a3b1	Fix processing of dtoverlay/dtparams on config.txt DT overlays and DT params need to be consumed in the order that they appear on the file. DT params apply to the last dtoverlay defined on the file, or to the base overlay. This commit updates config.txt parsing to consider this ordering, and it also ensures global dtparams are written first so they cannot be overriden by later overlays. Because of the more strict parsing method, it is possible that existing HOST_CONFIG vars do not match the interpretation of the parser. If that's the case, the supervisor will re-apply the target state which will cause the device to reboot. Change-type: major	2024-02-08 15:46:07 -03:00
Felipe Lalanne	a8e371f0c9	Refactor config-txt backend Cleans up code and adds better type detection	2024-02-07 20:39:41 -03:00
Christina Ying Wang	3afcef2969	Respect update strategies app-wide instead of at the service level Fixes behavior for release updates which removes a service in current state and adds a new service in target state. Change-type: patch Closes: #2095 Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-01-29 12:26:28 -08:00
Felipe Lalanne	dec39a35d4	Try MDNS lookup only if regular DNS lookup fails This is meant to allow users to configure their device to resolve `.local` queries via dnsmasq by modifying config.json, e.g. `dnsServers": "/bob.local/172.17.0.33`. This would fail before as MDNS lookups would always come first Change-type: minor	2024-01-03 14:42:23 -03:00
Felipe Lalanne	7a39da92b7	Refactor mdns lookup code in app entry Change-type: patch	2024-01-03 14:42:23 -03:00
Felipe Lalanne	3ea8d4727a	Force remove container if updateMetadata fails The `updateMetadata` step renames the container to match the target release when the service doesn't change between releases. We have seen this step fail because of an engine bug that seems to relate to the engine keeping stale references after container restarts. The only way around this issue is to remove the old container and create it again. This implements that workaround during the updateMetadata step to deal with that issue. Change-type: minor Relates-to: balena-os/balena-engine#261	2023-11-22 14:16:44 -03:00
Christina Ying Wang	eb8ad11cd7	Cache last reported current state to /mnt/root/tmp Whenever the Supervisor reports current state, it diffs the current state with its last reported current state. However, when the Supervisor starts up, there is no last reported state, since that last report is stored in process memory. Caching the last report in a location that survives Supervisor restarts will reduce the current report bandwidth used on startup. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-11-14 16:15:36 -08:00
Christina Ying Wang	d440776881	Convert current state types to io-ts Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-11-08 16:00:54 -08:00
Christina Ying Wang	a993b3e7af	Set applyInProgress to true while applying intermediate state Intermediate state is utilized when executing device actions such as a volume purge. It's a type of state apply, but despite that, applyInProgress is not true. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-10-25 10:32:10 -07:00
Felipe Lalanne	9bd216327f	Expose ports from port mappings on services PR #2217 removed the expose configuration but also caused a regresion where ports set via the `ports` configuration would no longer get exposed to the host, despite portmappings being set. This fixes that issue by exposing only those ports comming from port mappings. Change-type: patch	2023-10-24 15:04:39 -03:00
Felipe Lalanne	416170bc05	Ignore `expose` service compose configuration The docker EXPOSE directive and corresponding docker-compose `expose` service configuration serves as documentation/metadata that a container listens on a certain port that may be used for service discovery but it doesn't have any real impact on the ability for other containers on the same network to access the exposed service via the port. In newer engine implementations, this property may conflict with other network configurations, and prevent the container from being started by the docker engine (see #2211). This PR removes code that would manage the expose property and takes the property out of the whitelist. A composition with the `expose` property will result in the log message `Ignoring unsupported or unknown compose fields: expose`. While this change should not have operational impact, it still removes a previously supported configuration and as such there is a chance of it being a breaking change for some applications. For this reason it is being published as a new major version. Change-type: major Closes: #2211	2023-10-23 11:41:32 -03:00
Felipe Lalanne	b107868765	Add note regading API jitter on target state poll Change-type: patch	2023-10-23 14:11:20 +01:00
Pagan Gazzard	e15205301c	Switch some _.includes usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	a4a9a17c1a	Switch _.assign usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	d0cb54537f	Switch _.isNaN usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	3bfdc4454e	Switch _.isUndefined usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	8e23091aa9	Switch _.isNull usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	ca3faebfc9	Switch _.isNumber usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	20df54668c	Switch _.isArray usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	3fe8a22fb0	Switch _.isString usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Felipe Lalanne	3e828dcc52	Revert "Do not expose ports from image if service network mode" This reverts commit 0c7bad779291e15e419166a2c66c2a21dd06aa83, as that change causes a service restart loop. The supervisor cannot distinguish between ports exposed via the `EXPOSE` directive and the docker-compose `expose` property. Because of this, in the case of `network-mode: service:<...>` the current state and target state never match, leading to a service restart loop. Change-type: patch	2023-10-16 13:06:50 -03:00
Pagan Gazzard	766cce89c7	Convert multiple bluebird uses to native promises Change-type: patch	2023-10-16 11:40:45 +01:00
Felipe Lalanne	0c7bad7792	Do not expose ports from image if service network mode The supervisor exposes ports configured using the `EXPOSE` directive in the dockerfile when configuring the container for runtime. This can cause issues if using `network_mode: service:<service name>` as the expose configuration is not compatible with that network mode. This fix now skips image exposed ports for that particular network mode. Change-type: patch Relates-to: #2211	2023-10-12 18:03:42 -03:00
Pagan Gazzard	3d73bf3e91	Use mutation for adding service/image ids to logs to reduce allocations Change-type: patch	2023-10-11 15:39:19 -03:00
Pagan Gazzard	d685ccacb2	Keep the container lock for the entire duration of attaching logs Change-type: patch	2023-10-11 15:39:19 -03:00
Pagan Gazzard	74d374b5ad	Remove unnecessary async on handling journald stderr entries Change-type: patch	2023-10-11 15:39:19 -03:00
Pagan Gazzard	e3806ec018	Avoid unnecessary work in systemd log row handling for invalid logs Change-type: patch	2023-10-11 15:39:19 -03:00
Pagan Gazzard	894bdeeeb6	Remove unused docker logs logging code Change-type: patch	2023-10-11 14:20:33 +01:00
Christina Ying Wang	06d4775178	Use native structuredClone instead of _.cloneDeep Memory tests have shown performance improvements to using the native method. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-09-29 12:29:50 -07:00
jaomaloy	ab513cc021	Dump target-state to hostOS tmp dir This change is mainly for the hostOS to know if update locks should be ignored when updating to a newer version. Change-type: patch Signed-off-by: jaomaloy <jao.maloy@balena.io>	2023-09-14 11:03:34 +08:00
Felipe Lalanne	327dc31ef0	Replace node-dbus with @balena/systemd The node-dbus module is unmaintained and a blocker for the update to Node 18. Switching to our own node bindings for systemd solves this issue Relates-to: Shouqun/node-dbus#241 Change-type: patch	2023-08-16 15:58:52 -04:00
Alexandru Costache	512240c544	backends: Add Jetson Orin NANO custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-07-11 18:11:32 +03:00
Florin Sarbu	8d2b310af8	Add revpi-connect-s to Raspberry Pi variants We need the supervisor to be able to manage config.txt changes for the Revolution Pi Connect S. Change-type: patch Signed-off-by: Florin Sarbu <florin@balena.io>	2023-07-05 13:55:29 +02:00
Christina Ying Wang	38fe8dae75	Remove the 'Stopped' status for services It's not an official status from container inspects, and the Supervisor doesn't set it internally anywhere. It's better to remove it entirely as the method by which Supervisor sets internal service statuses is by using a global event emitter (reportNewStatus) which makes things difficult to test. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-28 11:17:13 -04:00
Christina W	71d24d6e33	Parse container exit error message instead of status The previous implementation in #2170 of parsing the container status was too general, because it relied on the mistaken assumption that a container would have a status of `Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped containers were also getting restarted by the Supervisor due to their inspect status of `exited`. With this, parsing the exit message became unavoidable as there are no other clear ways to discern a container that has been manually stopped and shouldn't be started from a container experiencing the Engine-host race condition issue (again, see #2170). Since we're just parsing the exit error message, we don't need to worry about different behaviors amongst restart policies, as any container with the error message on exit should be started. Change-type: patch Closes: #2178 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-22 14:43:17 -07:00
Felipe Lalanne	12eac04484	Fix /v2/applications/state endpoint It was returning stale information, particularly the download progress of the target release images never got updated. Change-type: patch Closes: #2174	2023-06-19 17:16:36 -04:00
Christina Ying Wang	9e249e6ae8	Remove unnecessary async/await from method Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	6e6f79c71d	Decrease wait time before start from 60s to 30s 60 seconds to wait may be excessively long. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	ace642ea0f	Improve naming of a util function & add unit test isOlderThan -> isValidDateAndOlderThan See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	ab80f198d8	Add exitCode property to Service class Since we need to conditionally query the service's exit code during step inference, adding the exitCode property keeps the step inference function pure. See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226805153 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	2537eb8189	Handle the case of 'on-failure' restart policy As explained in the comments of this commit, a container with the restart policy of 'on-failure' with a non-zero exit code matches the conditions for the race, so the Supervisor will also attempt to start it. A container with the 'no' restart policy that has been started once will not be started again. If a container with 'no' has never been started, its service status will be 'Installed' and the Supervisor will already try to start it until success, so the service with 'no' doesn't require special handling. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-05 11:05:58 -07:00
Christina Ying Wang	7f32141958	Handle Engine-host race condition for "always" and "unless-stopped" restart policy There exists a race condition between Engine and a host resource that may not be immediately created. In this race condition, if a container's compose config depends on the existence of that host resource, such as a network interface, and the Engine tries to create & start the container before the host resource is created, the Engine will not reattempt to start the container, regardless of the restart policy. This is undesireable behavior but seems to be the behavior as implemented by Docker. To rectify this, the Supervisor state funnel noops for a grace period of 1 minute after starting a container to see that the container's status has become 'running`. If the container exits because of the race condition, the status becomes 'exited' and the Supervisor will attempt to generate another start step. This noop-wait-start step loop will repeat until the container is able to start. If the container is never able to start, there was a problem in the host in the creation of the host resource, and that should be fixed at the host level. This commit does not handle the case of services with restart policies "no" or "on-failure" which encounter this host race, as metadata from container inspects needs to be introduced during step calculation in order to figure out whether services with those restart policies need to be started. This will be fixed in a future PR. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:32:19 -07:00
Felipe Lalanne	8656bd62f7	Add `arch.sw` to the valid container requirements Change-type: minor	2023-05-09 15:44:26 -04:00
Felipe Lalanne	f1f09e0e27	Allow using slug to validate hw.device-type contract This also adds the hw.device-type test case to the unit tests. Change-type: patch	2023-05-09 15:20:18 -04:00
Felipe Lalanne	5fdd689590	Fix service comparison when creating component steps A bug in service comparison would make it that a device already running a service from a new release with network changes would never stop the running service so remaining services would forever get stuck in `Downloaded` state. This fixes the comparison so the service will get killed in this case, particularly allowing devices to recover from #1576 Change-type: patch	2023-04-26 11:58:48 -04:00
Felipe Lalanne	7aecaae8b0	Skip updateMetadata step if there are network changes Previous behavior would make it that an `updateMetadata` step would take precedence over a `kill` step when network changes are present. This would lead to an inconsistent state if an update included a network and a container change. Closes: #1576 Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	138aec5de4	Add integration tests for state-engine These tests use the supervisor API to check that applying a target state allows the device to eventually get to the desired target configuration. This are high-level tests that work with real images and containers using dind. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	c1207cbbff	Do not pass auth to images with no registry The supervisor allows the target image to be an image without a registry (e.g. `alpine:latest`), while this really only happens while in local mode, we don't want to pass credentials to the default registry as those credentials are meant for balena registry and will otherwise fail. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	6c031299d6	Remove safeStateClone function This function is no longer needed with the latest changes to getCurrentState Change-type: patch	2023-04-20 14:58:58 -04:00

1 2 3 4 5 ...

1556 Commits