balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2025-02-06 19:20:12 +00:00

Author	SHA1	Message	Date
Christina Ying Wang	3fd035c5bd	Patch default dtparam handling in config.txt This commit completes the list of default / board-wide dtparams to include some `baudrate` and `vc` i2c params. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-02-21 12:45:29 -08:00
Christina Ying Wang	e22253ce6e	Patch config.txt backend to return array configs correctly Previously, getBootConfig() of the config.txt backend was omitting array configurations such as gpio settings, thus resulting in the SV mistakenly assuming that boot config had not been applied, since gpio would not be in current config.txt config but would be in target config. This resulted in SV entering an infinite loop of attempting to apply the gpio config when it wasn't necessary. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-02-16 18:12:33 -08:00
Felipe Lalanne	6e6a796da5	Add special case for base DTO params on RPI config While ordering is important in the RPI firmware configuration file (config.txt), some dt params are by default considered part of the base dt overlay if they are not used by other overlays. Unfortunately the [list of dtparams](https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L133) is too long to add all of them as exceptions, but we can add the params used in the default config.txt provided in OS images, to avoid reboots when updating to this new supervisor and correctly parsing the provisioning config.txt as variables. While this addition handles most common scenarios, there is still a chance a user may have use other base overlay dt params in the initial config, in which case those will be interpreted according to the relative ordering Change-type: patch	2024-02-08 15:48:10 -03:00
Felipe Lalanne	55a8c5bf90	Add tests for dtoverlay management in config.txt	2024-02-07 20:38:44 -03:00
Christina Ying Wang	3afcef2969	Respect update strategies app-wide instead of at the service level Fixes behavior for release updates which removes a service in current state and adds a new service in target state. Change-type: patch Closes: #2095 Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-01-29 12:26:28 -08:00
Felipe Lalanne	87b195685a	Use the state-helper functions in app module tests	2024-01-29 12:25:55 -08:00
Felipe Lalanne	6ee606806d	Fix docker utils tests for docker v25 From docker 25, the engine will validate IPAM config. This would cause the docker utils test to fail since the network/subnet configuration was incorrect. Change-type: patch	2024-01-25 15:05:12 -03:00
Felipe Lalanne	9bd216327f	Expose ports from port mappings on services PR #2217 removed the expose configuration but also caused a regresion where ports set via the `ports` configuration would no longer get exposed to the host, despite portmappings being set. This fixes that issue by exposing only those ports comming from port mappings. Change-type: patch	2023-10-24 15:04:39 -03:00
Felipe Lalanne	416170bc05	Ignore `expose` service compose configuration The docker EXPOSE directive and corresponding docker-compose `expose` service configuration serves as documentation/metadata that a container listens on a certain port that may be used for service discovery but it doesn't have any real impact on the ability for other containers on the same network to access the exposed service via the port. In newer engine implementations, this property may conflict with other network configurations, and prevent the container from being started by the docker engine (see #2211). This PR removes code that would manage the expose property and takes the property out of the whitelist. A composition with the `expose` property will result in the log message `Ignoring unsupported or unknown compose fields: expose`. While this change should not have operational impact, it still removes a previously supported configuration and as such there is a chance of it being a breaking change for some applications. For this reason it is being published as a new major version. Change-type: major Closes: #2211	2023-10-23 11:41:32 -03:00
Pagan Gazzard	c9f032e13a	Switch _.isFunction usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Pagan Gazzard	20df54668c	Switch _.isArray usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Felipe Lalanne	3e828dcc52	Revert "Do not expose ports from image if service network mode" This reverts commit 0c7bad779291e15e419166a2c66c2a21dd06aa83, as that change causes a service restart loop. The supervisor cannot distinguish between ports exposed via the `EXPOSE` directive and the docker-compose `expose` property. Because of this, in the case of `network-mode: service:<...>` the current state and target state never match, leading to a service restart loop. Change-type: patch	2023-10-16 13:06:50 -03:00
Pagan Gazzard	766cce89c7	Convert multiple bluebird uses to native promises Change-type: patch	2023-10-16 11:40:45 +01:00
Felipe Lalanne	0c7bad7792	Do not expose ports from image if service network mode The supervisor exposes ports configured using the `EXPOSE` directive in the dockerfile when configuring the container for runtime. This can cause issues if using `network_mode: service:<service name>` as the expose configuration is not compatible with that network mode. This fix now skips image exposed ports for that particular network mode. Change-type: patch Relates-to: #2211	2023-10-12 18:03:42 -03:00
Pagan Gazzard	894bdeeeb6	Remove unused docker logs logging code Change-type: patch	2023-10-11 14:20:33 +01:00
Christina Ying Wang	bc1d251e66	Revert os-release path to /mnt/root /mnt/boot/os-release isn't always accurate so /mnt/root should be the source of truth. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-10-09 14:02:02 -07:00
Felipe Lalanne	327dc31ef0	Replace node-dbus with @balena/systemd The node-dbus module is unmaintained and a blocker for the update to Node 18. Switching to our own node bindings for systemd solves this issue Relates-to: Shouqun/node-dbus#241 Change-type: patch	2023-08-16 15:58:52 -04:00
Felipe Lalanne	8f17c30de6	Replace dbus test service with mock-systemd-bus This avoids unnecessary mocking and tests against the real systemd API Change-type: patch	2023-08-16 14:46:58 -04:00
Alexandru Costache	512240c544	backends: Add Jetson Orin NANO custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-07-11 18:11:32 +03:00
Christina Ying Wang	38fe8dae75	Remove the 'Stopped' status for services It's not an official status from container inspects, and the Supervisor doesn't set it internally anywhere. It's better to remove it entirely as the method by which Supervisor sets internal service statuses is by using a global event emitter (reportNewStatus) which makes things difficult to test. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-28 11:17:13 -04:00
Christina W	71d24d6e33	Parse container exit error message instead of status The previous implementation in #2170 of parsing the container status was too general, because it relied on the mistaken assumption that a container would have a status of `Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped containers were also getting restarted by the Supervisor due to their inspect status of `exited`. With this, parsing the exit message became unavoidable as there are no other clear ways to discern a container that has been manually stopped and shouldn't be started from a container experiencing the Engine-host race condition issue (again, see #2170). Since we're just parsing the exit error message, we don't need to worry about different behaviors amongst restart policies, as any container with the error message on exit should be started. Change-type: patch Closes: #2178 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-22 14:43:17 -07:00
Christina Ying Wang	7eba48f8b8	Improve tests surrounding Engine-host race patch See: #2170 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	9e249e6ae8	Remove unnecessary async/await from method Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	6e6f79c71d	Decrease wait time before start from 60s to 30s 60 seconds to wait may be excessively long. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	ace642ea0f	Improve naming of a util function & add unit test isOlderThan -> isValidDateAndOlderThan See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	2537eb8189	Handle the case of 'on-failure' restart policy As explained in the comments of this commit, a container with the restart policy of 'on-failure' with a non-zero exit code matches the conditions for the race, so the Supervisor will also attempt to start it. A container with the 'no' restart policy that has been started once will not be started again. If a container with 'no' has never been started, its service status will be 'Installed' and the Supervisor will already try to start it until success, so the service with 'no' doesn't require special handling. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-05 11:05:58 -07:00
Christina Ying Wang	95f3e13d50	Add extra delay after state engine integration tests This ensures target state has settled (since it seems that the 'applied' status that's reported isn't 100% accurate and the actual Engine state may lag behind slightly) Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:33:27 -07:00
Christina Ying Wang	7f32141958	Handle Engine-host race condition for "always" and "unless-stopped" restart policy There exists a race condition between Engine and a host resource that may not be immediately created. In this race condition, if a container's compose config depends on the existence of that host resource, such as a network interface, and the Engine tries to create & start the container before the host resource is created, the Engine will not reattempt to start the container, regardless of the restart policy. This is undesireable behavior but seems to be the behavior as implemented by Docker. To rectify this, the Supervisor state funnel noops for a grace period of 1 minute after starting a container to see that the container's status has become 'running`. If the container exits because of the race condition, the status becomes 'exited' and the Supervisor will attempt to generate another start step. This noop-wait-start step loop will repeat until the container is able to start. If the container is never able to start, there was a problem in the host in the creation of the host resource, and that should be fixed at the host level. This commit does not handle the case of services with restart policies "no" or "on-failure" which encounter this host race, as metadata from container inspects needs to be introduced during step calculation in order to figure out whether services with those restart policies need to be started. This will be fixed in a future PR. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:32:19 -07:00
Felipe Lalanne	2758e190b2	Fix `sw.arch` typo when testing contracts Change-type: patch	2023-05-11 13:07:26 -04:00
Felipe Lalanne	8656bd62f7	Add `arch.sw` to the valid container requirements Change-type: minor	2023-05-09 15:44:26 -04:00
Felipe Lalanne	f1f09e0e27	Allow using slug to validate hw.device-type contract This also adds the hw.device-type test case to the unit tests. Change-type: patch	2023-05-09 15:20:18 -04:00
Felipe Lalanne	a884a58b4c	Simplify and move lib/contract.spec.ts to tests/unit Improve contract tests to remove dependence on stubs and unnecessary system calls. Change-type: patch	2023-05-09 15:20:12 -04:00
Felipe Lalanne	7b8b187c74	Create tests with recovery from #1576 Devices affected by the bug described in 1576, are also stuck with some services in the `Downloaded` state, because the state engine does not detect that the running services should be killed on a network change even if they belong to a new release. This is a bug, which can be replicated by the tests in this commit Change-type: patch	2023-04-26 11:58:42 -04:00
Felipe Lalanne	0a358a4463	Add replication of issue using unit tests Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	138aec5de4	Add integration tests for state-engine These tests use the supervisor API to check that applying a target state allows the device to eventually get to the desired target configuration. This are high-level tests that work with real images and containers using dind. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	3d43f7e3b3	Simplify doRestart and doPurge actions The actions now work by passing an intermediate state to the state engine. - doPurge first removes the user app from the target state and passes that to the state engine for purging. Since intermediate state doesn't remove images, this will have the effect of basically re-installing the app. - doRestart modifies the target state by first removing only the services from the current state but keeping volumes and networks. This has the same effect as before where services were stopped one by one Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	27f0d2e655	Improve net alias comparison to prevent unwanted restarts Network aliases are now compared checking that the target state is a subset of the current state. This will prevent service restarts due to additional aliases created by docker in the container. Closes: #2134 Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	cb98133717	Exclude containerId from service network aliases When getting the service from the docker container, remove the containerId from the list of aliases (which gets added by docker). This will make it easier to use the current service state as a target. This will help us remove the `safeStateClone` function in the API in a future commit Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	967cb7747f	Make local mode image management work as in cloud mode There were multiple places in the state engine that skipped some operations while in local mode. In reality, all it's needed while in local mode is to skip image and volume deletion. This commit simplifies application-manager and compose app to be more local mode agnostic and instead making the image deletion and volume deletion configurable via function arguments. This also has the benefit to make the treatment of local mode applications more similar to cloud mode applications, allowing for API endpoints to function the same way both modes. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	7b68ee4c4f	Do not restart balena-hostname on rename The OS since v2.82.6 will monitor changes to config.json and restart the relevant services to apply the changes. There is no need to trigger restart of the services via the supervisor. Users on older OS versions will need to update their OS or restart the services manually as OS loses support after 2y. Change-type: patch Closes: #2160	2023-04-20 11:43:35 -04:00
Christina Ying Wang	b9e1464d96	Fix assertion error in restart-service From: `c0b4fafe84` Restart-service checks that both services have restarted in its test assertion, which is incorrect as restart-service should only restart one service. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-04-07 14:40:15 -07:00
Alexandru Costache	6b67db98e5	backends: Add Jetson Orin NX custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-04-07 18:12:31 +03:00
Christina Ying Wang	4c948c8854	Mount data and state partitions on container startup Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	49ee1042a8	Mount boot partition into container on Supervisor start As the Supervisor is a privileged container, it has access to host /dev, and therefore has access to boot, data, and state balenaOS partitions. This commit sets up the framework for the following: - Finds the /dev partition that corresponds to each partition based on partition label - Mounts the partitions into set mountpoints in the device - Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script - Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts This particular changes env vars for and mounts the boot partition. Since the Supervisor would no longer rely on container `run` arguments provided by a host script, this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app). Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	9522c15ecd	Change constants imports to remove 'require' Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	84a9e7e9ac	Replace BALENA-FIREWALL rule in INPUT chain instead of flushing The issue with the original Supervisor implementation of the firewall is that on Supervisor start, the Supervisor flushes the INPUT chain of the filter table. This doesn't play well with services that add to the INPUT chain on startup that may start up before the Supervisor, such as certain NetworkManager connection profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain, preserving the other rules as well as their order. Closes: #1482 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-01 13:42:07 -08:00
Pagan Gazzard	d356f979d3	Always lower case the cpu id to avoid bouncing between casing when reporting Change-type: patch	2023-02-15 13:54:40 +00:00
Felipe Lalanne	89175432af	Find and remove duplicate networks We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this can be disruptive for updates as trying to create a container referencing a duplicate network results in a 400 error from the engine. This commit finds and removes duplicate networks via the state engine, this means that even if somehow a container could be referencing a network that has been duplicated later somehow, this will remove the container first. While thies doesn't solve the problem of duplicate networks being created in the first place, it will fix the state of the system to correct the inconsistency. Change-type: minor Closes: #590	2023-02-10 20:24:36 -05:00
Christina Ying Wang	c4f9d72172	Remove dependent devices content in codebase This includes: - proxyvisor.js - references in docs - references device-state, api-binder, compose modules, API - references in tests The commit also adds a migration to remove the 4 dependent device tables from the DB. Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 19:34:02 -08:00
Ruben Keulemans	9a1cde7f44	Support `since` and `until` in supervisor journalctl wrapper API. Signed-off-by: Ruben Keulemans ruben.keulemans@protonmail.com Change-type: minor Closes: #2083	2023-02-01 09:17:10 +01:00

1 2 3 4 5 ...

439 Commits