balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2024-12-22 15:02:23 +00:00

Author	SHA1	Message	Date
Christina Ying Wang	6e185fbd44	Don't follow symlinks when checking for lockfiles The Supervisor should only care whether a lockfile exists or not. This also fixes an edge case where a user symlinked a lockfile to a nonexistent file, causing the Supervisor to enter an error loop as it was not able to `stat` the nonexistent file. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-12 10:34:46 -04:00
Christina Ying Wang	f863075bdc	Add memory usage healthcheck This healthcheck fails when Supervisor memory usage is above a threshold based on initial memory measurements after device state has settled. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-11 18:16:47 -07:00
Christina Ying Wang	8ac2ce4677	Respect lockOverride when taking locks Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-06 00:59:04 -07:00
Christina Ying Wang	fd7d58f89a	Clean up lockfiles on takeLock step failure We don't want any Supervisor lockfiles to remain on the device when a takeLock step fails because this would interfere with the user app. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	fb1bd33ab6	Refine update locking interface * Remove Supervisor lockfile cleanup SIGTERM listener * Modify lockfile.getLocksTaken to read files from the filesystem * Remove in-memory tracking of locks taken in favor of filesystem * Require both `(resin-)updates.lock` to be locked with `nobody` UID for service to count as locked by the Supervisor Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	10f294cf8e	Add takeLock to state funnel A takeLock step should be generated before any of the following steps: * kill * start * stop * updateMetadata * restart * handover ALL services in an app will be locked for any of the above actions, unless the action is generated through Supervisor API's `POST /v2/applications/:appId/(start\|stop\|restart)-service` endpoints, in which case only the target service will be locked. A lock will be taken for a service before it starts by creating the directory in /tmp before the Engine creates it through bind mounts. Also, the commit simplifies the generation of service kill steps from network/volume changes or removals. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	cf8d8cedd7	Simplify lock interface to prep for adding takeLock to state funnel This commit changes a few things: * Pass `force` to `takeLock` step directly. This allows us to remove the `lockFn` used by app manager's action executors, setting takeLock as the main interface to interact with the update lock module. Note that this commit by itself will not pass tests, as no update locking occurs where it once did. This will be amended in the next commit. * Remove locking functions from doRestart & doPurge, as this is the only area where skipLock is required. * Remove `skipLock` interface, as it's redundant with the functionality of `force`. The only time `skipLock` is true is in doRestart/doPurge, as those API methods are already run within a lock function. We removed the lock function which removes the need for skipLock, and in the next commit we'll add locking as a composition step to replace the functionality removed here. * Remove some methods not in use, such as app manager's `stopAll`. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	af6359f7ae	Take lock before updating service metadata Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	e6df78a22b	Implement takeLock composition step + tests This commit only implements the action that a takeLock step results in. It does not add takeLock step generation logic to the state funnel yet. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	f2843e1382	Add update lock release functionality to state funnel releaseLock is a step that will be inferred if there are services in target state, and if some of those services have locks taken by the Supervisor. The releaseLock composition step calls the method of the same name in the updateLock module, which takes the exclusive process lock before disposing all Supervisor lockfiles in the target appId. This is half of the update lock incorporation into the state funnel, as we also need to introduce a takeLock step which triggers during crucial stages of device state transition. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	d18a740a40	Add methods for easier checking of lockfile existence Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Felipe Lalanne	6217546894	Update typescript to v5 This also updates code to use the default import syntax instead of `import * as` when the imported module exposes a default. This is needed with the latest typescript version. Change-type: patch	2024-03-05 15:33:56 -03:00
Felipe Lalanne	988a1c9e9a	Update @balena/lint to v7 This updates balena lint to the latest version to enable eslint support and unblock Typescript updates. This is a huge number of changes as the linting rules are much more strict now, requiring modifiying most files in the codebase. This commit also bumps the test dependency `rewire` as that was interfering with the update of balena-lint Change-type: patch	2024-03-01 18:27:30 -03:00
Felipe Lalanne	bda1bac04c	Add support for repeated overlays RPI firmware configuration allows repeating overlays to define configurations on multiple devices. For instance, for configuring multiple `ads` devices, `config.txt` needs to be setup this way ``` dtoverlay=ads1115,addr=0x48 dtoverlay=ads1115,addr=0x49 ``` Before this change, the supervisor would interpret both lines as belonging to the same overlay, preventing users from configuring multiple devices, and leading to a loop when trying to apply configurations with repeated overlays coming from the cloud side. Change-type: minor	2024-02-27 14:52:41 -03:00
Christina Ying Wang	3fd035c5bd	Patch default dtparam handling in config.txt This commit completes the list of default / board-wide dtparams to include some `baudrate` and `vc` i2c params. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-02-21 12:45:29 -08:00
Christina Ying Wang	e22253ce6e	Patch config.txt backend to return array configs correctly Previously, getBootConfig() of the config.txt backend was omitting array configurations such as gpio settings, thus resulting in the SV mistakenly assuming that boot config had not been applied, since gpio would not be in current config.txt config but would be in target config. This resulted in SV entering an infinite loop of attempting to apply the gpio config when it wasn't necessary. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-02-16 18:12:33 -08:00
Felipe Lalanne	6e6a796da5	Add special case for base DTO params on RPI config While ordering is important in the RPI firmware configuration file (config.txt), some dt params are by default considered part of the base dt overlay if they are not used by other overlays. Unfortunately the [list of dtparams](https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L133) is too long to add all of them as exceptions, but we can add the params used in the default config.txt provided in OS images, to avoid reboots when updating to this new supervisor and correctly parsing the provisioning config.txt as variables. While this addition handles most common scenarios, there is still a chance a user may have use other base overlay dt params in the initial config, in which case those will be interpreted according to the relative ordering Change-type: patch	2024-02-08 15:48:10 -03:00
Felipe Lalanne	55a8c5bf90	Add tests for dtoverlay management in config.txt	2024-02-07 20:38:44 -03:00
Christina Ying Wang	3afcef2969	Respect update strategies app-wide instead of at the service level Fixes behavior for release updates which removes a service in current state and adds a new service in target state. Change-type: patch Closes: #2095 Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-01-29 12:26:28 -08:00
Felipe Lalanne	6ee606806d	Fix docker utils tests for docker v25 From docker 25, the engine will validate IPAM config. This would cause the docker utils test to fail since the network/subnet configuration was incorrect. Change-type: patch	2024-01-25 15:05:12 -03:00
Felipe Lalanne	9bd216327f	Expose ports from port mappings on services PR #2217 removed the expose configuration but also caused a regresion where ports set via the `ports` configuration would no longer get exposed to the host, despite portmappings being set. This fixes that issue by exposing only those ports comming from port mappings. Change-type: patch	2023-10-24 15:04:39 -03:00
Christina Ying Wang	bc1d251e66	Revert os-release path to /mnt/root /mnt/boot/os-release isn't always accurate so /mnt/root should be the source of truth. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-10-09 14:02:02 -07:00
Felipe Lalanne	327dc31ef0	Replace node-dbus with @balena/systemd The node-dbus module is unmaintained and a blocker for the update to Node 18. Switching to our own node bindings for systemd solves this issue Relates-to: Shouqun/node-dbus#241 Change-type: patch	2023-08-16 15:58:52 -04:00
Felipe Lalanne	8f17c30de6	Replace dbus test service with mock-systemd-bus This avoids unnecessary mocking and tests against the real systemd API Change-type: patch	2023-08-16 14:46:58 -04:00
Alexandru Costache	512240c544	backends: Add Jetson Orin NANO custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-07-11 18:11:32 +03:00
Christina Ying Wang	38fe8dae75	Remove the 'Stopped' status for services It's not an official status from container inspects, and the Supervisor doesn't set it internally anywhere. It's better to remove it entirely as the method by which Supervisor sets internal service statuses is by using a global event emitter (reportNewStatus) which makes things difficult to test. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-28 11:17:13 -04:00
Christina W	71d24d6e33	Parse container exit error message instead of status The previous implementation in #2170 of parsing the container status was too general, because it relied on the mistaken assumption that a container would have a status of `Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped containers were also getting restarted by the Supervisor due to their inspect status of `exited`. With this, parsing the exit message became unavoidable as there are no other clear ways to discern a container that has been manually stopped and shouldn't be started from a container experiencing the Engine-host race condition issue (again, see #2170). Since we're just parsing the exit error message, we don't need to worry about different behaviors amongst restart policies, as any container with the error message on exit should be started. Change-type: patch Closes: #2178 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-22 14:43:17 -07:00
Christina Ying Wang	7eba48f8b8	Improve tests surrounding Engine-host race patch See: #2170 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	6e6f79c71d	Decrease wait time before start from 60s to 30s 60 seconds to wait may be excessively long. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	95f3e13d50	Add extra delay after state engine integration tests This ensures target state has settled (since it seems that the 'applied' status that's reported isn't 100% accurate and the actual Engine state may lag behind slightly) Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:33:27 -07:00
Christina Ying Wang	7f32141958	Handle Engine-host race condition for "always" and "unless-stopped" restart policy There exists a race condition between Engine and a host resource that may not be immediately created. In this race condition, if a container's compose config depends on the existence of that host resource, such as a network interface, and the Engine tries to create & start the container before the host resource is created, the Engine will not reattempt to start the container, regardless of the restart policy. This is undesireable behavior but seems to be the behavior as implemented by Docker. To rectify this, the Supervisor state funnel noops for a grace period of 1 minute after starting a container to see that the container's status has become 'running`. If the container exits because of the race condition, the status becomes 'exited' and the Supervisor will attempt to generate another start step. This noop-wait-start step loop will repeat until the container is able to start. If the container is never able to start, there was a problem in the host in the creation of the host resource, and that should be fixed at the host level. This commit does not handle the case of services with restart policies "no" or "on-failure" which encounter this host race, as metadata from container inspects needs to be introduced during step calculation in order to figure out whether services with those restart policies need to be started. This will be fixed in a future PR. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:32:19 -07:00
Felipe Lalanne	2758e190b2	Fix `sw.arch` typo when testing contracts Change-type: patch	2023-05-11 13:07:26 -04:00
Felipe Lalanne	8656bd62f7	Add `arch.sw` to the valid container requirements Change-type: minor	2023-05-09 15:44:26 -04:00
Felipe Lalanne	a884a58b4c	Simplify and move lib/contract.spec.ts to tests/unit Improve contract tests to remove dependence on stubs and unnecessary system calls. Change-type: patch	2023-05-09 15:20:12 -04:00
Felipe Lalanne	7b8b187c74	Create tests with recovery from #1576 Devices affected by the bug described in 1576, are also stuck with some services in the `Downloaded` state, because the state engine does not detect that the running services should be killed on a network change even if they belong to a new release. This is a bug, which can be replicated by the tests in this commit Change-type: patch	2023-04-26 11:58:42 -04:00
Felipe Lalanne	138aec5de4	Add integration tests for state-engine These tests use the supervisor API to check that applying a target state allows the device to eventually get to the desired target configuration. This are high-level tests that work with real images and containers using dind. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	3d43f7e3b3	Simplify doRestart and doPurge actions The actions now work by passing an intermediate state to the state engine. - doPurge first removes the user app from the target state and passes that to the state engine for purging. Since intermediate state doesn't remove images, this will have the effect of basically re-installing the app. - doRestart modifies the target state by first removing only the services from the current state but keeping volumes and networks. This has the same effect as before where services were stopped one by one Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	967cb7747f	Make local mode image management work as in cloud mode There were multiple places in the state engine that skipped some operations while in local mode. In reality, all it's needed while in local mode is to skip image and volume deletion. This commit simplifies application-manager and compose app to be more local mode agnostic and instead making the image deletion and volume deletion configurable via function arguments. This also has the benefit to make the treatment of local mode applications more similar to cloud mode applications, allowing for API endpoints to function the same way both modes. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	7b68ee4c4f	Do not restart balena-hostname on rename The OS since v2.82.6 will monitor changes to config.json and restart the relevant services to apply the changes. There is no need to trigger restart of the services via the supervisor. Users on older OS versions will need to update their OS or restart the services manually as OS loses support after 2y. Change-type: patch Closes: #2160	2023-04-20 11:43:35 -04:00
Christina Ying Wang	b9e1464d96	Fix assertion error in restart-service From: `c0b4fafe84` Restart-service checks that both services have restarted in its test assertion, which is incorrect as restart-service should only restart one service. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-04-07 14:40:15 -07:00
Alexandru Costache	6b67db98e5	backends: Add Jetson Orin NX custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-04-07 18:12:31 +03:00
Christina Ying Wang	4c948c8854	Mount data and state partitions on container startup Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	49ee1042a8	Mount boot partition into container on Supervisor start As the Supervisor is a privileged container, it has access to host /dev, and therefore has access to boot, data, and state balenaOS partitions. This commit sets up the framework for the following: - Finds the /dev partition that corresponds to each partition based on partition label - Mounts the partitions into set mountpoints in the device - Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script - Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts This particular changes env vars for and mounts the boot partition. Since the Supervisor would no longer rely on container `run` arguments provided by a host script, this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app). Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	9522c15ecd	Change constants imports to remove 'require' Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	84a9e7e9ac	Replace BALENA-FIREWALL rule in INPUT chain instead of flushing The issue with the original Supervisor implementation of the firewall is that on Supervisor start, the Supervisor flushes the INPUT chain of the filter table. This doesn't play well with services that add to the INPUT chain on startup that may start up before the Supervisor, such as certain NetworkManager connection profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain, preserving the other rules as well as their order. Closes: #1482 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-01 13:42:07 -08:00
Felipe Lalanne	89175432af	Find and remove duplicate networks We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this can be disruptive for updates as trying to create a container referencing a duplicate network results in a 400 error from the engine. This commit finds and removes duplicate networks via the state engine, this means that even if somehow a container could be referencing a network that has been duplicated later somehow, this will remove the container first. While thies doesn't solve the problem of duplicate networks being created in the first place, it will fix the state of the system to correct the inconsistency. Change-type: minor Closes: #590	2023-02-10 20:24:36 -05:00
Christina Ying Wang	c4f9d72172	Remove dependent devices content in codebase This includes: - proxyvisor.js - references in docs - references device-state, api-binder, compose modules, API - references in tests The commit also adds a migration to remove the 4 dependent device tables from the DB. Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 19:34:02 -08:00
Christina Ying Wang	e1bacda580	Update host-config, route, and action tests for host config endpoints Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	250684d651	Use actions & write tests for GET /v1/device Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	72c683d5ff	Use actions & write tests for GET /v1/apps/:appId Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00

1 2

94 Commits