balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2025-02-25 19:21:38 +00:00

Author	SHA1	Message	Date
Christina Ying Wang	7f32141958	Handle Engine-host race condition for "always" and "unless-stopped" restart policy There exists a race condition between Engine and a host resource that may not be immediately created. In this race condition, if a container's compose config depends on the existence of that host resource, such as a network interface, and the Engine tries to create & start the container before the host resource is created, the Engine will not reattempt to start the container, regardless of the restart policy. This is undesireable behavior but seems to be the behavior as implemented by Docker. To rectify this, the Supervisor state funnel noops for a grace period of 1 minute after starting a container to see that the container's status has become 'running`. If the container exits because of the race condition, the status becomes 'exited' and the Supervisor will attempt to generate another start step. This noop-wait-start step loop will repeat until the container is able to start. If the container is never able to start, there was a problem in the host in the creation of the host resource, and that should be fixed at the host level. This commit does not handle the case of services with restart policies "no" or "on-failure" which encounter this host race, as metadata from container inspects needs to be introduced during step calculation in order to figure out whether services with those restart policies need to be started. This will be fixed in a future PR. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:32:19 -07:00
Felipe Lalanne	8656bd62f7	Add `arch.sw` to the valid container requirements Change-type: minor	2023-05-09 15:44:26 -04:00
Felipe Lalanne	f1f09e0e27	Allow using slug to validate hw.device-type contract This also adds the hw.device-type test case to the unit tests. Change-type: patch	2023-05-09 15:20:18 -04:00
Felipe Lalanne	5fdd689590	Fix service comparison when creating component steps A bug in service comparison would make it that a device already running a service from a new release with network changes would never stop the running service so remaining services would forever get stuck in `Downloaded` state. This fixes the comparison so the service will get killed in this case, particularly allowing devices to recover from #1576 Change-type: patch	2023-04-26 11:58:48 -04:00
Felipe Lalanne	7aecaae8b0	Skip updateMetadata step if there are network changes Previous behavior would make it that an `updateMetadata` step would take precedence over a `kill` step when network changes are present. This would lead to an inconsistent state if an update included a network and a container change. Closes: #1576 Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	138aec5de4	Add integration tests for state-engine These tests use the supervisor API to check that applying a target state allows the device to eventually get to the desired target configuration. This are high-level tests that work with real images and containers using dind. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	c1207cbbff	Do not pass auth to images with no registry The supervisor allows the target image to be an image without a registry (e.g. `alpine:latest`), while this really only happens while in local mode, we don't want to pass credentials to the default registry as those credentials are meant for balena registry and will otherwise fail. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	6c031299d6	Remove safeStateClone function This function is no longer needed with the latest changes to getCurrentState Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	36311ef7a1	Get rid of targetVolatile in app manager Target volatile doesn't make sense now that we can use the current state as a target. It wasn't actually being used for anything anymore apparently Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	1e0dd381f5	Make pausingApply a private member of device-state This simplifies this module interface and hides implementation details from the rest of the code. The function `applyIntermediateTarget` will now call `pausingApply` before applying the target API actions no longer need to call pausing apply Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	3d43f7e3b3	Simplify doRestart and doPurge actions The actions now work by passing an intermediate state to the state engine. - doPurge first removes the user app from the target state and passes that to the state engine for purging. Since intermediate state doesn't remove images, this will have the effect of basically re-installing the app. - doRestart modifies the target state by first removing only the services from the current state but keeping volumes and networks. This has the same effect as before where services were stopped one by one Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	43630e5267	Fix network appUuid inference in local mode Local mode uses a numeric `appUuid` which was messing up parsing the network name. This fixes this issue so the current state can be used as a target state Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	b1fc4e1761	Get image name from DB when getting the app current state The Service class in `compose/service.ts` cannot get the image name from the image id when building the object from the container metadata. We query the metadata in the application manager getCurrentApps method so the current state can be used as target by API methods Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	27f0d2e655	Improve net alias comparison to prevent unwanted restarts Network aliases are now compared checking that the target state is a subset of the current state. This will prevent service restarts due to additional aliases created by docker in the container. Closes: #2134 Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	cb98133717	Exclude containerId from service network aliases When getting the service from the docker container, remove the containerId from the list of aliases (which gets added by docker). This will make it easier to use the current service state as a target. This will help us remove the `safeStateClone` function in the API in a future commit Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	f2ca7dbb6a	Skip image delete when applying intermediate state This replaces the previous flag `isApplyingIntermediate` on application manager and simplifies the interface of the state engine to make temporary changes to the general app state. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	967cb7747f	Make local mode image management work as in cloud mode There were multiple places in the state engine that skipped some operations while in local mode. In reality, all it's needed while in local mode is to skip image and volume deletion. This commit simplifies application-manager and compose app to be more local mode agnostic and instead making the image deletion and volume deletion configurable via function arguments. This also has the benefit to make the treatment of local mode applications more similar to cloud mode applications, allowing for API endpoints to function the same way both modes. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	76d5be64e5	Remove ignoreImages argument from getRequiredSteps The argument was unused and hence unnecesary. This is just a bit of cleanup Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	7b68ee4c4f	Do not restart balena-hostname on rename The OS since v2.82.6 will monitor changes to config.json and restart the relevant services to apply the changes. There is no need to trigger restart of the services via the supervisor. Users on older OS versions will need to update their OS or restart the services manually as OS loses support after 2y. Change-type: patch Closes: #2160	2023-04-20 11:43:35 -04:00
Felipe Lalanne	6764641426	Log uncaught promise exceptions on the app entry Node 15 [changed the way it treats unhandled promise rejections](https://github.com/nodejs/node/blob/main/doc/changelogs/CHANGELOG_V15.md#throw-on-unhandled-rejections---33021) from a warning to a throw. For this reason errors like a corrupt migration directory, that happens when trying to roll back to a previous supervisor version were no longer showing a message but dumping the full minimized code into the journal logs. This PR adds a catchall on app.ts to log the exception and throw an exit code of 1. Change-type: patch	2023-04-10 11:18:35 -04:00
Alexandru Costache	6b67db98e5	backends: Add Jetson Orin NX custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-04-07 18:12:31 +03:00
Christina Ying Wang	4c948c8854	Mount data and state partitions on container startup Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	49ee1042a8	Mount boot partition into container on Supervisor start As the Supervisor is a privileged container, it has access to host /dev, and therefore has access to boot, data, and state balenaOS partitions. This commit sets up the framework for the following: - Finds the /dev partition that corresponds to each partition based on partition label - Mounts the partitions into set mountpoints in the device - Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script - Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts This particular changes env vars for and mounts the boot partition. Since the Supervisor would no longer rely on container `run` arguments provided by a host script, this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app). Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	9522c15ecd	Change constants imports to remove 'require' Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	37371d89dc	Add missing log backend field assignment in logger init Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-23 14:07:35 -07:00
Christina Ying Wang	36e46d80a6	Use log endpoint subdomain if it exists in config.json See: https://github.com/balena-io/open-balena-api/pull/1288 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-14 12:56:32 -07:00
Felipe Lalanne	f6435814cd	Skip pin device step if release was deleted Preloaded devices can require that the device is pinned to the preloaded release on provisioning. However if the provisioned release gets released in the future, that would lead to the device remaining in "VPN only" state forever as the provisioning process could not finish due to pinning failure. This commit changes the behavior so if the release does not exist, the pinning step is skipped and the device follows the fleet pinning state. Closes: #2133 Change-type: patch	2023-03-13 10:03:00 -03:00
Christina Ying Wang	84a9e7e9ac	Replace BALENA-FIREWALL rule in INPUT chain instead of flushing The issue with the original Supervisor implementation of the firewall is that on Supervisor start, the Supervisor flushes the INPUT chain of the filter table. This doesn't play well with services that add to the INPUT chain on startup that may start up before the Supervisor, such as certain NetworkManager connection profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain, preserving the other rules as well as their order. Closes: #1482 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-01 13:42:07 -08:00
Pagan Gazzard	d356f979d3	Always lower case the cpu id to avoid bouncing between casing when reporting Change-type: patch	2023-02-15 13:54:40 +00:00
Felipe Lalanne	89175432af	Find and remove duplicate networks We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this can be disruptive for updates as trying to create a container referencing a duplicate network results in a 400 error from the engine. This commit finds and removes duplicate networks via the state engine, this means that even if somehow a container could be referencing a network that has been duplicated later somehow, this will remove the container first. While thies doesn't solve the problem of duplicate networks being created in the first place, it will fix the state of the system to correct the inconsistency. Change-type: minor Closes: #590	2023-02-10 20:24:36 -05:00
Felipe Lalanne	180c4ff31a	Reference networks by Id instead of by name We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this is disruptive of updates, as the supervisor usually queries resource by name, resulting in a 400 error from the engine because of the ambiguity. This replaces those queries by name to queries by id. This includes network removal. If a `removeNetwork` step is generated, the supervisor opts to remove all instances of the network with the same name as it cannot easily resolve the ambiguity. This doesn't solve the problem of ambiguous networks, because even if networks are referenced by id when creating a container, the engine will throw an error (see https://github.com/balena-os/balena-supervisor/issues/590#issuecomment-1423557871) Change-type: patch Relates-to: #590	2023-02-10 20:24:36 -05:00
Christina Ying Wang	c4f9d72172	Remove dependent devices content in codebase This includes: - proxyvisor.js - references in docs - references device-state, api-binder, compose modules, API - references in tests The commit also adds a migration to remove the 4 dependent device tables from the DB. Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 19:34:02 -08:00
Christina Ying Wang	9b26fc263a	patch: Convert internal timestamp passed to journalctl from number to string See: https://github.com/balena-os/balena-supervisor/pull/2084 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 15:59:16 -08:00
Ruben Keulemans	9a1cde7f44	Support `since` and `until` in supervisor journalctl wrapper API. Signed-off-by: Ruben Keulemans ruben.keulemans@protonmail.com Change-type: minor Closes: #2083	2023-02-01 09:17:10 +01:00
Felipe Lalanne	6683bca07d	Add SIGTERM listener on application start As reported by issue #2100, the supervisor was not correctly reacting to `SIGTERM` sent by the engine when terminating the process (for instance before a reboot). This would lead to the supervisor requiring an additional 10 seconds to terminate (after which the engine will send a `SIGKILL`). The reason for this is explained by the following info coming from Node > Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to `SIGINT` (`CTRL-C`) and similar signals. [reference](https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#handling-kernel-signals) On internal testing, it was discovered that simply adding a listener for the signal on the Node process was enough to handle the signal, even when the process runs as PID 1. This adds a listener for `SIGTERM` before starting the supervisor main loop. Closes: #2100 Change-type: patch	2023-01-31 10:43:19 -03:00
Felipe Lalanne	67d1503b54	Allow using colon character in config vars The Raspberry Pi config.txt file defines the use of colon to configure variables of the same name in different ports, for instance on those devices with two hdmi ports. This syntax was previously not supported by the supervisor. This change relaxes the syntax validation on config vars to allow the use of the colon character. Relates-to: #1573, #2046 Change-type: minor	2023-01-20 15:48:32 -03:00
Pagan Gazzard	63cc4ad58c	Correctly use the extended got instance for target-state In practice this fixes the missing user-agent Change-type: patch	2023-01-18 11:40:06 +00:00
Christina Ying Wang	e1bacda580	Update host-config, route, and action tests for host config endpoints Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	250684d651	Use actions & write tests for GET /v1/device Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	72c683d5ff	Use actions & write tests for GET /v1/apps/:appId Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	198d9ad638	Write update action and tests, remove isReadyForUpdate check See: #1924 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	85392f2a85	Move reboot/shutdown to actions and related tests to integration Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	c6cf6a0136	Use executeServiceAction for v1/v2 service action endpoints This includes: - /v1/apps/:appId/(stop\|start) - /v2/applications/:appId/(restart\|stop\|start)-service Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 18:20:24 -08:00
Christina Ying Wang	fcd28591c6	Add tests for doPurge action and v1/v2 app purge routes Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:25:27 -08:00
Christina Ying Wang	a24d5acf7f	Add tests for doRestart action and v1/v2 app restart routes Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:25:27 -08:00
Christina Ying Wang	d6298b2643	Use regenerateKey action for POST /v1/regenerate-api-key This also adds a 500 response with the old key if the API key refresh was unsuccessful. Previously, if the key refresh was unsuccessful, this would result in an UnhandledPromiseRejection. This is a new interface. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:25:27 -08:00
Christina Ying Wang	c7db3189ad	Use identify action for POST /v1/blink Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:01:43 -08:00
Christina Ying Wang	e351ed9803	Use runHealthchecks action for GET /v1/healthy Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:01:43 -08:00
Pagan Gazzard	fd135214fe	Use `got` for fetching the target state in order to have brotli support Change-type: patch	2022-12-15 21:15:11 +00:00
Felipe Lalanne	77cd15f131	Sanitize output when writing to redsocks.conf Properties with values including quotes (`"`) would not get sanitized and written verbatim on the config file, causing redsocks to fail. Closes: #2072	2022-12-07 18:36:25 +00:00

1 2 3 4 5 ...

1564 Commits