balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2024-12-19 13:47:54 +00:00

Author	SHA1	Message	Date
Pagan Gazzard	20df54668c	Switch _.isArray usage to native versions Change-type: patch	2023-10-16 14:30:25 -03:00
Felipe Lalanne	3e828dcc52	Revert "Do not expose ports from image if service network mode" This reverts commit `0c7bad7792`, as that change causes a service restart loop. The supervisor cannot distinguish between ports exposed via the `EXPOSE` directive and the docker-compose `expose` property. Because of this, in the case of `network-mode: service:<...>` the current state and target state never match, leading to a service restart loop. Change-type: patch	2023-10-16 13:06:50 -03:00
Pagan Gazzard	766cce89c7	Convert multiple bluebird uses to native promises Change-type: patch	2023-10-16 11:40:45 +01:00
Felipe Lalanne	0c7bad7792	Do not expose ports from image if service network mode The supervisor exposes ports configured using the `EXPOSE` directive in the dockerfile when configuring the container for runtime. This can cause issues if using `network_mode: service:<service name>` as the expose configuration is not compatible with that network mode. This fix now skips image exposed ports for that particular network mode. Change-type: patch Relates-to: #2211	2023-10-12 18:03:42 -03:00
Pagan Gazzard	894bdeeeb6	Remove unused docker logs logging code Change-type: patch	2023-10-11 14:20:33 +01:00
Christina Ying Wang	bc1d251e66	Revert os-release path to /mnt/root /mnt/boot/os-release isn't always accurate so /mnt/root should be the source of truth. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-10-09 14:02:02 -07:00
Felipe Lalanne	327dc31ef0	Replace node-dbus with @balena/systemd The node-dbus module is unmaintained and a blocker for the update to Node 18. Switching to our own node bindings for systemd solves this issue Relates-to: Shouqun/node-dbus#241 Change-type: patch	2023-08-16 15:58:52 -04:00
Felipe Lalanne	8f17c30de6	Replace dbus test service with mock-systemd-bus This avoids unnecessary mocking and tests against the real systemd API Change-type: patch	2023-08-16 14:46:58 -04:00
Alexandru Costache	512240c544	backends: Add Jetson Orin NANO custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-07-11 18:11:32 +03:00
Christina Ying Wang	38fe8dae75	Remove the 'Stopped' status for services It's not an official status from container inspects, and the Supervisor doesn't set it internally anywhere. It's better to remove it entirely as the method by which Supervisor sets internal service statuses is by using a global event emitter (reportNewStatus) which makes things difficult to test. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-28 11:17:13 -04:00
Christina W	71d24d6e33	Parse container exit error message instead of status The previous implementation in #2170 of parsing the container status was too general, because it relied on the mistaken assumption that a container would have a status of `Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped containers were also getting restarted by the Supervisor due to their inspect status of `exited`. With this, parsing the exit message became unavoidable as there are no other clear ways to discern a container that has been manually stopped and shouldn't be started from a container experiencing the Engine-host race condition issue (again, see #2170). Since we're just parsing the exit error message, we don't need to worry about different behaviors amongst restart policies, as any container with the error message on exit should be started. Change-type: patch Closes: #2178 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-22 14:43:17 -07:00
Christina Ying Wang	7eba48f8b8	Improve tests surrounding Engine-host race patch See: #2170 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	9e249e6ae8	Remove unnecessary async/await from method Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	6e6f79c71d	Decrease wait time before start from 60s to 30s 60 seconds to wait may be excessively long. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	ace642ea0f	Improve naming of a util function & add unit test isOlderThan -> isValidDateAndOlderThan See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	2537eb8189	Handle the case of 'on-failure' restart policy As explained in the comments of this commit, a container with the restart policy of 'on-failure' with a non-zero exit code matches the conditions for the race, so the Supervisor will also attempt to start it. A container with the 'no' restart policy that has been started once will not be started again. If a container with 'no' has never been started, its service status will be 'Installed' and the Supervisor will already try to start it until success, so the service with 'no' doesn't require special handling. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-05 11:05:58 -07:00
Christina Ying Wang	95f3e13d50	Add extra delay after state engine integration tests This ensures target state has settled (since it seems that the 'applied' status that's reported isn't 100% accurate and the actual Engine state may lag behind slightly) Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:33:27 -07:00
Christina Ying Wang	7f32141958	Handle Engine-host race condition for "always" and "unless-stopped" restart policy There exists a race condition between Engine and a host resource that may not be immediately created. In this race condition, if a container's compose config depends on the existence of that host resource, such as a network interface, and the Engine tries to create & start the container before the host resource is created, the Engine will not reattempt to start the container, regardless of the restart policy. This is undesireable behavior but seems to be the behavior as implemented by Docker. To rectify this, the Supervisor state funnel noops for a grace period of 1 minute after starting a container to see that the container's status has become 'running`. If the container exits because of the race condition, the status becomes 'exited' and the Supervisor will attempt to generate another start step. This noop-wait-start step loop will repeat until the container is able to start. If the container is never able to start, there was a problem in the host in the creation of the host resource, and that should be fixed at the host level. This commit does not handle the case of services with restart policies "no" or "on-failure" which encounter this host race, as metadata from container inspects needs to be introduced during step calculation in order to figure out whether services with those restart policies need to be started. This will be fixed in a future PR. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-05-31 11:32:19 -07:00
Felipe Lalanne	2758e190b2	Fix `sw.arch` typo when testing contracts Change-type: patch	2023-05-11 13:07:26 -04:00
Felipe Lalanne	8656bd62f7	Add `arch.sw` to the valid container requirements Change-type: minor	2023-05-09 15:44:26 -04:00
Felipe Lalanne	f1f09e0e27	Allow using slug to validate hw.device-type contract This also adds the hw.device-type test case to the unit tests. Change-type: patch	2023-05-09 15:20:18 -04:00
Felipe Lalanne	a884a58b4c	Simplify and move lib/contract.spec.ts to tests/unit Improve contract tests to remove dependence on stubs and unnecessary system calls. Change-type: patch	2023-05-09 15:20:12 -04:00
Felipe Lalanne	7b8b187c74	Create tests with recovery from #1576 Devices affected by the bug described in 1576, are also stuck with some services in the `Downloaded` state, because the state engine does not detect that the running services should be killed on a network change even if they belong to a new release. This is a bug, which can be replicated by the tests in this commit Change-type: patch	2023-04-26 11:58:42 -04:00
Felipe Lalanne	0a358a4463	Add replication of issue using unit tests Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	138aec5de4	Add integration tests for state-engine These tests use the supervisor API to check that applying a target state allows the device to eventually get to the desired target configuration. This are high-level tests that work with real images and containers using dind. Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	3d43f7e3b3	Simplify doRestart and doPurge actions The actions now work by passing an intermediate state to the state engine. - doPurge first removes the user app from the target state and passes that to the state engine for purging. Since intermediate state doesn't remove images, this will have the effect of basically re-installing the app. - doRestart modifies the target state by first removing only the services from the current state but keeping volumes and networks. This has the same effect as before where services were stopped one by one Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	27f0d2e655	Improve net alias comparison to prevent unwanted restarts Network aliases are now compared checking that the target state is a subset of the current state. This will prevent service restarts due to additional aliases created by docker in the container. Closes: #2134 Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	cb98133717	Exclude containerId from service network aliases When getting the service from the docker container, remove the containerId from the list of aliases (which gets added by docker). This will make it easier to use the current service state as a target. This will help us remove the `safeStateClone` function in the API in a future commit Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	967cb7747f	Make local mode image management work as in cloud mode There were multiple places in the state engine that skipped some operations while in local mode. In reality, all it's needed while in local mode is to skip image and volume deletion. This commit simplifies application-manager and compose app to be more local mode agnostic and instead making the image deletion and volume deletion configurable via function arguments. This also has the benefit to make the treatment of local mode applications more similar to cloud mode applications, allowing for API endpoints to function the same way both modes. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	7b68ee4c4f	Do not restart balena-hostname on rename The OS since v2.82.6 will monitor changes to config.json and restart the relevant services to apply the changes. There is no need to trigger restart of the services via the supervisor. Users on older OS versions will need to update their OS or restart the services manually as OS loses support after 2y. Change-type: patch Closes: #2160	2023-04-20 11:43:35 -04:00
Christina Ying Wang	b9e1464d96	Fix assertion error in restart-service From: `c0b4fafe84` Restart-service checks that both services have restarted in its test assertion, which is incorrect as restart-service should only restart one service. Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-04-07 14:40:15 -07:00
Alexandru Costache	6b67db98e5	backends: Add Jetson Orin NX custom device-tree support Signed-off-by: Alexandru Costache <alexandru@balena.io> Change-type: patch	2023-04-07 18:12:31 +03:00
Christina Ying Wang	4c948c8854	Mount data and state partitions on container startup Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	49ee1042a8	Mount boot partition into container on Supervisor start As the Supervisor is a privileged container, it has access to host /dev, and therefore has access to boot, data, and state balenaOS partitions. This commit sets up the framework for the following: - Finds the /dev partition that corresponds to each partition based on partition label - Mounts the partitions into set mountpoints in the device - Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script - Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts This particular changes env vars for and mounts the boot partition. Since the Supervisor would no longer rely on container `run` arguments provided by a host script, this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app). Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	9522c15ecd	Change constants imports to remove 'require' Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-27 12:07:01 -07:00
Christina Ying Wang	84a9e7e9ac	Replace BALENA-FIREWALL rule in INPUT chain instead of flushing The issue with the original Supervisor implementation of the firewall is that on Supervisor start, the Supervisor flushes the INPUT chain of the filter table. This doesn't play well with services that add to the INPUT chain on startup that may start up before the Supervisor, such as certain NetworkManager connection profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain, preserving the other rules as well as their order. Closes: #1482 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-03-01 13:42:07 -08:00
Pagan Gazzard	d356f979d3	Always lower case the cpu id to avoid bouncing between casing when reporting Change-type: patch	2023-02-15 13:54:40 +00:00
Felipe Lalanne	89175432af	Find and remove duplicate networks We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this can be disruptive for updates as trying to create a container referencing a duplicate network results in a 400 error from the engine. This commit finds and removes duplicate networks via the state engine, this means that even if somehow a container could be referencing a network that has been duplicated later somehow, this will remove the container first. While thies doesn't solve the problem of duplicate networks being created in the first place, it will fix the state of the system to correct the inconsistency. Change-type: minor Closes: #590	2023-02-10 20:24:36 -05:00
Christina Ying Wang	c4f9d72172	Remove dependent devices content in codebase This includes: - proxyvisor.js - references in docs - references device-state, api-binder, compose modules, API - references in tests The commit also adds a migration to remove the 4 dependent device tables from the DB. Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 19:34:02 -08:00
Ruben Keulemans	9a1cde7f44	Support `since` and `until` in supervisor journalctl wrapper API. Signed-off-by: Ruben Keulemans ruben.keulemans@protonmail.com Change-type: minor Closes: #2083	2023-02-01 09:17:10 +01:00
Felipe Lalanne	4d74505087	Fix wait-for-it script to work with external signals The wait-for-it script used during tests would setup a timer that would send SIGUSR2 to the parent process after the timer ends. Since node was ignoring additional signals, the timer ending would have no effect after the node process had replaced the start script. However when node has pid != 1, SIGUSR2 default behavior is to terminate the process, meaning the tests would fail after 30 seconds. The script is now updated so the timer is killed once the services are ready for the tests.	2023-01-31 10:43:19 -03:00
Felipe Lalanne	67d1503b54	Allow using colon character in config vars The Raspberry Pi config.txt file defines the use of colon to configure variables of the same name in different ports, for instance on those devices with two hdmi ports. This syntax was previously not supported by the supervisor. This change relaxes the syntax validation on config vars to allow the use of the colon character. Relates-to: #1573, #2046 Change-type: minor	2023-01-20 15:48:32 -03:00
Christina Ying Wang	e1bacda580	Update host-config, route, and action tests for host config endpoints Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	250684d651	Use actions & write tests for GET /v1/device Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	72c683d5ff	Use actions & write tests for GET /v1/apps/:appId Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	198d9ad638	Write update action and tests, remove isReadyForUpdate check See: #1924 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	85392f2a85	Move reboot/shutdown to actions and related tests to integration Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-11 15:48:13 -08:00
Christina Ying Wang	c6cf6a0136	Use executeServiceAction for v1/v2 service action endpoints This includes: - /v1/apps/:appId/(stop\|start) - /v2/applications/:appId/(restart\|stop\|start)-service Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 18:20:24 -08:00
Christina Ying Wang	fcd28591c6	Add tests for doPurge action and v1/v2 app purge routes Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:25:27 -08:00
Christina Ying Wang	a24d5acf7f	Add tests for doRestart action and v1/v2 app restart routes Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-01-09 16:25:27 -08:00

1 2 3 4 5 ...

429 Commits