balena-supervisor

mirror of https://github.com/balena-os/balena-supervisor.git synced 2024-12-24 15:56:40 +00:00

Author	SHA1	Message	Date
Christina Ying Wang	10f294cf8e	Add takeLock to state funnel A takeLock step should be generated before any of the following steps: * kill * start * stop * updateMetadata * restart * handover ALL services in an app will be locked for any of the above actions, unless the action is generated through Supervisor API's `POST /v2/applications/:appId/(start\|stop\|restart)-service` endpoints, in which case only the target service will be locked. A lock will be taken for a service before it starts by creating the directory in /tmp before the Engine creates it through bind mounts. Also, the commit simplifies the generation of service kill steps from network/volume changes or removals. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	cf8d8cedd7	Simplify lock interface to prep for adding takeLock to state funnel This commit changes a few things: * Pass `force` to `takeLock` step directly. This allows us to remove the `lockFn` used by app manager's action executors, setting takeLock as the main interface to interact with the update lock module. Note that this commit by itself will not pass tests, as no update locking occurs where it once did. This will be amended in the next commit. * Remove locking functions from doRestart & doPurge, as this is the only area where skipLock is required. * Remove `skipLock` interface, as it's redundant with the functionality of `force`. The only time `skipLock` is true is in doRestart/doPurge, as those API methods are already run within a lock function. We removed the lock function which removes the need for skipLock, and in the next commit we'll add locking as a composition step to replace the functionality removed here. * Remove some methods not in use, such as app manager's `stopAll`. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	af6359f7ae	Take lock before updating service metadata Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Christina Ying Wang	f2843e1382	Add update lock release functionality to state funnel releaseLock is a step that will be inferred if there are services in target state, and if some of those services have locks taken by the Supervisor. The releaseLock composition step calls the method of the same name in the updateLock module, which takes the exclusive process lock before disposing all Supervisor lockfiles in the target appId. This is half of the update lock incorporation into the state funnel, as we also need to introduce a takeLock step which triggers during crucial stages of device state transition. Signed-off-by: Christina Ying Wang <christina@balena.io>	2024-04-04 14:07:47 -07:00
Felipe Lalanne	988a1c9e9a	Update @balena/lint to v7 This updates balena lint to the latest version to enable eslint support and unblock Typescript updates. This is a huge number of changes as the linting rules are much more strict now, requiring modifiying most files in the codebase. This commit also bumps the test dependency `rewire` as that was interfering with the update of balena-lint Change-type: patch	2024-03-01 18:27:30 -03:00
Felipe Lalanne	87b195685a	Use the state-helper functions in app module tests	2024-01-29 12:25:55 -08:00
Felipe Lalanne	9bd216327f	Expose ports from port mappings on services PR #2217 removed the expose configuration but also caused a regresion where ports set via the `ports` configuration would no longer get exposed to the host, despite portmappings being set. This fixes that issue by exposing only those ports comming from port mappings. Change-type: patch	2023-10-24 15:04:39 -03:00
Felipe Lalanne	416170bc05	Ignore `expose` service compose configuration The docker EXPOSE directive and corresponding docker-compose `expose` service configuration serves as documentation/metadata that a container listens on a certain port that may be used for service discovery but it doesn't have any real impact on the ability for other containers on the same network to access the exposed service via the port. In newer engine implementations, this property may conflict with other network configurations, and prevent the container from being started by the docker engine (see #2211). This PR removes code that would manage the expose property and takes the property out of the whitelist. A composition with the `expose` property will result in the log message `Ignoring unsupported or unknown compose fields: expose`. While this change should not have operational impact, it still removes a previously supported configuration and as such there is a chance of it being a breaking change for some applications. For this reason it is being published as a new major version. Change-type: major Closes: #2211	2023-10-23 11:41:32 -03:00
Felipe Lalanne	3e828dcc52	Revert "Do not expose ports from image if service network mode" This reverts commit `0c7bad7792`, as that change causes a service restart loop. The supervisor cannot distinguish between ports exposed via the `EXPOSE` directive and the docker-compose `expose` property. Because of this, in the case of `network-mode: service:<...>` the current state and target state never match, leading to a service restart loop. Change-type: patch	2023-10-16 13:06:50 -03:00
Felipe Lalanne	0c7bad7792	Do not expose ports from image if service network mode The supervisor exposes ports configured using the `EXPOSE` directive in the dockerfile when configuring the container for runtime. This can cause issues if using `network_mode: service:<service name>` as the expose configuration is not compatible with that network mode. This fix now skips image exposed ports for that particular network mode. Change-type: patch Relates-to: #2211	2023-10-12 18:03:42 -03:00
Christina W	71d24d6e33	Parse container exit error message instead of status The previous implementation in #2170 of parsing the container status was too general, because it relied on the mistaken assumption that a container would have a status of `Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped containers were also getting restarted by the Supervisor due to their inspect status of `exited`. With this, parsing the exit message became unavoidable as there are no other clear ways to discern a container that has been manually stopped and shouldn't be started from a container experiencing the Engine-host race condition issue (again, see #2170). Since we're just parsing the exit error message, we don't need to worry about different behaviors amongst restart policies, as any container with the error message on exit should be started. Change-type: patch Closes: #2178 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-22 14:43:17 -07:00
Christina Ying Wang	7eba48f8b8	Improve tests surrounding Engine-host race patch See: #2170 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	9e249e6ae8	Remove unnecessary async/await from method Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	ace642ea0f	Improve naming of a util function & add unit test isOlderThan -> isValidDateAndOlderThan See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686 Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-19 11:11:26 -07:00
Christina Ying Wang	2537eb8189	Handle the case of 'on-failure' restart policy As explained in the comments of this commit, a container with the restart policy of 'on-failure' with a non-zero exit code matches the conditions for the race, so the Supervisor will also attempt to start it. A container with the 'no' restart policy that has been started once will not be started again. If a container with 'no' has never been started, its service status will be 'Installed' and the Supervisor will already try to start it until success, so the service with 'no' doesn't require special handling. Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-06-05 11:05:58 -07:00
Felipe Lalanne	7b8b187c74	Create tests with recovery from #1576 Devices affected by the bug described in 1576, are also stuck with some services in the `Downloaded` state, because the state engine does not detect that the running services should be killed on a network change even if they belong to a new release. This is a bug, which can be replicated by the tests in this commit Change-type: patch	2023-04-26 11:58:42 -04:00
Felipe Lalanne	0a358a4463	Add replication of issue using unit tests Change-type: patch	2023-04-25 14:47:00 -04:00
Felipe Lalanne	27f0d2e655	Improve net alias comparison to prevent unwanted restarts Network aliases are now compared checking that the target state is a subset of the current state. This will prevent service restarts due to additional aliases created by docker in the container. Closes: #2134 Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	cb98133717	Exclude containerId from service network aliases When getting the service from the docker container, remove the containerId from the list of aliases (which gets added by docker). This will make it easier to use the current service state as a target. This will help us remove the `safeStateClone` function in the API in a future commit Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	967cb7747f	Make local mode image management work as in cloud mode There were multiple places in the state engine that skipped some operations while in local mode. In reality, all it's needed while in local mode is to skip image and volume deletion. This commit simplifies application-manager and compose app to be more local mode agnostic and instead making the image deletion and volume deletion configurable via function arguments. This also has the benefit to make the treatment of local mode applications more similar to cloud mode applications, allowing for API endpoints to function the same way both modes. Change-type: patch	2023-04-20 14:58:58 -04:00
Felipe Lalanne	89175432af	Find and remove duplicate networks We have seen a few times devices with duplicated network names for some reason. While we don't know the cause the networks get duplicates, this can be disruptive for updates as trying to create a container referencing a duplicate network results in a 400 error from the engine. This commit finds and removes duplicate networks via the state engine, this means that even if somehow a container could be referencing a network that has been duplicated later somehow, this will remove the container first. While thies doesn't solve the problem of duplicate networks being created in the first place, it will fix the state of the system to correct the inconsistency. Change-type: minor Closes: #590	2023-02-10 20:24:36 -05:00
Christina Ying Wang	c4f9d72172	Remove dependent devices content in codebase This includes: - proxyvisor.js - references in docs - references device-state, api-binder, compose modules, API - references in tests The commit also adds a migration to remove the 4 dependent device tables from the DB. Change-type: minor Signed-off-by: Christina Ying Wang <christina@balena.io>	2023-02-06 19:34:02 -08:00
Christina Ying Wang	f558be0a16	Create default network as config-only when services have host networking This eliminates chances of host-Docker address collision for apps such as the Supervisor where all services have host networking. Closes: #2062 Change-type: patch Signed-off-by: Christina Ying Wang <christina@balena.io>	2022-11-16 10:19:36 -08:00
pipex	aa3002f909	Migrate docker-util tests Merged docker-utils and delta tests into a single test suite. They are now ran as part of the integration tests using the real engine. Change-type: patch	2022-10-19 12:05:52 -03:00
pipex	620bcae53a	Migrate simple legacy tests to test/unit and test/integration Change-type: patch	2022-10-18 20:36:53 -03:00
Felipe Lalanne	b81294431e	Migrate compose/app and compose/app-manager tests compose/app is run as part of the unit test suite compose/application-manager is run as part of the integration test suite	2022-09-28 10:37:41 -03:00
Felipe Lalanne	a4da25c1ef	Disable logs globally using mocha hooks	2022-09-28 10:37:41 -03:00
Felipe Lalanne	a5a24e6462	Split compose/service tests into unit/integration	2022-09-28 10:37:41 -03:00
Felipe Lalanne	cdc9868d29	Split compose/network test in unit/integration Integration tests are ran in the engine instead of mockerode.	2022-09-28 10:37:40 -03:00
Felipe Lalanne	4113dde45d	Split compose/volume tests into unit/integration This also needs to modify the test environment as database migrations will look for `config.json` in the location given by the variable `CONFIG_MOUNT_POINT`. The volume tests now run against the actual docker engine setup via dind Change-type: patch	2022-09-28 10:37:40 -03:00

30 Commits