Commit Graph

16 Commits

Author SHA1 Message Date
Christina Ying Wang
10f294cf8e Add takeLock to state funnel
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover

ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.

A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.

Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
cf8d8cedd7 Simplify lock interface to prep for adding takeLock to state funnel
This commit changes a few things:

* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.

* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.

* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.

* Remove some methods not in use, such as app manager's `stopAll`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
af6359f7ae Take lock before updating service metadata
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Felipe Lalanne
6217546894 Update typescript to v5
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.

Change-type: patch
2024-03-05 15:33:56 -03:00
Felipe Lalanne
988a1c9e9a Update @balena/lint to v7
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint

Change-type: patch
2024-03-01 18:27:30 -03:00
Christina Ying Wang
3afcef2969 Respect update strategies app-wide instead of at the service level
Fixes behavior for release updates which removes a service in current state
and adds a new service in target state.

Change-type: patch
Closes: #2095
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-01-29 12:26:28 -08:00
Christina Ying Wang
38fe8dae75 Remove the 'Stopped' status for services
It's not an official status from container inspects, and the Supervisor
doesn't set it internally anywhere. It's better to remove it entirely as the
method by which Supervisor sets internal service statuses is by using a global
event emitter (reportNewStatus) which makes things difficult to test.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-28 11:17:13 -04:00
Christina W
71d24d6e33 Parse container exit error message instead of status
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).

Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.

Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-22 14:43:17 -07:00
Christina Ying Wang
7eba48f8b8 Improve tests surrounding Engine-host race patch
See: #2170
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
6e6f79c71d Decrease wait time before start from 60s to 30s
60 seconds to wait may be excessively long.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
7f32141958 Handle Engine-host race condition for "always" and "unless-stopped" restart policy
There exists a race condition between Engine and a host resource that may not
be immediately created. In this race condition, if a container's compose config
depends on the existence of that host resource, such as a network interface, and the
Engine tries to create & start the container before the host resource is created, the
Engine will not reattempt to start the container, regardless of the restart policy.
This is undesireable behavior but seems to be the behavior as implemented by Docker.

To rectify this, the Supervisor state funnel noops for a grace period of 1 minute
after starting a container to see that the container's status has become 'running`.
If the container exits because of the race condition, the status becomes 'exited' and the
Supervisor will attempt to generate another start step. This noop-wait-start step loop
will repeat until the container is able to start.

If the container is never able to start, there was a problem in the host in the creation of the
host resource, and that should be fixed at the host level.

This commit does not handle the case of services with restart policies "no" or "on-failure"
which encounter this host race, as metadata from container inspects needs to be introduced
during step calculation in order to figure out whether services with those restart policies
need to be started. This will be fixed in a future PR.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-05-31 11:32:19 -07:00
Felipe Lalanne
967cb7747f Make local mode image management work as in cloud mode
There were multiple places in the state engine that skipped some
operations while in local mode. In reality, all it's needed while in
local mode is to skip image and volume deletion.

This commit simplifies application-manager and compose app to be more
local mode agnostic and instead making the image deletion and volume
deletion configurable via function arguments.

This also has the benefit to make the treatment of local mode
applications more similar to cloud mode applications, allowing for
API endpoints to function the same way both modes.

Change-type: patch
2023-04-20 14:58:58 -04:00
Christina Ying Wang
e1bacda580 Update host-config, route, and action tests for host config endpoints
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
8174ea9643 Simplify getting images for cleanup
getImagesForCleanup used to query the Engine for the Supervisor
image, which is unnecessary given that the Supervisor has access
to constants.supervisorImage. Thus, this Engine query is removed.

The method is simplified and made more clear, and
imageManager.isCleanupNeeded doesn't need to be stubbed in tests.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-16 12:52:49 -08:00
Christina Ying Wang
1034aa70e6 Convert ensureSupervisorNetwork to native Promises
Also remove system interface check from ensureSupervisorNetwork.

Previously `ensure` was a Bluebird promise which wasn't awaited in
its composition step. This has been here for some time and may contribute
to issues with duplicate networks. The conversion to native Promises
allows `ensure` to be awaited, hopefully reducing instances of duplicate
networks.

Removing the system interface check for /sys/class/net/supervisor0
because it's superfluous given that the Engine creates the interface
with NetworkManager. It also makes testing a lot more difficult to set up
as /sys/class/net isn't a directory that can be written to for emulating
system interface creation / removal.

Relates-to: https://github.com/balena-os/balena-supervisor/issues/1110
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-08 16:06:10 -08:00
Felipe Lalanne
b81294431e Migrate compose/app and compose/app-manager tests
compose/app is run as part of the unit test suite
compose/application-manager is run as part of the integration test suite
2022-09-28 10:37:41 -03:00