Init supports boolean values, and is not included in the config when
not defined.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This moves from throwing an error when an app is rejected due to unmet
requirements (because of contracts) to storing the target with a
`rejected` flag on the database.
The application manager filters rejected apps when calculating steps to
prevent them from affecting the current state. The state engine uses the
rejection info to generate the state report.
Change-type: minor
This make the LogBackend `log` method into an async method in
preparation for upcoming changes that will use backpressure from the
connection to delay logging coming from containers.
This also removes unnecessary imageId from the LogMessage type
Change-type: patch
This removes the dependence of the supervisor on the containerLogs
database for remembering the last sent timestamp. This commit instead
uses the supervisor startup time as the initial time for log retrieval.
This might result in some logs missing for services that may start
before the supervisor after a boot, or if the supervisor restarts.
However this seems like an acceptable trade-off as the current
implementation seems to make things worst in resource contrained
environments.
We'll move storing the last sent timestamp to a better storage medium in
a future commit.
Change-type: minor
This fixes a regression on the supervisor state engine computation
(added on v16.2.0) when
the target state removes a network at the same time that a service
referencing that network is changed. Example going from
```
services:
one:
image: alpine: 3.18
networks: ['balena']
networks:
balena:
```
to
```
services:
one:
image: alpine: latest
```
Would never reach the target state as killing the service in order to
remove the network is prioritized, but one of the invariants in the target state calculation is
to not kill any services until all images have been downloaded. These
two instructions were in contradiction leading to a deadlock.
The fix involves only adding removal steps for services depending on a
changing network or volume if the service container is not being removed
already.
Change-type: patch
This reduces circular dependencies from 250 to 80 by ensuring that
modules that only require types do not import the full module with all
its dependencies.
Change-type: patch
This splits `App`, `Network`, `Service` and `Volume` which used to be
defined as classes into an interface and a class implementation that is
not exported. This will allow to work with just the types in some cases
and prevent circular dependencies when importing.
Change-type: patch
This bumps dockerode, removes resin-docker-build in favor of
@balena/compose, and updates docker-delta and docker-progress packages.
Change-type: patch
* Remove Supervisor lockfile cleanup SIGTERM listener
* Modify lockfile.getLocksTaken to read files from the filesystem
* Remove in-memory tracking of locks taken in favor of filesystem
* Require both `(resin-)updates.lock` to be locked with `nobody` UID
for service to count as locked by the Supervisor
Signed-off-by: Christina Ying Wang <christina@balena.io>
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover
ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.
A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.
Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This commit changes a few things:
* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.
* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.
* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.
* Remove some methods not in use, such as app manager's `stopAll`.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This commit only implements the action that a takeLock step
results in. It does not add takeLock step generation logic
to the state funnel yet.
Signed-off-by: Christina Ying Wang <christina@balena.io>
releaseLock is a step that will be inferred if there are services
in target state, and if some of those services have locks taken by
the Supervisor.
The releaseLock composition step calls the method of the same name
in the updateLock module, which takes the exclusive process lock before
disposing all Supervisor lockfiles in the target appId.
This is half of the update lock incorporation into the state funnel, as
we also need to introduce a takeLock step which triggers during crucial
stages of device state transition.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.
Change-type: patch
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint
Change-type: patch
Fixes behavior for release updates which removes a service in current state
and adds a new service in target state.
Change-type: patch
Closes: #2095
Signed-off-by: Christina Ying Wang <christina@balena.io>
The `updateMetadata` step renames the container to match the target
release when the service doesn't change between releases. We have seen
this step fail because of an engine bug that seems to relate to the
engine keeping stale references after container restarts. The only way
around this issue is to remove the old container and create it again.
This implements that workaround during the updateMetadata step to deal
with that issue.
Change-type: minor
Relates-to: balena-os/balena-engine#261
PR #2217 removed the expose configuration but also caused a regresion
where ports set via the `ports` configuration would no longer get
exposed to the host, despite portmappings being set. This fixes that
issue by exposing only those ports comming from port mappings.
Change-type: patch
The docker EXPOSE directive and corresponding docker-compose `expose`
service configuration serves as documentation/metadata that a container
listens on a certain port that may be used for service discovery but it doesn't
have any real impact on the ability for
other containers on the same network to access the exposed service via
the port. In newer engine implementations, this property may conflict
with other network configurations, and prevent the container from being
started by the docker engine (see #2211).
This PR removes code that would manage the expose property and takes the
property out of the whitelist. A composition with the `expose` property
will result in the log message `Ignoring unsupported or unknown compose fields: expose`.
While this change should not have operational impact, it still removes
a previously supported configuration and as such there is a chance of it
being a breaking change for some applications. For this reason it is
being published as a new major version.
Change-type: major
Closes: #2211
This reverts commit 0c7bad779291e15e419166a2c66c2a21dd06aa83, as that
change causes a service restart loop. The supervisor cannot distinguish
between ports exposed via the `EXPOSE` directive and the docker-compose
`expose` property. Because of this, in the case of `network-mode:
service:<...>` the current state and target state never match, leading
to a service restart loop.
Change-type: patch
The supervisor exposes ports configured using the `EXPOSE` directive in
the dockerfile when configuring the container for runtime. This can
cause issues if using `network_mode: service:<service name>` as the
expose configuration is not compatible with that network mode. This
fix now skips image exposed ports for that particular network mode.
Change-type: patch
Relates-to: #2211
Memory tests have shown performance improvements to using the native method.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
It's not an official status from container inspects, and the Supervisor
doesn't set it internally anywhere. It's better to remove it entirely as the
method by which Supervisor sets internal service statuses is by using a global
event emitter (reportNewStatus) which makes things difficult to test.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).
Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.
Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
As explained in the comments of this commit, a container with the restart policy
of 'on-failure' with a non-zero exit code matches the conditions for the race, so
the Supervisor will also attempt to start it. A container with the 'no' restart
policy that has been started once will not be started again. If a container with
'no' has never been started, its service status will be 'Installed' and the Supervisor
will already try to start it until success, so the service with 'no' doesn't require
special handling.
Signed-off-by: Christina Ying Wang <christina@balena.io>
There exists a race condition between Engine and a host resource that may not
be immediately created. In this race condition, if a container's compose config
depends on the existence of that host resource, such as a network interface, and the
Engine tries to create & start the container before the host resource is created, the
Engine will not reattempt to start the container, regardless of the restart policy.
This is undesireable behavior but seems to be the behavior as implemented by Docker.
To rectify this, the Supervisor state funnel noops for a grace period of 1 minute
after starting a container to see that the container's status has become 'running`.
If the container exits because of the race condition, the status becomes 'exited' and the
Supervisor will attempt to generate another start step. This noop-wait-start step loop
will repeat until the container is able to start.
If the container is never able to start, there was a problem in the host in the creation of the
host resource, and that should be fixed at the host level.
This commit does not handle the case of services with restart policies "no" or "on-failure"
which encounter this host race, as metadata from container inspects needs to be introduced
during step calculation in order to figure out whether services with those restart policies
need to be started. This will be fixed in a future PR.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
A bug in service comparison would make it that a device already running
a service from a new release with network changes would never stop the
running service so remaining services would forever get stuck in
`Downloaded` state.
This fixes the comparison so the service will get killed in this case,
particularly allowing devices to recover from #1576
Change-type: patch
Previous behavior would make it that an `updateMetadata` step would take
precedence over a `kill` step when network changes are present. This
would lead to an inconsistent state if an update included a
network and a container change.
Closes: #1576
Change-type: patch
Target volatile doesn't make sense now that we can use the
current state as a target. It wasn't actually being used for anything
anymore apparently
Change-type: patch
The actions now work by passing an intermediate state to the state
engine.
- doPurge first removes the user app from the target state and passes
that to the state engine for purging. Since intermediate state doesn't
remove images, this will have the effect of basically re-installing
the app.
- doRestart modifies the target state by first removing only the
services from the current state but keeping volumes and networks. This
has the same effect as before where services were stopped one by one
Change-type: patch
Local mode uses a numeric `appUuid` which was messing up parsing the
network name. This fixes this issue so the current state can be used
as a target state
Change-type: patch