This healthcheck fails when Supervisor memory usage is above a threshold
based on initial memory measurements after device state has settled.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover
ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.
A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.
Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This commit changes a few things:
* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.
* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.
* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.
* Remove some methods not in use, such as app manager's `stopAll`.
Signed-off-by: Christina Ying Wang <christina@balena.io>
releaseLock is a step that will be inferred if there are services
in target state, and if some of those services have locks taken by
the Supervisor.
The releaseLock composition step calls the method of the same name
in the updateLock module, which takes the exclusive process lock before
disposing all Supervisor lockfiles in the target appId.
This is half of the update lock incorporation into the state funnel, as
we also need to introduce a takeLock step which triggers during crucial
stages of device state transition.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.
Change-type: patch
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint
Change-type: patch
PR #2217 removed the expose configuration but also caused a regresion
where ports set via the `ports` configuration would no longer get
exposed to the host, despite portmappings being set. This fixes that
issue by exposing only those ports comming from port mappings.
Change-type: patch
The docker EXPOSE directive and corresponding docker-compose `expose`
service configuration serves as documentation/metadata that a container
listens on a certain port that may be used for service discovery but it doesn't
have any real impact on the ability for
other containers on the same network to access the exposed service via
the port. In newer engine implementations, this property may conflict
with other network configurations, and prevent the container from being
started by the docker engine (see #2211).
This PR removes code that would manage the expose property and takes the
property out of the whitelist. A composition with the `expose` property
will result in the log message `Ignoring unsupported or unknown compose fields: expose`.
While this change should not have operational impact, it still removes
a previously supported configuration and as such there is a chance of it
being a breaking change for some applications. For this reason it is
being published as a new major version.
Change-type: major
Closes: #2211
This reverts commit 0c7bad7792, as that
change causes a service restart loop. The supervisor cannot distinguish
between ports exposed via the `EXPOSE` directive and the docker-compose
`expose` property. Because of this, in the case of `network-mode:
service:<...>` the current state and target state never match, leading
to a service restart loop.
Change-type: patch
The supervisor exposes ports configured using the `EXPOSE` directive in
the dockerfile when configuring the container for runtime. This can
cause issues if using `network_mode: service:<service name>` as the
expose configuration is not compatible with that network mode. This
fix now skips image exposed ports for that particular network mode.
Change-type: patch
Relates-to: #2211
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).
Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.
Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
As explained in the comments of this commit, a container with the restart policy
of 'on-failure' with a non-zero exit code matches the conditions for the race, so
the Supervisor will also attempt to start it. A container with the 'no' restart
policy that has been started once will not be started again. If a container with
'no' has never been started, its service status will be 'Installed' and the Supervisor
will already try to start it until success, so the service with 'no' doesn't require
special handling.
Signed-off-by: Christina Ying Wang <christina@balena.io>
Devices affected by the bug described in 1576, are also stuck with some
services in the `Downloaded` state, because the state engine does not
detect that the running services should be killed on a network change
even if they belong to a new release. This is a bug, which can be
replicated by the tests in this commit
Change-type: patch
Network aliases are now compared checking that the target state is a
subset of the current state. This will prevent service restarts due to
additional aliases created by docker in the container.
Closes: #2134
Change-type: patch
When getting the service from the docker container, remove the
containerId from the list of aliases (which gets added by docker). This
will make it easier to use the current service state as a target.
This will help us remove the `safeStateClone` function in the API in a
future commit
Change-type: patch
There were multiple places in the state engine that skipped some
operations while in local mode. In reality, all it's needed while in
local mode is to skip image and volume deletion.
This commit simplifies application-manager and compose app to be more
local mode agnostic and instead making the image deletion and volume
deletion configurable via function arguments.
This also has the benefit to make the treatment of local mode
applications more similar to cloud mode applications, allowing for
API endpoints to function the same way both modes.
Change-type: patch
As the Supervisor is a privileged container, it has access to host /dev, and therefore has access
to boot, data, and state balenaOS partitions. This commit sets up the framework for the following:
- Finds the /dev partition that corresponds to each partition based on partition label
- Mounts the partitions into set mountpoints in the device
- Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script
- Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts
This particular changes env vars for and mounts the boot partition.
Since the Supervisor would no longer rely on container `run` arguments provided by a host script,
this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app).
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates, this
can be disruptive for updates as trying to create a container referencing a duplicate
network results in a 400 error from the engine.
This commit finds and removes duplicate networks via the state engine,
this means that even if somehow a container could be referencing a
network that has been duplicated later somehow, this will remove the
container first.
While thies doesn't solve the problem of duplicate networks being
created in the first place, it will fix the state of the system to
correct the inconsistency.
Change-type: minor
Closes: #590
This includes:
- proxyvisor.js
- references in docs
- references device-state, api-binder, compose modules, API
- references in tests
The commit also adds a migration to remove the 4 dependent device tables from the DB.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
The Raspberry Pi config.txt file defines the use of colon to configure
variables of the same name in different ports, for instance on those
devices with two hdmi ports. This syntax was previously not supported by
the supervisor. This change relaxes the syntax validation on config vars
to allow the use of the colon character.
Relates-to: #1573, #2046
Change-type: minor
The supervisor had to chroot into the host root in order to read the
journal logs. This won't be possible anymore once the supervisor becomes
an app. This commit copies the journalctl binary and necessary libraries
from a debian image into the supervisor image in order to be able to use
the tool on runtime.
Change-type: patch
This eliminates chances of host-Docker address collision for apps such
as the Supervisor where all services have host networking.
Closes: #2062
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This means that configuration backend tests no longer use stubs and
(mostly) avoid internal dependencies in the tests. Instead of stubs and
mock-fs, the tests use [testfs](https://github.com/balena-io-modules/mocha-pod#working-with-the-filesystem)
which allows working with a real filesystem and ensuring everything is
re-set between tests.
This is the last change needed in order to be able to merge #1971. Here is the list of changes
- [x] Migrate splash image backend tests
- [x] Migrate extlinux backend tests
- [x] Migrate config.txt backend tests
- [x] Migrate extra-uenv config tests
- [x] Migrate odmdata config tests
- [x] Migrate config utils tests
- [x] Migrate device-config tests
Change-type: patch
This excludes route tests or refactoring. Also, created tests
for API middleware.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
Merged docker-utils and delta tests into a single test suite. They are
now ran as part of the integration tests using the real engine.
Change-type: patch
Update-lock tests now use the actual filesystem for testing, instead of
relying on stubs and spies.
This commit also fixes a small bug with update-lock that would cause a
`PromiseRejectionHandledWarning` when the lock callback would throw.
This also needs to modify the test environment as database migrations
will look for `config.json` in the location given by the variable
`CONFIG_MOUNT_POINT`.
The volume tests now run against the actual docker engine setup via dind
Change-type: patch