Commit Graph

77 Commits

Author SHA1 Message Date
026dc0aed2 Release locks when removing apps
This prevents leftover locks that can prevent other operations from
taking place.

Change-type: patch
2025-03-06 11:50:31 -03:00
e416ad0daf Add support for io.balena.update.requires-reboot
This label can be used by user services to indicate that a reboot is
required after the install of a service in order to fully apply an update.

Change-type: minor
2025-01-14 11:20:35 -03:00
8e6c0fcad7 Wait for service dependencies to be running
This fixes a regression where dependencies would only be started in
order and would start the dependent service if its dependency had been
started at some point in the past, regardless of the running condition.

This makes the behavior more consistent with docker compose where the
[dependency needs to be
running or healthy](69a83d1303/pkg/compose/convergence.go (L441)) for the service to be started.

Change-type: patch
2024-12-13 16:22:11 -03:00
fb6fa9b16c Add ability to stream logs from host services to cloud
Add `os-power-mode.service`, `nvpmodel.service`, and `os-fan-profile.service`
which report status from applying power mode and fan profile configs as read
from config.json. The Supervisor sets these configs in config.json for these
host services to pick up and apply.

Also add host log streaming from `jetson-qspi-manager.service` as it
will very soon be needed for Jetson Orins.

Relates-to: #2379
See: balena-io/open-balena-api#1792
See: balena-os/balena-jetson-orin#513
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-06 07:45:43 -08:00
9c09329b86 Clean up remaining locks on state settle
Locks could remain from a previous supervisor run that didn't get to
settle the state. This ensures that cleanup will happen for remaining
locks every time the state is settled.

Change-type: patch
2024-11-27 16:40:58 -03:00
3c6e9dd209 Refactor update-locks implementation
The refactor simplifies the implementation and ensures that locks per
app can only be held by one supervisor task at the time.

Change-type: patch
2024-11-27 16:40:50 -03:00
ed1c18e369 Add support for init field from compose
Init supports boolean values, and is not included in the config when
not defined.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-09-26 10:39:59 -03:00
48e526ec43 Refactor contracts validation code
This updates the interfaces on lib/contracts and the validation in
the application-manager module.
2024-08-30 10:52:11 -04:00
eaa07e97a9 Add support for redsocks dnsu2t config
Users may specify dnsu2t config by including a `dns` field
in the `proxy` section of PATCH /v1/device/host-config's body:
```
{
  network: {
    proxy: {
      dns: '1.1.1.1:53',
    }
  }
}
```

If `dns` is a string, ADDRESS and PORT are required and should be
in the format `ADDRESS:PORT`. The endpoint with error with
code 400 if either ADDRESS or PORT are missing.

`dns` may also be a boolean. If true, defaults will be configured.
If false, the dns configuration will be removed.

If `proxy` is patched to empty, `dns` will be removed regardless
of its current or input configs, as `dns` depends on an active
redsocks proxy to function.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 14:01:51 -07:00
8bf346a6fd Parse dnsu2t block to dns config
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
b775f8f14d Stringify dns subsection of redsocks input config to dnsu2t
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
e724f60beb Strip additional fields from HostConfiguration type
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
51e59725f8 Add unit test for usingInferStepsLock
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-26 13:44:51 -07:00
a255001c2e Verify that LED_FILE exists on blinking setup
Before v1, the blinking module would not throw when the passed led file
does not exist. This change checks for file existence and defaults to
`/dev/null` otherwise

Change-type: patch
2024-08-07 15:33:07 -04:00
f99ccb58c6 Remove unnecessary exports from host-config
This limits the host-config interface to necessary methods
only

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
53f5641ef1 Refactor host-config to be its own module
The host-config module exposes the following interfaces: get,
patch, and parse.

`get` gets host configuration such as redsocks proxy configuration
and hostname and returns it in an object of type HostConfiguration.

`patch` takes an object of type HostConfiguration or LegacyHostConfiguration
and updates the hostname and redsocks proxy configuration, optionally
forcing the patch through update locks.

`parse` takes a user input of unknown type and parses it into type
HostConfiguration or LegacyHostConfiguration for patching, erroring if
parse was unsuccessful.

LegacyHostConfiguration is a looser typing of the user input which does
not validate values of the five known proxy fields of type, ip, port,
username, and password. We should stop supporting it in the next
major Supervisor API release.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
be986a62a5 Add HostConfig.parse method
Parses input from PATCH /v1/device/host-config into either
type HostConfiguration, or if LegacyHostConfiguration if
input is of an acceptable shape (for backwards compatibility).

Once input has been determined to be of type HostConfiguration,
we can easily extract ProxyConfig from the object for patching,
stringifying, and writing to redsocks.conf.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
f17f7efe60 Add HostConfig.patchProxy method
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:49 -07:00
9c6681bb23 Add RedsocksConf.stringify method
`stringify` takes a RedsocksConfig, an internal object
representation of the redsocks.conf file, and transforms
it into a valid string that can be written to redsocks.conf.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:45:52 -07:00
1e224be0cd Add RedsocksConf.parse method
This is part of the host-config refactor which
enables easier encoding to / decoding from `redsocks.conf`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:45:06 -07:00
5d93f358aa Create replicating test
This adds a test to check the case where a service image changes along
with networks being removed.
2024-06-24 15:54:19 -04:00
94de4006a0 Split compose types into interface and implementation
This splits `App`, `Network`, `Service` and `Volume` which used to be
defined as classes into an interface and a class implementation that is
not exported. This will allow to work with just the types in some cases
and prevent circular dependencies when importing.

Change-type: patch
2024-05-27 14:36:03 -04:00
9c968b8d06 Move lib/fs-utils tests to testfs
This removes mock-fs as a dependency

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-24 13:37:45 -07:00
7bb5900649 Update systeminformation to 5.22.7
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-22 12:03:40 -07:00
f863075bdc Add memory usage healthcheck
This healthcheck fails when Supervisor memory usage is above a threshold
based on initial memory measurements after device state has settled.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-11 18:16:47 -07:00
10f294cf8e Add takeLock to state funnel
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover

ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.

A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.

Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
cf8d8cedd7 Simplify lock interface to prep for adding takeLock to state funnel
This commit changes a few things:

* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.

* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.

* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.

* Remove some methods not in use, such as app manager's `stopAll`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
af6359f7ae Take lock before updating service metadata
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
f2843e1382 Add update lock release functionality to state funnel
releaseLock is a step that will be inferred if there are services
in target state, and if some of those services have locks taken by
the Supervisor.

The releaseLock composition step calls the method of the same name
in the updateLock module, which takes the exclusive process lock before
disposing all Supervisor lockfiles in the target appId.

This is half of the update lock incorporation into the state funnel, as
we also need to introduce a takeLock step which triggers during crucial
stages of device state transition.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
d18a740a40 Add methods for easier checking of lockfile existence
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
6217546894 Update typescript to v5
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.

Change-type: patch
2024-03-05 15:33:56 -03:00
988a1c9e9a Update @balena/lint to v7
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint

Change-type: patch
2024-03-01 18:27:30 -03:00
87b195685a Use the state-helper functions in app module tests 2024-01-29 12:25:55 -08:00
9bd216327f Expose ports from port mappings on services
PR #2217 removed the expose configuration but also caused a regresion
where ports set via the `ports` configuration would no longer get
exposed to the host, despite portmappings being set. This fixes that
issue by exposing only those ports comming from port mappings.

Change-type: patch
2023-10-24 15:04:39 -03:00
416170bc05 Ignore expose service compose configuration
The docker EXPOSE directive and corresponding docker-compose `expose`
service configuration serves as documentation/metadata that a container
listens on a certain port that may be used for service discovery but it doesn't
have any real impact on the ability for
other containers on the same network to access the exposed service via
the port. In newer engine implementations, this property may conflict
with other network configurations, and prevent the container from being
started by the docker engine (see #2211).

This PR removes code that would manage the expose property and takes the
property out of the whitelist. A composition with the `expose` property
will result in the log message `Ignoring unsupported or unknown compose fields: expose`.

While this change should not have operational impact, it still removes
a previously supported configuration and as such there is a chance of it
being a breaking change for some applications. For this reason it is
being published as a new major version.

Change-type: major
Closes: #2211
2023-10-23 11:41:32 -03:00
3e828dcc52 Revert "Do not expose ports from image if service network mode"
This reverts commit 0c7bad7792, as that
change causes a service restart loop. The supervisor cannot distinguish
between ports exposed via the `EXPOSE` directive and the docker-compose
`expose` property. Because of this, in the case of `network-mode:
service:<...>` the current state and target state never match, leading
to a service restart loop.

Change-type: patch
2023-10-16 13:06:50 -03:00
0c7bad7792 Do not expose ports from image if service network mode
The supervisor exposes ports configured using the `EXPOSE` directive in
the dockerfile when configuring the container for runtime. This can
cause issues if using `network_mode: service:<service name>` as the
expose configuration is not compatible with that network mode. This
fix now skips image exposed ports for that particular network mode.

Change-type: patch
Relates-to: #2211
2023-10-12 18:03:42 -03:00
71d24d6e33 Parse container exit error message instead of status
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).

Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.

Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-22 14:43:17 -07:00
7eba48f8b8 Improve tests surrounding Engine-host race patch
See: #2170
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
9e249e6ae8 Remove unnecessary async/await from method
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
ace642ea0f Improve naming of a util function & add unit test
isOlderThan -> isValidDateAndOlderThan

See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
2537eb8189 Handle the case of 'on-failure' restart policy
As explained in the comments of this commit, a container with the restart policy
of 'on-failure' with a non-zero exit code matches the conditions for the race, so
the Supervisor will also attempt to start it. A container with the 'no' restart
policy that has been started once will not be started again. If a container with
'no' has never been started, its service status will be 'Installed' and the Supervisor
will already try to start it until success, so the service with 'no' doesn't require
special handling.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-05 11:05:58 -07:00
2758e190b2 Fix sw.arch typo when testing contracts
Change-type: patch
2023-05-11 13:07:26 -04:00
8656bd62f7 Add arch.sw to the valid container requirements
Change-type: minor
2023-05-09 15:44:26 -04:00
f1f09e0e27 Allow using slug to validate hw.device-type contract
This also adds the hw.device-type test case to the unit tests.

Change-type: patch
2023-05-09 15:20:18 -04:00
a884a58b4c Simplify and move lib/contract.spec.ts to tests/unit
Improve contract tests to remove dependence on stubs and unnecessary
system calls.

Change-type: patch
2023-05-09 15:20:12 -04:00
7b8b187c74 Create tests with recovery from #1576
Devices affected by the bug described in 1576, are also stuck with some
services in the `Downloaded` state, because the state engine does not
detect that the running services should be killed on a network change
even if they belong to a new release. This is a bug, which can be
replicated by the tests in this commit

Change-type: patch
2023-04-26 11:58:42 -04:00
0a358a4463 Add replication of issue using unit tests
Change-type: patch
2023-04-25 14:47:00 -04:00
27f0d2e655 Improve net alias comparison to prevent unwanted restarts
Network aliases are now compared checking that the target state is a
subset of the current state. This will prevent service restarts due to
additional aliases created by docker in the container.

Closes: #2134
Change-type: patch
2023-04-20 14:58:58 -04:00
cb98133717 Exclude containerId from service network aliases
When getting the service from the docker container, remove the
containerId from the list of aliases (which gets added by docker). This
will make it easier to use the current service state as a target.

This will help us remove the `safeStateClone` function in the API in a
future commit

Change-type: patch
2023-04-20 14:58:58 -04:00