Commit Graph

1588 Commits

Author SHA1 Message Date
Christina Ying Wang
1dcd156fc8 Update @balena/contrato to 0.9.4
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-30 16:39:49 -07:00
Pagan Gazzard
4adf710520 Update @types dependencies
Change-type: patch
2024-04-29 16:29:07 +01:00
Felipe Lalanne
ae823fea18 Update docker related dependencies
This bumps dockerode, removes resin-docker-build in favor of
@balena/compose, and updates docker-delta and docker-progress packages.

Change-type: patch
2024-04-26 12:03:04 -04:00
Felipe Lalanne
6f02b17968 Refactor MDNS resolver into a module
Also add integration tests for the resolver functionality to prevent
regressions.

Change-type: patch
2024-04-23 19:23:32 -04:00
Felipe Lalanne
ad52561de5 Fix mdnsResolver import
The `mdns-resolver` module does not provide a default export. Trying to
use a default import notation, causes the `resolve` function to not be
found, breaking MDNS resolution.

Change-type: patch
2024-04-23 19:23:32 -04:00
Christina Ying Wang
14bdc522c1 Gracefully handle multiple reboot/shutdown requests
Since HTTP's server.close() is async, there is a slim chance
for two instances of /v1/reboot or /v1/shutdown to be processed.
If the server is already closed when server.close() is called,
the call throws ERR_SERVER_NOT_RUNNING
which doesn't need to be surfaced to the user. This change
only allows one server.close() attempt to occur at a time.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-23 12:59:44 -07:00
Christina Ying Wang
6e185fbd44 Don't follow symlinks when checking for lockfiles
The Supervisor should only care whether a lockfile exists or
not. This also fixes an edge case where a user symlinked a lockfile
to a nonexistent file, causing the Supervisor to enter an error
loop as it was not able to `stat` the nonexistent file.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-12 10:34:46 -04:00
Christina Ying Wang
f863075bdc Add memory usage healthcheck
This healthcheck fails when Supervisor memory usage is above a threshold
based on initial memory measurements after device state has settled.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-11 18:16:47 -07:00
Christina Ying Wang
8ac2ce4677 Respect lockOverride when taking locks
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-06 00:59:04 -07:00
Christina Ying Wang
b7922e6875 Fix some RegEx io-ts types
io-ts types that were generated using `shortStringWithRegex` were testing
against `VAR_NAME_REGEX`, instead of the Regex that was specified when
generating the type. This affected `DockerName` such that service names with
a dash in the middle were returning as false when passed through the
`DockerName.is` type guard, affecting how `getServicesLockedByAppId` was
returning a map of locked services.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-06 00:20:34 -07:00
Christina Ying Wang
7220e994dc Log takeLock and releaseLock steps as system events
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
fd7d58f89a Clean up lockfiles on takeLock step failure
We don't want any Supervisor lockfiles to remain on the device
when a takeLock step fails because this would interfere with the user app.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
fb1bd33ab6 Refine update locking interface
* Remove Supervisor lockfile cleanup SIGTERM listener
* Modify lockfile.getLocksTaken to read files from the filesystem
* Remove in-memory tracking of locks taken in favor of filesystem
* Require both `(resin-)updates.lock` to be locked with `nobody` UID
  for service to count as locked by the Supervisor

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
10f294cf8e Add takeLock to state funnel
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover

ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.

A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.

Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
cf8d8cedd7 Simplify lock interface to prep for adding takeLock to state funnel
This commit changes a few things:

* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.

* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.

* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.

* Remove some methods not in use, such as app manager's `stopAll`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
af6359f7ae Take lock before updating service metadata
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
e6df78a22b Implement takeLock composition step + tests
This commit only implements the action that a takeLock step
results in. It does not add takeLock step generation logic
to the state funnel yet.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
f2843e1382 Add update lock release functionality to state funnel
releaseLock is a step that will be inferred if there are services
in target state, and if some of those services have locks taken by
the Supervisor.

The releaseLock composition step calls the method of the same name
in the updateLock module, which takes the exclusive process lock before
disposing all Supervisor lockfiles in the target appId.

This is half of the update lock incorporation into the state funnel, as
we also need to introduce a takeLock step which triggers during crucial
stages of device state transition.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
7cfc42e197 Separate rwlock functionality from update-lock for clarity
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
d18a740a40 Add methods for easier checking of lockfile existence
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
b9a6a6b685 Improve types & remove some lodash from state engine
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Shreya Patel
b5dbef82d7 Add revpi-connect-4 to RPi variants
We need the supervisor to be able to manage config.txt changes for the
RevPi Connect 4.

Change-type: patch
Signed-off-by: Shreya Patel <shreya@dynamicdevices.co.uk>
2024-03-27 11:55:15 +00:00
Pagan Gazzard
20e57f7f16 Log the full error on device state report failure as it is more useful
The message can be an empty string or similarly unhelpful, therefore
logging the entire error means that we will have whatever the message
may be along with the stack trace and other info that will be helpful
even when the message is not

Change-type: patch
2024-03-25 15:17:09 -03:00
Pagan Gazzard
6b0500cdbc Set @balena/es-version to es2022 to match tsconfig.json
Change-type: patch
2024-03-25 16:56:27 +00:00
Pagan Gazzard
5cd37e73ac Increase the timeout for auto select family to 5000ms to avoid issues
On slower networks the default of 250ms can cause problems as all
attempts will fail rather than only the ones for interfaces that do not
actually work correctly. Increasing this timeout to 5000ms will help to
avoid these issues

Change-type: patch
2024-03-25 15:05:13 +00:00
Felipe Lalanne
08727ed2b5 Remove dependency on @balena/happy-eyeballs
Node 20 now implements the happy eyeballs algorithm as part of its core
`net` module, with the [autoSelectFamily](https://nodejs.org/docs/latest-v20.x/api/net.html#netgetdefaultautoselectfamily) option of `socket.connect`. This option defaults to `true`, meaning that a separate
implementation of happy eyeballs is no longer needed.

Change-type: patch
2024-03-06 15:16:33 -03:00
Felipe Lalanne
b77dba2046 Update Node to v20
This updates the supervisor runtime to latest Node LTS version. There
are no breaking changes related to this bump.

Change-type: patch
2024-03-06 12:29:54 -03:00
Felipe Lalanne
6217546894 Update typescript to v5
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.

Change-type: patch
2024-03-05 15:33:56 -03:00
Felipe Lalanne
988a1c9e9a Update @balena/lint to v7
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint

Change-type: patch
2024-03-01 18:27:30 -03:00
Felipe Lalanne
bda1bac04c Add support for repeated overlays
RPI firmware configuration allows repeating overlays to define
configurations on multiple devices. For instance, for configuring
multiple `ads` devices, `config.txt` needs to be setup this way

```
dtoverlay=ads1115,addr=0x48
dtoverlay=ads1115,addr=0x49
```

Before this change, the supervisor would interpret both lines as
belonging to the same overlay, preventing users from configuring multiple
devices, and leading to a loop when trying to apply configurations with
repeated overlays coming from the cloud side.

Change-type: minor
2024-02-27 14:52:41 -03:00
Christina Ying Wang
3fd035c5bd Patch default dtparam handling in config.txt
This commit completes the list of default / board-wide dtparams
to include some `baudrate` and `vc` i2c params.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-02-21 12:45:29 -08:00
Christina Ying Wang
e22253ce6e Patch config.txt backend to return array configs correctly
Previously, getBootConfig() of the config.txt backend was omitting
array configurations such as gpio settings, thus resulting in the SV
mistakenly assuming that boot config had not been applied, since gpio
would not be in current config.txt config but would be in target config.
This resulted in SV entering an infinite loop of attempting to apply the
gpio config when it wasn't necessary.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-02-16 18:12:33 -08:00
Felipe Lalanne
6e6a796da5 Add special case for base DTO params on RPI config
While ordering is important in the RPI firmware configuration file (config.txt),
some dt params are by default considered part of the base dt overlay
if they are not used by other overlays.
Unfortunately the [list of dtparams](https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L133)
is too long to add all of them as exceptions, but we can add the params
used in the default config.txt provided in OS images, to avoid reboots
when updating to this new supervisor and correctly parsing the
provisioning config.txt as variables.

While this addition handles most common scenarios, there is still a
chance a user may have use other base overlay dt params in the initial
config, in which case those will be interpreted according to the
relative ordering

Change-type: patch
2024-02-08 15:48:10 -03:00
Felipe Lalanne
9546a1a3b1 Fix processing of dtoverlay/dtparams on config.txt
DT overlays and DT params need to be consumed in the order that they
appear on the file. DT params apply to the last dtoverlay defined on the
file, or to the base overlay.

This commit updates config.txt parsing to consider this ordering, and it
also ensures global dtparams are written first so they cannot be
overriden by later overlays.

Because of the more strict parsing method, it is possible that existing
HOST_CONFIG vars do not match the interpretation of the parser. If
that's the case, the supervisor will re-apply the target state which
will cause the device to reboot.

Change-type: major
2024-02-08 15:46:07 -03:00
Felipe Lalanne
a8e371f0c9 Refactor config-txt backend
Cleans up code and adds better type detection
2024-02-07 20:39:41 -03:00
Christina Ying Wang
3afcef2969 Respect update strategies app-wide instead of at the service level
Fixes behavior for release updates which removes a service in current state
and adds a new service in target state.

Change-type: patch
Closes: #2095
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-01-29 12:26:28 -08:00
Felipe Lalanne
dec39a35d4 Try MDNS lookup only if regular DNS lookup fails
This is meant to allow users to configure their device to
resolve `.local` queries via dnsmasq by modifying config.json, e.g. `dnsServers":
"/bob.local/172.17.0.33`.

This would fail before as MDNS lookups would always come first

Change-type: minor
2024-01-03 14:42:23 -03:00
Felipe Lalanne
7a39da92b7 Refactor mdns lookup code in app entry
Change-type: patch
2024-01-03 14:42:23 -03:00
Felipe Lalanne
3ea8d4727a Force remove container if updateMetadata fails
The `updateMetadata` step renames the container to match the target
release when the service doesn't change between releases. We have seen
this step fail because of an engine bug that seems to relate to the
engine keeping stale references after container restarts. The only way
around this issue is to remove the old container and create it again.
This implements that workaround during the updateMetadata step to deal
with that issue.

Change-type: minor
Relates-to: balena-os/balena-engine#261
2023-11-22 14:16:44 -03:00
Christina Ying Wang
eb8ad11cd7 Cache last reported current state to /mnt/root/tmp
Whenever the Supervisor reports current state, it diffs the current state
with its last reported current state. However, when the Supervisor starts
up, there is no last reported state, since that last report is stored in
process memory. Caching the last report in a location that survives
Supervisor restarts will reduce the current report bandwidth used on startup.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-11-14 16:15:36 -08:00
Christina Ying Wang
d440776881 Convert current state types to io-ts
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-11-08 16:00:54 -08:00
Christina Ying Wang
a993b3e7af Set applyInProgress to true while applying intermediate state
Intermediate state is utilized when executing device actions such as a
volume purge. It's a type of state apply, but despite that,
applyInProgress is not true.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-10-25 10:32:10 -07:00
Felipe Lalanne
9bd216327f Expose ports from port mappings on services
PR #2217 removed the expose configuration but also caused a regresion
where ports set via the `ports` configuration would no longer get
exposed to the host, despite portmappings being set. This fixes that
issue by exposing only those ports comming from port mappings.

Change-type: patch
2023-10-24 15:04:39 -03:00
Felipe Lalanne
416170bc05 Ignore expose service compose configuration
The docker EXPOSE directive and corresponding docker-compose `expose`
service configuration serves as documentation/metadata that a container
listens on a certain port that may be used for service discovery but it doesn't
have any real impact on the ability for
other containers on the same network to access the exposed service via
the port. In newer engine implementations, this property may conflict
with other network configurations, and prevent the container from being
started by the docker engine (see #2211).

This PR removes code that would manage the expose property and takes the
property out of the whitelist. A composition with the `expose` property
will result in the log message `Ignoring unsupported or unknown compose fields: expose`.

While this change should not have operational impact, it still removes
a previously supported configuration and as such there is a chance of it
being a breaking change for some applications. For this reason it is
being published as a new major version.

Change-type: major
Closes: #2211
2023-10-23 11:41:32 -03:00
Felipe Lalanne
b107868765 Add note regading API jitter on target state poll
Change-type: patch
2023-10-23 14:11:20 +01:00
Pagan Gazzard
e15205301c Switch some _.includes usage to native versions
Change-type: patch
2023-10-16 14:30:25 -03:00
Pagan Gazzard
a4a9a17c1a Switch _.assign usage to native versions
Change-type: patch
2023-10-16 14:30:25 -03:00
Pagan Gazzard
d0cb54537f Switch _.isNaN usage to native versions
Change-type: patch
2023-10-16 14:30:25 -03:00
Pagan Gazzard
3bfdc4454e Switch _.isUndefined usage to native versions
Change-type: patch
2023-10-16 14:30:25 -03:00
Pagan Gazzard
8e23091aa9 Switch _.isNull usage to native versions
Change-type: patch
2023-10-16 14:30:25 -03:00