131 Commits

Author SHA1 Message Date
Christina Ying Wang
8c69166271
Support target state apply cancellation
The current target state apply is cancelled when either:
- /v1/update is called with cancel: true
- A different target state is received from the cloud (with a non-304 status)

Following apply cancellation, a target state apply is re-triggered. This ensures
that the user can force a device out of a dead-locked situation where a long-running
task such as an image fetch fails to cede control back to the Supervisor, which is
the behavior observed in an Engine bug with infinite pull retries with a bad network.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-05-28 07:46:10 -07:00
Christina Ying Wang
0af915d815
Pass AbortSignal to image pull functions
When abortController.abort() is called, this signal is passed down
to the functions that interface with Docker Engine for image pulls,
cancelling those pulls.

The next commit will limit when abortController.abort() is called.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-05-27 11:09:19 -07:00
Felipe Lalanne
4318272844
Remove unsupported fields from contract requirements
A contract including extra requirement fields, such as "name" would fail
validation. This PR removes any extra fields from the validated contract
to prevent services with these extra fields from getting rejected by the
contract validation.

Change-type: patch
2025-05-15 17:38:03 -04:00
Christina Ying Wang
b596c77ce2
Add Docker network label if custom ipam config
In a target release where the only change is the addition or removal
of a custom ipam config, the Supervisor does not recreate the network
due to ignoring ipam config differences when comparing current and target
network (in network.isEqualConfig). This commit implements the addition of
a network label if the target compose object includes a network with custom
ipam. With the label, the Supervisor will detect a difference between a
network with a custom ipam and a network without, without needing to compare
the ipam configs themselves.

This is a major change, as devices running networks with custom ipam configs
will have their networks recreated to add the network label.

Closes: #2251
Change-type: major
See: https://balena.fibery.io/Work/Project/Fix-Supervisor-not-recreating-network-when-passed-custom-ipam-config-1127
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-03-24 14:55:19 -07:00
Felipe Lalanne
7764f98c9d
Start a dependent if all dependencies are started
The previous behavior required that dependencies were running beefore
starting the dependent service. This made it that services dependent on
a one-shot service would not get started and goes against the default
docker behavior.

Depending on a service to be running will require the implementation of
[long syntax depends_on](https://docs.docker.com/reference/compose-file/services/#long-syntax-1) and the condition
`service_healthy`.

Change-type: patch
Closes: #2409
2025-03-20 14:51:32 -03:00
Pagan Gazzard
49163e92a0 Decrease balenaCloud api request timeout from 15m to 59s
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value

Change-type: patch
2025-03-04 12:29:18 +00:00
Felipe Lalanne
51f1fb0f30
Refactor device-config as part of device-state
Move the device-config module to the device-state folder and export only
those functions that are needed elsewhere in the codebase

This moves us closer to making the device-state module the only way to
modify application and configuration.

Change-type: patch
2025-01-09 14:31:43 -03:00
Felipe Lalanne
8e6c0fcad7
Wait for service dependencies to be running
This fixes a regression where dependencies would only be started in
order and would start the dependent service if its dependency had been
started at some point in the past, regardless of the running condition.

This makes the behavior more consistent with docker compose where the
[dependency needs to be
running or healthy](69a83d1303/pkg/compose/convergence.go (L441)) for the service to be started.

Change-type: patch
2024-12-13 16:22:11 -03:00
Christina Ying Wang
828bd22ba0 Add PowerFanConfig config backend
This config backend uses ConfigJsonConfigBackend to update
os.power and os.fan subfields under the "os" key, in order
to set power and fan configs. The expected format for os.power
and os.fan settings is:
```
{
  os: {
    power: {
      mode: string
    },
    fan: {
      profile: string
    }
  }
}
```

There may be other keys in os which are not managed by the Supervisor,
so PowerFanConfig backend doesn't read or write to them. Extra keys in os.power
and os.fan are ignored when getting boot config and removed when setting
boot config.

After this backend writes to config.json, host services os-power-mode
and os-fan-profile pick up the changes, on reboot in the former's case
and at runtime in the latter's case. The changes are applied by the host
services, which the Supervisor does not manage aside from streaming
their service logs to the dashboard.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:43:51 -08:00
Christina Ying Wang
54fcfa22a7 Support "os" key with object values in ConfigJsonConfigBackend
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:29:26 -08:00
Christina Ying Wang
9ec45a724b Add tests for ConfigJsonConfigBackend
Also deprecate path-getting method, and remove OS version check.
The OS version itself is not used in ConfigJsonConfigBackend, so
it seems the OS version check is to confirm the existence of config.json
during class init, because OS version is a field that's always there
in a valid config.json.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:29:26 -08:00
Christina Ying Wang
c610710f03 Move logger.ts into logging/index.ts
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-05 21:55:09 -08:00
Felipe Lalanne
a2d4b31b23
Take update locks for host-config changes
This adds update-lock support to hostname changes via the host-config
endpoint, in addition to proxy changes as changing the hostname may
cause an engine restart from the OS.

Change-type: minor
2024-12-03 15:07:24 -03:00
Felipe Lalanne
9c09329b86
Clean up remaining locks on state settle
Locks could remain from a previous supervisor run that didn't get to
settle the state. This ensures that cleanup will happen for remaining
locks every time the state is settled.

Change-type: patch
2024-11-27 16:40:58 -03:00
Felipe Lalanne
3c6e9dd209
Refactor update-locks implementation
The refactor simplifies the implementation and ensures that locks per
app can only be held by one supervisor task at the time.

Change-type: patch
2024-11-27 16:40:50 -03:00
Felipe Lalanne
d8f54c05e7
Refactor lockfile module
Updated interfaces for clarity

Change-type: patch
2024-11-15 18:25:50 -03:00
Christina Ying Wang
3d3f659f16 Delete apps not in target from db by appUuid instead of appId
Resolve an issue in balenaMachine instances that were installed at <v14.1.0,
in which a Supervisor app with random UUID is kept in the target db due to its appId
being the same, even after the BM instance has upgraded to v14.1.0 which patches
the correct reserved Supervisor app UUIDs in. This results in two Supervisors running
on devices under the BM instance which persists after BM upgrade.

See: https://balena.fibery.io/search/T7ozi#Inputs/Pattern/Two-supervisors-are-running-on-device-3370
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-11-04 14:15:55 -08:00
Felipe Lalanne
e9a52e6786 Store rejected apps in the database
This moves from throwing an error when an app is rejected due to unmet
requirements (because of contracts) to storing the target with a
`rejected` flag on the database.

The application manager filters rejected apps when calculating steps to
prevent them from affecting the current state. The state engine uses the
rejection info to generate the state report.

Change-type: minor
2024-08-30 10:52:11 -04:00
Felipe Lalanne
227fee9941 Set the app update status when reporting state
Change-type: minor
2024-08-30 10:52:11 -04:00
Christina Ying Wang
eaa07e97a9 Add support for redsocks dnsu2t config
Users may specify dnsu2t config by including a `dns` field
in the `proxy` section of PATCH /v1/device/host-config's body:
```
{
  network: {
    proxy: {
      dns: '1.1.1.1:53',
    }
  }
}
```

If `dns` is a string, ADDRESS and PORT are required and should be
in the format `ADDRESS:PORT`. The endpoint with error with
code 400 if either ADDRESS or PORT are missing.

`dns` may also be a boolean. If true, defaults will be configured.
If false, the dns configuration will be removed.

If `proxy` is patched to empty, `dns` will be removed regardless
of its current or input configs, as `dns` depends on an active
redsocks proxy to function.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 14:01:51 -07:00
Felipe Lalanne
b088b78a3e
Do not write noProxy to redsocks.conf
This fixes a regression introduced by the refactor in #2329 where
`noProxy` was being included in the data added to redsocks.conf.

Change-type: patch
2024-08-08 11:59:20 -04:00
Felipe Lalanne
a255001c2e
Verify that LED_FILE exists on blinking setup
Before v1, the blinking module would not throw when the passed led file
does not exist. This change checks for file existence and defaults to
`/dev/null` otherwise

Change-type: patch
2024-08-07 15:33:07 -04:00
Felipe Lalanne
f38714d40f
Cleanup images after state-engine tests
Tests on GitHub started failing recently because of leftover images from
the state engine test suite. This fixes that issue to allow tests to
pass.

Change-type: patch
2024-07-16 16:33:52 -04:00
Christina Ying Wang
f99ccb58c6 Remove unnecessary exports from host-config
This limits the host-config interface to necessary methods
only

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
Christina Ying Wang
53f5641ef1 Refactor host-config to be its own module
The host-config module exposes the following interfaces: get,
patch, and parse.

`get` gets host configuration such as redsocks proxy configuration
and hostname and returns it in an object of type HostConfiguration.

`patch` takes an object of type HostConfiguration or LegacyHostConfiguration
and updates the hostname and redsocks proxy configuration, optionally
forcing the patch through update locks.

`parse` takes a user input of unknown type and parses it into type
HostConfiguration or LegacyHostConfiguration for patching, erroring if
parse was unsuccessful.

LegacyHostConfiguration is a looser typing of the user input which does
not validate values of the five known proxy fields of type, ip, port,
username, and password. We should stop supporting it in the next
major Supervisor API release.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
Christina Ying Wang
be986a62a5 Add HostConfig.parse method
Parses input from PATCH /v1/device/host-config into either
type HostConfiguration, or if LegacyHostConfiguration if
input is of an acceptable shape (for backwards compatibility).

Once input has been determined to be of type HostConfiguration,
we can easily extract ProxyConfig from the object for patching,
stringifying, and writing to redsocks.conf.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:47:51 -07:00
Christina Ying Wang
1e224be0cd Add RedsocksConf.parse method
This is part of the host-config refactor which
enables easier encoding to / decoding from `redsocks.conf`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-07-03 16:45:06 -07:00
Christina Ying Wang
725d7790fb Move noProxy handling to separate module
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-06-28 11:34:27 -07:00
Christina Ying Wang
0cf5a4bf18 Move hostname get/set to separate "module" (directory)
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-06-28 11:34:21 -07:00
Felipe Lalanne
5d93f358aa
Create replicating test
This adds a test to check the case where a service image changes along
with networks being removed.
2024-06-24 15:54:19 -04:00
Felipe Lalanne
9497eed380
Move device-state/target state to api-binder/poll
This goes in the direction of grouping modules by responsibility. The
api-binder module is the middleware between the device and the backend,
thus the target state polling code makes more sense there.

Change-type: patch
2024-06-03 11:40:46 -04:00
Felipe Lalanne
ac2db38742 Move api-keys module to src/lib
This removes circular dependencies between the device-api module and
the compose module, reducing total circular dependencies to 15

Change-type: patch
2024-05-27 14:36:03 -04:00
Felipe Lalanne
234e0de075 Move composition types to compose/types
This reduces circular dependencies from 250 to 80 by ensuring that
modules that only require types do not import the full module with all
its dependencies.

Change-type: patch
2024-05-27 14:36:03 -04:00
Felipe Lalanne
94de4006a0 Split compose types into interface and implementation
This splits `App`, `Network`, `Service` and `Volume` which used to be
defined as classes into an interface and a class implementation that is
not exported. This will allow to work with just the types in some cases
and prevent circular dependencies when importing.

Change-type: patch
2024-05-27 14:36:03 -04:00
Christina Ying Wang
9c968b8d06 Move lib/fs-utils tests to testfs
This removes mock-fs as a dependency

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-24 13:37:45 -07:00
Felipe Lalanne
6f02b17968 Refactor MDNS resolver into a module
Also add integration tests for the resolver functionality to prevent
regressions.

Change-type: patch
2024-04-23 19:23:32 -04:00
Christina Ying Wang
57207c3539 Add additional update lock tests for lockOverride & force
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-15 14:46:44 -04:00
Christina Ying Wang
6e185fbd44 Don't follow symlinks when checking for lockfiles
The Supervisor should only care whether a lockfile exists or
not. This also fixes an edge case where a user symlinked a lockfile
to a nonexistent file, causing the Supervisor to enter an error
loop as it was not able to `stat` the nonexistent file.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-12 10:34:46 -04:00
Christina Ying Wang
f863075bdc Add memory usage healthcheck
This healthcheck fails when Supervisor memory usage is above a threshold
based on initial memory measurements after device state has settled.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-11 18:16:47 -07:00
Christina Ying Wang
8ac2ce4677 Respect lockOverride when taking locks
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-06 00:59:04 -07:00
Christina Ying Wang
fd7d58f89a Clean up lockfiles on takeLock step failure
We don't want any Supervisor lockfiles to remain on the device
when a takeLock step fails because this would interfere with the user app.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
fb1bd33ab6 Refine update locking interface
* Remove Supervisor lockfile cleanup SIGTERM listener
* Modify lockfile.getLocksTaken to read files from the filesystem
* Remove in-memory tracking of locks taken in favor of filesystem
* Require both `(resin-)updates.lock` to be locked with `nobody` UID
  for service to count as locked by the Supervisor

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
10f294cf8e Add takeLock to state funnel
A takeLock step should be generated before any of the following steps:
* kill
* start
* stop
* updateMetadata
* restart
* handover

ALL services in an app will be locked for any of the above actions,
unless the action is generated through Supervisor API's
`POST /v2/applications/:appId/(start|stop|restart)-service` endpoints,
in which case only the target service will be locked.

A lock will be taken for a service before it starts by creating the
directory in /tmp before the Engine creates it through bind mounts.

Also, the commit simplifies the generation of service kill
steps from network/volume changes or removals.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
cf8d8cedd7 Simplify lock interface to prep for adding takeLock to state funnel
This commit changes a few things:

* Pass `force` to `takeLock` step directly. This allows us to remove
the `lockFn` used by app manager's action executors, setting takeLock
as the main interface to interact with the update lock module. Note
that this commit by itself will not pass tests, as no update locking
occurs where it once did. This will be amended in the next commit.

* Remove locking functions from doRestart & doPurge, as this is
the only area where skipLock is required.

* Remove `skipLock` interface, as it's redundant with the functionality
of `force`. The only time `skipLock` is true is in doRestart/doPurge,
as those API methods are already run within a lock function. We removed
the lock function which removes the need for skipLock, and in the next
commit we'll add locking as a composition step to replace the
functionality removed here.

* Remove some methods not in use, such as app manager's `stopAll`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
af6359f7ae Take lock before updating service metadata
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
e6df78a22b Implement takeLock composition step + tests
This commit only implements the action that a takeLock step
results in. It does not add takeLock step generation logic
to the state funnel yet.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
f2843e1382 Add update lock release functionality to state funnel
releaseLock is a step that will be inferred if there are services
in target state, and if some of those services have locks taken by
the Supervisor.

The releaseLock composition step calls the method of the same name
in the updateLock module, which takes the exclusive process lock before
disposing all Supervisor lockfiles in the target appId.

This is half of the update lock incorporation into the state funnel, as
we also need to introduce a takeLock step which triggers during crucial
stages of device state transition.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Christina Ying Wang
d18a740a40 Add methods for easier checking of lockfile existence
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-04-04 14:07:47 -07:00
Felipe Lalanne
6217546894 Update typescript to v5
This also updates code to use the default import syntax instead of
`import * as` when the imported module exposes a default. This is needed
with the latest typescript version.

Change-type: patch
2024-03-05 15:33:56 -03:00
Felipe Lalanne
988a1c9e9a Update @balena/lint to v7
This updates balena lint to the latest version to enable eslint support
and unblock Typescript updates. This is a huge number of changes as the
linting rules are much more strict now, requiring modifiying most files
in the codebase. This commit also bumps the test dependency `rewire` as
that was interfering with the update of balena-lint

Change-type: patch
2024-03-01 18:27:30 -03:00