1660 Commits

Author SHA1 Message Date
Felipe Lalanne
026dc0aed2
Release locks when removing apps
This prevents leftover locks that can prevent other operations from
taking place.

Change-type: patch
2025-03-06 11:50:31 -03:00
Felipe Lalanne
6d00be2093
Log non-API errors during state poll
The supervisor was failing silently if an error happened while establishing the
connection (e.g. requesting the socket).

Change-type: patch
2025-03-04 10:46:45 -03:00
Felipe Lalanne
f8bdb14335
Fix target poll healthcheck
The Target.lastFetch time compared when performing the healthcheck
resets any time a poll is attempted no matter the outcome. This changes
the behavior so the time is reset only on a successful poll

Change-type: patch
2025-03-04 10:45:31 -03:00
Pagan Gazzard
49163e92a0 Decrease balenaCloud api request timeout from 15m to 59s
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value

Change-type: patch
2025-03-04 12:29:18 +00:00
Christina Ying Wang
2dc9d275b1 Don't revert to regular pull if delta server 401
If the Supervisor receives a 401 Unauthorized from the delta server
when requesting a delta image location, we should surface the error
instead of falling back to a regular pull immediately, as there could
be an issue with the delta auth token, which refreshes after
DELTA_TOKEN_TIMEOUT (10min), or some other edge case.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-24 16:17:15 -08:00
Christina Ying Wang
341111f1f9 Retry DELTA_APPLY_RETRY_COUNT (3) times during delta apply fail before reverting to regular pull
This prevents an image download error loop where the delta image on the delta server is present,
but some aspect of the delta image or the base image on the device does not match up, causing
the delta to fail to be applied to the base image.

Delta apply errors don't raise status codes as they are thrown from the Engine (although they should),
so if an error with a status code is raised during this time, throw an error to the handler
indicating that the delta should be retried until success. Errors with status codes raised during
this time are largely network related, so falling back to a regular pull won't improve anything.

Upon delta apply errors exceeding DELTA_APPLY_RETRY_COUNT, revert to a regular pull.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-11 12:19:53 -08:00
Christina Ying Wang
1fc242200f Revert to regular pull immediately on delta server failure (code 400s)
If the delta server responds immediately with HTTP 4xx upon requesting a delta image,
this means the server is not able to supply the resource, so fall back to a regular pull
immediately.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-11 10:58:51 -08:00
Felipe Lalanne
f71f98777c
Update network-manager to v1
Change-type: patch
2025-01-23 23:40:52 -03:00
Felipe Lalanne
85fc5784bc
Update contrato to v0.12.0
Change-type: patch
2025-01-15 18:56:24 -03:00
Felipe Lalanne
e416ad0daf
Add support for io.balena.update.requires-reboot
This label can be used by user services to indicate that a reboot is
required after the install of a service in order to fully apply an update.

Change-type: minor
2025-01-14 11:20:35 -03:00
Felipe Lalanne
75127c6074
Move reboot breadcrumb check to device-state
This was on device-config before, but we'll need to set the reboot
breadcrumb from the application-manager as well when we introduce
`requires-reboot` as a label.

Change-type: patch
2025-01-09 14:31:55 -03:00
Felipe Lalanne
51f1fb0f30
Refactor device-config as part of device-state
Move the device-config module to the device-state folder and export only
those functions that are needed elsewhere in the codebase

This moves us closer to making the device-state module the only way to
modify application and configuration.

Change-type: patch
2025-01-09 14:31:43 -03:00
Felipe Lalanne
8e6c0fcad7
Wait for service dependencies to be running
This fixes a regression where dependencies would only be started in
order and would start the dependent service if its dependency had been
started at some point in the past, regardless of the running condition.

This makes the behavior more consistent with docker compose where the
[dependency needs to be
running or healthy](69a83d1303/pkg/compose/convergence.go (L441)) for the service to be started.

Change-type: patch
2024-12-13 16:22:11 -03:00
Christina Ying Wang
2f2b2e1c50 Don't require reboot if setting fan control
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:43:57 -08:00
Christina Ying Wang
828bd22ba0 Add PowerFanConfig config backend
This config backend uses ConfigJsonConfigBackend to update
os.power and os.fan subfields under the "os" key, in order
to set power and fan configs. The expected format for os.power
and os.fan settings is:
```
{
  os: {
    power: {
      mode: string
    },
    fan: {
      profile: string
    }
  }
}
```

There may be other keys in os which are not managed by the Supervisor,
so PowerFanConfig backend doesn't read or write to them. Extra keys in os.power
and os.fan are ignored when getting boot config and removed when setting
boot config.

After this backend writes to config.json, host services os-power-mode
and os-fan-profile pick up the changes, on reboot in the former's case
and at runtime in the latter's case. The changes are applied by the host
services, which the Supervisor does not manage aside from streaming
their service logs to the dashboard.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:43:51 -08:00
Christina Ying Wang
54fcfa22a7 Support "os" key with object values in ConfigJsonConfigBackend
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:29:26 -08:00
Christina Ying Wang
9ec45a724b Add tests for ConfigJsonConfigBackend
Also deprecate path-getting method, and remove OS version check.
The OS version itself is not used in ConfigJsonConfigBackend, so
it seems the OS version check is to confirm the existence of config.json
during class init, because OS version is a field that's always there
in a valid config.json.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-09 18:29:26 -08:00
Christina Ying Wang
8f3eeff72d Stream logs from last SV's State.FinishedAt, process uptime otherwise
This will catch any container or host logs between Supervisor runs. If
FinishedAt is invalid (0), the last sent timestamp is already set (i.e.
this isn't the first time logMonitor.start() has been called), or
the Supervisor container metadata couldn't be acquired, use the
Supervisor process uptime as the default. This has the downside of
missing any logs generated during SV downtime, but at least
means the log-streamer can proceed without error.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-06 07:46:38 -08:00
Christina Ying Wang
fb6fa9b16c Add ability to stream logs from host services to cloud
Add `os-power-mode.service`, `nvpmodel.service`, and `os-fan-profile.service`
which report status from applying power mode and fan profile configs as read
from config.json. The Supervisor sets these configs in config.json for these
host services to pick up and apply.

Also add host log streaming from `jetson-qspi-manager.service` as it
will very soon be needed for Jetson Orins.

Relates-to: #2379
See: balena-io/open-balena-api#1792
See: balena-os/balena-jetson-orin#513
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-06 07:45:43 -08:00
Christina Ying Wang
c610710f03 Move logger.ts into logging/index.ts
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-05 21:55:09 -08:00
Christina Ying Wang
e62e245fc7 Modify log monitor logging to be more generic
Includes other host services in addition to balena.service

Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-12-05 09:11:04 -08:00
Felipe Lalanne
a2d4b31b23
Take update locks for host-config changes
This adds update-lock support to hostname changes via the host-config
endpoint, in addition to proxy changes as changing the hostname may
cause an engine restart from the OS.

Change-type: minor
2024-12-03 15:07:24 -03:00
Felipe Lalanne
8b3b9a5b7b
Respect lockOverride when using withLock 2024-11-27 16:40:58 -03:00
Felipe Lalanne
9c09329b86
Clean up remaining locks on state settle
Locks could remain from a previous supervisor run that didn't get to
settle the state. This ensures that cleanup will happen for remaining
locks every time the state is settled.

Change-type: patch
2024-11-27 16:40:58 -03:00
Felipe Lalanne
3c6e9dd209
Refactor update-locks implementation
The refactor simplifies the implementation and ensures that locks per
app can only be held by one supervisor task at the time.

Change-type: patch
2024-11-27 16:40:50 -03:00
Felipe Lalanne
d8f54c05e7
Refactor lockfile module
Updated interfaces for clarity

Change-type: patch
2024-11-15 18:25:50 -03:00
Christina Ying Wang
7e1cafa866 Firewall: allow DNS requests from custom Docker bridge networks
We only allow DNS requests through `balena0` interface, but this
is the default Docker bridge which is used for containers that
don't have a custom bridge. However, the Supervisor creates a
custom bridge for all containers unless another network mode is
specified.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-11-08 17:02:34 -08:00
Christina Ying Wang
3d3f659f16 Delete apps not in target from db by appUuid instead of appId
Resolve an issue in balenaMachine instances that were installed at <v14.1.0,
in which a Supervisor app with random UUID is kept in the target db due to its appId
being the same, even after the BM instance has upgraded to v14.1.0 which patches
the correct reserved Supervisor app UUIDs in. This results in two Supervisors running
on devices under the BM instance which persists after BM upgrade.

See: https://balena.fibery.io/search/T7ozi#Inputs/Pattern/Two-supervisors-are-running-on-device-3370
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-11-04 14:15:55 -08:00
Christina Ying Wang
ed1c18e369
Add support for init field from compose
Init supports boolean values, and is not included in the config when
not defined.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-09-26 10:39:59 -03:00
Felipe Lalanne
e9a52e6786 Store rejected apps in the database
This moves from throwing an error when an app is rejected due to unmet
requirements (because of contracts) to storing the target with a
`rejected` flag on the database.

The application manager filters rejected apps when calculating steps to
prevent them from affecting the current state. The state engine uses the
rejection info to generate the state report.

Change-type: minor
2024-08-30 10:52:11 -04:00
Felipe Lalanne
227fee9941 Set the app update status when reporting state
Change-type: minor
2024-08-30 10:52:11 -04:00
Felipe Lalanne
48e526ec43 Refactor contracts validation code
This updates the interfaces on lib/contracts and the validation in
the application-manager module.
2024-08-30 10:52:11 -04:00
Felipe Lalanne
e9f460fd75 Add update status to types
Change-type: minor
2024-08-30 10:52:11 -04:00
Felipe Lalanne
788afee9a1
Remove unused patchDevice function
This function was a remainder of the dependent devices code that no
was removed on #2105

Change-type: patch
2024-08-29 10:34:43 -04:00
Christina Ying Wang
eaa07e97a9 Add support for redsocks dnsu2t config
Users may specify dnsu2t config by including a `dns` field
in the `proxy` section of PATCH /v1/device/host-config's body:
```
{
  network: {
    proxy: {
      dns: '1.1.1.1:53',
    }
  }
}
```

If `dns` is a string, ADDRESS and PORT are required and should be
in the format `ADDRESS:PORT`. The endpoint with error with
code 400 if either ADDRESS or PORT are missing.

`dns` may also be a boolean. If true, defaults will be configured.
If false, the dns configuration will be removed.

If `proxy` is patched to empty, `dns` will be removed regardless
of its current or input configs, as `dns` depends on an active
redsocks proxy to function.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 14:01:51 -07:00
Christina Ying Wang
8bf346a6fd Parse dnsu2t block to dns config
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
Christina Ying Wang
b775f8f14d Stringify dns subsection of redsocks input config to dnsu2t
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
Christina Ying Wang
e724f60beb Strip additional fields from HostConfiguration type
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-28 13:51:46 -07:00
Christina Ying Wang
51e59725f8 Add unit test for usingInferStepsLock
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-26 13:44:51 -07:00
Christina Ying Wang
3cebfa9f78 Revert PR #2364
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-22 14:31:35 -07:00
Christina Ying Wang
fc6927e53d Avoid unnecessary config calls during Supervisor init
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2024-08-20 19:11:14 -07:00
Felipe Lalanne
b088b78a3e
Do not write noProxy to redsocks.conf
This fixes a regression introduced by the refactor in #2329 where
`noProxy` was being included in the data added to redsocks.conf.

Change-type: patch
2024-08-08 11:59:20 -04:00
Felipe Lalanne
a255001c2e
Verify that LED_FILE exists on blinking setup
Before v1, the blinking module would not throw when the passed led file
does not exist. This change checks for file existence and defaults to
`/dev/null` otherwise

Change-type: patch
2024-08-07 15:33:07 -04:00
Felipe Lalanne
d789e5bb77
Avoid leaking memory on deep promise recursions
The following pattern
```ts
async function longRunning() {
   // do something
   await setTimeout(delay);
   await longRunning();
}
```

Is regularly used for long running operations on the supervisor (e.g.
polling target state). We have
recently discovered that this pattern can slowly leak memory as it
essentially creates an infinite promise chain. Using `void longRunning()` breaks
the chain and avoids the issue.

This commit fixes all those instances where the pattern was used.

Change-type: patch
2024-07-31 18:39:29 -04:00
Felipe Lalanne
8bc08750e9
Use promises for setup/writing for logging backend
The balena logging backend now uses async functions to setup the
connection and write messages to the request stream. This adds some
backpressure on `log` calls by by the log monitor module, to prevent a
very agressive container causing the supervisor to waste CPU cycles just
dropping messages.

Change-type: patch
2024-07-30 10:51:19 -04:00
Felipe Lalanne
f3fcb0db7a
Improve the LogBackend interface
This make the LogBackend `log` method into an async method in
preparation for upcoming changes that will use backpressure from the
connection to delay logging coming from containers.

This also removes unnecessary imageId from the LogMessage type

Change-type: patch
2024-07-30 10:51:19 -04:00
Felipe Lalanne
5af948483a
Use stream pipeline instead of pipe
This also removes the use of JSONStream from the monitor module

Change-type: patch
2024-07-30 10:51:19 -04:00
Felipe Lalanne
dbacca977a
Do not use DB to store container logs info
This removes the dependence of the supervisor on the containerLogs
database for remembering the last sent timestamp. This commit instead
uses the supervisor startup time as the initial time for log retrieval.
This might result in some logs missing for services that may start
before the supervisor after a boot, or if the supervisor restarts.
However this seems like an acceptable trade-off as the current
implementation seems to make things worst in resource contrained
environments.

We'll move storing the last sent timestamp to a better storage medium in
a future commit.

Change-type: minor
2024-07-30 10:51:18 -04:00
Pagan Gazzard
4976578a83 Improve log message typing
Change-type: patch
2024-07-17 11:14:17 +01:00
Pagan Gazzard
c5d0eafea9 Logs: only truncate the message if it's possible it will need it
Change-type: patch
2024-07-16 18:09:12 -04:00