The Target.lastFetch time compared when performing the healthcheck
resets any time a poll is attempted no matter the outcome. This changes
the behavior so the time is reset only on a successful poll
Change-type: patch
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value
Change-type: patch
If the Supervisor receives a 401 Unauthorized from the delta server
when requesting a delta image location, we should surface the error
instead of falling back to a regular pull immediately, as there could
be an issue with the delta auth token, which refreshes after
DELTA_TOKEN_TIMEOUT (10min), or some other edge case.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This prevents an image download error loop where the delta image on the delta server is present,
but some aspect of the delta image or the base image on the device does not match up, causing
the delta to fail to be applied to the base image.
Delta apply errors don't raise status codes as they are thrown from the Engine (although they should),
so if an error with a status code is raised during this time, throw an error to the handler
indicating that the delta should be retried until success. Errors with status codes raised during
this time are largely network related, so falling back to a regular pull won't improve anything.
Upon delta apply errors exceeding DELTA_APPLY_RETRY_COUNT, revert to a regular pull.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
If the delta server responds immediately with HTTP 4xx upon requesting a delta image,
this means the server is not able to supply the resource, so fall back to a regular pull
immediately.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This label can be used by user services to indicate that a reboot is
required after the install of a service in order to fully apply an update.
Change-type: minor
This was on device-config before, but we'll need to set the reboot
breadcrumb from the application-manager as well when we introduce
`requires-reboot` as a label.
Change-type: patch
Move the device-config module to the device-state folder and export only
those functions that are needed elsewhere in the codebase
This moves us closer to making the device-state module the only way to
modify application and configuration.
Change-type: patch
This fixes a regression where dependencies would only be started in
order and would start the dependent service if its dependency had been
started at some point in the past, regardless of the running condition.
This makes the behavior more consistent with docker compose where the
[dependency needs to be
running or healthy](69a83d1303/pkg/compose/convergence.go (L441)) for the service to be started.
Change-type: patch
This config backend uses ConfigJsonConfigBackend to update
os.power and os.fan subfields under the "os" key, in order
to set power and fan configs. The expected format for os.power
and os.fan settings is:
```
{
os: {
power: {
mode: string
},
fan: {
profile: string
}
}
}
```
There may be other keys in os which are not managed by the Supervisor,
so PowerFanConfig backend doesn't read or write to them. Extra keys in os.power
and os.fan are ignored when getting boot config and removed when setting
boot config.
After this backend writes to config.json, host services os-power-mode
and os-fan-profile pick up the changes, on reboot in the former's case
and at runtime in the latter's case. The changes are applied by the host
services, which the Supervisor does not manage aside from streaming
their service logs to the dashboard.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
Also deprecate path-getting method, and remove OS version check.
The OS version itself is not used in ConfigJsonConfigBackend, so
it seems the OS version check is to confirm the existence of config.json
during class init, because OS version is a field that's always there
in a valid config.json.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This will catch any container or host logs between Supervisor runs. If
FinishedAt is invalid (0), the last sent timestamp is already set (i.e.
this isn't the first time logMonitor.start() has been called), or
the Supervisor container metadata couldn't be acquired, use the
Supervisor process uptime as the default. This has the downside of
missing any logs generated during SV downtime, but at least
means the log-streamer can proceed without error.
Signed-off-by: Christina Ying Wang <christina@balena.io>
Add `os-power-mode.service`, `nvpmodel.service`, and `os-fan-profile.service`
which report status from applying power mode and fan profile configs as read
from config.json. The Supervisor sets these configs in config.json for these
host services to pick up and apply.
Also add host log streaming from `jetson-qspi-manager.service` as it
will very soon be needed for Jetson Orins.
Relates-to: #2379
See: balena-io/open-balena-api#1792
See: balena-os/balena-jetson-orin#513
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
This adds update-lock support to hostname changes via the host-config
endpoint, in addition to proxy changes as changing the hostname may
cause an engine restart from the OS.
Change-type: minor
Locks could remain from a previous supervisor run that didn't get to
settle the state. This ensures that cleanup will happen for remaining
locks every time the state is settled.
Change-type: patch
We only allow DNS requests through `balena0` interface, but this
is the default Docker bridge which is used for containers that
don't have a custom bridge. However, the Supervisor creates a
custom bridge for all containers unless another network mode is
specified.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
Resolve an issue in balenaMachine instances that were installed at <v14.1.0,
in which a Supervisor app with random UUID is kept in the target db due to its appId
being the same, even after the BM instance has upgraded to v14.1.0 which patches
the correct reserved Supervisor app UUIDs in. This results in two Supervisors running
on devices under the BM instance which persists after BM upgrade.
See: https://balena.fibery.io/search/T7ozi#Inputs/Pattern/Two-supervisors-are-running-on-device-3370
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
Init supports boolean values, and is not included in the config when
not defined.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This moves from throwing an error when an app is rejected due to unmet
requirements (because of contracts) to storing the target with a
`rejected` flag on the database.
The application manager filters rejected apps when calculating steps to
prevent them from affecting the current state. The state engine uses the
rejection info to generate the state report.
Change-type: minor
Users may specify dnsu2t config by including a `dns` field
in the `proxy` section of PATCH /v1/device/host-config's body:
```
{
network: {
proxy: {
dns: '1.1.1.1:53',
}
}
}
```
If `dns` is a string, ADDRESS and PORT are required and should be
in the format `ADDRESS:PORT`. The endpoint with error with
code 400 if either ADDRESS or PORT are missing.
`dns` may also be a boolean. If true, defaults will be configured.
If false, the dns configuration will be removed.
If `proxy` is patched to empty, `dns` will be removed regardless
of its current or input configs, as `dns` depends on an active
redsocks proxy to function.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
Before v1, the blinking module would not throw when the passed led file
does not exist. This change checks for file existence and defaults to
`/dev/null` otherwise
Change-type: patch
The following pattern
```ts
async function longRunning() {
// do something
await setTimeout(delay);
await longRunning();
}
```
Is regularly used for long running operations on the supervisor (e.g.
polling target state). We have
recently discovered that this pattern can slowly leak memory as it
essentially creates an infinite promise chain. Using `void longRunning()` breaks
the chain and avoids the issue.
This commit fixes all those instances where the pattern was used.
Change-type: patch
The balena logging backend now uses async functions to setup the
connection and write messages to the request stream. This adds some
backpressure on `log` calls by by the log monitor module, to prevent a
very agressive container causing the supervisor to waste CPU cycles just
dropping messages.
Change-type: patch
This make the LogBackend `log` method into an async method in
preparation for upcoming changes that will use backpressure from the
connection to delay logging coming from containers.
This also removes unnecessary imageId from the LogMessage type
Change-type: patch
This removes the dependence of the supervisor on the containerLogs
database for remembering the last sent timestamp. This commit instead
uses the supervisor startup time as the initial time for log retrieval.
This might result in some logs missing for services that may start
before the supervisor after a boot, or if the supervisor restarts.
However this seems like an acceptable trade-off as the current
implementation seems to make things worst in resource contrained
environments.
We'll move storing the last sent timestamp to a better storage medium in
a future commit.
Change-type: minor