The supervisor used to rely on specific event reporting for identifying
issues at runtime. As the platform has grown, it has become much more
difficult to get any signal from the event noise. Recently the API side
for these events has been disabled, meaning these events only
contribute to bandwidth consumption. This commit disables the
event reporting feature of the supervisor which will be most likely
replaced by something like Sentry in the near future.
Change-type: minor
The supervisor supports target state `running: false` for services.
This state indicates that the service should be stopped if already
running, or that the container should just be created and never started
if the container does not exist. This commit fixes the latter behavior.
Although nothing in our platform currently sends this target state, this
enables some potential use cases, e.g. only starting some services
in manufacturing and starting the rest of the services when the device
actually connects.
Change-type: patch
Closes: #2014
`withDefault` is a type helper that allows to create a type that
defaults to a default value when trying to decode a nullish value.
That type was not correctly working with boolean types, causing `false`
values to be replaced by true. This would specifically cause issues when
parsing the target state, where a `running: false` in a service would
become a `running: true` due to the type decoding.
Change-type: patch
Under some conditions, an aarch64 device may get a reference to a armv7hf
supervisor on the target state. One of the ways this can happen is if
an aarch64 device is added to an armv7hf fleet and the target supervisor
is set before the device fully provisions.
If that happens, the previous filtering for the supervisor app (which
relied on the architecture in device-type.json) would
fail and the user would end up with two supervisor containers, one
running correctly and the other crash looping.
This fixes the filtering and just checks if the supervisor uuid/service
name belongs to a group of known uuids.
Closes: #2006
Change-type: patch
The supervisor uses the following pattern for async module
initialization
```typescript
// module.ts
export const initialised = (async () => {
// do some async initialization
})();
// somewhere else
import * as module from 'module';
async function setup() {
await module.initialise;
}
```
The above pattern means that whenever the module is imported, the
initialisation procedure will be ran, which is an anti-pattern.
This converts any instance of this pattern into a function
```typescript
export const initialised = _.once(async () => {
// do some async initialization
});
```
And anywhere else on the code it replaces the call with a
```typescript
await module.initialised();
```
Change-type: patch
This mitigates an edge case bug introduced in v13.1.3 where services that
are slow to exit may get stuck in a state of Downloaded if a service var is
changed then reverted rapidly. More detailed description in linked issue.
Change-type: patch
Closes: #1991
Signed-off-by: Christina Wang <christina@balena.io>
State report errors contribute to the supervisor failing healthchecks
and being restarted by the engine. There is not evidence of this
improving the connectivity situation and it is likely to make things
worst for the API as the first report is much more expensive than
subsequent partial reports.
Change-type: patch
Closes: #1986
Some libraries, like [proper-lockfile](https://www.npmjs.com/package/proper-lockfile)
use directories instead of files for locking. This PR allows the supervisor to be able to
work with those types of locks when lock override is requested.
Closes: #1978
Change-type: patch
Host config shouldn't be tied to applications in the first place, but
needs to be done so because it uses update locks to determine when it's
safe to patch host config, and update locks are tied to apps.
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
We don't need to read the host's hostname through /mnt/root/etc/hostname,
because the hostname is written to config.json on a change. When the hostname
has never changed, it won't be found in config.json, so we can default to
the Supervisor container's /etc/hostname as it will match the host's
/etc/hostname, the network mode being `host`.
Closes: #1968
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
This fixes a race condition that could occur with the first current
state report, where if the device managed to send the current state
report first, then the device name on the cloud would be set to `local`
(see #1959).
Closes: #1959
Change-type: patch
balena-compose already supports this, and with this PR, Supervisor can
have the option of using HostConfig.Mounts for internal bind mounts such as
ones added by feature labels. This will be handled in a future PR.
The only blocker to having users use long syntax is adding this feature
to target state. This PR does not add that feature.
Relates-to: https://github.com/balena-os/balena-supervisor/pull/1780
Relates-to: https://github.com/balena-os/balena-engine/issues/220
Relates-to: #1933
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
Splash image backend would throw if the image is not a valid png during
the write step. This could prevent the device from provisioning if some
corruption happens at some point.
Change-type: patch
This commit updates all backends that write to /mnt/boot to do it
through a new `lib/host-utils` module. Writes are now done using write +
sync as rename is not an atomic operation in vfat.
The change also applies for writes through the `/v1/host-config`
endpoint.
Finally this change includes some improvements on tests.
Change-type: patch
This will ensure the restart policy specified is not violated
Change-type: patch
Closes: #1668
Signed-off-by: 20k-ultra <3946250+20k-ultra@users.noreply.github.com>
When disposing of resources which include Supervisor-created lockfiles,
only dispose of lockfiles for the specified user application.
Signed-off-by: Christina Wang <christina@balena.io>
The linked issue describes the Supervisor not cleaning up locks it creates due
to crashing at just the wrong time. After internal discussion we decided to
differentiate Supervisor-created lockfiles from user-created lockfiles by using
the `nobody` UID (65534) for Supervisor-created lockfiles.
As the existing NPM lockfile lib does not allow creating lockfiles atomically
with different UIDs, we move to using the lockfile binary, which is part of the
procmail package. To allow nonroot users to write to lock directories, permissions
are changed to allow write access by nonroot users.
See: https://www.flowdock.com/app/rulemotion/r-resinos/threads/gWMgK5hmR26TzWGHux62NpgJtVl
Change-type: minor
Closes: #1758
Signed-off-by: Christina Wang <christina@balena.io>
dmidecode for alpine 3.11 doesn't work in this device type. This change
moves to using `/proc/device-tree/product-sn` and
`/proc/device-tree/product-name` for these devices.
Resolves: #1916
Change-type: patch
Migration `M00008` had a bug with the check for legacy apps, which
resulted in devices that had at some point been updated from a single
container supervisor to get the error
```
Undefined binding(s) detected when compiling UPDATE. Undefined column(s): [appUuid] query
```
This adds a new migration with the fix to ensure broken fix the
inconsistent database state.
Change-type: patch
Closes: #1913
If an app is not in the target state means the supervisor no longer
has permissions to that app hence it cannot report on it. When moving
between apps, there is a transitional period where containers and images
from both apps can be in the current state, therefore filtering is
needed to prevent getting 401 errors from the API.
Starting with v3 state endpoint, the supervisor may receive the configuration
for the supervisor service on the target state. This commit allows the
supervisor to filter out the supervisor container from the current and target
state to let the update-balena-supervisor script handle the creation and update
of the supervisor container.
Updating and creating the supervisor container will be handled by a
future commit