The state engine and preloading is performed before the device gets a
chance to register, while this is desirable for preloaded apps, it
introduces a delay on registration which is known to cause issues since
the VPN is also trying to connect at the same time.
This triggers a simultaneous start of the device engine, the API binder
and the supevisor API to avoid delays.
Change-type: patch
Update-lock tests now use the actual filesystem for testing, instead of
relying on stubs and spies.
This commit also fixes a small bug with update-lock that would cause a
`PromiseRejectionHandledWarning` when the lock callback would throw.
Now the tests are ran against the actual docker engine instead of
against mockerode.
The new tests actually caught a bug in
`volumeManager.removeOrphanedVolumes`, where that function would try to
remove volumes for stopped containers, causing an exception.
This commit also fixes that bug.
The supervisor used to rely on specific event reporting for identifying
issues at runtime. As the platform has grown, it has become much more
difficult to get any signal from the event noise. Recently the API side
for these events has been disabled, meaning these events only
contribute to bandwidth consumption. This commit disables the
event reporting feature of the supervisor which will be most likely
replaced by something like Sentry in the near future.
Change-type: minor
The supervisor supports target state `running: false` for services.
This state indicates that the service should be stopped if already
running, or that the container should just be created and never started
if the container does not exist. This commit fixes the latter behavior.
Although nothing in our platform currently sends this target state, this
enables some potential use cases, e.g. only starting some services
in manufacturing and starting the rest of the services when the device
actually connects.
Change-type: patch
Closes: #2014
`withDefault` is a type helper that allows to create a type that
defaults to a default value when trying to decode a nullish value.
That type was not correctly working with boolean types, causing `false`
values to be replaced by true. This would specifically cause issues when
parsing the target state, where a `running: false` in a service would
become a `running: true` due to the type decoding.
Change-type: patch
Under some conditions, an aarch64 device may get a reference to a armv7hf
supervisor on the target state. One of the ways this can happen is if
an aarch64 device is added to an armv7hf fleet and the target supervisor
is set before the device fully provisions.
If that happens, the previous filtering for the supervisor app (which
relied on the architecture in device-type.json) would
fail and the user would end up with two supervisor containers, one
running correctly and the other crash looping.
This fixes the filtering and just checks if the supervisor uuid/service
name belongs to a group of known uuids.
Closes: #2006
Change-type: patch
The supervisor uses the following pattern for async module
initialization
```typescript
// module.ts
export const initialised = (async () => {
// do some async initialization
})();
// somewhere else
import * as module from 'module';
async function setup() {
await module.initialise;
}
```
The above pattern means that whenever the module is imported, the
initialisation procedure will be ran, which is an anti-pattern.
This converts any instance of this pattern into a function
```typescript
export const initialised = _.once(async () => {
// do some async initialization
});
```
And anywhere else on the code it replaces the call with a
```typescript
await module.initialised();
```
Change-type: patch
This mitigates an edge case bug introduced in v13.1.3 where services that
are slow to exit may get stuck in a state of Downloaded if a service var is
changed then reverted rapidly. More detailed description in linked issue.
Change-type: patch
Closes: #1991
Signed-off-by: Christina Wang <christina@balena.io>
State report errors contribute to the supervisor failing healthchecks
and being restarted by the engine. There is not evidence of this
improving the connectivity situation and it is likely to make things
worst for the API as the first report is much more expensive than
subsequent partial reports.
Change-type: patch
Closes: #1986
Some libraries, like [proper-lockfile](https://www.npmjs.com/package/proper-lockfile)
use directories instead of files for locking. This PR allows the supervisor to be able to
work with those types of locks when lock override is requested.
Closes: #1978
Change-type: patch
Host config shouldn't be tied to applications in the first place, but
needs to be done so because it uses update locks to determine when it's
safe to patch host config, and update locks are tied to apps.
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
We don't need to read the host's hostname through /mnt/root/etc/hostname,
because the hostname is written to config.json on a change. When the hostname
has never changed, it won't be found in config.json, so we can default to
the Supervisor container's /etc/hostname as it will match the host's
/etc/hostname, the network mode being `host`.
Closes: #1968
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
This fixes a race condition that could occur with the first current
state report, where if the device managed to send the current state
report first, then the device name on the cloud would be set to `local`
(see #1959).
Closes: #1959
Change-type: patch
balena-compose already supports this, and with this PR, Supervisor can
have the option of using HostConfig.Mounts for internal bind mounts such as
ones added by feature labels. This will be handled in a future PR.
The only blocker to having users use long syntax is adding this feature
to target state. This PR does not add that feature.
Relates-to: https://github.com/balena-os/balena-supervisor/pull/1780
Relates-to: https://github.com/balena-os/balena-engine/issues/220
Relates-to: #1933
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
Splash image backend would throw if the image is not a valid png during
the write step. This could prevent the device from provisioning if some
corruption happens at some point.
Change-type: patch
This commit updates all backends that write to /mnt/boot to do it
through a new `lib/host-utils` module. Writes are now done using write +
sync as rename is not an atomic operation in vfat.
The change also applies for writes through the `/v1/host-config`
endpoint.
Finally this change includes some improvements on tests.
Change-type: patch
This will ensure the restart policy specified is not violated
Change-type: patch
Closes: #1668
Signed-off-by: 20k-ultra <3946250+20k-ultra@users.noreply.github.com>
When disposing of resources which include Supervisor-created lockfiles,
only dispose of lockfiles for the specified user application.
Signed-off-by: Christina Wang <christina@balena.io>