Devices affected by the bug described in 1576, are also stuck with some
services in the `Downloaded` state, because the state engine does not
detect that the running services should be killed on a network change
even if they belong to a new release. This is a bug, which can be
replicated by the tests in this commit
Change-type: patch
These tests use the supervisor API to check that applying a target state
allows the device to eventually get to the desired target configuration.
This are high-level tests that work with real images and containers
using dind.
Change-type: patch
The actions now work by passing an intermediate state to the state
engine.
- doPurge first removes the user app from the target state and passes
that to the state engine for purging. Since intermediate state doesn't
remove images, this will have the effect of basically re-installing
the app.
- doRestart modifies the target state by first removing only the
services from the current state but keeping volumes and networks. This
has the same effect as before where services were stopped one by one
Change-type: patch
Network aliases are now compared checking that the target state is a
subset of the current state. This will prevent service restarts due to
additional aliases created by docker in the container.
Closes: #2134
Change-type: patch
When getting the service from the docker container, remove the
containerId from the list of aliases (which gets added by docker). This
will make it easier to use the current service state as a target.
This will help us remove the `safeStateClone` function in the API in a
future commit
Change-type: patch
There were multiple places in the state engine that skipped some
operations while in local mode. In reality, all it's needed while in
local mode is to skip image and volume deletion.
This commit simplifies application-manager and compose app to be more
local mode agnostic and instead making the image deletion and volume
deletion configurable via function arguments.
This also has the benefit to make the treatment of local mode
applications more similar to cloud mode applications, allowing for
API endpoints to function the same way both modes.
Change-type: patch
The OS since v2.82.6 will monitor changes to config.json and restart
the relevant services to apply the changes. There is no need to trigger
restart of the services via the supervisor. Users on older OS versions
will need to update their OS or restart the services manually as OS
loses support after 2y.
Change-type: patch
Closes: #2160
From: c0b4fafe84
Restart-service checks that both services have restarted in its test assertion, which is
incorrect as restart-service should only restart one service.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
As the Supervisor is a privileged container, it has access to host /dev, and therefore has access
to boot, data, and state balenaOS partitions. This commit sets up the framework for the following:
- Finds the /dev partition that corresponds to each partition based on partition label
- Mounts the partitions into set mountpoints in the device
- Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script
- Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts
This particular changes env vars for and mounts the boot partition.
Since the Supervisor would no longer rely on container `run` arguments provided by a host script,
this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app).
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
The issue with the original Supervisor implementation of the firewall is that
on Supervisor start, the Supervisor flushes the INPUT chain of the filter table.
This doesn't play well with services that add to the INPUT chain on startup that
may start up before the Supervisor, such as certain NetworkManager connection
profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain,
preserving the other rules as well as their order.
Closes: #1482
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates, this
can be disruptive for updates as trying to create a container referencing a duplicate
network results in a 400 error from the engine.
This commit finds and removes duplicate networks via the state engine,
this means that even if somehow a container could be referencing a
network that has been duplicated later somehow, this will remove the
container first.
While thies doesn't solve the problem of duplicate networks being
created in the first place, it will fix the state of the system to
correct the inconsistency.
Change-type: minor
Closes: #590
This includes:
- proxyvisor.js
- references in docs
- references device-state, api-binder, compose modules, API
- references in tests
The commit also adds a migration to remove the 4 dependent device tables from the DB.
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
The wait-for-it script used during tests would setup a timer
that would send SIGUSR2 to the parent process after the timer ends.
Since node was ignoring additional signals, the timer ending would have
no effect after the node process had replaced the start script. However
when node has pid != 1, SIGUSR2 default behavior is to terminate the
process, meaning the tests would fail after 30 seconds.
The script is now updated so the timer is killed once the services are
ready for the tests.
The Raspberry Pi config.txt file defines the use of colon to configure
variables of the same name in different ports, for instance on those
devices with two hdmi ports. This syntax was previously not supported by
the supervisor. This change relaxes the syntax validation on config vars
to allow the use of the colon character.
Relates-to: #1573, #2046
Change-type: minor
This includes:
- /v1/apps/:appId/(stop|start)
- /v2/applications/:appId/(restart|stop|start)-service
Signed-off-by: Christina Ying Wang <christina@balena.io>
This also adds a 500 response with the old key if the API key
refresh was unsuccessful. Previously, if the key refresh was
unsuccessful, this would result in an UnhandledPromiseRejection.
This is a new interface.
Signed-off-by: Christina Ying Wang <christina@balena.io>
The supervisor had to chroot into the host root in order to read the
journal logs. This won't be possible anymore once the supervisor becomes
an app. This commit copies the journalctl binary and necessary libraries
from a debian image into the supervisor image in order to be able to use
the tool on runtime.
Change-type: patch
This means that dynamic import statements will emit actual `import`
statements rather than being translated to `require`, the benefit being
that we can now import ES modules via dynamic imports
Change-type: patch
This PR changes the way the supervisor reads and writes files from /mnt/boot. Reads will
now use the [fatrw utility](https://github.com/balena-os/fatrw/) as a way to minimize corruption of
files in the boot partition, and thus preventing possible bricking of the device.
Since this basically changes the way a lot of configurations are read, this work was being blocked because of
the way tests were being done. While there still remain a couple of legacy tests to be migrated, this PR disables
test:legacy tests when running npm run test, as the work on refactoring those tests is in progress (see #2048) and
fatrw integration is of higher priority.
Change-type: minor
getImagesForCleanup used to query the Engine for the Supervisor
image, which is unnecessary given that the Supervisor has access
to constants.supervisorImage. Thus, this Engine query is removed.
The method is simplified and made more clear, and
imageManager.isCleanupNeeded doesn't need to be stubbed in tests.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This eliminates chances of host-Docker address collision for apps such
as the Supervisor where all services have host networking.
Closes: #2062
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
This means that configuration backend tests no longer use stubs and
(mostly) avoid internal dependencies in the tests. Instead of stubs and
mock-fs, the tests use [testfs](https://github.com/balena-io-modules/mocha-pod#working-with-the-filesystem)
which allows working with a real filesystem and ensuring everything is
re-set between tests.
This is the last change needed in order to be able to merge #1971. Here is the list of changes
- [x] Migrate splash image backend tests
- [x] Migrate extlinux backend tests
- [x] Migrate config.txt backend tests
- [x] Migrate extra-uenv config tests
- [x] Migrate odmdata config tests
- [x] Migrate config utils tests
- [x] Migrate device-config tests
Change-type: patch
Also remove system interface check from ensureSupervisorNetwork.
Previously `ensure` was a Bluebird promise which wasn't awaited in
its composition step. This has been here for some time and may contribute
to issues with duplicate networks. The conversion to native Promises
allows `ensure` to be awaited, hopefully reducing instances of duplicate
networks.
Removing the system interface check for /sys/class/net/supervisor0
because it's superfluous given that the Engine creates the interface
with NetworkManager. It also makes testing a lot more difficult to set up
as /sys/class/net isn't a directory that can be written to for emulating
system interface creation / removal.
Relates-to: https://github.com/balena-os/balena-supervisor/issues/1110
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
Previously it was set at /mnt/root/sys/class/net, which is
the same as /sys/class/net because Supervisor has a network
mode of `host`.
Signed-off-by: Christina Ying Wang <christina@balena.io>
Use isNotFoundError which converts an error of the default type
`unknown` into NotFoundError if the error is an instance of NotFoundError.
Thrown errors are of type `unknown` by default so we should use methods
with type guards for better type narrowing.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This allows to test that the supervisor build actually runs and opens up the
possibility of running more exhaustive API tests against a working supervisor.
Change-type: patch
This excludes route tests or refactoring. Also, created tests
for API middleware.
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
Merged docker-utils and delta tests into a single test suite. They are
now ran as part of the integration tests using the real engine.
Change-type: patch
When code that is unit tested is part of a file that imports modules which
depend on the dbus module, this breaks the unit test environment because there
is no system socket set up, as the unit test mocha config doesn't import fixtures.ts.
For example, if we change src/compose/utils to import device-config or api-binder, both
of those modules import lib/dbus which invokes a dbus.getBus call at the root level. This
is problematic for unit testing.
We can get around the root-level dbus.getBus call by initializing dbus only when it's first
needed. The mocked-dbus test setup code can also be removed in favor of legacy mocha
hooks, which makes the dbus stubbing in the legacy test environment more clear.
We can remove these legacy hooks when all the legacy tests are migrated to unit/integration.
Signed-off-by: Christina Ying Wang <christina@balena.io>
This commit also changes the test:integration command to run integration
tests as different processes. This allows to avoid any test leaking into
each-other because of the use of singletons. This however has the side
effect of tests being slower, but that is a forcing function to refactor
the code.
Update-lock tests now use the actual filesystem for testing, instead of
relying on stubs and spies.
This commit also fixes a small bug with update-lock that would cause a
`PromiseRejectionHandledWarning` when the lock callback would throw.
Now the tests are ran against the actual docker engine instead of
against mockerode.
The new tests actually caught a bug in
`volumeManager.removeOrphanedVolumes`, where that function would try to
remove volumes for stopped containers, causing an exception.
This commit also fixes that bug.
This also needs to modify the test environment as database migrations
will look for `config.json` in the location given by the variable
`CONFIG_MOUNT_POINT`.
The volume tests now run against the actual docker engine setup via dind
Change-type: patch
The supervisor used to rely on specific event reporting for identifying
issues at runtime. As the platform has grown, it has become much more
difficult to get any signal from the event noise. Recently the API side
for these events has been disabled, meaning these events only
contribute to bandwidth consumption. This commit disables the
event reporting feature of the supervisor which will be most likely
replaced by something like Sentry in the near future.
Change-type: minor
`withDefault` is a type helper that allows to create a type that
defaults to a default value when trying to decode a nullish value.
That type was not correctly working with boolean types, causing `false`
values to be replaced by true. This would specifically cause issues when
parsing the target state, where a `running: false` in a service would
become a `running: true` due to the type decoding.
Change-type: patch
The supervisor uses the following pattern for async module
initialization
```typescript
// module.ts
export const initialised = (async () => {
// do some async initialization
})();
// somewhere else
import * as module from 'module';
async function setup() {
await module.initialise;
}
```
The above pattern means that whenever the module is imported, the
initialisation procedure will be ran, which is an anti-pattern.
This converts any instance of this pattern into a function
```typescript
export const initialised = _.once(async () => {
// do some async initialization
});
```
And anywhere else on the code it replaces the call with a
```typescript
await module.initialised();
```
Change-type: patch
This allows to run integration tests during development and on CI
with the right dependencies. There are several changes that this
involves, but the gist of it is that a test environment is setup using
`docker-compose.test.yml`. This file is loaded by `resin-ci` during the
build, and ensures that integration tests are ran after setting up all
requirements. This commit also defines a test environment command that
can be setup using `npm run test:env` in order to run tests in a local
development machine.
This sets up the new `test/unit` and `test/integration` folders
and starts classification of some of the test files.
Note that unit tests include, `fs-utils` and `system-info` tests.
While these tests interact with the filesystem, the implementation
of these modules is simple enough, and the tests are fast enough to
allow these tests to fall under the `unit` test category (according to
test/README)
Change-type: patch
We are refactoring the supervisor test suite into unit tests (for
algorithms an domain model tests) and integration
tests (for interaction with out-of-process dependencies).
This means the current test suite needs to be classified into
these two categories, and fixed whenever possible.
This commit moves the test suite under the `test/legacy` folder, this
folder should be progressively migrated and eventually removed.
Subsequent commits will begin to split these files into unit and
integration whenever possible.
Depends-on: #1996
Change-type: patch
This replaces all relative paths in the test suite (e.g
`../src/compose/service.ts`) with the aliased path configured through
tsconfig.
This is a big change but it doesn't affect any functionality
Currently, tests only can import source code modules through relative
paths `../../`. This makes it very difficult to refactor and organize
tests in folders as the paths change.
[tsconfig-paths](https://www.npmjs.com/package/tsconfig-paths) allows to
reference the source through an alias defined in the "paths" section of
tsconfig.json
The supervisor used to perform tests both for the transpiled code (after
tsc) and one for the typescript code (using
ts-node/register/transpile-only). There is not really a reason for this
and this added complexity to the test configuration. This used to make
testing harder, as the built code didn't include source maps, meaning
the tests did not point to the right code.
Since we want to split tests in unit and integration tests as the next
test improvement, it makes sense to simplify these commands before
adding more complexity.
Change-type: patch
This mitigates an edge case bug introduced in v13.1.3 where services that
are slow to exit may get stuck in a state of Downloaded if a service var is
changed then reverted rapidly. More detailed description in linked issue.
Change-type: patch
Closes: #1991
Signed-off-by: Christina Wang <christina@balena.io>
Some libraries, like [proper-lockfile](https://www.npmjs.com/package/proper-lockfile)
use directories instead of files for locking. This PR allows the supervisor to be able to
work with those types of locks when lock override is requested.
Closes: #1978
Change-type: patch
We don't need to read the host's hostname through /mnt/root/etc/hostname,
because the hostname is written to config.json on a change. When the hostname
has never changed, it won't be found in config.json, so we can default to
the Supervisor container's /etc/hostname as it will match the host's
/etc/hostname, the network mode being `host`.
Closes: #1968
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>