73 Commits

Author SHA1 Message Date
Christina Ying Wang
bc1d251e66 Revert os-release path to /mnt/root
/mnt/boot/os-release isn't always accurate so /mnt/root should be the source of truth.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-10-09 14:02:02 -07:00
Felipe Lalanne
327dc31ef0 Replace node-dbus with @balena/systemd
The node-dbus module is unmaintained and a blocker for the update to
Node 18. Switching to our own node bindings for systemd solves this
issue

Relates-to: Shouqun/node-dbus#241
Change-type: patch
2023-08-16 15:58:52 -04:00
Felipe Lalanne
8f17c30de6 Replace dbus test service with mock-systemd-bus
This avoids unnecessary mocking and tests against the real systemd API

Change-type: patch
2023-08-16 14:46:58 -04:00
Alexandru Costache
512240c544 backends: Add Jetson Orin NANO custom device-tree support
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Change-type: patch
2023-07-11 18:11:32 +03:00
Christina Ying Wang
38fe8dae75 Remove the 'Stopped' status for services
It's not an official status from container inspects, and the Supervisor
doesn't set it internally anywhere. It's better to remove it entirely as the
method by which Supervisor sets internal service statuses is by using a global
event emitter (reportNewStatus) which makes things difficult to test.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-28 11:17:13 -04:00
Christina W
71d24d6e33 Parse container exit error message instead of status
The previous implementation in #2170 of parsing the container status was too general,
because it relied on the mistaken assumption that a container would have a status of
`Stopped` if it was manually stopped. This turned out to be untrue, as manually stopped
containers were also getting restarted by the Supervisor due to their inspect status of
`exited`. With this, parsing the exit message became unavoidable as there are no other
clear ways to discern a container that has been manually stopped and shouldn't be started
from a container experiencing the Engine-host race condition issue (again, see #2170).

Since we're just parsing the exit error message, we don't need to worry about different behaviors
amongst restart policies, as any container with the error message on exit should be started.

Change-type: patch
Closes: #2178
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-22 14:43:17 -07:00
Christina Ying Wang
7eba48f8b8 Improve tests surrounding Engine-host race patch
See: #2170
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
6e6f79c71d Decrease wait time before start from 60s to 30s
60 seconds to wait may be excessively long.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
95f3e13d50 Add extra delay after state engine integration tests
This ensures target state has settled (since it seems that the 'applied' status
that's reported isn't 100% accurate and the actual Engine state may lag behind slightly)

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-05-31 11:33:27 -07:00
Christina Ying Wang
7f32141958 Handle Engine-host race condition for "always" and "unless-stopped" restart policy
There exists a race condition between Engine and a host resource that may not
be immediately created. In this race condition, if a container's compose config
depends on the existence of that host resource, such as a network interface, and the
Engine tries to create & start the container before the host resource is created, the
Engine will not reattempt to start the container, regardless of the restart policy.
This is undesireable behavior but seems to be the behavior as implemented by Docker.

To rectify this, the Supervisor state funnel noops for a grace period of 1 minute
after starting a container to see that the container's status has become 'running`.
If the container exits because of the race condition, the status becomes 'exited' and the
Supervisor will attempt to generate another start step. This noop-wait-start step loop
will repeat until the container is able to start.

If the container is never able to start, there was a problem in the host in the creation of the
host resource, and that should be fixed at the host level.

This commit does not handle the case of services with restart policies "no" or "on-failure"
which encounter this host race, as metadata from container inspects needs to be introduced
during step calculation in order to figure out whether services with those restart policies
need to be started. This will be fixed in a future PR.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-05-31 11:32:19 -07:00
Felipe Lalanne
2758e190b2 Fix sw.arch typo when testing contracts
Change-type: patch
2023-05-11 13:07:26 -04:00
Felipe Lalanne
8656bd62f7 Add arch.sw to the valid container requirements
Change-type: minor
2023-05-09 15:44:26 -04:00
Felipe Lalanne
a884a58b4c Simplify and move lib/contract.spec.ts to tests/unit
Improve contract tests to remove dependence on stubs and unnecessary
system calls.

Change-type: patch
2023-05-09 15:20:12 -04:00
Felipe Lalanne
7b8b187c74 Create tests with recovery from #1576
Devices affected by the bug described in 1576, are also stuck with some
services in the `Downloaded` state, because the state engine does not
detect that the running services should be killed on a network change
even if they belong to a new release. This is a bug, which can be
replicated by the tests in this commit

Change-type: patch
2023-04-26 11:58:42 -04:00
Felipe Lalanne
138aec5de4 Add integration tests for state-engine
These tests use the supervisor API to check that applying a target state
allows the device to eventually get to the desired target configuration.

This are high-level tests that work with real images and containers
using dind.

Change-type: patch
2023-04-25 14:47:00 -04:00
Felipe Lalanne
3d43f7e3b3 Simplify doRestart and doPurge actions
The actions now work by passing an intermediate state to the state
engine.

- doPurge first removes the user app from the target state and passes
  that to the state engine for purging. Since intermediate state doesn't
  remove images, this will have the effect of basically re-installing
  the app.

- doRestart modifies the target state by first removing only the
  services from the current state but keeping volumes and networks. This
  has the same effect as before where services were stopped one by one

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
967cb7747f Make local mode image management work as in cloud mode
There were multiple places in the state engine that skipped some
operations while in local mode. In reality, all it's needed while in
local mode is to skip image and volume deletion.

This commit simplifies application-manager and compose app to be more
local mode agnostic and instead making the image deletion and volume
deletion configurable via function arguments.

This also has the benefit to make the treatment of local mode
applications more similar to cloud mode applications, allowing for
API endpoints to function the same way both modes.

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
7b68ee4c4f Do not restart balena-hostname on rename
The OS since v2.82.6 will monitor changes to config.json and restart
the relevant services to apply the changes. There is no need to trigger
restart of the services via the supervisor. Users on older OS versions
will need to update their OS or restart the services manually as OS
loses support after 2y.

Change-type: patch
Closes: #2160
2023-04-20 11:43:35 -04:00
Christina Ying Wang
b9e1464d96 Fix assertion error in restart-service
From: c0b4fafe84
Restart-service checks that both services have restarted in its test assertion, which is
incorrect as restart-service should only restart one service.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-04-07 14:40:15 -07:00
Alexandru Costache
6b67db98e5 backends: Add Jetson Orin NX custom device-tree support
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Change-type: patch
2023-04-07 18:12:31 +03:00
Christina Ying Wang
4c948c8854 Mount data and state partitions on container startup
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-03-27 12:07:01 -07:00
Christina Ying Wang
49ee1042a8 Mount boot partition into container on Supervisor start
As the Supervisor is a privileged container, it has access to host /dev, and therefore has access
to boot, data, and state balenaOS partitions. This commit sets up the framework for the following:

- Finds the /dev partition that corresponds to each partition based on partition label
- Mounts the partitions into set mountpoints in the device
- Removes reliance on env vars and mountpoints provided by host's start-balena-supervisor script
- Simplifies host path querying by centralizing these queries through methods in lib/host-utils.ts

This particular changes env vars for and mounts the boot partition.

Since the Supervisor would no longer rely on container `run` arguments provided by a host script,
this change moves Supervisor closer to being able to start itself (Supervisor-as-an-app).

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-03-27 12:07:01 -07:00
Christina Ying Wang
9522c15ecd Change constants imports to remove 'require'
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-03-27 12:07:01 -07:00
Christina Ying Wang
84a9e7e9ac Replace BALENA-FIREWALL rule in INPUT chain instead of flushing
The issue with the original Supervisor implementation of the firewall is that
on Supervisor start, the Supervisor flushes the INPUT chain of the filter table.
This doesn't play well with services that add to the INPUT chain on startup that
may start up before the Supervisor, such as certain NetworkManager connection
profiles. This change only replaces the BALENA-FIREWALL rule in the INPUT chain,
preserving the other rules as well as their order.

Closes: #1482
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-03-01 13:42:07 -08:00
Felipe Lalanne
89175432af Find and remove duplicate networks
We have seen a few times devices with duplicated network names for some
reason. While we don't know the cause the networks get duplicates, this
can be disruptive for updates as trying to create a container referencing a duplicate
network results in a 400 error from the engine.

This commit finds and removes duplicate networks via the state engine,
this means that even if somehow a container could be referencing a
network that has been duplicated later somehow, this will remove the
container first.

While thies doesn't solve the problem of duplicate networks being
created in the first place, it will fix the state of the system to
correct the inconsistency.

Change-type: minor
Closes: #590
2023-02-10 20:24:36 -05:00
Christina Ying Wang
c4f9d72172 Remove dependent devices content in codebase
This includes:
- proxyvisor.js
- references in docs
- references device-state, api-binder, compose modules, API
- references in tests

The commit also adds a migration to remove the 4 dependent device tables from the DB.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-02-06 19:34:02 -08:00
Christina Ying Wang
e1bacda580 Update host-config, route, and action tests for host config endpoints
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
250684d651 Use actions & write tests for GET /v1/device
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
72c683d5ff Use actions & write tests for GET /v1/apps/:appId
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
198d9ad638 Write update action and tests, remove isReadyForUpdate check
See: #1924
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
85392f2a85 Move reboot/shutdown to actions and related tests to integration
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-11 15:48:13 -08:00
Christina Ying Wang
c6cf6a0136 Use executeServiceAction for v1/v2 service action endpoints
This includes:
- /v1/apps/:appId/(stop|start)
- /v2/applications/:appId/(restart|stop|start)-service

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 18:20:24 -08:00
Christina Ying Wang
fcd28591c6 Add tests for doPurge action and v1/v2 app purge routes
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 16:25:27 -08:00
Christina Ying Wang
a24d5acf7f Add tests for doRestart action and v1/v2 app restart routes
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 16:25:27 -08:00
Christina Ying Wang
d6298b2643 Use regenerateKey action for POST /v1/regenerate-api-key
This also adds a 500 response with the old key if the API key
refresh was unsuccessful. Previously, if the key refresh was
unsuccessful, this would result in an UnhandledPromiseRejection.
This is a new interface.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 16:25:27 -08:00
Christina Ying Wang
c7db3189ad Use identify action for POST /v1/blink
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 16:01:43 -08:00
Christina Ying Wang
e351ed9803 Use runHealthchecks action for GET /v1/healthy
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-01-09 16:01:43 -08:00
Pagan Gazzard
836f6ab754 Enable node16 module resolution in tsconfig to ease the ESM transition
This means that dynamic import statements will emit actual `import`
statements rather than being translated to `require`, the benefit being
that we can now import ES modules via dynamic imports

Change-type: patch
2022-11-22 11:01:03 -03:00
Christina Ying Wang
e0f77b660d Fix config test typo
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-21 18:18:15 -05:00
Pagan Gazzard
b8891ebb08 Use timers/promises for promisified setTimeout
Change-type: patch
2022-11-21 18:17:34 -05:00
Felipe Lalanne
dade598737 Use fatrw utility for writes to boot partition
This PR changes the way the supervisor reads and writes files from /mnt/boot. Reads will
now use the [fatrw utility](https://github.com/balena-os/fatrw/) as a way to minimize corruption of
files in the boot partition, and thus preventing possible bricking of the device.

Since this basically changes the way a lot of configurations are read, this work was being blocked because of
the way tests were being done. While there still remain a couple of legacy tests to be migrated, this PR disables
test:legacy tests when running npm run test, as the work on refactoring those tests is in progress (see #2048) and
fatrw integration is of higher priority.

Change-type: minor
2022-11-16 21:21:23 -03:00
Christina Ying Wang
8174ea9643 Simplify getting images for cleanup
getImagesForCleanup used to query the Engine for the Supervisor
image, which is unnecessary given that the Supervisor has access
to constants.supervisorImage. Thus, this Engine query is removed.

The method is simplified and made more clear, and
imageManager.isCleanupNeeded doesn't need to be stubbed in tests.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-16 12:52:49 -08:00
Christina Ying Wang
f558be0a16 Create default network as config-only when services have host networking
This eliminates chances of host-Docker address collision for apps such
as the Supervisor where all services have host networking.

Closes: #2062
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-16 10:19:36 -08:00
pipex
827f892c13 Migrate all device config tests to integration.
This means that configuration backend tests no longer use stubs and
(mostly) avoid internal dependencies in the tests. Instead of stubs and
mock-fs, the tests use [testfs](https://github.com/balena-io-modules/mocha-pod#working-with-the-filesystem)
which allows working with a real filesystem and ensuring everything is
re-set between tests.

This is the last change needed in order to be able to merge #1971. Here is the list of changes

- [x] Migrate splash image backend tests
- [x] Migrate extlinux backend tests
- [x] Migrate config.txt backend tests
- [x] Migrate extra-uenv config tests
- [x] Migrate odmdata config tests
- [x] Migrate config utils tests
- [x] Migrate device-config tests

Change-type: patch
2022-11-14 11:12:52 -03:00
Christina Ying Wang
1034aa70e6 Convert ensureSupervisorNetwork to native Promises
Also remove system interface check from ensureSupervisorNetwork.

Previously `ensure` was a Bluebird promise which wasn't awaited in
its composition step. This has been here for some time and may contribute
to issues with duplicate networks. The conversion to native Promises
allows `ensure` to be awaited, hopefully reducing instances of duplicate
networks.

Removing the system interface check for /sys/class/net/supervisor0
because it's superfluous given that the Engine creates the interface
with NetworkManager. It also makes testing a lot more difficult to set up
as /sys/class/net isn't a directory that can be written to for emulating
system interface creation / removal.

Relates-to: https://github.com/balena-os/balena-supervisor/issues/1110
Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-08 16:06:10 -08:00
Christina Ying Wang
57f4dcbcac Change macAddressPath to /sys/class/net
Previously it was set at /mnt/root/sys/class/net, which is
the same as /sys/class/net because Supervisor has a network
mode of `host`.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-08 15:51:14 -08:00
Christina Ying Wang
fc15ad2554 Fix typo: intialise -> initialize
Missing "i"

Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-11-07 12:09:17 -08:00
Felipe Lalanne
46fa7321c0 Run the built supervisor as part of docker-compose tests
This allows to test that the supervisor build actually runs and opens up the
possibility of running more exhaustive API tests against a working supervisor.

Change-type: patch
2022-11-03 15:45:39 -03:00
Christina Ying Wang
532e75a77e Migrate API tests to unit/integration
This excludes route tests or refactoring. Also, created tests
for API middleware.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2022-10-25 19:06:39 +00:00
pipex
0befb30018 Migrate firewall tests to integration 2022-10-19 14:09:45 -03:00