Commit Graph

4571 Commits

Author SHA1 Message Date
flowzone-app[bot]
2ddacb6b44
Merge pull request #2176 from balena-os/improve-engine-host-race-fix-implementation
Improve engine host race fix implementation
2023-06-19 19:00:29 +00:00
Christina Ying Wang
7eba48f8b8 Improve tests surrounding Engine-host race patch
See: #2170
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
9e249e6ae8 Remove unnecessary async/await from method
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
6e6f79c71d Decrease wait time before start from 60s to 30s
60 seconds to wait may be excessively long.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
ace642ea0f Improve naming of a util function & add unit test
isOlderThan -> isValidDateAndOlderThan

See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226809686
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
Christina Ying Wang
ab80f198d8 Add exitCode property to Service class
Since we need to conditionally query the service's exit code
during step inference, adding the exitCode property keeps the
step inference function pure.

See: https://github.com/balena-os/balena-supervisor/pull/2170#discussion_r1226805153
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-19 11:11:26 -07:00
flowzone-app[bot]
7e24f095cc
v14.11.4 2023-06-19 07:56:46 +00:00
flowzone-app[bot]
96d2c6af64
Merge pull request #2177 from balena-os/specify-fs-type-when-mounting-partitions
Specify fs type when mounting partitions to prevent "Can't open blockdev" warnings
2023-06-19 07:55:21 +00:00
Christina Ying Wang
e6662f664c Specify fs type when mounting partitions to prevent "Can't open blockdev" warnings
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-16 13:46:41 -07:00
flowzone-app[bot]
0521d97c96
v14.11.3 2023-06-15 19:49:47 +00:00
flowzone-app[bot]
51ad257e7f
Merge pull request #2173 from balena-os/renovate/balena-io-deploy-to-balena-action-0.x
Update balena-io/deploy-to-balena-action action to v0.27.0
2023-06-15 19:48:57 +00:00
Self-hosted Renovate Bot
1675c16622 Update balena-io/deploy-to-balena-action action to v0.27.0
Update balena-io/deploy-to-balena-action

Change-type: patch
Changelog-entry: Update balena-io/deploy-to-balena-action to v0.27.0
2023-06-08 11:15:42 -07:00
Balena CI
d3f9821895
v14.11.2 2023-06-05 18:53:19 +00:00
flowzone-app[bot]
ce9ba9aac1
Merge pull request #2170 from balena-os/handle-engine-host-resource-race-condition
Handle Engine-host race condition
2023-06-05 18:52:38 +00:00
Christina Ying Wang
2537eb8189 Handle the case of 'on-failure' restart policy
As explained in the comments of this commit, a container with the restart policy
of 'on-failure' with a non-zero exit code matches the conditions for the race, so
the Supervisor will also attempt to start it. A container with the 'no' restart
policy that has been started once will not be started again. If a container with
'no' has never been started, its service status will be 'Installed' and the Supervisor
will already try to start it until success, so the service with 'no' doesn't require
special handling.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-06-05 11:05:58 -07:00
Christina Ying Wang
95f3e13d50 Add extra delay after state engine integration tests
This ensures target state has settled (since it seems that the 'applied' status
that's reported isn't 100% accurate and the actual Engine state may lag behind slightly)

Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-05-31 11:33:27 -07:00
Christina Ying Wang
7f32141958 Handle Engine-host race condition for "always" and "unless-stopped" restart policy
There exists a race condition between Engine and a host resource that may not
be immediately created. In this race condition, if a container's compose config
depends on the existence of that host resource, such as a network interface, and the
Engine tries to create & start the container before the host resource is created, the
Engine will not reattempt to start the container, regardless of the restart policy.
This is undesireable behavior but seems to be the behavior as implemented by Docker.

To rectify this, the Supervisor state funnel noops for a grace period of 1 minute
after starting a container to see that the container's status has become 'running`.
If the container exits because of the race condition, the status becomes 'exited' and the
Supervisor will attempt to generate another start step. This noop-wait-start step loop
will repeat until the container is able to start.

If the container is never able to start, there was a problem in the host in the creation of the
host resource, and that should be fixed at the host level.

This commit does not handle the case of services with restart policies "no" or "on-failure"
which encounter this host race, as metadata from container inspects needs to be introduced
during step calculation in order to figure out whether services with those restart policies
need to be started. This will be fixed in a future PR.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2023-05-31 11:32:19 -07:00
Balena CI
e6c136d6cd
v14.11.1 2023-05-11 22:07:34 +00:00
flowzone-app[bot]
09f975395e
Merge pull request #2168 from balena-os/fix-contract-arch-test
Fix `sw.arch` typo when testing contracts
2023-05-11 22:06:50 +00:00
Felipe Lalanne
2758e190b2 Fix sw.arch typo when testing contracts
Change-type: patch
2023-05-11 13:07:26 -04:00
Balena CI
ec363c305a
v14.11.0 2023-05-10 12:39:17 +00:00
flowzone-app[bot]
3b6878fd80
Merge pull request #2167 from balena-os/hw-arch-contract
Add `arch.sw` to supported container requirements
2023-05-10 12:38:21 +00:00
Felipe Lalanne
8656bd62f7 Add arch.sw to the valid container requirements
Change-type: minor
2023-05-09 15:44:26 -04:00
Felipe Lalanne
f1f09e0e27 Allow using slug to validate hw.device-type contract
This also adds the hw.device-type test case to the unit tests.

Change-type: patch
2023-05-09 15:20:18 -04:00
Felipe Lalanne
a884a58b4c Simplify and move lib/contract.spec.ts to tests/unit
Improve contract tests to remove dependence on stubs and unnecessary
system calls.

Change-type: patch
2023-05-09 15:20:12 -04:00
Balena CI
eec4d06909
v14.10.11 2023-05-08 20:35:29 +00:00
flowzone-app[bot]
196bc820b1
Merge pull request #2166 from balena-os/hdmi-port-1-docs
Add information about hdmi port 2 config vars
2023-05-08 20:34:43 +00:00
Felipe Lalanne
d5cc8238cb Add information about hdmi port 2 config vars
Support for colon characters was added v14.6.0 which enabled
configurations for HDMI port 2 (e.g on the RPi 4). These configurations
are not documented anywhere else so this allows users to be able to
better find the relevant information for working with HDMI.

Change-type: patch
Relates-to: #2090
2023-05-08 15:21:28 -04:00
Felipe Lalanne
ba39cf539e Update table formatting on configurations.md
For better readability on text editor

Change-type: patch
2023-05-08 15:15:37 -04:00
Balena CI
6148ed6ed1
v14.10.10 2023-05-03 16:01:39 +00:00
flowzone-app[bot]
4087782e80
Merge pull request #2165 from balena-os/mtoman/detect-crypt-mounts
mount-partitions.sh: Add support for encrypted partitions
2023-05-03 16:00:51 +00:00
Michal Toman
0045928944 mount-partitions.sh: Add support for encrypted partitions
After a recent change enforcing all the partitions to be on the same
block device, encrypted partitions are no longer being detected
correctly. This is because the assumption that the parent block device
is a substring of the actually mounted block device does not work
for LUKS devices - the mount will either be /dev/mapper/luks-XXX
or /dev/dm-X while the parent device is still e.g. /dev/sda.

The usual balenaOS boot partition is also split in two - boot and efi.
The boot partition (mounted under /mnt/boot) is encrypted and the efi
partition (mounted under /mnt/efi) is not.

This patch generalizes the detection of the parent device so that
it works with both encrypted and unencrypted partitions.

Change-type: patch
Signed-off-by: Michal Toman <michalt@balena.io>
2023-05-03 16:29:16 +02:00
Balena CI
c8d7b28a7e
v14.10.9 2023-05-03 14:26:38 +00:00
flowzone-app[bot]
cf21b093a6
Merge pull request #2164 from balena-os/klutchell-patch-1
Run test supervisor under a different service name
2023-05-03 14:25:58 +00:00
Kyle Harding
33b29cfa22
Run test supervisor under a different service name
The docker compose V2 spec no longer accepts `network_mode: bridge`,
which means we can no longer override the network configuration of
the `balena-supervisor` service for tests.

For this reason we now create a separate service to run the built
supervisor `balena-supervisor-sut` and run API tests against this
service instead of the default `balena-supervisor`.

Change-type: patch
2023-05-03 09:33:22 -04:00
Balena CI
f6e0683032
v14.10.8 2023-04-26 18:49:44 +00:00
flowzone-app[bot]
bc969c8c89
Merge pull request #2161 from balena-os/network-plus-service-bug
Fix device state not applied when a network change happens during the update
2023-04-26 18:48:55 +00:00
Felipe Lalanne
5fdd689590 Fix service comparison when creating component steps
A bug in service comparison would make it that a device already running
a service from a new release with network changes would never stop the
running service so remaining services would forever get stuck in
`Downloaded` state.

This fixes the comparison so the service will get killed in this case,
particularly allowing devices to recover from #1576

Change-type: patch
2023-04-26 11:58:48 -04:00
Felipe Lalanne
7b8b187c74 Create tests with recovery from #1576
Devices affected by the bug described in 1576, are also stuck with some
services in the `Downloaded` state, because the state engine does not
detect that the running services should be killed on a network change
even if they belong to a new release. This is a bug, which can be
replicated by the tests in this commit

Change-type: patch
2023-04-26 11:58:42 -04:00
Felipe Lalanne
7aecaae8b0 Skip updateMetadata step if there are network changes
Previous behavior would make it that an `updateMetadata` step would take
precedence over a `kill` step when network changes are present. This
would lead to an inconsistent state if an update included a
network and a container change.

Closes: #1576
Change-type: patch
2023-04-25 14:47:00 -04:00
Felipe Lalanne
0a358a4463 Add replication of issue using unit tests
Change-type: patch
2023-04-25 14:47:00 -04:00
Felipe Lalanne
138aec5de4 Add integration tests for state-engine
These tests use the supervisor API to check that applying a target state
allows the device to eventually get to the desired target configuration.

This are high-level tests that work with real images and containers
using dind.

Change-type: patch
2023-04-25 14:47:00 -04:00
Felipe Lalanne
c1207cbbff Do not pass auth to images with no registry
The supervisor allows the target image to be an image without a
registry (e.g. `alpine:latest`), while this really only happens while in
local mode, we don't want to pass credentials to the default registry as
those credentials are meant for balena registry and will otherwise fail.

Change-type: patch
2023-04-25 14:47:00 -04:00
Balena CI
d3be730c8e
v14.10.7 2023-04-21 23:04:21 +00:00
flowzone-app[bot]
48951d0333
Merge pull request #2153 from balena-os/local-mode
Refactor state engine to be able to use current state as target
2023-04-21 23:03:37 +00:00
Felipe Lalanne
6c031299d6 Remove safeStateClone function
This function is no longer needed with the latest changes to
getCurrentState

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
36311ef7a1 Get rid of targetVolatile in app manager
Target volatile doesn't make sense now that we can use the
current state as a target. It wasn't actually being used for anything
anymore apparently

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
1e0dd381f5 Make pausingApply a private member of device-state
This simplifies this module interface and hides implementation details
from the rest of the code.

The function `applyIntermediateTarget` will now call `pausingApply`
before applying the target

API actions no longer need to call pausing apply

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
3d43f7e3b3 Simplify doRestart and doPurge actions
The actions now work by passing an intermediate state to the state
engine.

- doPurge first removes the user app from the target state and passes
  that to the state engine for purging. Since intermediate state doesn't
  remove images, this will have the effect of basically re-installing
  the app.

- doRestart modifies the target state by first removing only the
  services from the current state but keeping volumes and networks. This
  has the same effect as before where services were stopped one by one

Change-type: patch
2023-04-20 14:58:58 -04:00
Felipe Lalanne
43630e5267 Fix network appUuid inference in local mode
Local mode uses a numeric `appUuid` which was messing up parsing the
network name. This fixes this issue so the current state can be used
as a target state

Change-type: patch
2023-04-20 14:58:58 -04:00