1387 Commits

Author SHA1 Message Date
Felipe Lalanne
39c667803d Fix .local dns resolution when returning multiple addresses
The supervisor performs its own local resolution for `.local`
addresses due to a limitation in [musl](https://wiki.musl-libc.org/future-ideas.html).
The resolution function was not following exactly the nodejs [dns.lookup
specification](https://nodejs.org/api/dns.html#dnslookuphostname-options-callback)
which could cause certain clients to fail (in this case happy-eyeballs). This
updates the function to follow the specification.

Change-type: patch
2021-12-16 11:59:54 -03:00
Felipe Lalanne
9015b0e22f Skip initial apply until a target has been set
The supervisor always applies target state on start to ensure that the
device is at the correct in case of a crash or another reason. This had
the side effect that if the database is deleted, the supervisor would
apply target state (which is empty), stopping services and possibly
causing volume data loss.

This prevents that behavior and ensures that the supervisor only
applies target state if a target has been set either by the cloud, preload or local
mode.

Change-type: patch
2021-12-13 09:31:00 -03:00
Pagan Gazzard
32e3399f7c Fix the "already delayed by" calculation
Change-type: patch
2021-12-10 15:54:30 +00:00
Pagan Gazzard
6554ff5a64 Add exponential backoff on errors for logs reporting
Change-type: patch
2021-12-09 18:30:04 +00:00
Felipe Lalanne
f6b2ec9677 Improve validation messages for env vars and labels
Change-type: patch
2021-12-02 17:19:50 -03:00
Felipe Lalanne
445aefaa29 Ensure target state errors are sent to the log backend
Closes: #1838
2021-12-02 15:29:37 -03:00
Felipe Lalanne
f6692ab918 Convert target state types to io-ts for better validation
This simplifies target state validation and improves validation
messages.

Change-type: patch
2021-12-02 15:29:37 -03:00
Felipe Lalanne
ca7c22d854 Move lib/types.ts to src/types/basic.ts 2021-12-02 15:29:37 -03:00
Zane Hitchcox
9ed2685f63 Add happy eyeballs
Change-type: patch
2021-11-30 12:43:18 -05:00
Pagan Gazzard
2eb00fa0da Increase request timeout to 59s to better align with our backends
Change-type: patch
2021-11-29 17:14:51 +00:00
Felipe Lalanne
6fd516a930 Fix broken local mode after PR #1824
PR #1824 changed app update behavior to test that all images are there
before moving between releases. This check always fails in local mode
since local mode images are handled differently.

This PR fixes local mode again by skipping the check when `localMode` is
set.

Change-type: patch
2021-11-17 17:54:25 -03:00
Alexandru Costache
3b9c68246e backends/extra-uEnv: Extend custom DTB support for Nano 2GB Devkit
Change-type: patch
Signed-off-by: Alexandru Costache <alexandru@balena.io>
2021-11-17 13:48:19 +01:00
Felipe Lalanne
394377e0a1 Fix delete-then-download strategy
The strategy has been broken for a while but it was not clear how to
fix it before the changes to image management. This PR fixes application
manager to remove images before downloading the new image. This will
only have an effect on changing images.

Closes: #1233
Change-type: patch
2021-11-16 16:40:15 -03:00
Felipe Lalanne
7aedc97ee1 Wait for images to be ready before moving between releases
For download-then-kill strategy, this waits for all changing images on the target
release to be available on device before killing the old services. This
will prevent that multicontainer applications get to a state where some
services of the new release start runnning much before others have been
downloaded.

When adding new services to a multicontainer app, the supervisor will
now wait for other changing services to be downloaded before starting
the new service.

Closes: #1812
Change-type: patch
2021-11-11 14:08:36 -03:00
Felipe Lalanne
969f4225e5 Check config for networks and volumes inside Service
This removes the need for the app module to know about the naming
conventions for networks and volumes since those exist now within the
service itself. This also fixes a small bug where the volume would be
removed before the service itself had been successfully stopped.

Change-type: patch
2021-10-28 10:20:53 -03:00
Alexandru Costache
7d678fa838 backends/extra-uEnv: Extend custom DTB support for Jetson TX2 NX
We just added support for the TX2 NX, which supports u-boot
thus allows for using custom device-trees. Let's allow
for Jetson TX2 NX and future TX2 NX derived
device types to have device-trees configurable from the dashboard.

Change-type: patch
Signed-off-by: Alexandru Costache <alexandru@balena.io>
2021-08-24 07:24:48 +00:00
Felipe Lalanne
aab000209b Add backoff to state reporting when 503 is received
Current state reporting had a backoff when network or inconsistency
errors were found, but not on API errors. This change adds a backoff
using RetryAfter header if present to reduce load on API

Change-type: patch
2021-09-28 14:53:26 -04:00
Felipe Lalanne
802f26fe71 Improve network interface filter
The supervisor filters out some network interfaces for mac address
reporting, to remove (balena*,lo,tun*,etc). The previous filter was
matching any interface containing in one of the defined filters, making
it stricter than necessary. This commit fixes the issue

Change-type: patch
2021-09-24 13:01:17 -03:00
Alex Gonzalez
9e0cbe04c6 api-keys: Remove os variant parameter for authentication check
The current code authenticates unmanaged production devices which makes
no sense. Unmanaged devices do not need to authenticate with the API.

Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Alex Gonzalez
1abd10a129 os-release: Use developmentMode to ascertain OS variant in new releases
Newer BalenaOS releases have replaced OS variants for a developmentMode
configuration setting. This commit uses this variable to set the OS
variant in the absence of `VARIANT_ID` from the os-release file.

Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Alex Gonzalez
4ad7a3ae91 config: Add developmentMode to schema
Add a `developmentMode` configuration variable to the schema. Do not expose
this on the device target state until local key-based authentication is
sorted.

Relates-to: https://jel.ly.fish/e9525e9e-aa74-478c-b931-52951c679f78
Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Kyle Harding
669866b4c2
Skip restarting services if they are part of conf targets
Some recent changes to the OS allowed some services to restart
automatically when the associated config files are changed.

In these cases we want to avoid restarting the same services
manually from the supervisor.

Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-08-24 14:03:55 -04:00
peakyDicers
30c728fae2 Removed fire emoji prefix for firewall logs.
Change-type: patch
2021-08-02 17:24:03 -04:00
Felipe Lalanne
6f5f3bc2f3 Fix regression with local mode push
PR #1749 introduced a bug when pushing local target state. An update to
the [image name normalization](f1bd4b8d9b/src/lib/docker-utils.ts (L81))
failed to consider the local image name format. This results in mangling
of image names in the database, i.e. the image `ubuntu:latest` is stored
as `/ubuntu:latest`. This causes an exception to be returned by the
dockerode `getImage('/ubuntu:latest').inspect()` call.

This sends the supervisor into a crash loop and is shown on the supervisor
journal logs as

```
getaddrinfo ENOTFOUND images
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:64:26)
```

Unfortunately if this happens on a user device, since the mangled image
name is already on the database, the easiest way to fix is to remove the
supervisor database and let the supervisor recreate it. Deleting the
database should be side effect free.

Change-type: patch
2021-08-02 11:52:07 -04:00
Felipe Lalanne
104a8006fb Update apiSecret table to id services by name
It adds a migration replacing the serviceId column by serviceName and
populates serviceNames from services in the target state.
2021-07-28 09:57:38 -04:00
Felipe Lalanne
b67f94802d Remove comparison based on image, release, and service ids
Preparing for the new v3 target state, where the supervisor will make environment
dependent ids optional and rely on using general UUIDs and user known identifiers
for comparison. This PR moves forward in that direction by removing some of those
comparisons for v2 target state.

- imageId to be replaced with imageName
- serviceId to be replace by serviceName
- releaseId to be replaced by commit (future release_uuid)

This is a backwards compatible change, meaning it doesn't completely get rid of
these identifiers (which are still being used by supervisor API and for state
patch), but will not depend on those identifiers for calculating steps to target state.

Change-type: minor
2021-07-28 09:57:38 -04:00
Felipe Lalanne
77070712a4 Remove image manager appUpdatePollInterval listener 2021-07-28 09:57:36 -04:00
Felipe Lalanne
a1d098d8f3 Refactor image "volatile state" to use state pattern
This replaces stored `volatileState` with a more declarative ImageTask API.
An ImageTask stores volatile image state for operations that cannot be
obtained through an engine query, such as fetching and removing an
image, state that can be updated while the task is running.

Image controller methods can now use the `reportEvent` method to create
and update the state of a longer running task.
2021-07-28 09:56:38 -04:00
Felipe Lalanne
f1bd4b8d9b Use tags to track supervised images in docker
The image manager module now uses tags instead of docker IDs as the main
way to identify docker images on the engine. That is, if the target
state image has a name `imageName:tag@digest`, the supervisor will always use
the given `imageName` and `tag` (which may be empty) to tag the image on
the engine after fetching. This PR also adds checkups to ensure
consistency is maintained between the database and the engine.

Using tags allows to simplify query and removal operations, since now
removing the image now means removing tags matching the image name.

Before this change the supervisor relied only on information in the
supervisor database, and used that to remove images by docker ID. However, the docker
id is not a reliable identifier, since images retain the same id between
releases or between services in the same release.

List of squashed commits
- Remove custom type NormalizedImageInfo
- Remove dependency on docker-toolbelt
- Use tags to traack supervised images in docker
- Ensure tag removal occurs in sequence
- Only save database image after download confirmed

Relates-to: #1616 #1579
Change-type: patch
2021-07-26 09:52:25 -04:00
Felipe Lalanne
c05c5803f0 Log the delta URL that will be downloaded on update
Change-type: patch
Closes: #1755
2021-07-22 11:05:00 -04:00
Christina Wang
17e740a4ba
Allow users to override HUP lock if device is stuck in invalid state
This functionality is needed when breadcrumbs aren't deleted after a HUP
rollback for whatever reason. Also rename HUP lock function.

Change-type: patch
Connects-to: #1459
Signed-off-by: Christina Wang <christina@balena.io>
2021-07-08 12:43:32 +09:00
Felipe Lalanne
e04e64763f Improve testing for supervisor composition modules
This PR cleans up testing for supervisor compose modules. It also fixes broken
tests for application manager and removes a lot of dependencies for those tests
on DB and other unnecessary mocks. There are probably a lot of cases that tests
are missing but this should make writing new tests a lot easier.

This PR also creates a new mock dockerode (mockerode) module that should make it
easier to test operations that interact with the engine. All references
to the old mock-dockerode have not yet been removed but that should come
soon in another PR

List of squashed commits:
- Add tests for network create/remove
- Move compose service tests to test/src/compose and reorganize test descriptions
- Add support for image creation to mockerode
- Add additional tests for compose volumes
- Update mockerode so unimplemented fake methods throw. This is to ensure
  tests using mockerode fail if an unimplemented method is used
- Update tests for volume-manager with mockerode
- Update tests for compose/images
- Simplify tests using mockerode
- Clean up compose/app tests
- Create application manager tests

Change-type: minor
2021-07-05 17:50:52 -04:00
Christina Wang
a9028e58ec
Prevent updates/reboots with locks when HUP breadcrumbs present
On HUP, some healthceck services need to complete before
it's safe for the Supervisor to reboot the device when
applying state changes. rollback-{health|altboot}-breadcrumb
are the two files that Supervisor looks for and locks the device
on when present in this patch.

Not closing issue 1459 because there is a possible case where,
on altboot rollback, the breadcrumbs are not present. 1459
may be closed when this edge case is investigated.

Change-type: patch
Connects-to: #1459
See: https://www.flowdock.com/app/rulemotion/r-supervisor/threads/cL7YfNOLSfTPfw05h59GEW0kfOt
Signed-off-by: Christina Wang <christina@balena.io>
2021-06-30 13:27:03 +09:00
Felipe Lalanne
2fa0d3dc43 Fix supervisor using wrong source for deltas
This fixes a specific issue when the supervisor cannot find the right
source for deltas (e.g. after the DB gets deleted), where legacy
behavior was to look for any image in the app.

Change-type: patch
Relates-to: #1729
2021-06-25 16:24:51 -04:00
Florin Sarbu
7c26480ada
Add revpi-connect, revpi-core-3 to Raspberry Pi variants
We need the supervisor to be able to manage config.txt changes for these
Revolution Pi boards too.

Change-type: patch
Signed-off-by: Florin Sarbu <florin@balena.io>
2021-06-18 20:33:27 +09:00
Pagan Gazzard
ee4d919fca Improve target state typings
Change-type: patch
2021-06-08 13:45:44 +01:00
Miguel Casqueira
ab4fb454e0 Refactor debug log when unmanaged volume is found
Change-type: patch
Signed-off-by: Miguel Casqueira <miguel@balena.io>
2021-06-02 13:07:24 -04:00
Miguel Casqueira
55a344dceb Prevent a recursive loop when reporting current state
Closes: #1673
Change-type: patch
Signed-off-by: Miguel Casqueira <miguel@balena.io>
2021-05-28 16:20:27 -04:00
Christina Wang
dcd863eed8
Add toggleable SUPERVISOR_HARDWARE_METRICS config
On devices with bandwidth sensitivity, this config var
disables sending system information such as memory
usage or cpu temp as current state.

Closes: #1645
Change-type: minor
Signed-off-by: Christina Wang <christina@balena.io>
2021-05-13 13:59:07 +09:00
Christina Wang
ea3e50e96e
Create & unify src/device-state/current-state tests
Signed-off-by: Christina Wang <christina@balena.io>
2021-05-12 18:33:01 +09:00
Christina Wang
39601473c0
Fix undervoltage regex, add undervoltage tests, move sysinfo suite to test/src
Signed-off-by: Christina Wang <christina@balena.io>
2021-05-12 18:33:01 +09:00
Pagan Gazzard
74ae31fcfd Simplify/optimize filtering non-significant sys info changes
Change-type: patch
2021-05-06 10:59:49 +00:00
Pagan Gazzard
466ff58871 Avoid double omits whilst filtering current state
Change-type: patch
2021-05-06 10:59:23 +00:00
Kyle Harding
164dd7ccc1 Rename meta-resin to meta-balena
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-05-06 17:05:26 +00:00
Kyle Harding
301aa52f03 Backwards compatility changes for old resin namespaces
Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-05-06 17:05:26 +00:00
Kyle Harding
09615c9d82 Change container name to balena_supervisor
Change-type: minor
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-05-06 17:05:25 +00:00
Kyle Harding
5faf9d7686 Rename resin-supervisor to balena-supervisor
Change-type: minor
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-05-06 17:05:25 +00:00
Felipe Lalanne
5197a1330d Show warning instead of exception for invalid network config
A previous PR (#1656) fixed validation for network ipam config,
checking that both network and subnet are defined for each ipam config entry
(as described in the docker documentation).

After that PR, the validations throws an exception if the network target state is incorrect,
but this turns out to be the wrong approach, because that exception is also triggered
when querying target state.

This isn't a problem in normal operation, but it is in local mode, because local
mode queries the old target state before sending a new one. Since the query fails,
the CLI can never push the new target state.

This PR replaces the exception with a warning on the logs, since a
misconfigured network won't cause any engine failures, it will just
prevent containers to communicate through the provided network.

A future improvement should move this validation to an earlier point in the process,
so the target state can get rejected before it even gets to a point it
can be used.

Relates-to: #1693
Change-type: patch
2021-05-06 16:27:40 -04:00
Miguel Casqueira
8b0c2347d8 Patch awaiting response when checking if supervisor0 network exists
Change-type: patch
Signed-off-by: Miguel Casqueira <miguel@balena.io>
2021-05-06 14:41:32 +00:00
quentinGllmt
1408fd7bcb Fix parsing driver_opts from compose to docker network creation
Change-type: patch
Signed-off-by: quentinGllmt <quentin@quentingllmt.fr>
2021-05-06 16:50:11 +02:00