1665 Commits

Author SHA1 Message Date
Felipe Lalanne
7425d1110b Add support for GET v3 target state
This change updates types and database format in order to allow
receiving the new format of the target state from the cloud and allow
applications to keep working.

This change also updates metadata in the containers, meaning services
will need to be restarted on supervisor update

Change-type: major
2022-03-22 19:08:02 -03:00
Felipe Lalanne
ccae1f7cb8 Rename aplication manager getStatus as getLegacyState
With the move to v3 target state and the move forward to remove
database ids from the supervisor, we want to ensure the ids are only
used for legacy support (such as within the API). This change renames
the method and sets it as deprecated
2022-03-22 19:08:02 -03:00
Felipe Lalanne
21c1c006f7 Always add status to image download report
It seems that in some cases the supervisor can report
an image without a `status` field leading to a cloud side 401 response.
See #1905 for more details.

Change-type: patch
2022-03-21 14:39:29 -03:00
Felipe Lalanne
e217ff9027 Only count report connectivity errors for healthcheck
Change-type: patch
2022-03-16 17:34:07 +00:00
20k-ultra
2fdb83839c Move report throttle out of reporting logic
Change-type: patch
Signed-off-by: 20k-ultra <3946250+20k-ultra@users.noreply.github.com>
2022-03-15 22:53:34 -04:00
20k-ultra
b069d6b9d5 Apply target state if loaded from file (apps.json)
Closes: #1895
Change-type: patch
See: https://www.flowdock.com/app/rulemotion/r-supervisor/threads/tSN9BgLxkgJKapbQHQJr-R9yLPM
Signed-off-by: 20k-ultra <3946250+20k-ultra@users.noreply.github.com>
2022-03-14 18:49:29 -04:00
Felipe Lalanne
d1956b69cc Fix check for supervisor0 network
The check for the docker network supervisor0 assumed that if the
interface supervisor0 existed, then the network would exist too. However this is not
true on the case of docker directory corruption, which would lead to a
loop with `Error: (HTTP code 404) no such network - network supervisor0 not found`.

Change-type: patch
Closes: #1806
2022-02-25 19:46:59 -03:00
Felipe Lalanne
1b54ce8bfd Ignore selinux security opts when comparing services
The moby engine v20.x.y adds some selinux [security configurations](https://docs.docker.com/engine/reference/run/#security-configuration)
depending on the [container configuration](https://github.com/moby/moby/blob/master/daemon/create.go#L214).
This would cause the supervisor to enter a service restart loop as the
current and target service configurations will never match. The
supervisor now ignores selinux specific security options since those are
not supported by balenaOS.

Closes: #1890
Change-type: patch
2022-02-23 18:12:27 -03:00
Felipe Lalanne
e7ec42fadc Use a breadcrumb to mark that a reboot is required
As changes to config.json may restart the supervisor before it can
trigger the reboot (or something can kill the supervisor before it can run that step),
the supervisor needs a persistent signal that a reboot is required
(instead of the current transient signal).

With this commit, the supervisor will now create a breadcrumb in the
host `/tmp` folder, that will be checked as the last step of the
configuration changes.
2022-02-15 12:52:48 -03:00
Felipe Lalanne
a2d6db1e1d Update signature of fsUtils.getPathOnHost
The function now returns either a string array if it receives multiple
arguments or a single string if it receives a single argument.
2022-02-15 12:52:46 -03:00
Felipe Lalanne
2917f03452 Perform config.json sequentially to other config changes
As config.json changes may restart the engine (and hence the supervisor)
in newer OS versions, this ensures that the supervisor does not get
interrupted while writing to backends.
2022-02-15 12:49:03 -03:00
Felipe Lalanne
63cb985c53 Split device-config step calculation into separate functions 2022-02-15 12:49:03 -03:00
Felipe Lalanne
118875e12e Fix apiUpdatePollInterval default to line up with API 2022-02-15 12:49:03 -03:00
Felipe Lalanne
a4d91d381a Create touch and getBootTime utility functions
Change-type: patch
2022-02-15 12:49:03 -03:00
Christina Wang
5f1a77da25 Add update lock check to PATCH /v1/device/host-config
This is necessary with the changes as of balenaOS 2.82.6, which watches config.json
and will restart balena-hostname and some other services automatically on file change.

Change-type: patch
Relates-to: #1876
Signed-off-by: Christina Wang <christina@balena.io>
2022-02-14 22:22:00 +00:00
Christina Wang
4f446103f4 Remove lockingIfNecessary in favor of updateLock.lock
The functionality is pretty much the same, so we don't need the two
functions in two different places.

Signed-off-by: Christina Wang <christina@balena.io>
2022-02-14 22:06:18 +00:00
Felipe Lalanne
72f6cbe4c7 Add support for local ipv6 reporting
With more and more devices in ipv6 only networks, this ensures the
local addresses are reported to the cloud as part of the state patch.

Change-type: patch
2022-02-08 19:06:13 -03:00
Felipe Lalanne
d071cd1507 Use writeAndSync when writing to config.json
`/mnt/boot` is a vfat partition which does not support atomic file
rename. The best course of action is to write and sync as fast as
possible to prevent corruption (although it still may happen)

Change-type: patch
2022-02-01 18:56:18 -03:00
Felipe Lalanne
a0ed00d8f3 Perform safeRename on writeFileAtomic
This forces a sync of the file as soon as the rename happens to prevent
corruption.
2022-02-01 18:56:18 -03:00
Felipe Lalanne
fa0e28de6d Clean up image event reporting 2022-02-01 18:35:50 -03:00
Pagan Gazzard
ae501048f5 Ensure the finish event is always reported when fetching images
Change-type: patch
2022-01-18 11:45:13 +00:00
Felipe Lalanne
f471ad736c Throw if target states gets a 304 without an ETAG
The API uses 304 as a mechanism for load management on target state
requests. This may cause that the supervisor receives a 304 response
without having received a copy of the target state first, leading to
issues. This change checks for an etag when receiving a 304, throwing an
exception otherwise.

Change-type: patch
2022-01-26 11:27:15 -03:00
Felipe Lalanne
d06b8e053e Use dmidecode to read cpuid in non ARM devices
Cpu id is set to null so far for non ARM devices (e.g. Intel NUC). This
parses the output of dmidecode to get the cpu id and system model.

Change-type: patch
2022-01-13 22:49:42 +00:00
Felipe Lalanne
c7fc7aacf8 Use dmidecode to read cpuid in non ARM devices
Cpu id is set to null so far for non ARM devices (e.g. Intel NUC). This
parses the output of dmidecode to get the cpu id and system model.

Change-type: patch
2022-01-06 21:01:53 +00:00
Pagan Gazzard
157fd95196 Increase delta request timeout to 59s to better align with our backends
Change-type: patch
2022-01-18 10:02:13 +00:00
Pagan Gazzard
fd1f646073 Fix memoization of registry token request
Change-type: patch
2022-01-17 16:52:43 +00:00
Felipe Lalanne
9c6e5ee11f Remove apps.json after initial preload
This avoids the supervisor trying to get back to the preloaded target
state if the database is deleted by any reason. It does this by moving the
used apps.json to a backup location.

Change-type: patch
Depends-on: #1841
2021-12-13 20:11:42 +00:00
Felipe Lalanne
08147e6a86 Ensure happy-eyeballs uses supervisor dns lookup
Happy-eyeballs performs [dns lookups](https://github.com/balena-io-modules/happy-eyeballs/blob/master/src/happy-eyeballs.ts#L23)
for the requested addresses, however, because of the order of imports it
was not using the supervisor custom `dns.lookup` that handles `.local`
name resolution, making address resolution fail in those cases.

Moving the import after the `dns.lookup` patch fixes the problem.
2021-12-16 11:59:59 -03:00
Felipe Lalanne
39c667803d Fix .local dns resolution when returning multiple addresses
The supervisor performs its own local resolution for `.local`
addresses due to a limitation in [musl](https://wiki.musl-libc.org/future-ideas.html).
The resolution function was not following exactly the nodejs [dns.lookup
specification](https://nodejs.org/api/dns.html#dnslookuphostname-options-callback)
which could cause certain clients to fail (in this case happy-eyeballs). This
updates the function to follow the specification.

Change-type: patch
2021-12-16 11:59:54 -03:00
Felipe Lalanne
9015b0e22f Skip initial apply until a target has been set
The supervisor always applies target state on start to ensure that the
device is at the correct in case of a crash or another reason. This had
the side effect that if the database is deleted, the supervisor would
apply target state (which is empty), stopping services and possibly
causing volume data loss.

This prevents that behavior and ensures that the supervisor only
applies target state if a target has been set either by the cloud, preload or local
mode.

Change-type: patch
2021-12-13 09:31:00 -03:00
Pagan Gazzard
32e3399f7c Fix the "already delayed by" calculation
Change-type: patch
2021-12-10 15:54:30 +00:00
Pagan Gazzard
6554ff5a64 Add exponential backoff on errors for logs reporting
Change-type: patch
2021-12-09 18:30:04 +00:00
Felipe Lalanne
f6b2ec9677 Improve validation messages for env vars and labels
Change-type: patch
2021-12-02 17:19:50 -03:00
Felipe Lalanne
445aefaa29 Ensure target state errors are sent to the log backend
Closes: #1838
2021-12-02 15:29:37 -03:00
Felipe Lalanne
f6692ab918 Convert target state types to io-ts for better validation
This simplifies target state validation and improves validation
messages.

Change-type: patch
2021-12-02 15:29:37 -03:00
Felipe Lalanne
ca7c22d854 Move lib/types.ts to src/types/basic.ts 2021-12-02 15:29:37 -03:00
Zane Hitchcox
9ed2685f63 Add happy eyeballs
Change-type: patch
2021-11-30 12:43:18 -05:00
Pagan Gazzard
2eb00fa0da Increase request timeout to 59s to better align with our backends
Change-type: patch
2021-11-29 17:14:51 +00:00
Felipe Lalanne
6fd516a930 Fix broken local mode after PR #1824
PR #1824 changed app update behavior to test that all images are there
before moving between releases. This check always fails in local mode
since local mode images are handled differently.

This PR fixes local mode again by skipping the check when `localMode` is
set.

Change-type: patch
2021-11-17 17:54:25 -03:00
Alexandru Costache
3b9c68246e backends/extra-uEnv: Extend custom DTB support for Nano 2GB Devkit
Change-type: patch
Signed-off-by: Alexandru Costache <alexandru@balena.io>
2021-11-17 13:48:19 +01:00
Felipe Lalanne
394377e0a1 Fix delete-then-download strategy
The strategy has been broken for a while but it was not clear how to
fix it before the changes to image management. This PR fixes application
manager to remove images before downloading the new image. This will
only have an effect on changing images.

Closes: #1233
Change-type: patch
2021-11-16 16:40:15 -03:00
Felipe Lalanne
7aedc97ee1 Wait for images to be ready before moving between releases
For download-then-kill strategy, this waits for all changing images on the target
release to be available on device before killing the old services. This
will prevent that multicontainer applications get to a state where some
services of the new release start runnning much before others have been
downloaded.

When adding new services to a multicontainer app, the supervisor will
now wait for other changing services to be downloaded before starting
the new service.

Closes: #1812
Change-type: patch
2021-11-11 14:08:36 -03:00
Felipe Lalanne
969f4225e5 Check config for networks and volumes inside Service
This removes the need for the app module to know about the naming
conventions for networks and volumes since those exist now within the
service itself. This also fixes a small bug where the volume would be
removed before the service itself had been successfully stopped.

Change-type: patch
2021-10-28 10:20:53 -03:00
Alexandru Costache
7d678fa838 backends/extra-uEnv: Extend custom DTB support for Jetson TX2 NX
We just added support for the TX2 NX, which supports u-boot
thus allows for using custom device-trees. Let's allow
for Jetson TX2 NX and future TX2 NX derived
device types to have device-trees configurable from the dashboard.

Change-type: patch
Signed-off-by: Alexandru Costache <alexandru@balena.io>
2021-08-24 07:24:48 +00:00
Felipe Lalanne
aab000209b Add backoff to state reporting when 503 is received
Current state reporting had a backoff when network or inconsistency
errors were found, but not on API errors. This change adds a backoff
using RetryAfter header if present to reduce load on API

Change-type: patch
2021-09-28 14:53:26 -04:00
Felipe Lalanne
802f26fe71 Improve network interface filter
The supervisor filters out some network interfaces for mac address
reporting, to remove (balena*,lo,tun*,etc). The previous filter was
matching any interface containing in one of the defined filters, making
it stricter than necessary. This commit fixes the issue

Change-type: patch
2021-09-24 13:01:17 -03:00
Alex Gonzalez
9e0cbe04c6 api-keys: Remove os variant parameter for authentication check
The current code authenticates unmanaged production devices which makes
no sense. Unmanaged devices do not need to authenticate with the API.

Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Alex Gonzalez
1abd10a129 os-release: Use developmentMode to ascertain OS variant in new releases
Newer BalenaOS releases have replaced OS variants for a developmentMode
configuration setting. This commit uses this variable to set the OS
variant in the absence of `VARIANT_ID` from the os-release file.

Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Alex Gonzalez
4ad7a3ae91 config: Add developmentMode to schema
Add a `developmentMode` configuration variable to the schema. Do not expose
this on the device target state until local key-based authentication is
sorted.

Relates-to: https://jel.ly.fish/e9525e9e-aa74-478c-b931-52951c679f78
Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
2021-08-05 09:30:35 +00:00
Kyle Harding
669866b4c2
Skip restarting services if they are part of conf targets
Some recent changes to the OS allowed some services to restart
automatically when the associated config files are changed.

In these cases we want to avoid restarting the same services
manually from the supervisor.

Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
2021-08-24 14:03:55 -04:00