This will ensure the restart policy specified is not violated
Change-type: patch
Closes: #1668
Signed-off-by: 20k-ultra <3946250+20k-ultra@users.noreply.github.com>
When disposing of resources which include Supervisor-created lockfiles,
only dispose of lockfiles for the specified user application.
Signed-off-by: Christina Wang <christina@balena.io>
The linked issue describes the Supervisor not cleaning up locks it creates due
to crashing at just the wrong time. After internal discussion we decided to
differentiate Supervisor-created lockfiles from user-created lockfiles by using
the `nobody` UID (65534) for Supervisor-created lockfiles.
As the existing NPM lockfile lib does not allow creating lockfiles atomically
with different UIDs, we move to using the lockfile binary, which is part of the
procmail package. To allow nonroot users to write to lock directories, permissions
are changed to allow write access by nonroot users.
See: https://www.flowdock.com/app/rulemotion/r-resinos/threads/gWMgK5hmR26TzWGHux62NpgJtVl
Change-type: minor
Closes: #1758
Signed-off-by: Christina Wang <christina@balena.io>
dmidecode for alpine 3.11 doesn't work in this device type. This change
moves to using `/proc/device-tree/product-sn` and
`/proc/device-tree/product-name` for these devices.
Resolves: #1916
Change-type: patch
Migration `M00008` had a bug with the check for legacy apps, which
resulted in devices that had at some point been updated from a single
container supervisor to get the error
```
Undefined binding(s) detected when compiling UPDATE. Undefined column(s): [appUuid] query
```
This adds a new migration with the fix to ensure broken fix the
inconsistent database state.
Change-type: patch
Closes: #1913
If an app is not in the target state means the supervisor no longer
has permissions to that app hence it cannot report on it. When moving
between apps, there is a transitional period where containers and images
from both apps can be in the current state, therefore filtering is
needed to prevent getting 401 errors from the API.
Starting with v3 state endpoint, the supervisor may receive the configuration
for the supervisor service on the target state. This commit allows the
supervisor to filter out the supervisor container from the current and target
state to let the update-balena-supervisor script handle the creation and update
of the supervisor container.
Updating and creating the supervisor container will be handled by a
future commit
Starting with v3 state endpoint, the supervisor can receive
service configuration for services that are meant to be installed as
overlays or filesets on the host, as well as configuration for services
that are meant to be installed on the root partition. This commit just
ignores those services from the target state until support is added
Local mode is still a device level config. Eventually it will become a
property of an app, but for now, we don't want the supervisor trying to
uninstall supervisor or host app when local mode is set
This change makes the `api-binder/report` module more agnostic
to internal device state implementation details, moving necessary
healthchecks and data filtering to getCurrentForReport in device-state.
This also adds generic functions to perform comparison between current
state reports.
The role of the api-binder module is to be the intermediary
between the cloud API and the device-state. For this reason it makes sense to
isolate target state retrieval and current state reporting into this
module. This change just moves current state reporting to the directory.
This is required as we are phasing out app ids and we need to be able to
get app uuid from the current state of the network. The app-id now
exists as a container in new networks
This commit will restart containers as it needs to recreate the network.
Removed redundant `getCurrentAppsForReport` and `getCurrentForComparison` since
the behavior of these methods is already handled by `getCurrentApps` and
`getCurrentState`.
Creates `lib/legacy.ts` and `device-state/legacy.ts` to deal with
migration from legacy target states (single container and v2) for all
apps and for apps.json respectively
This change updates types and database format in order to allow
receiving the new format of the target state from the cloud and allow
applications to keep working.
This change also updates metadata in the containers, meaning services
will need to be restarted on supervisor update
Change-type: major
With the move to v3 target state and the move forward to remove
database ids from the supervisor, we want to ensure the ids are only
used for legacy support (such as within the API). This change renames
the method and sets it as deprecated
It seems that in some cases the supervisor can report
an image without a `status` field leading to a cloud side 401 response.
See #1905 for more details.
Change-type: patch
The check for the docker network supervisor0 assumed that if the
interface supervisor0 existed, then the network would exist too. However this is not
true on the case of docker directory corruption, which would lead to a
loop with `Error: (HTTP code 404) no such network - network supervisor0 not found`.
Change-type: patch
Closes: #1806
As changes to config.json may restart the supervisor before it can
trigger the reboot (or something can kill the supervisor before it can run that step),
the supervisor needs a persistent signal that a reboot is required
(instead of the current transient signal).
With this commit, the supervisor will now create a breadcrumb in the
host `/tmp` folder, that will be checked as the last step of the
configuration changes.
As config.json changes may restart the engine (and hence the supervisor)
in newer OS versions, this ensures that the supervisor does not get
interrupted while writing to backends.
This is necessary with the changes as of balenaOS 2.82.6, which watches config.json
and will restart balena-hostname and some other services automatically on file change.
Change-type: patch
Relates-to: #1876
Signed-off-by: Christina Wang <christina@balena.io>
The functionality is pretty much the same, so we don't need the two
functions in two different places.
Signed-off-by: Christina Wang <christina@balena.io>
With more and more devices in ipv6 only networks, this ensures the
local addresses are reported to the cloud as part of the state patch.
Change-type: patch
`/mnt/boot` is a vfat partition which does not support atomic file
rename. The best course of action is to write and sync as fast as
possible to prevent corruption (although it still may happen)
Change-type: patch
The API uses 304 as a mechanism for load management on target state
requests. This may cause that the supervisor receives a 304 response
without having received a copy of the target state first, leading to
issues. This change checks for an etag when receiving a 304, throwing an
exception otherwise.
Change-type: patch
Cpu id is set to null so far for non ARM devices (e.g. Intel NUC). This
parses the output of dmidecode to get the cpu id and system model.
Change-type: patch
Cpu id is set to null so far for non ARM devices (e.g. Intel NUC). This
parses the output of dmidecode to get the cpu id and system model.
Change-type: patch
This avoids the supervisor trying to get back to the preloaded target
state if the database is deleted by any reason. It does this by moving the
used apps.json to a backup location.
Change-type: patch
Depends-on: #1841
Happy-eyeballs performs [dns lookups](https://github.com/balena-io-modules/happy-eyeballs/blob/master/src/happy-eyeballs.ts#L23)
for the requested addresses, however, because of the order of imports it
was not using the supervisor custom `dns.lookup` that handles `.local`
name resolution, making address resolution fail in those cases.
Moving the import after the `dns.lookup` patch fixes the problem.
The supervisor performs its own local resolution for `.local`
addresses due to a limitation in [musl](https://wiki.musl-libc.org/future-ideas.html).
The resolution function was not following exactly the nodejs [dns.lookup
specification](https://nodejs.org/api/dns.html#dnslookuphostname-options-callback)
which could cause certain clients to fail (in this case happy-eyeballs). This
updates the function to follow the specification.
Change-type: patch
The supervisor always applies target state on start to ensure that the
device is at the correct in case of a crash or another reason. This had
the side effect that if the database is deleted, the supervisor would
apply target state (which is empty), stopping services and possibly
causing volume data loss.
This prevents that behavior and ensures that the supervisor only
applies target state if a target has been set either by the cloud, preload or local
mode.
Change-type: patch
PR #1824 changed app update behavior to test that all images are there
before moving between releases. This check always fails in local mode
since local mode images are handled differently.
This PR fixes local mode again by skipping the check when `localMode` is
set.
Change-type: patch
The strategy has been broken for a while but it was not clear how to
fix it before the changes to image management. This PR fixes application
manager to remove images before downloading the new image. This will
only have an effect on changing images.
Closes: #1233
Change-type: patch
For download-then-kill strategy, this waits for all changing images on the target
release to be available on device before killing the old services. This
will prevent that multicontainer applications get to a state where some
services of the new release start runnning much before others have been
downloaded.
When adding new services to a multicontainer app, the supervisor will
now wait for other changing services to be downloaded before starting
the new service.
Closes: #1812
Change-type: patch
This removes the need for the app module to know about the naming
conventions for networks and volumes since those exist now within the
service itself. This also fixes a small bug where the volume would be
removed before the service itself had been successfully stopped.
Change-type: patch
We just added support for the TX2 NX, which supports u-boot
thus allows for using custom device-trees. Let's allow
for Jetson TX2 NX and future TX2 NX derived
device types to have device-trees configurable from the dashboard.
Change-type: patch
Signed-off-by: Alexandru Costache <alexandru@balena.io>
Current state reporting had a backoff when network or inconsistency
errors were found, but not on API errors. This change adds a backoff
using RetryAfter header if present to reduce load on API
Change-type: patch
The supervisor filters out some network interfaces for mac address
reporting, to remove (balena*,lo,tun*,etc). The previous filter was
matching any interface containing in one of the defined filters, making
it stricter than necessary. This commit fixes the issue
Change-type: patch
The current code authenticates unmanaged production devices which makes
no sense. Unmanaged devices do not need to authenticate with the API.
Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
Newer BalenaOS releases have replaced OS variants for a developmentMode
configuration setting. This commit uses this variable to set the OS
variant in the absence of `VARIANT_ID` from the os-release file.
Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
Add a `developmentMode` configuration variable to the schema. Do not expose
this on the device target state until local key-based authentication is
sorted.
Relates-to: https://jel.ly.fish/e9525e9e-aa74-478c-b931-52951c679f78
Change-type: patch
Signed-off-by: Alex Gonzalez <alexg@balena.io>
Some recent changes to the OS allowed some services to restart
automatically when the associated config files are changed.
In these cases we want to avoid restarting the same services
manually from the supervisor.
Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
PR #1749 introduced a bug when pushing local target state. An update to
the [image name normalization](f1bd4b8d9b/src/lib/docker-utils.ts (L81))
failed to consider the local image name format. This results in mangling
of image names in the database, i.e. the image `ubuntu:latest` is stored
as `/ubuntu:latest`. This causes an exception to be returned by the
dockerode `getImage('/ubuntu:latest').inspect()` call.
This sends the supervisor into a crash loop and is shown on the supervisor
journal logs as
```
getaddrinfo ENOTFOUND images
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:64:26)
```
Unfortunately if this happens on a user device, since the mangled image
name is already on the database, the easiest way to fix is to remove the
supervisor database and let the supervisor recreate it. Deleting the
database should be side effect free.
Change-type: patch
Preparing for the new v3 target state, where the supervisor will make environment
dependent ids optional and rely on using general UUIDs and user known identifiers
for comparison. This PR moves forward in that direction by removing some of those
comparisons for v2 target state.
- imageId to be replaced with imageName
- serviceId to be replace by serviceName
- releaseId to be replaced by commit (future release_uuid)
This is a backwards compatible change, meaning it doesn't completely get rid of
these identifiers (which are still being used by supervisor API and for state
patch), but will not depend on those identifiers for calculating steps to target state.
Change-type: minor
This replaces stored `volatileState` with a more declarative ImageTask API.
An ImageTask stores volatile image state for operations that cannot be
obtained through an engine query, such as fetching and removing an
image, state that can be updated while the task is running.
Image controller methods can now use the `reportEvent` method to create
and update the state of a longer running task.
The image manager module now uses tags instead of docker IDs as the main
way to identify docker images on the engine. That is, if the target
state image has a name `imageName:tag@digest`, the supervisor will always use
the given `imageName` and `tag` (which may be empty) to tag the image on
the engine after fetching. This PR also adds checkups to ensure
consistency is maintained between the database and the engine.
Using tags allows to simplify query and removal operations, since now
removing the image now means removing tags matching the image name.
Before this change the supervisor relied only on information in the
supervisor database, and used that to remove images by docker ID. However, the docker
id is not a reliable identifier, since images retain the same id between
releases or between services in the same release.
List of squashed commits
- Remove custom type NormalizedImageInfo
- Remove dependency on docker-toolbelt
- Use tags to traack supervised images in docker
- Ensure tag removal occurs in sequence
- Only save database image after download confirmed
Relates-to: #1616#1579
Change-type: patch
This functionality is needed when breadcrumbs aren't deleted after a HUP
rollback for whatever reason. Also rename HUP lock function.
Change-type: patch
Connects-to: #1459
Signed-off-by: Christina Wang <christina@balena.io>
This PR cleans up testing for supervisor compose modules. It also fixes broken
tests for application manager and removes a lot of dependencies for those tests
on DB and other unnecessary mocks. There are probably a lot of cases that tests
are missing but this should make writing new tests a lot easier.
This PR also creates a new mock dockerode (mockerode) module that should make it
easier to test operations that interact with the engine. All references
to the old mock-dockerode have not yet been removed but that should come
soon in another PR
List of squashed commits:
- Add tests for network create/remove
- Move compose service tests to test/src/compose and reorganize test descriptions
- Add support for image creation to mockerode
- Add additional tests for compose volumes
- Update mockerode so unimplemented fake methods throw. This is to ensure
tests using mockerode fail if an unimplemented method is used
- Update tests for volume-manager with mockerode
- Update tests for compose/images
- Simplify tests using mockerode
- Clean up compose/app tests
- Create application manager tests
Change-type: minor
On HUP, some healthceck services need to complete before
it's safe for the Supervisor to reboot the device when
applying state changes. rollback-{health|altboot}-breadcrumb
are the two files that Supervisor looks for and locks the device
on when present in this patch.
Not closing issue 1459 because there is a possible case where,
on altboot rollback, the breadcrumbs are not present. 1459
may be closed when this edge case is investigated.
Change-type: patch
Connects-to: #1459
See: https://www.flowdock.com/app/rulemotion/r-supervisor/threads/cL7YfNOLSfTPfw05h59GEW0kfOt
Signed-off-by: Christina Wang <christina@balena.io>
This fixes a specific issue when the supervisor cannot find the right
source for deltas (e.g. after the DB gets deleted), where legacy
behavior was to look for any image in the app.
Change-type: patch
Relates-to: #1729
We need the supervisor to be able to manage config.txt changes for these
Revolution Pi boards too.
Change-type: patch
Signed-off-by: Florin Sarbu <florin@balena.io>
On devices with bandwidth sensitivity, this config var
disables sending system information such as memory
usage or cpu temp as current state.
Closes: #1645
Change-type: minor
Signed-off-by: Christina Wang <christina@balena.io>
A previous PR (#1656) fixed validation for network ipam config,
checking that both network and subnet are defined for each ipam config entry
(as described in the docker documentation).
After that PR, the validations throws an exception if the network target state is incorrect,
but this turns out to be the wrong approach, because that exception is also triggered
when querying target state.
This isn't a problem in normal operation, but it is in local mode, because local
mode queries the old target state before sending a new one. Since the query fails,
the CLI can never push the new target state.
This PR replaces the exception with a warning on the logs, since a
misconfigured network won't cause any engine failures, it will just
prevent containers to communicate through the provided network.
A future improvement should move this validation to an earlier point in the process,
so the target state can get rejected before it even gets to a point it
can be used.
Relates-to: #1693
Change-type: patch
This extra info will mean the API is able to immediately set default
config vars based on the os/supervisor version so that they are
available on the first target state fetch rather than having a delay
whilst waiting for the supervisor to report them as part of a state
patch
Update balena-register-device from 6.1.6 to 7.2.0
Change-type: patch
This adds the error message from the API to journal logs to better
identify those cases where patching to the API fails.
Change-type: patch
Relates-to: #1680
'mz' can be safely replaced with fs.promises
and util.promisify for faster native methods.
'mkdirp' after Node v8 uses native fs.mkdir, thus
is redundant. 'body-parser' is deprecated and
contained within express v4.x.
Closes: #1567
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
This commit updates dockerode types to the latest 2.x version, removing the need
for custom composer types for network.
This commit also modifies network tests to use the new types
Change-type: minor
Using safeStateClone within doPurge to applyIntermediateTarget after
successful volume purge has led to various type deficiencies being revealed
in common.js. Add several inline types in common.js to satisfy
the type checker (credit: Page <page@balena.io>). Delete common.d.ts
since it's not required and might mistakenly mask true I/O types of
functions in common.js.
Closes: #1611
Change-type: patch
Signed-off-by: Christina Wang <christina@balena.io>
The device request object was created with untouched fields left unset. When
comparing state to determine if a transition is required this would
result in a mismatch between:
{
Driver: '',
Count: 1,
DeviceIDs: null,
Capabilities: [Array],
Options: null
}
and
{
Count: 1,
Capabilities: [Array],
}
Which in turn resulted in the target service being continously restarted.
The fix is to instantiate the object in full.
Connects-to: https://github.com/balena-io/balena-supervisor/issues/1449
Connects-to: ae646a07ec
Change-type: patch
Signed-off-by: Robert Günzler <robertg@balena.io>
Setting this this variable to a base64 encoded string will replace the splash
image on the device by rewriting `/mnt/boot/splash/balena-logo.png`.
This will also make a copy of the default balena logo so the splash can
be restored if the variable is removed.
Change-type: minor
Signed-off-by: Felipe Lalanne <felipe@balena.io>
The `ensureRequiredOverlay` function is currently ran for any backend,
at this moment this causes no issue, since most configuration backends
are defined per single device type. However, with the option to modify splash
images, which is available for all device types, the function would add
unwanted configuration vars to the splash image configuration. Moving it
to the config txt backend solves this issue.
This PR adds the following
* Supervisor v1 API application actions now return HTTP status code 423 when locks
are preventing the action to be performed. Previously this resulted in a
503 error
* Supervisor API v2 service actions now returns HTTP status code 423 when locks are
preventing the action to be performed. Previously, this resulted in an
exception logged by the supervisor and the API query timing out
* Supervisor API `/v2/applications/:appId/start-service` now does not
check for a lock. Lock handling in v2 actions is now performed by each
step executor
* `/v1/apps/:appId/start` now queries the target state and uses that
information to execute the start step (as v2 does). Previously start
resulted in `cannot get appId from undefined`
* Extra tests for API methods
Change-type: patch
Connects-to: #1523
Signed-off-by: Felipe Lalanne <felipe@balena.io>
During first time run of the supervisor, the target state is queried
by `reportInitialEnv`. Since this happens early on the initialization
process, this target state report is missed by any listeners and this
can lead to the initial target state not beeing applied (see #1455).
This PR ensures that target state is re-emitted if there were no
listeners setup on call to update.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1455
Some endpoints filter data based on the scope of the API key
used to make the request. When in LocalMode the check was not
being made correctly and all apps were considered out of scope.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
This paves the way for running multiple applications and storing
information related to the application against the application itself. A
couple of hacks have been added to v1 and v2 endpoints to maintain
compatability but these should eventually be removed with the addition
of a v3 api.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
Currently, when the label `io.balena.features.balena-socket` is set,
the balena engine socket is mounted under `/run/balena-engine.sock`.
This causes a problem when using systemd inside the container, since
this service remounts `/run` and `/run/lock` as tmpfs, causing the
socket to become unavailable.
Making a mount of the socket into `/host/run` solves this issue. This is
the same approach taken with DBUS.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1494
The source of truth for the device-type should be
device-type.json instead of config.json
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1472
A docker-compose.yml with the following structure
```
version: '2.1'
services:
app_1:
build: ./noisy-1
image: noisy1
app_2:
build: ./noisy-1
image: noisy1
app_3:
build: ./noisy-1
image: noisy1
```
Will lead to the supervisor creating multiple image database entries
with the same dockerId (this is because of how the engine handles this
particular case). This case is not handled by the removal process
leading to image pile up and increased disk usage.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1434
The memory information reported by the supervisor currently
estimates the value of used memory as `MemTotal - MemFree`.
However, linux systems will try to cache and buffer as much
memory as possible, which will affect the output of `MemFree`
(from /proc/meminfo) and in consequence the memory usage seen
by the user on the dashboard, which will appear much greater than
it is.
The correct calculation should be `MemTotal - MemFree - Buffers - Cached`,
which the calculation performed by `htop` and the `free` commands.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1471
With the addition of the system information feature (CPU temp) etc if
there wasn't any changes in the docker or config state of the device,
updates in system information would not be sent to the API. Now we
attempt to send data once every maxReportFrequency (although this does
not mean that we will be sending data that often, we still only send the
delta, if one exists)
Change-type: patch
Closes: #1481
Signed-off-by: Cameron Diver <cameron@balena.io>
In order to make supervisor upgrades more transparent, lets move away
from this env var since it requires a container restart any time the supervisor
is upgraded. We should ultimately move towards providing the supervisors
set of capabilities, but that can come later
Connects-to: #1447
Change-type: major
Signed-off-by: Matthew McGinn <matthew@balena.io>
Due to the singleton work, when performing migration M00005 and there
are apps with services created in the database, a deadlock occurs
during database initialization due to a circular
dependency for generating scoped keys.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1468
When trying to apply SSDT overlays in Up Board, the supervisor currently
gets stuck in a loop trying to apply target state. See #1465
This was due to a bug in parsing the configuration, which lead to
the method bootConfigChangeRequired returning true when no change was
needed.
Change-type: patch
Signed-off-by: Felipe Lalanne <felipe@balena.io>
Connects-to: #1465
Each service, when requesting access to the Supervisor API, will
now get an individual key which can be scoped to specific resources.
In this iteration the default scope will be to the application that
the service belongs to.
We also have a `global` scope which is used by the cloud API when in
managed mode.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
Due to switching to Alpine the ability to resolve mDNS
hostnames was lost. This patch overrides the lookup and
manually resolves the names.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
We provide a local DNS server for containers to use and this
was not allowed through the firewall when enabled.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
When invoking iptables-restore it can fail. This wasn't handled
and this makes sure that it fails gracefully.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
The host config variable HOST_DISCOVERABILITY can be set to
true or false, controlling the state of the avahi service. This
determines if the device advertises it's presence over mDNS.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Signed-off-by: Rich Bayliss <rich@balena.io>
Controlled by BALENA_HOST_FIREWALL_MODE, the firewall can
either be 'on' or 'off'.
- In the 'off' state, all traffic is allowed.
- In the 'on' state, only traffic for the core services provided
by Balena is allowed.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
When reporting device information, send the MAC address of any
interfaces on the system. Also expose in the Supervisor API at
the route GET /v1/device.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
This is part of the work to make the application-manager module much
less monolithic, in preperation for system apps and more generally
multi-app.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
We were treating the database class as a singleton, but still having to pass
around the db instance. Now we can simply require the db module and have
access to the database handle.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
This allows a response to an input with dport=`supevisor api port` and
is required when the host OS is doing stateful firewalling.
This should not affect things when stateful firewalling is not in
effect, as the standard OUTPUT chain policy is ACCEPT, so we're just
being explicit about it.
Change-type: patch
Backport-to: next, current, sunset
Signed-off-by: Cameron Diver <cameron@balena.io>