That way that this is performed is by first adding a depends_on entry
for the target service if it appears in a network mode. Then when we
generate the docker container for this service, we use the containerId
of the target container and replace the network_mode with
`container:<id>`.
When comparing state, we check that the containerId still points to the
contianerId of the target container, and in this way we ensure that
when a network mode target container changes, we change the dependent
container too.
Change-type: minor
Closes: #851
Signed-off-by: Cameron Diver <cameron@balena.io>
The code before this change could potentially remove a volume which
should not be removed if a container was deleted before the call that
references said volume.
To avoid this, we additionally filter the list of volumes to cleanup by
any that are referenced in the target state. This means that cleanup
will never remove it, as long as it's still supposed to be there,
regardless of if a container references it or not.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
This change also makes sure that in the application-manager workflow we
pass around instances of the Volume class, rather than just the config.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Since we were comparing the VPN's value before adding the explicit "true", there were cases
were the VPN is off, and therefore "value" didn't match the default, so the supervisor would
create a device specific SUPERVISOR_VPN_CONTROL = true, which is unnecessary and causes issues if
users don't expect this and move the device to an app that has VPN disabled. The correct behavior
is to compare "varValue" and only create a device config var if this value differs from the default.
(This was the behavior before the TS conversion in 01ed7bb103 )
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
This change makes DeviceState to wait until local mode switch is definitely
completed before actually applying the state, which avoids races in state cleanup.
Change-type: patch
Signed-off-by: Roman Mazur <roman@balena.io>
In local mode, we now update device status on the backend,
but omit applications info in our updates.
Closes: #959
Change-type: minor
Signed-off-by: Roman Mazur <roman@balena.io>
Also use the supervisor's own container logging monitoring code when
running livepush on the supervisor container.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
Changes are collected together and exist in memory, for querying and
saving. Once every 10 mins, every changed timestamp is flushed to the
database.
Change-type: patch
Closes: #987
Signed-off-by: Cameron Diver <cameron@balena.io>
This is a massive commit, but nothing related to runtime has actually
changed, only the lint errors have changed.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Changes in the node engine related to streams would cause the gzip
streams flush function to be called at the wrong times. The sinon fake
timers were also interacting with this.
We use setImmediate to call the flush function, and remove sinon timers
for the logging tests.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Before this change the first time the cleanup code runs would be before
the migrations have had a chance to execute. This change makes it so
that the cleanup code always runs once the migrations have finished.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
When assigning multiple host ports to a single container port before
this change, the supervisor would incorrectly take only the first host
port into consideration. This change makes it so that every host port
per container port is considered.
Closes: #986
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Prior to this change, we would `_.uniq` the expose value before adding
values from the port mappings. This could cause ports to get added
twice, which would cause the supervisor to think that there is a
configuration mismatch.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Even though this would never have attempted to report the state to the
api during local mode, it leaves behind artifacts which would cause the
state to be sometimes reported when exiting local mode. This would cause
the api to reject the update unecessarily.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
This had a bug where it was using the `in` operator on a list. It may
have worked for some cases, but would have failed for others.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
We add a database table, which holds information about the last
timestamp of a log successfully reported to a backend (local or remote).
We then use this value to calculate from which point in time to start
reporting logs from the container. If this is the first time we've seen
a container, we get all logs, and for every log reported we save the
timestamp. If it is not the first time we've seen a container, we
request all logs since the last reported time, ensuring no interruption
of service.
Change-type: minor
Closes: #937
Signed-off-by: Cameron Diver <cameron@balena.io>
Container logging is now handled by a class which attaches and emits
information from the container. We add these to the directory
logging-backends/, and rename it to logging/.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
We also add a catch to any errors when getting configuration, and send 503 in this case, even if it's
unlikely.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
This commit does two related things:
* We make the poll interval a random time between 0.5 and 1.5 times the configured interval.
* We introduce the BALENA_SUPERVISOR_INSTANT_UPDATE_TRIGGER configuration variable, that defaults to true. If this variable is set
to false, then calls to /v1/update are ignored, and on startup the supervisor waits for a poll interval before getting the target state.
This will help especially on cases where there's a large number of devices on a single network. By disabling instant updates and setting a large
poll interval, we can now achieve a sitation where not all devices apply an update at the same time, which can help avoid
overwhelming the network.
Change-type: minor
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
When comparing a stopped container after a start request, the container
ID will be present in the target state (where usually it is not). We
were already filtering this value out of the current state, but
neglected to do so for the target state. This change now ensures we
remove it from both alias lists if it exists.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
In an edge case observed in the field, a supervisor's database held two applications
because the device had been moved and the update lock was set in the old app. This causes
the updated supervisor to be unable to start, logging "No compatible releases found in API",
because it can't fetch the release for the app it was moved from.
This commit changes the migration code to iterate through all apps, and remove any for which
we can't get a release.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Before this change, service name resolution would only occur in the
default network. This was because we were not explicitly adding aliases
of the service names to the aliases fields.
We also fix the comparison, which would do funny things based on
container IDs, which was correct but unnecessary.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
If a value is requested which does not pass validation, we instead set
it to the default value, to ensure that the state engine continues to
work and move towards the target state.
Change-type: minor
Closes: #938
Signed-off-by: Cameron Diver <cameron@balena.io>
We put the supervisor0 network in the 10.114.104.0/25 subnet to avoid issues when the device
is in a network using the 172.17.* network.
We also ensure we recreate this network if it was created in the incorrect subnet (i.e. if we're updating
from an old supervisor that didn't do this), for which we have to kill any containers using this network.
Closes#731
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Without this patch, if for some reason device pinning fails (e.g. connectivity goes down) or anything
interrupts the initialization after provisioning completes but before pinning is completed, after a retry
the supervisor would just skip the pinning code, leaving the device unpinned. This patch ensures that the
pinning procedure is run even if the device was already provisioned (as long as the pinning flag has been set,
of course). This matches the behavior that the CoffeeScript code had from before the TypeScript conversion.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Otherwise we'll keep doing the rest of the APIBinder init steps, like reporting initial config,
potentially before completing the provisioning.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
This avoid a race condition, in which config.txt can be cleared if a target state is fetched before the
initial values have been created as config vars.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
The last update of pinejs-client to pinejs-client-request made the way we were
using $expand on the migration break. This switches to the correct way of doing it now.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
This endpoint returns the last known device name from the API. This
differs from the BALENA_DEVICE_NAME_AT_INIT env var because this will
not change throughout the runtime of the container.
Closes: #908
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
We run the risk of the state engine exiting early when a dependency is
not ready, especially in local mode. This changes forces a noop to be
returned when we are waiting on another service, which is the process
used elsewhere in the state engine.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
This function would usually check that an image is present for a
dependency, but in local mode the images would have never been inserted
into the database.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
To avoid unnecesarilly using resources, we add an exponential backoff
when the noops explicitly come from the device-config module.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
During the conversion to typescript, the VPN active check was being
performed on the directory, and not the file that the VPN creates,
meaning it would always return true (as we explicitly create the
directory on startup if it does not exist).
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
In the case of an airgapped supervisor, with a target state that
requests the vpn be enabled, the supervisor will constantly loop on
trying to set the vpn to on. Unfortunately the vpn requires an internet
connection to be configured, so it will never be turned on.
We add the concept of no-ops to the device-config state change steps,
and don't end the state engine transition while these are present
(similar to how image pulls are implemented).
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
When using the label `io.balena.features.balena-api` the supervisor will inject 2 environment
variables into the container:
- BALENA_API_KEY
- BALENA_API_URL
This allows the container to access the currently associated API using the KEY.
Change-type: patch
Signed-off-by: Rich Bayliss <rich@balena.io>
Connects-to: #847
The default value for the delta apply timeout was changed from `''` to
`'0'` (note strings as these are database values) - but if the value
existed in the database already, this would fail validation. We add a
migration which will look explcitily for the failing value and switch it
to the new default.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Non-200 errors were causing the watchdog to restart the supervisor,
which in some cases could cause a restart loop. Instead we change the
code to only treat communication failures as an error, and report status
code failures directly.
Change-type: patch
Closes: #843
Signed-off-by: Cameron Diver <cameron@balena.io>
We were not allowing newlines previously by virtue of the regex not
allowing them. The docker daemon and supervisor handling code both
support them, so we allow them in the parsing code too.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
We were validating the input configuration values by coercing them to
the correct type, and then using the initial value to be saved (which
currently is always converted to a string).
We now use the coerced value as the actual value we will store, and more
importantly emit. This means that the config.on('change' ...) calls will
always be properly typed, which before this change was not a guarantee
that we could make.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Adjacent ports are always grouped together by docker when reporting the
container state (from an inspect), so adjacent ports defined in the
compose file would not match as they would not have been normalized.
We make sure to always normalize the input port configuration, so that
it will match the docker output (if it should).
We also don't sort in the fromComposePorts function anymore as that is
handled by the normalize function.
Closes: #897
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
In the original implementation it was possible that the delete did not
wait for the kill step to be finished, so it would not be deleted.
We seperate this process into two steps, to allow for the container to
have stopped before proceeding.
Change-type: patch
Closes: #841
Signed-off-by: Cameron Diver <cameron@balena.io>
We define the type for each config value, and validate the data when
retrieving and setting it.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
Docker always returns ports in ascending order, so if they aren't
specified like that in the compose, a restart loop would occur. This
patch changes the port maps to be stored in ascending order, based on
an alphabetical sort of the internalStart port (not taking into account
the protocol). This is the same as how Docker returns them, so they will
match, regardless of input form.
Change-type: patch
Closes: #824
Signed-off-by: Cameron Diver <cameron@balena.io>
Also change the return format of ApplicationManager.getStatus(), which
does not conform to the above.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Up to now, there was a slim but non-zero chance that an image would be downloaded between the call to `@getTarget` inside deviceState
(which gets the target state and creates Service objects using information from available images), and the call to
`@images.getAvailable` in ApplicationManager (which is used to determine whether we should keep waiting for a download or start the
service). If this race condition happened, then the ApplicationManager would infer that a service was ready to be started (because
the image appears as available), but would have incomplete information about the service because the image wasn't available when
the Service object was created. The result would be that the service would be started, and then immediately on the next applyTarget
the ApplicationManager would try to kill it and restart it to update it with the complete information from the image.
This patch changes this behavior by ensuring that all of the additional information about the current state, which includes available images,
is gathered *before* building the current and target states that we compare. This means that if the image is downloaded after the call to getAvailable, the Service might be constructed with all the information about the image, but it won't be started until the next pass, because ApplicationManager will treat it as still downloading.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>