Non-200 errors were causing the watchdog to restart the supervisor,
which in some cases could cause a restart loop. Instead we change the
code to only treat communication failures as an error, and report status
code failures directly.
Change-type: patch
Closes: #843
Signed-off-by: Cameron Diver <cameron@balena.io>
We were not allowing newlines previously by virtue of the regex not
allowing them. The docker daemon and supervisor handling code both
support them, so we allow them in the parsing code too.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
We were validating the input configuration values by coercing them to
the correct type, and then using the initial value to be saved (which
currently is always converted to a string).
We now use the coerced value as the actual value we will store, and more
importantly emit. This means that the config.on('change' ...) calls will
always be properly typed, which before this change was not a guarantee
that we could make.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Adjacent ports are always grouped together by docker when reporting the
container state (from an inspect), so adjacent ports defined in the
compose file would not match as they would not have been normalized.
We make sure to always normalize the input port configuration, so that
it will match the docker output (if it should).
We also don't sort in the fromComposePorts function anymore as that is
handled by the normalize function.
Closes: #897
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
In the original implementation it was possible that the delete did not
wait for the kill step to be finished, so it would not be deleted.
We seperate this process into two steps, to allow for the container to
have stopped before proceeding.
Change-type: patch
Closes: #841
Signed-off-by: Cameron Diver <cameron@balena.io>
We define the type for each config value, and validate the data when
retrieving and setting it.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
Docker always returns ports in ascending order, so if they aren't
specified like that in the compose, a restart loop would occur. This
patch changes the port maps to be stored in ascending order, based on
an alphabetical sort of the internalStart port (not taking into account
the protocol). This is the same as how Docker returns them, so they will
match, regardless of input form.
Change-type: patch
Closes: #824
Signed-off-by: Cameron Diver <cameron@balena.io>
Also change the return format of ApplicationManager.getStatus(), which
does not conform to the above.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@balena.io>
Up to now, there was a slim but non-zero chance that an image would be downloaded between the call to `@getTarget` inside deviceState
(which gets the target state and creates Service objects using information from available images), and the call to
`@images.getAvailable` in ApplicationManager (which is used to determine whether we should keep waiting for a download or start the
service). If this race condition happened, then the ApplicationManager would infer that a service was ready to be started (because
the image appears as available), but would have incomplete information about the service because the image wasn't available when
the Service object was created. The result would be that the service would be started, and then immediately on the next applyTarget
the ApplicationManager would try to kill it and restart it to update it with the complete information from the image.
This patch changes this behavior by ensuring that all of the additional information about the current state, which includes available images,
is gathered *before* building the current and target states that we compare. This means that if the image is downloaded after the call to getAvailable, the Service might be constructed with all the information about the image, but it won't be started until the next pass, because ApplicationManager will treat it as still downloading.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
We also replace a createTableIfNotExists in the migrations with hasTable then createTable, to
avoid a warning message about it being not recommended.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
The move from pure CoffeeScript to TypeScript has brought a
few changes to the way transpiling happens. Previously, through
serendipity, the way `startIPAddressUpdate` was called worked
because of the binding convention pre-transpiling.
However, with the move to TypeScript, this has altered and
the assumption that a lack of parentheses would call the
method before supplying a callback into the returned function
is incorrect. The method must be specifically called first.
Connects-to: #836
Change-type: patch
Signed-off-by: Heds Simons <heds@balena.io>
When updating from old supervisors (<7.0.0), we've been so far using a fake id 1 for serviceId, imageId
and releaseId since these were not available in the old supervisor. This causes problems when the supervisor
tries to report these values to the API. Moreover, the app from the legacy supervisor has an image URL
that doesn't include the content hash - this causes the supervisor to believe the image is not really downloaded
and try to fetch it again.
To fix these issues, we add a request to the API when the supervisor starts up and detects that there's a legacy
app that needs to be normalised. We fetch the appropriate release, and use it to populate the resource ids
and the updated image URL.
This should avoid the unnecessary image download, and errors reporting target state after an update.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Now you can either pass a serviceName or imageId to restart a specific
service, which is much easier from a device container.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@balena.io>
The supervisor has been doing regular pulls instead of deltas
from scratch for a while now. We remove remaining references to
resin/scratch, and add a handler in docker-utils to fall back
to a regular pull with a null deltaSource (which should never be
called anyways, but is left as a precaution).
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
getAndSetTargetState in the APIBinder had a check for whether the target state has changed.
When triggering an update from the API, we want to *always* call triggerApplyTarget, especially
when the update is forced. Otherwise the API endpoint doesn't work for forced updates (and in general,
will rarely trigger an update unless a change in the API's target state has happened)
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
In commit 19cd310da3 this line was deleted,
probably to avoid deleting local mode apps when setting the API target and
viceversa but we need to delete old apps to avoid problems when moving
the device between apps.
We now filter by source to avoid the problem with local mode too.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Otherwise we may skip saving a target image to the db when updating from legacy supervisors,
which in turn prevents from deleting the legacy image entry (with imageId = 1), leaving the
supervisor in a state where it can't report its current state to the API.
While we're at it, we also remove an unused variable in _getStatus.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
We do this by formatting the keys from the target state before comparing them
with the ones from the current state (that are already formatted to strip the namespace
prefix).
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
This avoids issues on provisioning where the current state
(esp. config.txt) that we want to save is retrieved without
a RESIN_ or BALENA_ prefix, causing those values to be lost.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Otherwise old releases (where applications expected tty to be true)
would break.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
i.e. if we're not provisioned or if the target state is empty (of apps), then we
read apps.json to preload. We then mark that the target state has been set to avoid
trying to preload again if we ever get an empty target state from the API.
Change-type: patch
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
They will take precedence over any existing RESIN_ variables. We strip both namespaces now
whenever we get the target values.
This also fixes preloading with a legacy config (the interface to get the config keys from
the legacy apps.json was broken).
Change-type: minor
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Instead of hardcoding balena.sock we use this variable since the path changes
with the balena -> balenaEngine rename.
We keep also mounting into balena.sock for backwards compatibility (even though
most tools should transparently use the DOCKER_HOST env var).
Change-type: minor
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
But we keep backwards compatibility by normalizing existing io.resin labels
into io.balena ones, and adding both RESIN_ and BALENA_ env vars for these features.
Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
We update the supervisor API docs to reflect the new `BALENA_` injected env vars.
We also add the new sensitive env vars to the list of vars not sent by the API.
And we remove the unused RESIN_DATA_PATH and RESIN_PROXYVISOR_HOOK_RECEIVER from the
vars the supervisor parses from its own environment.
Signed-off-by: Pablo Carranza Velez <pablo@balena.io>
Also fixes the handover wait (since the service interface for it was missing).
We add some logging to this process too.
Change-type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We change the lockfile to /tmp/balena/updates.lock, and the resin-kill-me file to /tmp/balena/handover-complete.
In the host, we change to use /tmp/balena-supervisor instead of /tmp/resin-supervisor.
We add BALENA_ env vars in addition to the RESIN_ env vars.
We keep backwards compatibility by using both paths for the lockfile and handover, and keeping the RESIN_ env vars.
Changelog-entry: Move the handover and lock files to /tmp/balena, rename them, and add BALENA_ env vars
Change-type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This commit changes a bug where the compose healthcheck would always
overwrite the healthcheck set by the image - even if no compose
healthcheck exists.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
These options were discussed in an arch call and the conclusion was
that they don't need to be in the blacklist.
Change-type: patch
Changelog-entry: Remove a few blacklisted config.txt options
Signed-off-by: Zubair Lutfullah Kakakhel <zubair@resin.io>
We were joining every network on container creation, which is currently
bugged in Docker. We were also joining networks afterwards, so the
non-default networks are joined post-creation, and only the networkMode
container is joined on creation.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
This enables the switch to be added to the compose, and the handling of
docker messages has been changed to ensure that the multiplexed logs
which result are handled properly.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@resin.io>
Stability improvements;
* Printing of unsupported compose fields
* Added a lot of tests
* All compose configuration has a default value, enabling better
comparison
Change-type: minor
Signed-off-by: Cameron Diver <cameron@resin.io>
When using the predicate functions in bluebird `.catch`es from
typescript, the compiler would complain that the predicates do not
accept a function which takes an error. Because these are specific
errors, I've extended the base `Error` class, and added the extra fields
we expect.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
Debounce will mean that in certain cases, the events will never be sent,
whereas with throttle we can be sure that it will be sent a minimum
amount per time slice.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
This was causing a bug where the applications were cloned when
restarting all of them. These clones did not carry over the binds, so
when starting the containers back up, two different versions of `this`
were being used.
Change-type: patch
Closes: #736
Signed-off-by: Cameron Diver <cameron@resin.io>
Device names with newlines cause reboot loops, due to newlines not being
supported by docker. This PR will warn when a device name contains a
newline.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
Currently the service has the device name applied after the docker
config is generated. This means that is has no effect until the next
restart.
This commit ensures that the device name is applied before the docker
config is generated, meaning that the env var gets applied to the
upcoming container.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
The previous approach had the bad side effect of resending tons of logs
in the case of a supervisor restart.
The approach can be improved by storing the last timestamp per container
and re-attaching at the correct point.
Change-type: minor
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
The resinApiEndpoint config option existed for legacy reasons, where the
apiEndpoint was passed in via env vars, but this is no longer the case,
and the current supervisor wouldn't run on these older versions of
resinOS anymore anyway, so I've removed the references to this legacy
endpoint, as it made reasoning about offline mode weird.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@resin.io>
During the conversion to typescript, the behaviour of the database
handle changed slightly, meaning storing a reference to the models
function also requires a bind to be applied too.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
Also change logsChannelSecret value to be queried with the api backend,
so that logs are not shared between instances. This has been implemented
as the first config function provider with mutability.
Change-type: minor
Closes: #675
Signed-off-by: Cameron Diver <cameron@resin.io>
This function takes the docker output representing ports, and generates
the port map values from them. This means that services can accurately
be compared and next steps can be inferred.
The normalization function ensures that regardless of source, PortMaps
that represent the same port setup will be represented correctly
internally.
Change-type: patch
Closes: #644
Signed-off-by: Cameron Diver <cameron@resin.io>
Before this change, port ranges were iterated and stored as an object
per port mapping. Now the port ranges are stored as ranges until they
need to be converted to objects. The need to convert to objects still
exists as this is the format which the docker remote API expects, but
hopefully this should alleviate bugs like #644 by making the memory more
shorter-lived.
Also added more tests.
Change-type: patch
Closes: #644
Signed-of-by: Cameron Diver <cameron@resin.io>
Also had to change config module to bind `.this` value, due to
differences in setup.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
This commit abstracts all of the boot config code out of the
device-config module, ready to extend with different config backends.
Change-type: patch
Signed-off-by: Cameron Diver <cameron@resin.io>
Resin’s delta server supports Balena deltas as version 3 deltas. This commit adds support for triggering delta generation for Balena deltas, and applying them locally to the device via a simple pull.
The delta version to use when updating has been abstracted away as an env var that is user-defined. The default value is still instructing use of rsync deltas (v2).
Change-Type: minor
The supervisor will now check that a source of an application matches
the current source, and only start it if so.
Change-type: patch
Closes: #658
Signed-off-by: Cameron Diver <cameron@resin.io>
This field will represent the apiEndpoint that the application came
from, (or an empty string for local apps). This means that when
configuring an application to work on a different environment, as long
as the endpoint is different, the supervisor can know not to start the
old application.
Change-type: minor
Connects-to: #658
Signed-off-by: Cameron Diver <cameron@resin.io>
This api endpoint is the endpoint which the intial config was reported.
Also changed the code to detect if the api endpoint has changed, and
therefore whether we need to re-report the initial config, to avoid
losing hardware specific information.
Change-type: minor
Closes: #649
Signed-off-by: Cameron Diver <cameron@resin.io>
We add a bunch of additional unit tests, and also a coverage report using istanbul.
The tests are not meant to cover everything, but they're a first attempt at having *some* unit testing
on the supervisor. There's much to improve but hopefully it helps catch obvious errors.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This is to combat when a working directory is in the compose file for a
service with a trailing slash. Docker will strip this slash and that
means service comparisons will fail going forward - even if they are the
same.
Change-type: patch
Closes: #635
Signed-off-by: Cameron Diver <cameron@resin.io>
Add webpack config and dependencies to have typescript built, and also
convert src/lib/validation.coffee to typescript.
In this conversion I also added a lot of debugging which should help the
upcoming local mode development.
Change-type: minor
Signed-off-by: Cameron Diver <cameron@resin.io>
It now allows a trailing `b`, as the docker-compose docs specify.
In addition the regex now specifies a case-insensitive flag, to catch
both upper and lower case memory numbers (the rest of the function
supported these already).
Change-type: patch
Closes: #603
Signed-off-by: Cameron Diver <cameron@resin.io>
Otherwise if the hostname on the supervisor container differs from the hostname on the host, the current and target
services will never match.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We were getting the correct working dir from the compose or image config, but not really using it.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This should fix an issue where, on an update that only changes container metadata, the image install for the old image
is kept around on the API.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We migrate to a default composition because we need to avoid deleting existing docker images, but
we need to use the legacy-container label to avoid potentially creating a duplicated container when a target state comes in.
(Just like we do for preloaded apps)
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
In some cases we were using early `return res.status(...).send(...)` to send 400 errors
but this happened inside a promise chain that later sent another status and response.
We fix this with the correct indentation of the success response so that an early return doesn't fall there.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We weren't passing a "target" to serviceAction, which made the start action fail.
Plus we need to get the container again after starting to get the latest containerId.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Turns out shell-quote's parse function also replaces environment variables, which we don't want in this case. So we escape dollar signs
before calling shell-quote's parse function.
Also shell-quote takes some characters like `>` and globs and returns an object - so we return those objects to string form.
This should still be simpler/better than writing our own shlex.split, I hope...
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Use the correct defaults for the delta config variables that have them
* Only mount /lib/firmware and /lib/modules if they exist on the host
* hardcode-migrations.js: Nicer line separation
* APIBinder: switch to using a header for authentication, and keep credentials saved in the API clients
* Fix hrtime measurements in milliseconds
* Do not uses classes for routers
* compose: properly initialize networkMode to the first entry in networks if there is one
* Fix some details regarding defaults in validation and service
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Plus a few bugfixes.
* Add support for cgroup_parent
* Add support for specifying a single value in tmpfs
* Fix support for extra_hosts
* Add support for group_add
* Add support for pid mode (only host and empty value are supported for now)
* Add support for pids_limit
* Add support for security_opt
* Add support for storage_opt
* Add support for userns_mode
* Add support for ipc (except for another container's)
* Add support for mac_address
* Add support for oom_kill_disable
* Add support for 'user' compose option
* Add support for working_dir and fix support for user when image specifies it
* Add support for bind-mounting the balena socket using a label
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Get imageId when normalising a dep. app from the DB
* Fix the appId in migrations when updating the supervisor
* Use the update lock to update a service's metadata
* Restart clears volatile target state
* Fix function definition for updateMetadata
* Improve backwards compatibility of /v1/apps/:appId endpoint
* Fix multicontainer deltas to work with resumable-request 2.0
* Fix dependent target normalisation logic
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Switch default dependent device type to generic
* Reduce noise in logs
* Limit to 3 simultaneous delta downloads
* Better check for deltaSource
* When checking volume dependencies, do not compare regular (non-named) volumes
* Store imageId for dependent apps, and don't report dependent images with invalid imageIds
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Ensure commit is only reported when update has finished
* Change default delay between actions to 100ms
* Fix envArrayToObject for cases where the env var has an equal sign
* Use shell-quote to properly parse string command and entrypoint
* Fix preloading with a legacy apps.json
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Fix deleting unneeded image tags
* Fix inspectByName to work with tags besides digests when the image isn't really tagged
* Tag deltas that should have tags, and fix cleanup of dangling images
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This also changes the deviceState object to use promises instead of timeouts to schedule
applying the target state.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Plus several small bug fixes:
* Allow target states with apps with no release
* Fix lock override and a TypeError in compareServicesForUpdate
* Lowercase service names when doing migrations and legacy preload
* Fix deltas from scratch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Fix validation of 0, fix ulimits, don't compare mem_limit or mem_reservation until OS supports them
* Remove all instances of _.forEach
* ApplicationManager: have separate compareNetworksForUpdate and compareVolumesForUpdate
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also several bugfixes:
* Fix VPN control, logging in deviceConfig, and action executors in proxyvisor
* Fix bug in calculation of dependencies due to fields still using snake_case
* Fix snake_case in a migration, and remove unused lib/migration.coffee
* In healthcheck, count deviceState as healthy when a fetch is in progress (as in the non-multicontainer supervisor)
* Set always as default restart policy
* Fix healthcheck, stop_grace_period and mem_limit
* Lint and reduce some cyclomatic complexities
* Namespace volumes and networks by appId, switch default network name to 'default', fix dependencies in networks and volumes, fix duplicated kill steps, fix fat arrow on provisioning
* Check that supervisor network is okay every time we're applying target state
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also add support for several networks per container (but with no configuration yet).
Also some bugfixes and implement healthcheck and not disabling VPN on startup.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also includes various improvements and bugfixes to services and the migration from legacy /data to volumes.
The switch ti migrations involves a dirty hack for webpack to properly resolve the paths to the migrations js files - it uses an expression
that webpack can't resolve, so we hardcode it to a value and use the ContextReplacementPlugin to make that value resolve to the migrations folder.
The downsides to this approach are:
- a change in knex code would break this
- the migration code is added twice to the supervisor image: once in the migrations folder (because knex needs to loop through the directory to find the files),
and once inside app.js (because I can't make webpack treat them as external)
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Two cases could've caused deadlocks:
1) Two services use a volume, and one service depends on the other. The volume config changes, but we can't update the volume because we need to kill
both services, and yet we can't kill the dependent service because its dependency isn't ready either.
2) A service with handover strategy uses a volume. The volume config changes. We can't update the volume because the running service is using it, and we can't
start the handover because it depends on the volume being ready. So we need to kill the service to update the volume config.
(Same for networks as with volumes)
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Change the way we get the network gateway to set up the supervisor API address.
Added support for cap_add, cap_drop and devices.
Some fixes like missing fat arrows and removing leftover code.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module provisions the device and takes care of getting the target state from the API, calling deviceState to apply it.
It also reports the current state of the device back to the API.
An important change is that the initial values of the device configuration (e.g. config.txt) are reported to the API, creating new config
variables if no values exist for a particular key. This will allow better management of config.txt by giving visibility to the initial configuration.
Changelog-Entry: Remove support for keeping the provisioning apiKey on Resin OS 1.X. Report initial values from config.txt and other device configuration variables to the Resin API.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This will be quickly replaced by a newer version with a different API, but for now we needed to maintain backwards compatibility (see #508).
This proxyvisor handles dependent apps and devices with a multicontainer parent app.
It also switches to the new update mechanism by inferring and applying updates step by step.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This commit adds models to manage services, images, volumes and networks.
The main model for this is ServiceManager, which manages the collection of services on the device. It has functions to query what services are running, and to perform actions like starting, killing or performing handovers.
The Service model allows defining the transformations between a container and its service representation, and includes the functions to compare a running service with a target to determine if an update needs to happen.
This model includes the relevant compose file entries for a service that are supported. Bind mounts are disallowed except for the ones that relate to supervisor features, and persistent data is now stored in named volumes.
The Images model allows fetching and removing images, and includes functionality to determine images that have to be cleaned up - now only dangling and old supervisor images are cleaned up automatically, and ApplicationManager
will remove images that correspond to old services that are no longer needed.
The Networks and Volumes models allow managing named networks and volumes that are part of composed applications.
Changelog-Entry: Remove all bind mounts that were specific to 1.X devices. Move the resin-kill-me file for the handover strategy to /tmp/resin. Add environment variables for the location of resin-kill-me and the lockfile. Use running containers to determine what services are running instead of storing them in the internal database. Use named volumes for persistent data.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This commit implements what we used to have in docker-utils.coffee now making use of coffeescript classes.
We remove the cleanup function as this is now handled directly by the ApplicationManager.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This update lock library allows an application to take a lockfile in several locations (subdirectories inside a base folder). The user of this library must be able
to exclusively create a lockfile in each of the corresponding locations, and if any of the files exist, the locking fails.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module takes care of inferring and applying the steps to run multicontainer applications. It will have a Proxyvisor to handle dependent apps and
devices. It understands the relationship between services, networks and volumes to infer the steps in the correct order, also taking update strategies into account.
Changelog-Entry: Allow running docker-compose-like multicontainer applications
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This model allows modifying config.txt on raspberry pi devices, as well as logging to display, bandwidth control variables and other supervisor
configuration settings. Configuration values are read from the underlying OS and the supervisor configuration where appropriate (i.e. the Config object), instead of storing the current state
in the database. This means that the supervisor will always use the real values to determine if changes have to be made.
This fixes several issues with config.txt, as the current values are now read from the file, and can be reported on the supervisor's first run (which will be implemented in APIBinder).
It also now treats dtoverlay and dtparam values as a JSON array without the enclosing brackets, for instance:
```
RESIN_HOST_CONFIG_dtparam="audio=on","spi=on"
```
Will produce the following lines in config.txt:
```
dtparam=audio=on
dtparam=spi=on
```
Changelog-Entry: Implement inference of device configuration. Allow array values for dtoverlay and dtparam.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module can also send logs for dependent devices (by passing a specific channel to the "log" function).
The log types are also moved to a separate module to be used by modules that perform logging.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module now uses the native node `os.networkInterfaces()` to retrieve the addresses,
instead of the gosuper endpoint.
We also add the very simple "blink" library that is also used by the Supervisor API.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module will take care of applying the target state for the device and reporting its current state.
The state itself is handled by two other modules, ApplicationManager and DeviceConfig. The former will take care of running applications (including the dependent ones
via its Proxyvisor), and the latter will take care of device configuration like config.txt and supervisor configuration variables.
The way state is applied differs radically from the previous approach: the old application.coffee had a big `update` function that took all of the steps from fetching the target state
to running the containers. DeviceState, instead, does an iterative process through `triggerApplyTarget` of inferring the next steps to perform towards the target state, by looking at the current state and asking the ApplicationManager and DeviceConfig for
the next steps. It then applies the next steps and every time a step is completed, it schedules another round of inferring and applying the next steps.
Special care is taken to ensure `applyTarget` is not called simultaneously more than once.
This commit also adds a "device" module to handle reboot and shutdown, and moves gosuper calls to a separate module.
The module also uses a "network" module to manage network-related parts of the device's current state: IP addresses and the connectivity check.
The module implements a "normaliseLegacy" function that allows a migration from the models from older versions of the supervisor to the multicontainer models,
so that in case of a supervisor update we can have minimal downtime and bandwidth consumption when updating to the multicontainer supervisor - this migration allows
us to avoid cleaning up images, and also allows migrating the contents of the old /data for the app.
Changelog-Entry: Infer the current state of the device when applying the target state
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
These modules allow managing the models in the sqlite database and the configuration for the supervisor.
The database will now have a schema version, and the supervisor will normalize any legacy data when migrating
from an older schema (i.e. when doing a supervisor update). This will make model changes cleaner.
If a migration is needed, the DB initialization will return "true" and store the legacy data in a legacyData table. Once the supervisor finishes migrating the data,
it calls `db.finishMigration` to mark the migration complete and clear the legacyData table.
Changes in the models:
* The database implements the tables for multicontainer applications that now have services, networks and volumes as in a docker compose file.
* Dependent apps and devices now have separate tables to store their target states.
* The deviceConfig table now only stores target values, as the current ones will be inferred from the state of the device.
* We keep a table for images as we have no way to label them in docker storage, so we need to keep our own track of what images are relevant for the supervisor.
The Config object allows transparent management of configuration values, mainly through `get`, `getMany` and `set` functions. The values can be stored in config.json or
the database, and this is managed with a schema definition that also defines whether values are mutable and whether they have default values.
Some configuration values are of the "func" type, which means that instead of corresponding to a config.json or database key, they result from a helper function
that aggregates other configuration values or gets the value from other sources, like OS version and supervisor version.
Writes to config.json are atomic if a path to the file via /mnt/root can be found. We keep a write-through cache of the file to avoid unnecessary IO.
Changelog-Entry: Implement the multicontainer app models, and change the supervisor configuration management to avoid duplication between fields in config.json and fields in the internal database
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This will be the top level object in the multicontainer supervisor, using the following objects
to perform its duties:
* A DB object to manage the sqlite database models
* A Config object to manage configuration in sqlite and config.json
* An EventTracker to track events and send them to mixpanel
* A DeviceState object to manage the device state, including containers, device configuration and dependent devices
* An APIBinder object to manage all interactions with the Resin API
* The SupervisorAPI, implemented here, which exposes functionality from the other objects over an HTTP API with apikey authentication.
We also include an iptables module that the SupervisorAPI will use to only allow traffic from certain interfaces.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We had previously done this for all the other configuration variables, but for some reason we had missed these two.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Turned out that disk I/O can be the bottleneck when applying deltas on some devices. When the disk can’t keep up and consume the downloaded delta, there’s memory bloat due to buffering.
The updated version provides far better reliability when the device is under load and pretty much constant memory consumption with any number of concurrent deltas.
Change-Type: patch
We add an endpoint to the supervisor API that checks the following conditions to determine whether the supervisor is healthy:
* That the update cycle has run fully, in a time that's less than twice the poll interval. Unless we're downloading an image, in which case
we assume it's healthy (otherwise we'd get into the issue of determining a reasonable timeout for the image download, which is already done in a configurable way with delta options and the like).
* That the current state report to the Resin API hasn't failed more than 3 times. Unless the device has no connectivity, or the connectivity check is disabled, in which case we don't know
if the report failed simply because there's no network.
* That the gosuper component is working (since we periodically hit its API to get the IP addresses, we mark it as not working if this API call fails).
We need this endpoint to be unauthenticated for the docker daemon to be able to hit it (though, as the rest of the API, it is protected with iptables rules).
Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
I realized we're not deleting config.txt entries because the function checked for the values to apply
not to be empty, instead of just checking if the *changes* are empty.
So this closes#450
(Still not a complete solution to config.txt issues, which will come with the multicontainer PR, but at least it's a step forward)
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Otherwise, devices where we update from legacy supervisors might have other keys, like RESIN_SUPERVISOR_DELTA, stored in deviceConfig.values,
causing `_.isEqual(values, targetValues)` to always return false.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Should only be relevant in really old OS versions, but still this is the correct default.
Fixes#439
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We used to have a recursion based on Promises and Promise.delay, which caused the promise never to resolve
so eventually the stack would be exhausted.
This fixes it by using a simpler way to check if reporting the state is in progress and using a setImmediate to
call applyState outside of the Promise chain.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
When sending events to mixpanel, we now use an explicit whitelist for the properties sent with the event, to avoid accidental leakage of any sensitive information.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This change removes the behavior where we would try to fetch an app image when starting the app. This might cause an unintended
download of an app that is not really needed anymore because we're starting the app on boot and an update cycle would make this image unnecessary.
So now we try to inspect the image, and if this fails we will throw an error, causing the app to be soft-deleted and the next update cycle to properly trigger
a download of whatever image we need from the target state.
We also improve the error catching when fetching an image, to specifically catch an "image not found" error before trying to download - otherwise, any other
random error will cause us to try to download the image again, which will not be a noop if we're using deltas. If there's any other error, the correct behavior
is to throw and retry later.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We change the way container cleanup works so that it compares running
app containers with the container names for the known apps. This allows
the cleanup to effectively delete any spurious/duplicated app containers.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>