* Ensure commit is only reported when update has finished
* Change default delay between actions to 100ms
* Fix envArrayToObject for cases where the env var has an equal sign
* Use shell-quote to properly parse string command and entrypoint
* Fix preloading with a legacy apps.json
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Fix deleting unneeded image tags
* Fix inspectByName to work with tags besides digests when the image isn't really tagged
* Tag deltas that should have tags, and fix cleanup of dangling images
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This also changes the deviceState object to use promises instead of timeouts to schedule
applying the target state.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Plus several small bug fixes:
* Allow target states with apps with no release
* Fix lock override and a TypeError in compareServicesForUpdate
* Lowercase service names when doing migrations and legacy preload
* Fix deltas from scratch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
* Fix validation of 0, fix ulimits, don't compare mem_limit or mem_reservation until OS supports them
* Remove all instances of _.forEach
* ApplicationManager: have separate compareNetworksForUpdate and compareVolumesForUpdate
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also several bugfixes:
* Fix VPN control, logging in deviceConfig, and action executors in proxyvisor
* Fix bug in calculation of dependencies due to fields still using snake_case
* Fix snake_case in a migration, and remove unused lib/migration.coffee
* In healthcheck, count deviceState as healthy when a fetch is in progress (as in the non-multicontainer supervisor)
* Set always as default restart policy
* Fix healthcheck, stop_grace_period and mem_limit
* Lint and reduce some cyclomatic complexities
* Namespace volumes and networks by appId, switch default network name to 'default', fix dependencies in networks and volumes, fix duplicated kill steps, fix fat arrow on provisioning
* Check that supervisor network is okay every time we're applying target state
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also add support for several networks per container (but with no configuration yet).
Also some bugfixes and implement healthcheck and not disabling VPN on startup.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Also includes various improvements and bugfixes to services and the migration from legacy /data to volumes.
The switch ti migrations involves a dirty hack for webpack to properly resolve the paths to the migrations js files - it uses an expression
that webpack can't resolve, so we hardcode it to a value and use the ContextReplacementPlugin to make that value resolve to the migrations folder.
The downsides to this approach are:
- a change in knex code would break this
- the migration code is added twice to the supervisor image: once in the migrations folder (because knex needs to loop through the directory to find the files),
and once inside app.js (because I can't make webpack treat them as external)
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Two cases could've caused deadlocks:
1) Two services use a volume, and one service depends on the other. The volume config changes, but we can't update the volume because we need to kill
both services, and yet we can't kill the dependent service because its dependency isn't ready either.
2) A service with handover strategy uses a volume. The volume config changes. We can't update the volume because the running service is using it, and we can't
start the handover because it depends on the volume being ready. So we need to kill the service to update the volume config.
(Same for networks as with volumes)
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Change the way we get the network gateway to set up the supervisor API address.
Added support for cap_add, cap_drop and devices.
Some fixes like missing fat arrows and removing leftover code.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module provisions the device and takes care of getting the target state from the API, calling deviceState to apply it.
It also reports the current state of the device back to the API.
An important change is that the initial values of the device configuration (e.g. config.txt) are reported to the API, creating new config
variables if no values exist for a particular key. This will allow better management of config.txt by giving visibility to the initial configuration.
Changelog-Entry: Remove support for keeping the provisioning apiKey on Resin OS 1.X. Report initial values from config.txt and other device configuration variables to the Resin API.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This will be quickly replaced by a newer version with a different API, but for now we needed to maintain backwards compatibility (see #508).
This proxyvisor handles dependent apps and devices with a multicontainer parent app.
It also switches to the new update mechanism by inferring and applying updates step by step.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This commit adds models to manage services, images, volumes and networks.
The main model for this is ServiceManager, which manages the collection of services on the device. It has functions to query what services are running, and to perform actions like starting, killing or performing handovers.
The Service model allows defining the transformations between a container and its service representation, and includes the functions to compare a running service with a target to determine if an update needs to happen.
This model includes the relevant compose file entries for a service that are supported. Bind mounts are disallowed except for the ones that relate to supervisor features, and persistent data is now stored in named volumes.
The Images model allows fetching and removing images, and includes functionality to determine images that have to be cleaned up - now only dangling and old supervisor images are cleaned up automatically, and ApplicationManager
will remove images that correspond to old services that are no longer needed.
The Networks and Volumes models allow managing named networks and volumes that are part of composed applications.
Changelog-Entry: Remove all bind mounts that were specific to 1.X devices. Move the resin-kill-me file for the handover strategy to /tmp/resin. Add environment variables for the location of resin-kill-me and the lockfile. Use running containers to determine what services are running instead of storing them in the internal database. Use named volumes for persistent data.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This commit implements what we used to have in docker-utils.coffee now making use of coffeescript classes.
We remove the cleanup function as this is now handled directly by the ApplicationManager.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This update lock library allows an application to take a lockfile in several locations (subdirectories inside a base folder). The user of this library must be able
to exclusively create a lockfile in each of the corresponding locations, and if any of the files exist, the locking fails.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module takes care of inferring and applying the steps to run multicontainer applications. It will have a Proxyvisor to handle dependent apps and
devices. It understands the relationship between services, networks and volumes to infer the steps in the correct order, also taking update strategies into account.
Changelog-Entry: Allow running docker-compose-like multicontainer applications
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This model allows modifying config.txt on raspberry pi devices, as well as logging to display, bandwidth control variables and other supervisor
configuration settings. Configuration values are read from the underlying OS and the supervisor configuration where appropriate (i.e. the Config object), instead of storing the current state
in the database. This means that the supervisor will always use the real values to determine if changes have to be made.
This fixes several issues with config.txt, as the current values are now read from the file, and can be reported on the supervisor's first run (which will be implemented in APIBinder).
It also now treats dtoverlay and dtparam values as a JSON array without the enclosing brackets, for instance:
```
RESIN_HOST_CONFIG_dtparam="audio=on","spi=on"
```
Will produce the following lines in config.txt:
```
dtparam=audio=on
dtparam=spi=on
```
Changelog-Entry: Implement inference of device configuration. Allow array values for dtoverlay and dtparam.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module can also send logs for dependent devices (by passing a specific channel to the "log" function).
The log types are also moved to a separate module to be used by modules that perform logging.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module now uses the native node `os.networkInterfaces()` to retrieve the addresses,
instead of the gosuper endpoint.
We also add the very simple "blink" library that is also used by the Supervisor API.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This module will take care of applying the target state for the device and reporting its current state.
The state itself is handled by two other modules, ApplicationManager and DeviceConfig. The former will take care of running applications (including the dependent ones
via its Proxyvisor), and the latter will take care of device configuration like config.txt and supervisor configuration variables.
The way state is applied differs radically from the previous approach: the old application.coffee had a big `update` function that took all of the steps from fetching the target state
to running the containers. DeviceState, instead, does an iterative process through `triggerApplyTarget` of inferring the next steps to perform towards the target state, by looking at the current state and asking the ApplicationManager and DeviceConfig for
the next steps. It then applies the next steps and every time a step is completed, it schedules another round of inferring and applying the next steps.
Special care is taken to ensure `applyTarget` is not called simultaneously more than once.
This commit also adds a "device" module to handle reboot and shutdown, and moves gosuper calls to a separate module.
The module also uses a "network" module to manage network-related parts of the device's current state: IP addresses and the connectivity check.
The module implements a "normaliseLegacy" function that allows a migration from the models from older versions of the supervisor to the multicontainer models,
so that in case of a supervisor update we can have minimal downtime and bandwidth consumption when updating to the multicontainer supervisor - this migration allows
us to avoid cleaning up images, and also allows migrating the contents of the old /data for the app.
Changelog-Entry: Infer the current state of the device when applying the target state
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
These modules allow managing the models in the sqlite database and the configuration for the supervisor.
The database will now have a schema version, and the supervisor will normalize any legacy data when migrating
from an older schema (i.e. when doing a supervisor update). This will make model changes cleaner.
If a migration is needed, the DB initialization will return "true" and store the legacy data in a legacyData table. Once the supervisor finishes migrating the data,
it calls `db.finishMigration` to mark the migration complete and clear the legacyData table.
Changes in the models:
* The database implements the tables for multicontainer applications that now have services, networks and volumes as in a docker compose file.
* Dependent apps and devices now have separate tables to store their target states.
* The deviceConfig table now only stores target values, as the current ones will be inferred from the state of the device.
* We keep a table for images as we have no way to label them in docker storage, so we need to keep our own track of what images are relevant for the supervisor.
The Config object allows transparent management of configuration values, mainly through `get`, `getMany` and `set` functions. The values can be stored in config.json or
the database, and this is managed with a schema definition that also defines whether values are mutable and whether they have default values.
Some configuration values are of the "func" type, which means that instead of corresponding to a config.json or database key, they result from a helper function
that aggregates other configuration values or gets the value from other sources, like OS version and supervisor version.
Writes to config.json are atomic if a path to the file via /mnt/root can be found. We keep a write-through cache of the file to avoid unnecessary IO.
Changelog-Entry: Implement the multicontainer app models, and change the supervisor configuration management to avoid duplication between fields in config.json and fields in the internal database
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This will be the top level object in the multicontainer supervisor, using the following objects
to perform its duties:
* A DB object to manage the sqlite database models
* A Config object to manage configuration in sqlite and config.json
* An EventTracker to track events and send them to mixpanel
* A DeviceState object to manage the device state, including containers, device configuration and dependent devices
* An APIBinder object to manage all interactions with the Resin API
* The SupervisorAPI, implemented here, which exposes functionality from the other objects over an HTTP API with apikey authentication.
We also include an iptables module that the SupervisorAPI will use to only allow traffic from certain interfaces.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We had previously done this for all the other configuration variables, but for some reason we had missed these two.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Turned out that disk I/O can be the bottleneck when applying deltas on some devices. When the disk can’t keep up and consume the downloaded delta, there’s memory bloat due to buffering.
The updated version provides far better reliability when the device is under load and pretty much constant memory consumption with any number of concurrent deltas.
Change-Type: patch
We add an endpoint to the supervisor API that checks the following conditions to determine whether the supervisor is healthy:
* That the update cycle has run fully, in a time that's less than twice the poll interval. Unless we're downloading an image, in which case
we assume it's healthy (otherwise we'd get into the issue of determining a reasonable timeout for the image download, which is already done in a configurable way with delta options and the like).
* That the current state report to the Resin API hasn't failed more than 3 times. Unless the device has no connectivity, or the connectivity check is disabled, in which case we don't know
if the report failed simply because there's no network.
* That the gosuper component is working (since we periodically hit its API to get the IP addresses, we mark it as not working if this API call fails).
We need this endpoint to be unauthenticated for the docker daemon to be able to hit it (though, as the rest of the API, it is protected with iptables rules).
Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
I realized we're not deleting config.txt entries because the function checked for the values to apply
not to be empty, instead of just checking if the *changes* are empty.
So this closes#450
(Still not a complete solution to config.txt issues, which will come with the multicontainer PR, but at least it's a step forward)
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Otherwise, devices where we update from legacy supervisors might have other keys, like RESIN_SUPERVISOR_DELTA, stored in deviceConfig.values,
causing `_.isEqual(values, targetValues)` to always return false.
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Should only be relevant in really old OS versions, but still this is the correct default.
Fixes#439
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We used to have a recursion based on Promises and Promise.delay, which caused the promise never to resolve
so eventually the stack would be exhausted.
This fixes it by using a simpler way to check if reporting the state is in progress and using a setImmediate to
call applyState outside of the Promise chain.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
When sending events to mixpanel, we now use an explicit whitelist for the properties sent with the event, to avoid accidental leakage of any sensitive information.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This change removes the behavior where we would try to fetch an app image when starting the app. This might cause an unintended
download of an app that is not really needed anymore because we're starting the app on boot and an update cycle would make this image unnecessary.
So now we try to inspect the image, and if this fails we will throw an error, causing the app to be soft-deleted and the next update cycle to properly trigger
a download of whatever image we need from the target state.
We also improve the error catching when fetching an image, to specifically catch an "image not found" error before trying to download - otherwise, any other
random error will cause us to try to download the image again, which will not be a noop if we're using deltas. If there's any other error, the correct behavior
is to throw and retry later.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We change the way container cleanup works so that it compares running
app containers with the container names for the known apps. This allows
the cleanup to effectively delete any spurious/duplicated app containers.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
By storing the container name before creating the container, we avoid problems
if the supervisor crashes or the device reboots between creating a container and storing its id.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This means that the supervisor will be less aggressive in the case of the api experiencing issues, stopping it from compounding the issue if the api is being overloaded
Change-type: patch
Currently preloaded apps don't run because their markedForDeletion field in the database is null. In this commit we set it to false, and we
also change the startup check to also run any apps that have markedForDeletion as null (which should now never happen, but is still good as a backup
plan in case something else fails and to avoid regressions).
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This change improves the check for the DuplicateUuidError that can happen if a device has been provisioned but the API's response hasn't been persisted - the error message
returned from the API has been known to have a few variations (usually an extra dot at the end), so we now use _.startsWith instead of checking for equal strings to make the
supervisor still work under these variations.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
It appears preloaded apps have been getting restarted because the "apiKey" configuration value was only available after provisioning succeeded. This change ensures the
deviceApiKey that the device will use is injected into the env vars of preloaded apps, ensuring the app is not restarted (unless provisioning fails and the uuid and deviceApiKey are
regenerated, but this should be rare).
We also ensure that whenever an app's RESIN_API_KEY env var is populated, it is *always* done with the deviceApiKey and never with the provisioning apiKey.
Closes#457
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
- Updates resumable-request to 1.0.1
- Updates docker-progress to 2.0.3
- Removes `DEFAULT_DELTA_APPLY_TIMEOUT`; it’s not needed anymore, docker-delta reliably tracks rsync.
- Properly end the update when applying the delta results in an error.
Change-Type: patch
This commit changes the way the source for a delta is determined. We used to do
it by comparing the available tags with the one we want and relying on the format that
includes the app in the image name. Now we explicitly choose a delta source from the previous app
version if we have one, and otherwise use the image from any available app - which will allow us
to have a valid source when moving a device between apps.
For this to work consistently if there's an unexpected reboot, we now avoid deleting an app from the db
until the full update has succeeded. Instead, we mark the app for deletion so that we still have the image stored after the reboot.
This commit also changes a .map to .mapSeries when iterating over appIds for removal/install/update - this avoids parallel treatment
of apps which can cause inconsistencies in the status reported to the API.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We've been using `.catch Promise.OperationalError, ...` to catch errors when stopping a container and
detecting whether the error means that the container has already been stopped of removed.
Apparently, after the recent dockerode upgrade these errors are not typed as OperationalError anymore, causing error
messages like "No such container: null" when applying an update. This commit makes us catch all errors and check for their statusCode.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Errors from docker-modem that are passed from dockerode can have a "json" or "reason" property,
but that is generally less descriptive than the more standard "message", and can show up in the logs
as `[object Object]`. This commit changes it so that we log err.message if it is non-empty, and otherwise
look for json and reason.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
Applying a delta update consists of two parts:
1. The request to the delta server for the delta payload (an rsync batch file, plus some prepended Docker metadata). The response is a redirect to a URL that contains the delta (currently S3).
2. The request for the actual download of the delta. The response is streamed directly to rsync, which applies it onto the mounted root filesystem of the final image.
The first step may take a while as it may trigger the generation of the delta if the request is the first one for this combination of src/dest image and the images are large. If the request times out, either because of the delta server taking too long to respond or bad network, the Supervisor automatically schedules a retry to be performed after a while.
Currently, similar behaviour applies to the second step as well -- if the request fails, we immediately bail out and the Supervisor schedules a retry of the whole process (i.e. from step 1). But in this case it means we might have downloaded and applied some or most of the delta when a socket timeout occurs causing us to start all over again, wasting time and bandwidth.
This commit splits the process into the two discreet steps and improves the behaviour on the second step. Specifically:
- makes the Supervisor try to resume the delta download request several times before it bails out and starts the process all over again.
- removes arbitrary timeout which applied over the whole process and meant some deltas would never manage to be applied (because of large delta size and low network bandwidth).
- makes sure any launched rsync processes always exit and any opened streams consumed and closed.
Most of the improvements are in the two dependencies linked below -- `resumable-request` and `node-docker-delta` -- and this commit merely combines the updated versions of these modules.
Change-Type: minor
Connects-To: #140
Depends-On: https://github.com/resin-io/node-docker-delta/pull/19
Depends-On: https://github.com/resin-io-modules/resumable-request/pull/2
We mark when the device is rebooting and avoid some steps in the update cycle that change the device
state, similarly to when the device is in local mode, to avoid problems with non-atomic operations.
This doesn't solve *all* the potential scenarios of a reboot happening in the middle of an update, but at least
should prevent the case where we start an app container and reboot the device before saving the containerId, potentially
causing a duplicated container issue.
We also correct the API docs to reflect the 202 response when reboot or shutdown are successful.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
We used to store the uuid which would cause the supervisor to not attempt a provisioning even if offline mode
was turned off. This was to avoid preloaded apps being reloaded constantly leaving multiple containers.
We now avoid persisting the uuid, so that when the supervisor goes out of offline mode it can provision
without the need to wipe out the db. We avoid the problem with preloaded apps by not loading them
if there's apps already stored on the db.
(In the future, apps in the db will only represent target state and we can make preloaded apps be reloaded on every
start, but for now we can't do it as long as we store the containerId on the db - deleting an app on the db
means losing track of its containerId and therefore leaving an orphaned container)
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This makes the Async suffix for docker functions unnecessary. It also allows us to remove dockerode as an
explicit dependency.
Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
The test for an exec format error caused a `err.json.trim` is not a function
error so the message shown didn't relate to what the problem actually was.
This makes the test for the exec format error safer.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
The current setup would cause the check to always fail - the consequence is not *that* bad since
the provisioning key still gets overwritten, but it's better to delete it if we can.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This allows us to also remove a few npm dependencies and the docker compose binary.
Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
The supervisor uses an `API_ENDPOINT` environment variable to define what API to register to. Up to now this has been defaulted to `https://api.resin.io`.
(In Resin OS devices this environment variable ultimately comes from config.json).
This commit changes the behavior so that an empty value of that environment variable causes the supervisor to work in "offline mode", i.e. not connected to a remote server.
Basically only preloaded apps and the supervisor API work in this mode.
The config.json `supervisorOfflineMode` field still works for backwards compatibility, but we'll treat it as deprecated and it should be removed eventually.
Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
The logic to disable mixpanel initialization in offline mode was inverted :S causing mixpanel
to *only* be initialized when in offline mode.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This was properly done in the recently added changes in bootstrap.coffee,
but all other references where using "Os" instead of "OS.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
When requesting a delta, a `Promise.join` promise chain was producing unhandled
errors since it consisted in a separate promise chain from the parent function which,
was created with `new Promise`. This commit fixes this by creating the new Promise only
when it's needed, avoiding the creation of a separate promise chain.
Closes#432
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This avoids problems when updating the supervisor on an older OS, where the VPN and other
host services still require config.json to have an apiKey field to authenticate.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
This helps avoid unnecessary writes to the DB which may cause disk wearout.
We also change the error message in this section to show that the error might have happened
when fetching the device config as much as when setting it.
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>