671 Commits

Author SHA1 Message Date
Pablo Carranza Velez
c05474b1a9 Always execute special actions if the value stored in memory doesn't match the target. And when storing target values, only store relevant ones
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-12-10 18:48:30 -08:00
Pablo Carranza Velez
8fc1a0935b Avoid stopping the VPN until a remote target state has been fetched, and retry applying config variables when they fail
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-12-08 00:16:34 -08:00
Pablo Carranza Velez
21a9bb4e82 When listenPort is not specified, use 48484 as default
Should only be relevant in really old OS versions, but still this is the correct default.

Fixes #439

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-11-23 14:08:32 -08:00
Pablo Carranza Velez
51d6ab01c9 Avoid an indefinite recursion that grows the call stack when reporting the current state fails
We used to have a recursion based on Promises and Promise.delay, which caused the promise never to resolve
so eventually the stack would be exhausted.

This fixes it by using a simpler way to check if reporting the state is in progress and using a setImmediate to
call applyState outside of the Promise chain.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-11-02 15:54:09 -07:00
Pablo Carranza Velez
20d95ff024 Add whitelist-based filtering to mixpanel events
When sending events to mixpanel, we now use an explicit whitelist for the properties sent with the event, to avoid accidental leakage of any sensitive information.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-31 23:22:38 -07:00
Pablo Carranza Velez
34d37814c9 Tunnel all mixpanel events through the resin API
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-31 23:22:38 -07:00
Pablo Carranza Velez
ecf7e4206c Avoid fetching an image when it might be available or when starting an app because it might not be necessary
This change removes the behavior where we would try to fetch an app image when starting the app. This might cause an unintended
download of an app that is not really needed anymore because we're starting the app on boot and an update cycle would make this image unnecessary.
So now we try to inspect the image, and if this fails we will throw an error, causing the app to be soft-deleted and the next update cycle to properly trigger
a download of whatever image we need from the target state.

We also improve the error catching when fetching an image, to specifically catch an "image not found" error before trying to download - otherwise, any other
random error will cause us to try to download the image again, which will not be a noop if we're using deltas. If there's any other error, the correct behavior
is to throw and retry later.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-30 15:25:49 -07:00
Pablo Carranza Velez
0bc23df8c9 Refactor container cleanup to remove all spurious containers
We change the way container cleanup works so that it compares running
app containers with the container names for the known apps. This allows
the cleanup to effectively delete any spurious/duplicated app containers.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-30 15:25:49 -07:00
Pablo Carranza Velez
bd34a19a79 Use container name instead of id to identify apps, and avoid duplicated containers
By storing the container name before creating the container, we avoid problems
if the supervisor crashes or the device reboots between creating a container and storing its id.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-30 15:25:49 -07:00
Pablo Carranza Velez
c532344dce If a device is already provisioned but the key exchange fails, retry it until it succeeds
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-27 18:40:29 -07:00
Pagan Gazzard
21712ae810 Change the update retry to back off to the standard update check interval
This means that the supervisor will be less aggressive in the case of the api experiencing issues, stopping it from compounding the issue if the api is being overloaded

Change-type: patch
2017-10-24 15:36:43 -07:00
Pablo Carranza Velez
a87c6682a2 Ensure preloaded apps are properly loaded by setting their internal markedForDeletion to false, and run apps that have it set to null
Currently preloaded apps don't run because their markedForDeletion field in the database is null. In this commit we set it to false, and we
also change the startup check to also run any apps that have markedForDeletion as null (which should now never happen, but is still good as a backup
plan in case something else fails and to avoid regressions).

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-23 17:29:41 -07:00
Pablo Carranza Velez
3f198fc6aa Improve the check for when the device has been provisioned but the supervisor doesn't have knowledge of it in its local state
This change improves the check for the DuplicateUuidError that can happen if a device has been provisioned but the API's response hasn't been persisted - the error message
returned from the API has been known to have a few variations (usually an extra dot at the end), so we now use _.startsWith instead of checking for equal strings to make the
supervisor still work under these variations.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-23 17:28:36 -07:00
Pablo Carranza Velez
d98897cdcf Ensure preloaded apps get the deviceApiKey in the env vars, and apps never get the provisioning key, and improve detection of cases when the device has been pre-provisioned
It appears preloaded apps have been getting restarted because the "apiKey" configuration value was only available after provisioning succeeded. This change ensures the
deviceApiKey that the device will use is injected into the env vars of preloaded apps, ensuring the app is not restarted (unless provisioning fails and the uuid and deviceApiKey are
regenerated, but this should be rare).

We also ensure that whenever an app's RESIN_API_KEY env var is populated, it is *always* done with the deviceApiKey and never with the provisioning apiKey.

Closes #457
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-10-23 17:28:17 -07:00
Akis Kesoglou
78f74d757d Delta improvements
- Updates resumable-request to 1.0.1
- Updates docker-progress to 2.0.3
- Removes `DEFAULT_DELTA_APPLY_TIMEOUT`; it’s not needed anymore, docker-delta reliably tracks rsync.
- Properly end the update when applying the delta results in an error.

Change-Type: patch
2017-10-17 10:43:12 +03:00
Pablo Carranza Velez
31d09e70e4 Explicitly define the source for deltas, allow cross-app deltas, and iterate serially through apps when updating
This commit changes the way the source for a delta is determined. We used to do
it by comparing the available tags with the one we want and relying on the format that
includes the app in the image name. Now we explicitly choose a delta source from the previous app
version if we have one, and otherwise use the image from any available app - which will allow us
to have a valid source when moving a device between apps.

For this to work consistently if there's an unexpected reboot, we now avoid deleting an app from the db
until the full update has succeeded. Instead, we mark the app for deletion so that we still have the image stored after the reboot.

This commit also changes a .map to .mapSeries when iterating over appIds for removal/install/update - this avoids parallel treatment
of apps which can cause inconsistencies in the status reported to the API.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-09-14 14:52:06 -07:00
Pablo Carranza Velez
81a6c2f344 Fix problem catching errors when killing a container that doesn't exist
We've been using `.catch Promise.OperationalError, ...` to catch errors when stopping a container and
detecting whether the error means that the container has already been stopped of removed.

Apparently, after the recent dockerode upgrade these errors are not typed as OperationalError anymore, causing error
messages like "No such container: null" when applying an update. This commit makes us catch all errors and check for their statusCode.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-09-05 20:17:43 -07:00
Pablo Carranza Velez
dbb4fd8292 Prefer err.message when reporting errors from dockerode, then err.json and err.reason
Errors from docker-modem that are passed from dockerode can have a "json" or "reason" property,
but that is generally less descriptive than the more standard "message", and can show up in the logs
as `[object Object]`. This commit changes it so that we log err.message if it is non-empty, and otherwise
look for json and reason.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-09-01 15:08:10 -07:00
Akis Kesoglou
a5980918b4 Forward resume options
Change-Type: patch
2017-08-29 00:02:26 +03:00
Akis Kesoglou
52c55a0c1b Apply a default timeout unless one is given 2017-08-09 11:55:22 +03:00
Akis Kesoglou
1412785886 Try to resume the download of a delta if it fails due to flaky network
Applying a delta update consists of two parts:

1. The request to the delta server for the delta payload (an rsync batch file, plus some prepended Docker metadata). The response is a redirect to a URL that contains the delta (currently S3).
2. The request for the actual download of the delta. The response is streamed directly to rsync, which applies it onto the mounted root filesystem of the final image.

The first step may take a while as it may trigger the generation of the delta if the request is the first one for this combination of src/dest image and the images are large. If the request times out, either because of the delta server taking too long to respond or bad network, the Supervisor automatically schedules a retry to be performed after a while.

Currently, similar behaviour applies to the second step as well -- if the request fails, we immediately bail out and the Supervisor schedules a retry of the whole process (i.e. from step 1). But in this case it means we might have downloaded and applied some or most of the delta when a socket timeout occurs causing us to start all over again, wasting time and bandwidth.

This commit splits the process into the two discreet steps and improves the behaviour on the second step. Specifically:

- makes the Supervisor try to resume the delta download request several times before it bails out and starts the process all over again.
- removes arbitrary timeout which applied over the whole process and meant some deltas would never manage to be applied (because of large delta size and low network bandwidth).
- makes sure any launched rsync processes always exit and any opened streams consumed and closed.

Most of the improvements are in the two dependencies linked below -- `resumable-request` and `node-docker-delta` -- and this commit merely combines the updated versions of these modules.

Change-Type: minor
Connects-To: #140
Depends-On: https://github.com/resin-io/node-docker-delta/pull/19
Depends-On: https://github.com/resin-io-modules/resumable-request/pull/2
2017-08-09 11:55:22 +03:00
Pablo Carranza Velez
6f87b1db18 Avoid starting apps on startup if device has to reboot due to a configuration change
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-08-02 20:07:13 -03:00
Pablo Carranza Velez
42ac7487e7 When the device is about to reboot or shutdown, close the API server and avoid applying updates
We mark when the device is rebooting and avoid some steps in the update cycle that change the device
state, similarly to when the device is in local mode, to avoid problems with non-atomic operations.
This doesn't solve *all* the potential scenarios of a reboot happening in the middle of an update, but at least
should prevent the case where we start an app container and reboot the device before saving the containerId, potentially
causing a duplicated container issue.

We also correct the API docs to reflect the 202 response when reboot or shutdown are successful.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-27 20:07:24 -03:00
Pablo Carranza Velez
55ad977ede Avoid unhandled errors when in offline mode due to a missing apiEndpoint
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-27 12:46:14 -03:00
Pablo Carranza Velez
f0344ca4be Do not persist the uuid when in offline mode, so that the supervisor tries to provision if it goes out of offline mode
We used to store the uuid which would cause the supervisor to not attempt a provisioning even if offline mode
was turned off. This was to avoid preloaded apps being reloaded constantly leaving multiple containers.

We now avoid persisting the uuid, so that when the supervisor goes out of offline mode it can provision
without the need to wipe out the db. We avoid the problem with preloaded apps by not loading them
if there's apps already stored on the db.

(In the future, apps in the db will only represent target state and we can make preloaded apps be reloaded on every
start, but for now we can't do it as long as we store the containerId on the db - deleting an app on the db
means losing track of its containerId and therefore leaving an orphaned container)

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-27 12:46:14 -03:00
Pablo Carranza Velez
7aedd7062d Update docker-delta to 1.1.1, docker-toolbelt to 3.0.1, docker-progress to 2.6.0 to add support for deltas and overlay2
This makes the Async suffix for docker functions unnecessary. It also allows us to remove dockerode as an
explicit dependency.

Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-27 01:48:35 -03:00
Pablo Carranza Velez
279ab60233 Fix the message shown when docker gives a 500 error when starting a container
The test for an exec format error caused a `err.json.trim` is not a function
error so the message shown didn't relate to what the problem actually was.
This makes the test for the exec format error safer.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-26 10:41:25 -03:00
Pablo Carranza Velez
1790939046 Use webpack to join all modules
This saves around 13MB in the resulting uncompressed docker image.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-11 14:01:16 -07:00
Joe Roberts
087e7c3af0
Deprecate edge device type
Change-type: major
2017-07-05 10:20:26 +01:00
Pablo Carranza Velez
8b2138f744 Fix semver comparison for OS version when determining if the device has deviceApiKey support
The current setup would cause the check to always fail - the consequence is not *that* bad since
the provisioning key still gets overwritten, but it's better to delete it if we can.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-07-04 02:56:31 -07:00
Pablo Carranza Velez
928df5b140 Allow registering the deviceApiKey in a non-compatible OS by making the apiKey equal the deviceApiKey, and add an fsync to all config.json writes
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-06-30 18:00:01 -07:00
Pablo Carranza Velez
597a2c6b65 Remove the undocumented and unused sideload and compose APIs
This allows us to also remove a few npm dependencies and the docker compose binary.

Change-Type: major
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-06-26 13:08:52 -07:00
Pablo Carranza Velez
18ca98a2ae Fix provisioning key exchange by passing apikey in the request
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-06-26 07:04:43 -07:00
Pablo Carranza Velez
00b53bd03e When apiEndpoint is not defined, work in offline mode
The supervisor uses an `API_ENDPOINT` environment variable to define what API to register to. Up to now this has been defaulted to `https://api.resin.io`.
(In Resin OS devices this environment variable ultimately comes from config.json).
This commit changes the behavior so that an empty value of that environment variable causes the supervisor to work in "offline mode", i.e. not connected to a remote server.
Basically only preloaded apps and the supervisor API work in this mode.

The config.json `supervisorOfflineMode` field still works for backwards compatibility, but we'll treat it as deprecated and it should be removed eventually.

Change-Type: minor
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-06-14 12:57:47 -07:00
Pablo Carranza Velez
1e7bdad7a9 Fix mixpanel initialization when not in offline mode
The logic to disable mixpanel initialization in offline mode was inverted :S causing mixpanel
to *only* be initialized when in offline mode.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-06-14 12:48:29 -07:00
Joe Roberts
d4e3e45e52
Dependent device DB migrations 2017-06-14 09:27:47 +01:00
Joe Roberts
786874dbb6
Update dependent device DB
Change-type: patch
2017-06-14 09:27:47 +01:00
Petros Angelatos
171460041f
enable SSL when connecting to pubnub
Fixes #451

Connected-to: pubnub/javascript#89
Change-Type: patch
Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
2017-06-13 19:13:09 +03:00
Pablo Carranza Velez
08c5413413 Fix typo in how hostOSVersionPath was camel-cased
This was properly done in the recently added changes in bootstrap.coffee,
but all other references where using "Os" instead of "OS.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-05-11 13:06:38 -07:00
Pablo Carranza Velez
cb0152c5ea Properly handle errors when requesting deltas
When requesting a delta, a `Promise.join` promise chain was producing unhandled
errors since it consisted in a separate promise chain from the parent function which,
was created with `new Promise`. This commit fixes this by creating the new Promise only
when it's needed, avoiding the creation of a separate promise chain.

Closes #432
Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-05-08 15:05:42 -07:00
Pablo Carranza Vélez
ac2531368c Merge pull request #427 from resin-io/dont-update-deviceconfig-if-unchanged
Avoid writing target device config to DB if it hasn't changed
2017-04-27 21:18:38 -07:00
Pablo Carranza Velez
c251de1cd3 Only delete the provisioning key if the supervisor is running on an OS that supports using the deviceApiKey
This avoids problems when updating the supervisor on an older OS, where the VPN and other
host services still require config.json to have an apiKey field to authenticate.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-04-27 13:31:25 -07:00
Pablo Carranza Velez
e36fa601ad Avoid writing target device config to DB if it hasn't changed
This helps avoid unnecessary writes to the DB which may cause disk wearout.

We also change the error message in this section to show that the error might have happened
when fetching the device config as much as when setting it.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-04-27 09:42:41 -07:00
Pagan Gazzard
42cd3a6b01 Fix an infinite loop that could happen when trying to bootstrap if the key exchange fails
Change-Type: patch
2017-04-26 13:54:15 -07:00
Pagan Gazzard
89ccb6480d Fix the case of being registered with a version of the cli/sdk that does not support device api keys.
Change-Type: patch
2017-04-26 13:52:43 -07:00
Pagan Gazzard
d31ee452d0 Deduplicate the device fetching logic 2017-04-24 12:09:50 -07:00
Pagan Gazzard
1002629a5e Improve key exchange by first checking if an existing device api key is valid. 2017-04-22 15:17:00 -07:00
Pagan Gazzard
477184d72d Add handling for duplicate UUIDs and key exchanging for old user-api-keys
Change-Type: minor
2017-04-20 21:37:27 -07:00
Pagan Gazzard
03ec97ab8d Change to the new device registration method to exchange our provisioning key with a dedicated api key for the device.
Change-Type: minor
2017-04-20 21:37:27 -07:00
Pablo Carranza Velez
4d322c72a0 Issue #420: Avoid supervisor crash without connection by properly memoizing promise-returning functions
device.getID caused a fatal error when connection was down, as the memoization with `promise: true` throws
synchronously. Changing memoizee to use `promise: 'then'` makes the memoization work as expected.

Change-Type: patch
Signed-off-by: Pablo Carranza Velez <pablo@resin.io>
2017-04-05 14:51:26 -07:00