Compare commits

...

30 Commits

Author SHA1 Message Date
flowzone-app[bot]
01585c688e
v17.0.2 2025-04-02 20:16:11 +00:00
flowzone-app[bot]
eeac56efc3
Merge pull request #2415 from balena-os/local-leftover-locks
Fix search for app leftover locks
2025-04-02 20:15:12 +00:00
Felipe Lalanne
d475b1d830
Fix search for app leftover locks
The leftover locks search was creating an array rather than an object
keyed by the appId. This could affect the lock cleanup and make leftover
locks from one app affect the install of the app in local mode.

Change-type: patch
2025-04-01 17:56:06 -03:00
flowzone-app[bot]
49b18b4a37
v17.0.1 2025-03-25 20:41:22 +00:00
Christina Wang
623a1638c1
Merge pull request #2413 from balena-os/clarify-firewall-docs-on-host-network-containers
Clarify firewall docs on behavior with host network containers
2025-03-25 13:40:28 -07:00
Christina Ying Wang
caed4dcca0
Clarify firewall docs on behavior with host network containers
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-03-25 13:10:52 -07:00
flowzone-app[bot]
7efdeea0f7
v17.0.0 2025-03-24 22:18:11 +00:00
flowzone-app[bot]
2d1871e16d
Merge pull request #2400 from balena-os/network-custom-ipam-label
Add Docker network label if custom ipam config
2025-03-24 22:17:12 +00:00
Christina Ying Wang
b596c77ce2
Add Docker network label if custom ipam config
In a target release where the only change is the addition or removal
of a custom ipam config, the Supervisor does not recreate the network
due to ignoring ipam config differences when comparing current and target
network (in network.isEqualConfig). This commit implements the addition of
a network label if the target compose object includes a network with custom
ipam. With the label, the Supervisor will detect a difference between a
network with a custom ipam and a network without, without needing to compare
the ipam configs themselves.

This is a major change, as devices running networks with custom ipam configs
will have their networks recreated to add the network label.

Closes: #2251
Change-type: major
See: https://balena.fibery.io/Work/Project/Fix-Supervisor-not-recreating-network-when-passed-custom-ipam-config-1127
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-03-24 14:55:19 -07:00
flowzone-app[bot]
8c6e3df7d9
v16.12.9 2025-03-20 18:43:08 +00:00
flowzone-app[bot]
94cdd3fcd7
Merge pull request #2411 from balena-os/service-dependencies
Start a dependent if all dependencies are started
2025-03-20 18:42:13 +00:00
Felipe Lalanne
7764f98c9d
Start a dependent if all dependencies are started
The previous behavior required that dependencies were running beefore
starting the dependent service. This made it that services dependent on
a one-shot service would not get started and goes against the default
docker behavior.

Depending on a service to be running will require the implementation of
[long syntax depends_on](https://docs.docker.com/reference/compose-file/services/#long-syntax-1) and the condition
`service_healthy`.

Change-type: patch
Closes: #2409
2025-03-20 14:51:32 -03:00
flowzone-app[bot]
b8032edc04
v16.12.8 2025-03-12 14:50:35 +00:00
flowzone-app[bot]
175872b358
Merge pull request #2408 from balena-os/fix-socket-timeout
Ensure poll socket timeout is defined early
2025-03-12 14:49:34 +00:00
Felipe Lalanne
ae337a1dd7
Remove GOT retries on state poll
The state poll already has retry implementation, making the GOT default
unnecessary.

Change-type: patch
2025-03-12 10:59:16 -03:00
Felipe Lalanne
bdbc6a4ba4
Ensure poll socket timeout is defined early
We have observed that even when setting the socket timeout on the
state poll https request, the timeout is only applied once the socket is
connected. This causes issues with Node's auto family selection (happy
eyeballs), as the default https timeout is 5s which means that larger
[auto select attempt timeout](https://nodejs.org/docs/latest-v22.x/api/net.html#netgetdefaultautoselectfamilyattempttimeout) may result in the socket timing out before all connection attempts have been tried.

This commit sets a different https Agent for state polling, with a
timeout matching the `apiRequestTimeout` used for other request events.

Change-type: patch
2025-03-12 10:59:11 -03:00
flowzone-app[bot]
978652b292
v16.12.7 2025-03-06 19:11:20 +00:00
flowzone-app[bot]
7771c0e96b
Merge pull request #2406 from balena-os/release-locks-on-app-remove
Release locks when removing apps
2025-03-06 19:10:38 +00:00
Felipe Lalanne
026dc0aed2
Release locks when removing apps
This prevents leftover locks that can prevent other operations from
taking place.

Change-type: patch
2025-03-06 11:50:31 -03:00
flowzone-app[bot]
5ef6b054fd
v16.12.6 2025-03-04 14:25:09 +00:00
flowzone-app[bot]
3cca2b7ecd
Merge pull request #2404 from balena-os/polling-improvements
Polling improvements
2025-03-04 14:24:18 +00:00
Felipe Lalanne
3d8bd28f5a
Update GOT to v14.4.6 2025-03-04 10:46:47 -03:00
Felipe Lalanne
6d00be2093
Log non-API errors during state poll
The supervisor was failing silently if an error happened while establishing the
connection (e.g. requesting the socket).

Change-type: patch
2025-03-04 10:46:45 -03:00
Felipe Lalanne
f8bdb14335
Fix target poll healthcheck
The Target.lastFetch time compared when performing the healthcheck
resets any time a poll is attempted no matter the outcome. This changes
the behavior so the time is reset only on a successful poll

Change-type: patch
2025-03-04 10:45:31 -03:00
flowzone-app[bot]
c88cf6a259
v16.12.5 2025-03-04 13:35:28 +00:00
Page-
906ce6dc0d
Merge pull request #2405 from balena-os/fix-api-request-timeout
Decrease balenaCloud api request timeout from 15m to 59s
2025-03-04 13:34:35 +00:00
Pagan Gazzard
49163e92a0 Decrease balenaCloud api request timeout from 15m to 59s
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value

Change-type: patch
2025-03-04 12:29:18 +00:00
flowzone-app[bot]
f67e45f432
v16.12.4 2025-03-03 13:42:20 +00:00
flowzone-app[bot]
91335051ac
Merge pull request #2403 from balena-os/dont-revert-to-regular-pull-if-401
Don't revert to regular pull if delta server 401
2025-03-03 13:41:29 +00:00
Christina Ying Wang
2dc9d275b1 Don't revert to regular pull if delta server 401
If the Supervisor receives a 401 Unauthorized from the delta server
when requesting a delta image location, we should surface the error
instead of falling back to a regular pull immediately, as there could
be an issue with the delta auth token, which refreshes after
DELTA_TOKEN_TIMEOUT (10min), or some other edge case.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-24 16:17:15 -08:00
28 changed files with 646 additions and 148 deletions

View File

@ -1,3 +1,204 @@
- commits:
- subject: Fix search for app leftover locks
hash: d475b1d8301c83b932ce272d3496bf4aac0ef1ad
body: |
The leftover locks search was creating an array rather than an object
keyed by the appId. This could affect the lock cleanup and make leftover
locks from one app affect the install of the app in local mode.
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
version: 17.0.2
title: ""
date: 2025-04-02T20:16:09.754Z
- commits:
- subject: Clarify firewall docs on behavior with host network containers
hash: caed4dcca0043f848f6dd5a3d1a2f82a2466e8d6
body: ""
footer:
Change-type: patch
change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
signed-off-by: Christina Ying Wang <christina@balena.io>
author: Christina Ying Wang
nested: []
version: 17.0.1
title: ""
date: 2025-03-25T20:41:20.141Z
- commits:
- subject: Add Docker network label if custom ipam config
hash: b596c77ce2d229e79082cbb1f0022f93806f09ae
body: >
In a target release where the only change is the addition or removal
of a custom ipam config, the Supervisor does not recreate the network
due to ignoring ipam config differences when comparing current and
target
network (in network.isEqualConfig). This commit implements the addition
of
a network label if the target compose object includes a network with
custom
ipam. With the label, the Supervisor will detect a difference between a
network with a custom ipam and a network without, without needing to
compare
the ipam configs themselves.
This is a major change, as devices running networks with custom ipam
configs
will have their networks recreated to add the network label.
footer:
Closes: "#2251"
closes: "#2251"
Change-type: major
change-type: major
See: https://balena.fibery.io/Work/Project/Fix-Supervisor-not-recreating-network-when-passed-custom-ipam-config-1127
see: https://balena.fibery.io/Work/Project/Fix-Supervisor-not-recreating-network-when-passed-custom-ipam-config-1127
Signed-off-by: Christina Ying Wang <christina@balena.io>
signed-off-by: Christina Ying Wang <christina@balena.io>
author: Christina Ying Wang
nested: []
version: 17.0.0
title: ""
date: 2025-03-24T22:18:08.753Z
- commits:
- subject: Start a dependent if all dependencies are started
hash: 7764f98c9d357a1942628e57951266767555f67b
body: |
The previous behavior required that dependencies were running beefore
starting the dependent service. This made it that services dependent on
a one-shot service would not get started and goes against the default
docker behavior.
Depending on a service to be running will require the implementation of
[long syntax depends_on](https://docs.docker.com/reference/compose-file/services/#long-syntax-1) and the condition
`service_healthy`.
footer:
Change-type: patch
change-type: patch
Closes: "#2409"
closes: "#2409"
author: Felipe Lalanne
nested: []
version: 16.12.9
title: ""
date: 2025-03-20T18:43:06.085Z
- commits:
- subject: Remove GOT retries on state poll
hash: ae337a1dd7743b0ee0a05c32a5ce01965c5bafef
body: |
The state poll already has retry implementation, making the GOT default
unnecessary.
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
- subject: Ensure poll socket timeout is defined early
hash: bdbc6a4ba4766f9466891497bc02bd33aff1d4c7
body: |
We have observed that even when setting the socket timeout on the
state poll https request, the timeout is only applied once the socket is
connected. This causes issues with Node's auto family selection (happy
eyeballs), as the default https timeout is 5s which means that larger
[auto select attempt timeout](https://nodejs.org/docs/latest-v22.x/api/net.html#netgetdefaultautoselectfamilyattempttimeout) may result in the socket timing out before all connection attempts have been tried.
This commit sets a different https Agent for state polling, with a
timeout matching the `apiRequestTimeout` used for other request events.
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
version: 16.12.8
title: ""
date: 2025-03-12T14:50:33.204Z
- commits:
- subject: Release locks when removing apps
hash: 026dc0aed29ce7d66cfdd8616d80d1f5daf3ad46
body: |
This prevents leftover locks that can prevent other operations from
taking place.
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
version: 16.12.7
title: ""
date: 2025-03-06T19:11:18.704Z
- commits:
- subject: Log non-API errors during state poll
hash: 6d00be20930398699da1006176dac1e81b2dbbd6
body: >
The supervisor was failing silently if an error happened while
establishing the
connection (e.g. requesting the socket).
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
- subject: Fix target poll healthcheck
hash: f8bdb1433508dcaeff12a78d746256041ba1c414
body: |
The Target.lastFetch time compared when performing the healthcheck
resets any time a poll is attempted no matter the outcome. This changes
the behavior so the time is reset only on a successful poll
footer:
Change-type: patch
change-type: patch
author: Felipe Lalanne
nested: []
version: 16.12.6
title: ""
date: 2025-03-04T14:25:06.565Z
- commits:
- subject: Decrease balenaCloud api request timeout from 15m to 59s
hash: 49163e92a013250f72ca7231e11945b465c4dd45
body: |
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value
footer:
Change-type: patch
change-type: patch
author: Pagan Gazzard
nested: []
version: 16.12.5
title: ""
date: 2025-03-04T13:35:26.801Z
- commits:
- subject: Don't revert to regular pull if delta server 401
hash: 2dc9d275b15a0802264bcd49e2f0dddbbadd2225
body: |
If the Supervisor receives a 401 Unauthorized from the delta server
when requesting a delta image location, we should surface the error
instead of falling back to a regular pull immediately, as there could
be an issue with the delta auth token, which refreshes after
DELTA_TOKEN_TIMEOUT (10min), or some other edge case.
footer:
Change-type: patch
change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
signed-off-by: Christina Ying Wang <christina@balena.io>
author: Christina Ying Wang
nested: []
version: 16.12.4
title: ""
date: 2025-03-03T13:42:18.045Z
- commits:
- subject: Retry DELTA_APPLY_RETRY_COUNT (3) times during delta apply fail before
reverting to regular pull

View File

@ -4,6 +4,53 @@ All notable changes to this project will be documented in this file
automatically by Versionist. DO NOT EDIT THIS FILE MANUALLY!
This project adheres to [Semantic Versioning](http://semver.org/).
# v17.0.2
## (2025-04-02)
* Fix search for app leftover locks [Felipe Lalanne]
# v17.0.1
## (2025-03-25)
* Clarify firewall docs on behavior with host network containers [Christina Ying Wang]
# v17.0.0
## (2025-03-24)
* Add Docker network label if custom ipam config [Christina Ying Wang]
# v16.12.9
## (2025-03-20)
* Start a dependent if all dependencies are started [Felipe Lalanne]
# v16.12.8
## (2025-03-12)
* Remove GOT retries on state poll [Felipe Lalanne]
* Ensure poll socket timeout is defined early [Felipe Lalanne]
# v16.12.7
## (2025-03-06)
* Release locks when removing apps [Felipe Lalanne]
# v16.12.6
## (2025-03-04)
* Log non-API errors during state poll [Felipe Lalanne]
* Fix target poll healthcheck [Felipe Lalanne]
# v16.12.5
## (2025-03-04)
* Decrease balenaCloud api request timeout from 15m to 59s [Pagan Gazzard]
# v16.12.4
## (2025-03-03)
* Don't revert to regular pull if delta server 401 [Christina Ying Wang]
# v16.12.3
## (2025-02-19)

View File

@ -1 +1 @@
16.12.3
17.0.2

View File

@ -2,6 +2,6 @@ name: balena-supervisor
description: 'Balena Supervisor: balena''s agent on devices.'
joinable: false
type: sw.application
version: 16.12.3
version: 17.0.2
provides:
- slug: sw.compose.long-volume-syntax

View File

@ -8,10 +8,10 @@ To switch between firewall modes, the `HOST_FIREWALL_MODE` (with `BALENA_` or le
> [!NOTE] Configuration variables defined in the dashboard will not apply to devices in local mode.
| Mode | Description |
| ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| on | Only traffic for core services provided by balena and containers on the host network are allowed. |
| off | All network traffic is allowed. |
| Mode | Description |
| ---- | ----------- |
| on | Only traffic for core services provided by balena are allowed. Any other ports, including those used by containers with host networking, are blocked unless explicitly configured. |
| off | All network traffic is allowed. |
| auto | If there _are_ host network services, behaves as if `FIREWALL_MODE` = `on`. If there _aren't_ host network services, behaves as if `FIREWALL_MODE` = `off`. |
## Issues

31
package-lock.json generated
View File

@ -1,12 +1,12 @@
{
"name": "balena-supervisor",
"version": "16.12.3",
"version": "17.0.2",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "balena-supervisor",
"version": "16.12.3",
"version": "17.0.2",
"license": "Apache-2.0",
"dependencies": {
"@balena/systemd": "^0.5.0",
@ -64,7 +64,7 @@
"express": "^4.21.2",
"fork-ts-checker-webpack-plugin": "^9.0.2",
"fp-ts": "^2.16.5",
"got": "14.4.1",
"got": "^14.4.6",
"husky": "^9.1.7",
"io-ts": "2.2.20",
"io-ts-reporters": "^2.0.1",
@ -1225,13 +1225,13 @@
"license": "MIT"
},
"node_modules/@sindresorhus/is": {
"version": "6.3.1",
"resolved": "https://registry.npmjs.org/@sindresorhus/is/-/is-6.3.1.tgz",
"integrity": "sha512-FX4MfcifwJyFOI2lPoX7PQxCqx8BG1HCho7WdiXwpEQx1Ycij0JxkfYtGK7yqNScrZGSlt6RE6sw8QYoH7eKnQ==",
"version": "7.0.1",
"resolved": "https://registry.npmjs.org/@sindresorhus/is/-/is-7.0.1.tgz",
"integrity": "sha512-QWLl2P+rsCJeofkDNIT3WFmb6NrRud1SUYW8dIhXK/46XFV8Q/g7Bsvib0Askb0reRLe+WYPeeE+l5cH7SlkuQ==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=16"
"node": ">=18"
},
"funding": {
"url": "https://github.com/sindresorhus/is?sponsor=1"
@ -7054,24 +7054,23 @@
}
},
"node_modules/got": {
"version": "14.4.1",
"resolved": "https://registry.npmjs.org/got/-/got-14.4.1.tgz",
"integrity": "sha512-IvDJbJBUeexX74xNQuMIVgCRRuNOm5wuK+OC3Dc2pnSoh1AOmgc7JVj7WC+cJ4u0aPcO9KZ2frTXcqK4W/5qTQ==",
"version": "14.4.6",
"resolved": "https://registry.npmjs.org/got/-/got-14.4.6.tgz",
"integrity": "sha512-rnhwfM/PhMNJ1i17k3DuDqgj0cKx3IHxBKVv/WX1uDKqrhi2Gv3l7rhPThR/Cc6uU++dD97W9c8Y0qyw9x0jag==",
"dev": true,
"license": "MIT",
"dependencies": {
"@sindresorhus/is": "^6.3.1",
"@sindresorhus/is": "^7.0.1",
"@szmarczak/http-timer": "^5.0.1",
"cacheable-lookup": "^7.0.0",
"cacheable-request": "^12.0.1",
"decompress-response": "^6.0.0",
"form-data-encoder": "^4.0.2",
"get-stream": "^8.0.1",
"http2-wrapper": "^2.2.1",
"lowercase-keys": "^3.0.0",
"p-cancelable": "^4.0.1",
"responselike": "^3.0.0",
"type-fest": "^4.19.0"
"type-fest": "^4.26.1"
},
"engines": {
"node": ">=20"
@ -7110,9 +7109,9 @@
}
},
"node_modules/got/node_modules/type-fest": {
"version": "4.20.0",
"resolved": "https://registry.npmjs.org/type-fest/-/type-fest-4.20.0.tgz",
"integrity": "sha512-MBh+PHUHHisjXf4tlx0CFWoMdjx8zCMLJHOjnV1prABYZFHqtFOyauCIK2/7w4oIfwkF8iNhLtnJEfVY2vn3iw==",
"version": "4.35.0",
"resolved": "https://registry.npmjs.org/type-fest/-/type-fest-4.35.0.tgz",
"integrity": "sha512-2/AwEFQDFEy30iOLjrvHDIH7e4HEWH+f1Yl1bI5XMqzuoCUqwYCdxachgsgv0og/JdVZUhbfjcJAoHj5L1753A==",
"dev": true,
"license": "(MIT OR CC0-1.0)",
"engines": {

View File

@ -1,7 +1,7 @@
{
"name": "balena-supervisor",
"description": "This is balena's Supervisor, a program that runs on IoT devices and has the task of running user Apps (which are Docker containers), and updating them as the balena API informs it to.",
"version": "16.12.3",
"version": "17.0.2",
"license": "Apache-2.0",
"repository": {
"type": "git",
@ -90,7 +90,7 @@
"express": "^4.21.2",
"fork-ts-checker-webpack-plugin": "^9.0.2",
"fp-ts": "^2.16.5",
"got": "14.4.1",
"got": "^14.4.6",
"husky": "^9.1.7",
"io-ts": "2.2.20",
"io-ts-reporters": "^2.0.1",
@ -137,6 +137,6 @@
"yargs": "^17.7.2"
},
"versionist": {
"publishedAt": "2025-02-19T20:51:53.619Z"
"publishedAt": "2025-04-02T20:16:10.284Z"
}
}

View File

@ -63,7 +63,7 @@ export async function healthcheck() {
}
// Check last time target state has been polled
const timeSinceLastFetch = process.hrtime(TargetState.lastFetch);
const timeSinceLastFetch = process.hrtime(TargetState.lastSuccessfulFetch);
const timeSinceLastFetchMs =
timeSinceLastFetch[0] * 1000 + timeSinceLastFetch[1] / 1e6;

View File

@ -3,6 +3,7 @@ import url from 'url';
import { setTimeout } from 'timers/promises';
import Bluebird from 'bluebird';
import type StrictEventEmitter from 'strict-event-emitter-types';
import { Agent } from 'https';
import type { TargetState } from '../types/state';
import { InternalInconsistencyError } from '../lib/errors';
@ -87,7 +88,8 @@ const emitTargetState = (
* We set a value rather then being undeclared because having it undefined
* adds more overhead to dealing with this value without any benefits.
*/
export let lastFetch: ReturnType<typeof process.hrtime> = process.hrtime();
export let lastSuccessfulFetch: ReturnType<typeof process.hrtime> =
process.hrtime();
/**
* Attempts to update the target state
@ -101,11 +103,11 @@ export const update = async (
): Promise<void> => {
await config.initialized();
return Bluebird.using(lockGetTarget(), async () => {
const { uuid, apiEndpoint, apiTimeout, deviceApiKey } =
const { uuid, apiEndpoint, apiRequestTimeout, deviceApiKey } =
await config.getMany([
'uuid',
'apiEndpoint',
'apiTimeout',
'apiRequestTimeout',
'deviceApiKey',
]);
@ -119,6 +121,13 @@ export const update = async (
const got = await getGotInstance();
const { statusCode, headers, body } = await got(endpoint, {
retry: { limit: 0 },
agent: {
https: new Agent({
keepAlive: true,
timeout: apiRequestTimeout,
}),
},
headers: {
Authorization: `Bearer ${deviceApiKey}`,
'If-None-Match': cache?.etag,
@ -126,12 +135,12 @@ export const update = async (
timeout: {
// TODO: We use the same default timeout for all of these in order to have a timeout generally
// but it would probably make sense to tune them individually
lookup: apiTimeout,
connect: apiTimeout,
secureConnect: apiTimeout,
socket: apiTimeout,
send: apiTimeout,
response: apiTimeout,
lookup: apiRequestTimeout,
connect: apiRequestTimeout,
secureConnect: apiRequestTimeout,
socket: apiRequestTimeout,
send: apiRequestTimeout,
response: apiRequestTimeout,
},
});
@ -154,8 +163,6 @@ export const update = async (
// Emit the target state and update the cache
cache.emitted = emitTargetState(cache, force, isFromApi);
}).finally(() => {
lastFetch = process.hrtime();
});
};
@ -188,7 +195,11 @@ const poll = async (
await update();
// Reset fetchErrors because we successfuly updated
fetchErrors = 0;
} catch {
lastSuccessfulFetch = process.hrtime();
} catch (e) {
if (!(e instanceof ApiResponseError)) {
log.error('Target state poll failed', e);
}
// Exponential back off if request fails
pollInterval = Math.min(appUpdatePollInterval, 15000 * 2 ** fetchErrors);
++fetchErrors;

View File

@ -41,14 +41,17 @@ export let stateReportErrors = 0;
type StateReportOpts = {
[key in keyof Pick<
config.ConfigMap<SchemaTypeKey>,
'apiEndpoint' | 'apiTimeout' | 'deviceApiKey' | 'appUpdatePollInterval'
| 'apiEndpoint'
| 'apiRequestTimeout'
| 'deviceApiKey'
| 'appUpdatePollInterval'
>]: SchemaReturn<key>;
};
type StateReport = { body: Partial<DeviceState>; opts: StateReportOpts };
async function report({ body, opts }: StateReport) {
const { apiEndpoint, apiTimeout, deviceApiKey } = opts;
const { apiEndpoint, apiRequestTimeout, deviceApiKey } = opts;
if (!apiEndpoint) {
throw new InternalInconsistencyError(
@ -69,7 +72,7 @@ async function report({ body, opts }: StateReport) {
const [{ statusCode, body: statusMessage, headers }] = await request
.patchAsync(endpoint, params)
.timeout(apiTimeout);
.timeout(apiRequestTimeout);
if (statusCode < 200 || statusCode >= 300) {
throw new StatusError(
@ -203,7 +206,7 @@ export async function startReporting() {
// Get configs needed to make a report
const reportConfigs = (await config.getMany([
'apiEndpoint',
'apiTimeout',
'apiRequestTimeout',
'deviceApiKey',
'appUpdatePollInterval',
])) as StateReportOpts;

View File

@ -247,6 +247,16 @@ class AppImpl implements App {
}
}
// Release locks (if any) for all services before settling state
if (state.lock || state.hasLeftoverLocks) {
return [
generateStep('releaseLock', {
appId: this.appId,
lock: state.lock,
}),
];
}
return [];
}
@ -911,19 +921,24 @@ class AppImpl implements App {
volumePairs: Array<ChangingPair<Volume>>,
servicePairs: Array<ChangingPair<Service>>,
): boolean {
// Firstly we check if a dependency is not already running (this is
// Firstly we check if a dependency has already been started (this is
// different to a dependency which is in the servicePairs below, as these
// are services which are changing). We could have a dependency which is
// starting up, but is not yet running.
const depInstallingButNotRunning = _.some(this.services, (svc) => {
const depCreatedButNotStarted = _.some(this.services, (svc) => {
if (target.dependsOn?.includes(svc.serviceName)) {
if (!svc.config.running) {
if (
svc.status === 'Installing' ||
svc.startedAt == null ||
svc.createdAt == null ||
svc.startedAt < svc.createdAt
) {
return true;
}
}
});
if (depInstallingButNotRunning) {
if (depCreatedButNotStarted) {
return false;
}

View File

@ -187,8 +187,12 @@ export async function inferNextSteps(
const currentAppIds = Object.keys(currentApps).map((i) => parseInt(i, 10));
const targetAppIds = Object.keys(targetApps).map((i) => parseInt(i, 10));
const withLeftoverLocks = await Promise.all(
currentAppIds.map((id) => hasLeftoverLocks(id)),
const withLeftoverLocks = Object.fromEntries(
await Promise.all(
currentAppIds.map(
async (id) => [id, await hasLeftoverLocks(id)] as [number, boolean],
),
),
);
const bootTime = getBootTime();

View File

@ -160,6 +160,15 @@ class NetworkImpl implements Network {
configOnly: network.config_only || false,
};
// Add label if there's non-default ipam config
// e.g. explicitly defined subnet or gateway.
// When updating between a release where the ipam config
// changes, this label informs the Supervisor that
// there's an ipam diff that requires recreating the network.
if (net.config.ipam.config.length > 0) {
net.config.labels['io.balena.private.ipam.config'] = 'true';
}
return net;
}

View File

@ -61,6 +61,7 @@ class ServiceImpl implements Service {
public dockerImageId: string | null;
public status: ServiceStatus;
public createdAt: Date | null;
public startedAt: Date | null;
private static configArrayFields: ServiceConfigArrayField[] = [
'volumes',
@ -476,6 +477,7 @@ class ServiceImpl implements Service {
}
svc.createdAt = new Date(container.Created);
svc.startedAt = new Date(container.State.StartedAt);
svc.containerId = container.Id;
svc.exitErrorMessage = container.State.Error;

View File

@ -373,6 +373,7 @@ export interface Service {
// from docker
status: ServiceStatus;
createdAt: Date | null;
startedAt: Date | null;
hasNetwork(networkName: string): boolean;
hasVolume(volumeName: string): boolean;

View File

@ -90,7 +90,7 @@ export const fnSchema = {
'deviceArch',
'deviceType',
'apiEndpoint',
'apiTimeout',
'apiRequestTimeout',
'registered_at',
'deviceId',
'version',
@ -107,7 +107,7 @@ export const fnSchema = {
provisioningApiKey: conf.apiKey,
deviceApiKey: conf.deviceApiKey,
apiEndpoint: conf.apiEndpoint,
apiTimeout: conf.apiTimeout,
apiRequestTimeout: conf.apiRequestTimeout,
registered_at: conf.registered_at,
deviceId: conf.deviceId,
supervisorVersion: conf.version,

View File

@ -12,6 +12,9 @@ export const schemaTypes = {
type: t.string,
default: '',
},
/**
* The timeout for the supervisor's api
*/
apiTimeout: {
type: PermissiveNumber,
default: 15 * 60 * 1000,
@ -118,6 +121,13 @@ export const schemaTypes = {
type: PermissiveBoolean,
default: false,
},
/**
* The timeout for requests to the balenaCloud api
*/
apiRequestTimeout: {
type: PermissiveNumber,
default: 59000,
},
deltaRequestTimeout: {
type: PermissiveNumber,
default: 59000,
@ -218,7 +228,7 @@ export const schemaTypes = {
provisioningApiKey: t.union([t.string, NullOrUndefined]),
deviceApiKey: t.string,
apiEndpoint: t.string,
apiTimeout: PermissiveNumber,
apiRequestTimeout: PermissiveNumber,
registered_at: t.union([PermissiveNumber, NullOrUndefined]),
deviceId: t.union([PermissiveNumber, NullOrUndefined]),
supervisorVersion: t.union([t.string, t.undefined]),

View File

@ -4,6 +4,9 @@ export const schema = {
mutable: false,
removeIfNull: false,
},
/**
* The timeout for the supervisor's api
*/
apiTimeout: {
source: 'config.json',
mutable: false,
@ -120,6 +123,11 @@ export const schema = {
mutable: true,
removeIfNull: false,
},
apiRequestTimeout: {
source: 'db',
mutable: true,
removeIfNull: false,
},
delta: {
source: 'db',
mutable: true,

View File

@ -141,6 +141,11 @@ const configKeys: Dictionary<ConfigOption> = {
varType: 'bool',
defaultValue: 'true',
},
apiRequestTimeout: {
envVarName: 'SUPERVISOR_API_REQUEST_TIMEOUT',
varType: 'int',
defaultValue: '59000',
},
delta: {
envVarName: 'SUPERVISOR_DELTA',
varType: 'bool',

View File

@ -111,10 +111,10 @@ export const exchangeKeyAndGetDevice = async (
opts: Partial<KeyExchangeOpts>,
): Promise<Device> => {
const uuid = opts.uuid;
const apiTimeout = opts.apiTimeout;
if (!(uuid && apiTimeout)) {
const apiRequestTimeout = opts.apiRequestTimeout;
if (!(uuid && apiRequestTimeout)) {
throw new InternalInconsistencyError(
'UUID and apiTimeout should be defined in exchangeKeyAndGetDevice',
'UUID and apiRequestTimeout should be defined in exchangeKeyAndGetDevice',
);
}
@ -122,7 +122,12 @@ export const exchangeKeyAndGetDevice = async (
// valid, because if it is then we can just use that
if (opts.deviceApiKey != null) {
try {
return await fetchDevice(balenaApi, uuid, opts.deviceApiKey, apiTimeout);
return await fetchDevice(
balenaApi,
uuid,
opts.deviceApiKey,
apiRequestTimeout,
);
} catch (e) {
if (e instanceof DeviceNotFoundError) {
// do nothing...
@ -146,7 +151,7 @@ export const exchangeKeyAndGetDevice = async (
balenaApi,
uuid,
opts.provisioningApiKey,
apiTimeout,
apiRequestTimeout,
);
} catch {
throw new ExchangeKeyError(`Couldn't fetch device with provisioning key`);
@ -165,7 +170,7 @@ export const exchangeKeyAndGetDevice = async (
Authorization: `Bearer ${opts.provisioningApiKey}`,
},
})
.timeout(apiTimeout);
.timeout(apiRequestTimeout);
if (res.statusCode !== 200) {
throw new ExchangeKeyError(
@ -220,7 +225,7 @@ export const provision = async (
osVariant: opts.osVariant,
macAddress: opts.macAddress,
}),
).timeout(opts.apiTimeout);
).timeout(opts.apiRequestTimeout);
} catch (err) {
if (
err instanceof deviceRegister.ApiError &&

View File

@ -219,9 +219,17 @@ export async function fetchDeltaWithProgress(
}
break;
case 3:
// If 400s status code, throw a more specific error & revert immediately to a regular pull
// If 400s status code, throw a more specific error & revert immediately to a regular pull,
// unless the code is 401 Unauthorized, in which case we should surface the error by retrying
// the delta server request, instead of falling back to a regular pull immediately.
if (res.statusCode >= 400 && res.statusCode < 500) {
throw new DeltaServerError(res.statusCode, res.statusMessage);
if (res.statusCode === 401) {
throw new Error(
`Got ${res.statusCode} when requesting an image from delta server: ${res.statusMessage}`,
);
} else {
throw new DeltaServerError(res.statusCode, res.statusMessage);
}
}
if (res.statusCode !== 200) {
throw new Error(

View File

@ -1128,11 +1128,20 @@ describe('compose/application-manager', () => {
const { currentApps, availableImages, downloading, containerIdsByAppId } =
createCurrentState({
services: [
await createService({
image: 'dep-image',
serviceName: 'dep',
commit: 'new-release',
}),
await createService(
{
image: 'dep-image',
serviceName: 'dep',
commit: 'new-release',
},
{
state: {
createdAt: new Date(Date.now() - 5 * 1000),
// Container was started 5 after creation
startedAt: new Date(),
},
},
),
],
networks: [DEFAULT_NETWORK],
images: [

View File

@ -67,6 +67,8 @@ describe('compose/network: integration tests', () => {
Labels: {
'io.balena.supervised': 'true',
'io.balena.app-id': '12345',
// This label should be present as we've defined a custom ipam config
'io.balena.private.ipam.config': 'true',
},
Options: {},
ConfigOnly: false,

View File

@ -84,6 +84,7 @@ describe('device-config', () => {
SUPERVISOR_LOCAL_MODE: 'false',
SUPERVISOR_CONNECTIVITY_CHECK: 'true',
SUPERVISOR_LOG_CONTROL: 'true',
SUPERVISOR_API_REQUEST_TIMEOUT: '59000',
SUPERVISOR_DELTA: 'false',
SUPERVISOR_DELTA_REQUEST_TIMEOUT: '59000',
SUPERVISOR_DELTA_APPLY_TIMEOUT: '0',

View File

@ -260,7 +260,33 @@ describe('state engine', () => {
});
});
it('updates an app with two services with a network change', async () => {
it('updates an app with two services with a network change where the only change is a custom ipam config addition', async () => {
const services = {
'1': {
image: 'alpine:latest',
imageId: 11,
serviceName: 'one',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
'2': {
image: 'alpine:latest',
imageId: 12,
serviceName: 'two',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
};
await setTargetState({
config: {},
apps: {
@ -268,30 +294,10 @@ describe('state engine', () => {
name: 'test-app',
commit: 'deadbeef',
releaseId: 1,
services: {
'1': {
image: 'alpine:latest',
imageId: 11,
serviceName: 'one',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
labels: {},
environment: {},
},
'2': {
image: 'alpine:latest',
imageId: 12,
serviceName: 'two',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
labels: {},
environment: {},
},
services,
networks: {
default: {},
},
networks: {},
volumes: {},
},
},
@ -311,6 +317,21 @@ describe('state engine', () => {
]);
const containerIds = containers.map(({ Id }) => Id);
// Network should not have custom ipam config
const defaultNet = await docker.getNetwork('123_default').inspect();
expect(defaultNet)
.to.have.property('IPAM')
.to.not.deep.equal({
Config: [{ Gateway: '192.168.91.1', Subnet: '192.168.91.0/24' }],
Driver: 'default',
Options: {},
});
// Network should not have custom ipam label
expect(defaultNet)
.to.have.property('Labels')
.to.not.have.property('io.balena.private.ipam.config');
await setTargetState({
config: {},
apps: {
@ -318,32 +339,7 @@ describe('state engine', () => {
name: 'test-app',
commit: 'deadca1f',
releaseId: 2,
services: {
'1': {
image: 'alpine:latest',
imageId: 21,
serviceName: 'one',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
'2': {
image: 'alpine:latest',
imageId: 22,
serviceName: 'two',
restart: 'unless-stopped',
running: true,
command: 'sh -c "echo two && sleep infinity"',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
},
services,
networks: {
default: {
driver: 'bridge',
@ -364,8 +360,8 @@ describe('state engine', () => {
expect(
updatedContainers.map(({ Names, State }) => ({ Name: Names[0], State })),
).to.have.deep.members([
{ Name: '/one_21_2_deadca1f', State: 'running' },
{ Name: '/two_22_2_deadca1f', State: 'running' },
{ Name: '/one_11_2_deadca1f', State: 'running' },
{ Name: '/two_12_2_deadca1f', State: 'running' },
]);
// Container ids must have changed
@ -373,13 +369,145 @@ describe('state engine', () => {
containerIds,
);
expect(await docker.getNetwork('123_default').inspect())
// Network should have custom ipam config
const customNet = await docker.getNetwork('123_default').inspect();
expect(customNet)
.to.have.property('IPAM')
.to.deep.equal({
Config: [{ Gateway: '192.168.91.1', Subnet: '192.168.91.0/24' }],
Driver: 'default',
Options: {},
});
// Network should have custom ipam label
expect(customNet)
.to.have.property('Labels')
.to.have.property('io.balena.private.ipam.config');
});
it('updates an app with two services with a network change where the only change is a custom ipam config removal', async () => {
const services = {
'1': {
image: 'alpine:latest',
imageId: 11,
serviceName: 'one',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
'2': {
image: 'alpine:latest',
imageId: 12,
serviceName: 'two',
restart: 'unless-stopped',
running: true,
command: 'sleep infinity',
stop_signal: 'SIGKILL',
networks: ['default'],
labels: {},
environment: {},
},
};
await setTargetState({
config: {},
apps: {
'123': {
name: 'test-app',
commit: 'deadbeef',
releaseId: 1,
services,
networks: {
default: {
driver: 'bridge',
ipam: {
config: [
{ gateway: '192.168.91.1', subnet: '192.168.91.0/24' },
],
driver: 'default',
},
},
},
volumes: {},
},
},
});
const state = await getCurrentState();
expect(
state.apps['123'].services.map((s: any) => s.serviceName),
).to.deep.equal(['one', 'two']);
// Network should have custom ipam config
const customNet = await docker.getNetwork('123_default').inspect();
expect(customNet)
.to.have.property('IPAM')
.to.deep.equal({
Config: [{ Gateway: '192.168.91.1', Subnet: '192.168.91.0/24' }],
Driver: 'default',
Options: {},
});
// Network should have custom ipam label
expect(customNet)
.to.have.property('Labels')
.to.have.property('io.balena.private.ipam.config');
const containers = await docker.listContainers();
expect(
containers.map(({ Names, State }) => ({ Name: Names[0], State })),
).to.have.deep.members([
{ Name: '/one_11_1_deadbeef', State: 'running' },
{ Name: '/two_12_1_deadbeef', State: 'running' },
]);
const containerIds = containers.map(({ Id }) => Id);
await setTargetState({
config: {},
apps: {
'123': {
name: 'test-app',
commit: 'deadca1f',
releaseId: 2,
services,
networks: {
default: {},
},
volumes: {},
},
},
});
const updatedContainers = await docker.listContainers();
expect(
updatedContainers.map(({ Names, State }) => ({ Name: Names[0], State })),
).to.have.deep.members([
{ Name: '/one_11_2_deadca1f', State: 'running' },
{ Name: '/two_12_2_deadca1f', State: 'running' },
]);
// Container ids must have changed
expect(updatedContainers.map(({ Id }) => Id)).to.not.have.members(
containerIds,
);
// Network should not have custom ipam config
const defaultNet = await docker.getNetwork('123_default').inspect();
expect(defaultNet)
.to.have.property('IPAM')
.to.not.deep.equal({
Config: [{ Gateway: '192.168.91.1', Subnet: '192.168.91.0/24' }],
Driver: 'default',
Options: {},
});
// Network should not have custom ipam label
expect(defaultNet)
.to.have.property('Labels')
.to.not.have.property('io.balena.private.ipam.config');
});
it('updates an app with two services with a network removal', async () => {

View File

@ -335,7 +335,7 @@ describe('ApiBinder', () => {
before(async () => {
await initModels(components, '/config-apibinder.json');
previousLastFetch = TargetState.lastFetch;
previousLastFetch = TargetState.lastSuccessfulFetch;
});
after(async () => {

View File

@ -1458,7 +1458,14 @@ describe('compose/app', () => {
services: [
await createService(
{ appId: 1, serviceName: 'dep' },
{ state: { containerId: 'dep-id' } },
{
state: {
containerId: 'dep-id',
createdAt: new Date(Date.now() - 5 * 1000),
// Container was started 5 after creation
startedAt: new Date(),
},
},
),
],
networks: [DEFAULT_NETWORK],
@ -1475,7 +1482,7 @@ describe('compose/app', () => {
.that.deep.includes({ serviceName: 'main' });
});
it('should not start a container when it depends on a service that is not running', async () => {
it('should not start a container when it depends on a service that has not been started yet', async () => {
const current = createApp({
services: [
await createService(
@ -1535,7 +1542,14 @@ describe('compose/app', () => {
services: [
await createService(
{ appId: 1, serviceName: 'dep' },
{ state: { containerId: 'dep-id' } },
{
state: {
containerId: 'dep-id',
createdAt: new Date(Date.now() - 5 * 1000),
// Container was started 5 after creation
startedAt: new Date(),
},
},
),
],
networks: [DEFAULT_NETWORK],
@ -2399,5 +2413,19 @@ describe('compose/app', () => {
const [releaseLockStep] = expectSteps('releaseLock', steps, 1);
expect(releaseLockStep).to.have.property('appId').that.equals(1);
});
it('should infer a releaseLock step when removing an app', async () => {
const current = createApp({
services: [],
networks: [],
});
const steps = current.stepsToRemoveApp({
...defaultContext,
lock: mockLock,
});
const [releaseLockStep] = expectSteps('releaseLock', steps, 1);
expect(releaseLockStep).to.have.property('appId').that.equals(1);
});
});
});

View File

@ -183,6 +183,8 @@ describe('compose/network', () => {
'io.balena.supervised': 'true',
'io.balena.app-id': '12345',
'com.docker.some-label': 'yes',
// This label should be present as we've defined a custom ipam config
'io.balena.private.ipam.config': 'true',
});
expect(dockerConfig.Options).to.deep.equal({
@ -344,12 +346,14 @@ describe('compose/network', () => {
'io.resin.features.something': '123',
'io.balena.features.dummy': 'abc',
'io.balena.supervised': 'true',
'io.balena.private.ipam.config': 'true',
} as NetworkInspectInfo['Labels'],
} as NetworkInspectInfo);
expect(network.config.labels).to.deep.equal({
'io.balena.features.something': '123',
'io.balena.features.dummy': 'abc',
'io.balena.private.ipam.config': 'true',
});
});
});
@ -425,34 +429,32 @@ describe('compose/network', () => {
});
describe('comparing network configurations', () => {
it('ignores IPAM configuration', () => {
const network = Network.fromComposeObject('default', 12345, 'deadbeef', {
ipam: {
driver: 'default',
config: [
{
subnet: '172.20.0.0/16',
ip_range: '172.20.10.0/24',
gateway: '172.20.0.1',
},
],
options: {},
it('distinguishes a network with custom ipam config from a network without', () => {
const customIpam = Network.fromComposeObject(
'default',
12345,
'deadbeef',
{
ipam: {
driver: 'default',
config: [
{
subnet: '172.20.0.0/16',
gateway: '172.20.0.1',
},
],
options: {},
},
},
});
expect(
network.isEqualConfig(
Network.fromComposeObject('default', 12345, 'deadbeef', {}),
),
).to.be.true;
);
const noCustomIpam = Network.fromComposeObject(
'default',
12345,
'deadbeef',
{},
);
// Only ignores ipam.config, not other ipam elements
expect(
network.isEqualConfig(
Network.fromComposeObject('default', 12345, 'deadbeef', {
ipam: { driver: 'aaa' },
}),
),
).to.be.false;
expect(customIpam.isEqualConfig(noCustomIpam)).to.be.false;
});
it('compares configurations recursively', () => {