4977 Commits

Author SHA1 Message Date
14d4b13bbf v17.1.1 v17.1.1 2025-06-09 13:58:37 +00:00
696902a08a Merge pull request #2421 from balena-os/remove-memory-healthcheck
Remove memory healthcheck
2025-06-09 13:57:52 +00:00
8ffdba7d18 Remove memory healthcheck
Supervisor has had memory leaks removed since v16.5.1, with latest tested
version being v16.7.1. Furthermore, on recent reported instances of memory healthcheck
triggering on support, we've snapshotted the heap before & after on devices multiple times
without finding any evidence of memory leaks in the snapshots.

Therefore, it's hypothesized that the heuristic for determining starting memory may
be flawed in that it's not waiting long enough after system startup, or it may
run right after garbage collection has happened. Because of the variability and
difficulty of ascertaining these factors, we suspect an inaccurate memory
baseline may be the cause of the instances of false positives on support.

See: https://balena.zulipchat.com/#narrow/channel/403752-channel.2Fsupport-help/topic/supervisor.20memory.20usage.20above.20threadhold/near/520640885
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-06-06 11:44:15 -07:00
f90d72231d v17.1.0 v17.1.0 2025-05-28 15:42:43 +00:00
c33ea5e474 Merge pull request #2417 from balena-os/target-state-cancellation
Support target state apply cancellation
2025-05-28 15:41:54 +00:00
8c69166271 Support target state apply cancellation
The current target state apply is cancelled when either:
- /v1/update is called with cancel: true
- A different target state is received from the cloud (with a non-304 status)

Following apply cancellation, a target state apply is re-triggered. This ensures
that the user can force a device out of a dead-locked situation where a long-running
task such as an image fetch fails to cede control back to the Supervisor, which is
the behavior observed in an Engine bug with infinite pull retries with a bad network.

Change-type: minor
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-05-28 07:46:10 -07:00
0af915d815 Pass AbortSignal to image pull functions
When abortController.abort() is called, this signal is passed down
to the functions that interface with Docker Engine for image pulls,
cancelling those pulls.

The next commit will limit when abortController.abort() is called.

Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-05-27 11:09:19 -07:00
4558b5a20b Bump docker-progress to v5.3.1
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-05-27 11:09:09 -07:00
1d34d90b2d v17.0.5 v17.0.5 2025-05-27 15:32:23 +00:00
d3801b2092 Merge pull request #2420 from balena-os/adjust_used_memory
Exclude reclaimable slab memory from used memory metric
2025-05-27 11:31:40 -04:00
49e91a2639 Exclude reclaimable slab memory from used memory metric
Aligns metric value with used memory reported by the free and htop
utilities.

Change-type: patch
2025-05-25 11:51:06 -04:00
f3aa2a2c90 v17.0.4 v17.0.4 2025-05-22 18:47:30 +00:00
58bd22511e Merge pull request #2418 from balena-os/container-contracts-ignore-extra
Remove unsupported fields from contract requirements
2025-05-22 18:46:44 +00:00
4318272844 Remove unsupported fields from contract requirements
A contract including extra requirement fields, such as "name" would fail
validation. This PR removes any extra fields from the validated contract
to prevent services with these extra fields from getting rejected by the
contract validation.

Change-type: patch
2025-05-15 17:38:03 -04:00
aecefd400a v17.0.3 v17.0.3 2025-05-13 14:50:16 +00:00
2cb201f808 Merge pull request #2416 from balena-os/container-contracts-refactor
Simplify contract validation module
2025-05-13 14:49:06 +00:00
7c83eaef80 Simplify contract validation module
Use `satisfiesChildContract` instead of Blueprints as the previous
implementation did.

Change-type: patch
2025-05-08 19:33:12 -04:00
01585c688e v17.0.2 v17.0.2 2025-04-02 20:16:11 +00:00
eeac56efc3 Merge pull request #2415 from balena-os/local-leftover-locks
Fix search for app leftover locks
2025-04-02 20:15:12 +00:00
d475b1d830 Fix search for app leftover locks
The leftover locks search was creating an array rather than an object
keyed by the appId. This could affect the lock cleanup and make leftover
locks from one app affect the install of the app in local mode.

Change-type: patch
2025-04-01 17:56:06 -03:00
49b18b4a37 v17.0.1 v17.0.1 2025-03-25 20:41:22 +00:00
623a1638c1 Merge pull request #2413 from balena-os/clarify-firewall-docs-on-host-network-containers
Clarify firewall docs on behavior with host network containers
2025-03-25 13:40:28 -07:00
caed4dcca0 Clarify firewall docs on behavior with host network containers
Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-03-25 13:10:52 -07:00
7efdeea0f7 v17.0.0 v17.0.0 2025-03-24 22:18:11 +00:00
2d1871e16d Merge pull request #2400 from balena-os/network-custom-ipam-label
Add Docker network label if custom ipam config
2025-03-24 22:17:12 +00:00
b596c77ce2 Add Docker network label if custom ipam config
In a target release where the only change is the addition or removal
of a custom ipam config, the Supervisor does not recreate the network
due to ignoring ipam config differences when comparing current and target
network (in network.isEqualConfig). This commit implements the addition of
a network label if the target compose object includes a network with custom
ipam. With the label, the Supervisor will detect a difference between a
network with a custom ipam and a network without, without needing to compare
the ipam configs themselves.

This is a major change, as devices running networks with custom ipam configs
will have their networks recreated to add the network label.

Closes: #2251
Change-type: major
See: https://balena.fibery.io/Work/Project/Fix-Supervisor-not-recreating-network-when-passed-custom-ipam-config-1127
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-03-24 14:55:19 -07:00
8c6e3df7d9 v16.12.9 v16.12.9 2025-03-20 18:43:08 +00:00
94cdd3fcd7 Merge pull request #2411 from balena-os/service-dependencies
Start a dependent if all dependencies are started
2025-03-20 18:42:13 +00:00
7764f98c9d Start a dependent if all dependencies are started
The previous behavior required that dependencies were running beefore
starting the dependent service. This made it that services dependent on
a one-shot service would not get started and goes against the default
docker behavior.

Depending on a service to be running will require the implementation of
[long syntax depends_on](https://docs.docker.com/reference/compose-file/services/#long-syntax-1) and the condition
`service_healthy`.

Change-type: patch
Closes: #2409
2025-03-20 14:51:32 -03:00
b8032edc04 v16.12.8 v16.12.8 2025-03-12 14:50:35 +00:00
175872b358 Merge pull request #2408 from balena-os/fix-socket-timeout
Ensure poll socket timeout is defined early
2025-03-12 14:49:34 +00:00
ae337a1dd7 Remove GOT retries on state poll
The state poll already has retry implementation, making the GOT default
unnecessary.

Change-type: patch
2025-03-12 10:59:16 -03:00
bdbc6a4ba4 Ensure poll socket timeout is defined early
We have observed that even when setting the socket timeout on the
state poll https request, the timeout is only applied once the socket is
connected. This causes issues with Node's auto family selection (happy
eyeballs), as the default https timeout is 5s which means that larger
[auto select attempt timeout](https://nodejs.org/docs/latest-v22.x/api/net.html#netgetdefaultautoselectfamilyattempttimeout) may result in the socket timing out before all connection attempts have been tried.

This commit sets a different https Agent for state polling, with a
timeout matching the `apiRequestTimeout` used for other request events.

Change-type: patch
2025-03-12 10:59:11 -03:00
978652b292 v16.12.7 v16.12.7 2025-03-06 19:11:20 +00:00
7771c0e96b Merge pull request #2406 from balena-os/release-locks-on-app-remove
Release locks when removing apps
2025-03-06 19:10:38 +00:00
026dc0aed2 Release locks when removing apps
This prevents leftover locks that can prevent other operations from
taking place.

Change-type: patch
2025-03-06 11:50:31 -03:00
5ef6b054fd v16.12.6 v16.12.6 2025-03-04 14:25:09 +00:00
3cca2b7ecd Merge pull request #2404 from balena-os/polling-improvements
Polling improvements
2025-03-04 14:24:18 +00:00
3d8bd28f5a Update GOT to v14.4.6 2025-03-04 10:46:47 -03:00
6d00be2093 Log non-API errors during state poll
The supervisor was failing silently if an error happened while establishing the
connection (e.g. requesting the socket).

Change-type: patch
2025-03-04 10:46:45 -03:00
f8bdb14335 Fix target poll healthcheck
The Target.lastFetch time compared when performing the healthcheck
resets any time a poll is attempted no matter the outcome. This changes
the behavior so the time is reset only on a successful poll

Change-type: patch
2025-03-04 10:45:31 -03:00
c88cf6a259 v16.12.5 v16.12.5 2025-03-04 13:35:28 +00:00
906ce6dc0d Merge pull request #2405 from balena-os/fix-api-request-timeout
Decrease balenaCloud api request timeout from 15m to 59s
2025-03-04 13:34:35 +00:00
49163e92a0 Decrease balenaCloud api request timeout from 15m to 59s
This was mistakenly increased due to confusion between the timeout for
requests to the supervisor's api vs the timeout for requests from the
supervisor to the balenaCloud api. This separates the two configs and
documents the difference between the timeouts whilst also decreasing
the timeout for balenaCloud api requests to the correct/expected value

Change-type: patch
2025-03-04 12:29:18 +00:00
f67e45f432 v16.12.4 v16.12.4 2025-03-03 13:42:20 +00:00
91335051ac Merge pull request #2403 from balena-os/dont-revert-to-regular-pull-if-401
Don't revert to regular pull if delta server 401
2025-03-03 13:41:29 +00:00
2dc9d275b1 Don't revert to regular pull if delta server 401
If the Supervisor receives a 401 Unauthorized from the delta server
when requesting a delta image location, we should surface the error
instead of falling back to a regular pull immediately, as there could
be an issue with the delta auth token, which refreshes after
DELTA_TOKEN_TIMEOUT (10min), or some other edge case.

Change-type: patch
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-24 16:17:15 -08:00
b6f0ecba18 v16.12.3 v16.12.3 2025-02-19 20:51:55 +00:00
dd0253ff1f Merge pull request #2396 from balena-os/switch-to-image-pull-if-delta-failure
Switch to image pull if delta failure
2025-02-19 20:50:58 +00:00
5936af37e7 Bump docker-progress to 5.2.4
Signed-off-by: Christina Ying Wang <christina@balena.io>
2025-02-12 13:49:09 -08:00