mirror of
https://github.com/balena-os/balena-supervisor.git
synced 2024-12-19 05:37:53 +00:00
Fix grammar and simplify some phrases
Signed-off-by: Christina Ying Wang <christina@balena.io>
This commit is contained in:
parent
a9045d5eda
commit
6d4f7cb32c
@ -1,24 +1,25 @@
|
|||||||
# Working with the Supervisor
|
# Working with the Supervisor
|
||||||
|
|
||||||
Service: `balena-supervisor.service`
|
Service: `balena-supervisor.service`, or `resin-supervisor.service` if OS < v2.78.0
|
||||||
|
|
||||||
The balena Supervisor is the service that carries out the management of the
|
The balena Supervisor is the service that carries out the management of the
|
||||||
software release on a device, including determining when to download updates,
|
software release on a device, including determining when to download updates,
|
||||||
the changing of variables, ensuring services
|
the changing of variables, ensuring services are restarted correctly, etc.
|
||||||
are restarted correctly, etc. It is, in effect, the on-device agent for
|
It is the on-device agent for balenaCloud.
|
||||||
balenaCloud.
|
|
||||||
|
|
||||||
As such, it's imperative that the Supervisor is operational and healthy at all
|
As such, it's imperative that the Supervisor is operational and healthy at all
|
||||||
times, even when a device is not connected via the Internet, as it still
|
times, even when a device is not connected to the Internet, as the Supervisor still
|
||||||
ensures the running of a device that is offline.
|
ensures the running of a device that is offline.
|
||||||
|
|
||||||
The Supervisor itself is a Docker service that runs alongside any installed
|
The Supervisor itself is a Docker service that runs alongside any installed
|
||||||
user services and the healthcheck container (more on that later). One
|
user services and the healthcheck container. One major advantage of running it as
|
||||||
major advantage of running it as a Docker service is that it can be updated
|
a Docker service is that it can be updated just like any other service, although
|
||||||
just like any other service (although actually carrying that out is slightly
|
carrying that out is slightly different than updating user containers. (See [Updating the Supervisor](#82-updating-the-supervisor)).
|
||||||
different to updating user containers, see 'Updating the Supervisor').
|
|
||||||
|
|
||||||
Assuming you're still logged into your development device, run the following:
|
Before attempting to debug the Supervisor, it's recommended to upgrade the Supervisor to
|
||||||
|
the latest version, as we frequently release bugfixes and features that may resolve device issues.
|
||||||
|
|
||||||
|
Otherwise, assuming you're still logged into your development device, run the following:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
root@debug-device:~# systemctl status balena-supervisor
|
root@debug-device:~# systemctl status balena-supervisor
|
||||||
@ -49,24 +50,24 @@ Aug 19 18:09:18 debug-device balena-supervisor[2486]: [info] Reported current
|
|||||||
```
|
```
|
||||||
|
|
||||||
You can see the Supervisor is just another `systemd` service
|
You can see the Supervisor is just another `systemd` service
|
||||||
(`balena-supervisor.service)`, and that it is started and run by balenaEngine.
|
(`balena-supervisor.service`) and that it is started and run by balenaEngine.
|
||||||
|
|
||||||
Supervisor issues, due to their nature, vary quite significantly. It's also
|
Supervisor issues, due to their nature, vary significantly. Issues may commonly
|
||||||
commonly used to misattribute issues to. As the Supervisor is verbose about its
|
be misattributed to the Supervisor. As the Supervisor is verbose about its
|
||||||
state and actions (such as the download of images), it tends to be suspected of
|
state and actions, such as the download of images, it tends to be suspected of
|
||||||
problems when in fact there are usually other underlying issues. A few examples
|
problems when in fact there are usually other underlying issues. A few examples
|
||||||
are:
|
are:
|
||||||
|
|
||||||
- Networking problems - In the case of the Supervisor reporting failed downloads
|
- Networking problems - The Supervisor reports failed downloads
|
||||||
or attempting to retrieve the same images repeatedly (where in fact instable
|
or attempts to retrieve the same images repeatedly, where in fact unstable
|
||||||
networking is usually the cause).
|
networking is usually the cause.
|
||||||
- Service container restarts - The default policy for service containers is to
|
- Service container restarts - The default policy for service containers is to
|
||||||
restart if they exit, and this sometimes is misunderstood. If a container's
|
restart if they exit, and this sometimes is misunderstood. If a container is
|
||||||
restarting, it's worth ensuring it's not because the container itself is
|
restarting, it's worth ensuring it's not because the container itself is
|
||||||
exiting correctly either due to a bug in the service container code or
|
exiting either due to a bug in the service container code or
|
||||||
because it has correctly come to the end of its running process.
|
because it has correctly come to the end of its running process.
|
||||||
- Staged releases - A fleet/device has been pinned to a particular
|
- Release not being downloaded - For instance, a fleet/device has been pinned
|
||||||
version, and a new push is not being downloaded.
|
to a particular version, and a new push is not being downloaded.
|
||||||
|
|
||||||
It's _always_ worth considering how the system is configured, how releases were
|
It's _always_ worth considering how the system is configured, how releases were
|
||||||
produced, how the fleet or device is configured and what the current
|
produced, how the fleet or device is configured and what the current
|
||||||
@ -78,9 +79,9 @@ Another point to note is that the Supervisor is started using
|
|||||||
ensures that the Supervisor is present by using balenaEngine to find the
|
ensures that the Supervisor is present by using balenaEngine to find the
|
||||||
Supervisor image. If the image isn't present, or balenaEngine doesn't respond,
|
Supervisor image. If the image isn't present, or balenaEngine doesn't respond,
|
||||||
then the Supervisor is restarted. The default period for this check is 180
|
then the Supervisor is restarted. The default period for this check is 180
|
||||||
seconds at the time of writing, but inspect the
|
seconds. Inspecting `/lib/systemd/system/balena-supervisor.service` on-device
|
||||||
`/lib/systemd/system/balena-supervisor.service` file on-device to see what
|
will show whether the timeout period is different for a particular device.
|
||||||
it is for the device you're SSHd into. For example, using our example device:
|
For example:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
root@debug-device:~# cat /lib/systemd/system/balena-supervisor.service
|
root@debug-device:~# cat /lib/systemd/system/balena-supervisor.service
|
||||||
@ -127,15 +128,15 @@ Alias=resin-supervisor.service
|
|||||||
|
|
||||||
#### 8.1 Restarting the Supervisor
|
#### 8.1 Restarting the Supervisor
|
||||||
|
|
||||||
It's actually incredibly rare to actually _need_ a Supervisor restart. The
|
It's rare to actually _need_ a Supervisor restart. The Supervisor will attempt to
|
||||||
Supervisor will attempt to recover from issues that occur automatically, without
|
recover from issues that occur automatically, without the requirement for a restart.
|
||||||
the requirement for a restart. If you've got to a point where you believe that
|
When in doubt about whether a restart is required, look at the Supervisor logs and
|
||||||
a restart is required, double check with the other agent on-duty, and if
|
double check other on-duty support agents if needed. If fairly certain, it's generally
|
||||||
required either with the Supervisor maintainer or another knowledgeable engineer
|
safe to restart the Supervisor, as long as the user is aware that some extra bandwidth
|
||||||
before doing so.
|
and device resources will be used on startup.
|
||||||
|
|
||||||
There are instances where the Supervisor is incorrectly restarted when in fact
|
There are instances where the Supervisor is incorrectly restarted when in fact
|
||||||
the issue could be down to corruption of service images, containers, volumes
|
the issue could be the corruption of service images, containers, volumes
|
||||||
or networking. In these cases, you're better off dealing with the underlying
|
or networking. In these cases, you're better off dealing with the underlying
|
||||||
balenaEngine to ensure that anything corrupt is recreated correctly. See the
|
balenaEngine to ensure that anything corrupt is recreated correctly. See the
|
||||||
balenaEngine section for more details.
|
balenaEngine section for more details.
|
||||||
@ -143,8 +144,8 @@ balenaEngine section for more details.
|
|||||||
If a restart is required, ensure that you have gathered as much information
|
If a restart is required, ensure that you have gathered as much information
|
||||||
as possible before a restart, including pertinent logs and symptoms so that
|
as possible before a restart, including pertinent logs and symptoms so that
|
||||||
investigations can occur asynchronously to determine what occurred and how it
|
investigations can occur asynchronously to determine what occurred and how it
|
||||||
may be mitigated in the future. Enabling permanent logging may also be of
|
may be mitigated in the future. Enabling persistent logging may also be beneficial
|
||||||
benefit in cases where symptoms are repeatedly occurring.
|
in cases where symptoms are repeatedly occurring.
|
||||||
|
|
||||||
To restart the Supervisor, simply restart the `systemd` service:
|
To restart the Supervisor, simply restart the `systemd` service:
|
||||||
|
|
||||||
@ -185,7 +186,7 @@ the Supervisor on a device is outdated and is causing an issue. Usually the best
|
|||||||
way to achieve this is via a balenaOS update, either from the dashboard or via
|
way to achieve this is via a balenaOS update, either from the dashboard or via
|
||||||
the command line on the device.
|
the command line on the device.
|
||||||
|
|
||||||
If updating balenaOS is not desirable or a user prefers updating the Supervisor independently, this can easily be accomplished using the [self-service](https://www.balena.io/docs/reference/supervisor/supervisor-upgrades/) Supervisor upgrades. Alternatively, this can be programmatically done by using the Node.js SDK method [device.setSupervisorRelease](https://www.balena.io/docs/reference/sdk/node-sdk/#devicesetsupervisorreleaseuuidorid-supervisorversionorid-%E2%87%92-codepromisecode).
|
If updating balenaOS is not desirable or a user prefers updating the Supervisor independently, this can easily be accomplished using [self-service](https://www.balena.io/docs/reference/supervisor/supervisor-upgrades/) Supervisor upgrades. Alternatively, this can be programmatically done by using the Node.js SDK method [device.setSupervisorRelease](https://www.balena.io/docs/reference/sdk/node-sdk/#devicesetsupervisorreleaseuuidorid-supervisorversionorid-%E2%87%92-codepromisecode).
|
||||||
|
|
||||||
You can additionally write a script to manage this for a fleet of devices in combination with other SDK functions such as [device.getAll](https://www.balena.io/docs/reference/sdk/node-sdk/#devicegetalloptions-%E2%87%92-codepromisecode).
|
You can additionally write a script to manage this for a fleet of devices in combination with other SDK functions such as [device.getAll](https://www.balena.io/docs/reference/sdk/node-sdk/#devicegetalloptions-%E2%87%92-codepromisecode).
|
||||||
|
|
||||||
@ -193,15 +194,15 @@ You can additionally write a script to manage this for a fleet of devices in com
|
|||||||
|
|
||||||
#### 8.3 The Supervisor Database
|
#### 8.3 The Supervisor Database
|
||||||
|
|
||||||
The Supervisor uses a SQLite database to store persistent state (so in the
|
The Supervisor uses a SQLite database to store persistent state, so in the
|
||||||
case of going offline, or a reboot, it knows exactly what state an
|
case of going offline, or a reboot, it knows exactly what state an
|
||||||
app should be in, and which images, containers, volumes and networks
|
app should be in, and which images, containers, volumes and networks
|
||||||
to apply to it).
|
to apply to it.
|
||||||
|
|
||||||
This database is located at
|
This database is located at
|
||||||
`/mnt/data/resin-data/balena-supervisor/database.sqlite` and can be accessed
|
`/mnt/data/resin-data/balena-supervisor/database.sqlite` and can be accessed
|
||||||
inside the Supervisor, most easily by running Node. Assuming you're logged
|
inside the Supervisor container at `/data/database.sqlite` by running Node.
|
||||||
into your device, run the following:
|
Assuming you're logged into your device, run the following:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
root@debug-device:~# balena exec -ti balena_supervisor node
|
root@debug-device:~# balena exec -ti balena_supervisor node
|
||||||
@ -330,7 +331,7 @@ root@debug-device:~# rm /mnt/data/resin-data/balena-supervisor/database.sqlite
|
|||||||
This:
|
This:
|
||||||
|
|
||||||
- Stops the Supervisor (and the timer that will attempt to restart it).
|
- Stops the Supervisor (and the timer that will attempt to restart it).
|
||||||
- Removes all current services containers (including the Supervisor).
|
- Removes all current service containers, including the Supervisor.
|
||||||
- Removes the Supervisor database.
|
- Removes the Supervisor database.
|
||||||
(If for some reason the images also need to be removed, run
|
(If for some reason the images also need to be removed, run
|
||||||
`balena rmi -f $(balena images -q)` which will remove all images _including_
|
`balena rmi -f $(balena images -q)` which will remove all images _including_
|
||||||
@ -344,6 +345,6 @@ root@debug-device:~# systemctl start update-balena-supervisor.timer balena-super
|
|||||||
If you deleted all the images, this will first download the Supervisor image
|
If you deleted all the images, this will first download the Supervisor image
|
||||||
again before restarting it.
|
again before restarting it.
|
||||||
At this point, the Supervisor will start up as if the device has just been
|
At this point, the Supervisor will start up as if the device has just been
|
||||||
provisioned (though it will already be registered), and the release will
|
provisioned and already registered, and the device's target release will
|
||||||
be freshly downloaded (if the images were removed) before starting the service
|
be freshly downloaded if images were removed before starting the service
|
||||||
containers.
|
containers.
|
||||||
|
Loading…
Reference in New Issue
Block a user