mirror of
https://github.com/genodelabs/genode.git
synced 2025-01-10 23:13:01 +00:00
f2485f624b
Provide a link to the new "Taking control over DMA" article and apply a few minor style tweaks.
888 lines
42 KiB
Plaintext
888 lines
42 KiB
Plaintext
|
|
|
|
===============================================
|
|
Release notes for the Genode OS Framework 23.02
|
|
===============================================
|
|
|
|
Genode Labs
|
|
|
|
|
|
|
|
With Genode's February release, almost everything goes
|
|
[https://genode.org/about/road-map - according to plan].
|
|
As envisioned on our road map, it features the first ready-to-install
|
|
system image of Sculpt OS for the PinePhone, which is not merely a re-targeted
|
|
version of the PC version but comes with a novel user interface, a new
|
|
mechanism for rapidly switching between different application scenarios, and
|
|
system-update functionality.
|
|
Section [First system image of mobile Sculpt OS (PinePhone)] gives an
|
|
overview and further links about running Genode on your PinePhone.
|
|
|
|
While enabling substantial application workloads on devices as constrained as
|
|
the PinePhone, we engaged in holistic performance optimizations, ranging from
|
|
kernel scheduling (Section [Base-HW microkernel]), over the framework's VFS
|
|
infrastructure (Section [VFS optimization and simplification]), to the
|
|
interfacing of GPU drivers (Section [GPU performance optimizations]).
|
|
|
|
For stationary ARM-based platforms like the MNT-Reform laptop,
|
|
interactive graphical virtual machines have become available now, which
|
|
brings us close to mirror the experience of the PC version of Sculpt OS on
|
|
such devices (Section [Interactive graphical VMs on ARM]). This development
|
|
is accompanied by several device-driver improvements for NXP's i.MX family.
|
|
|
|
For embedded devices based on Xilinx Zynq, the release introduces custom
|
|
FPGA fabric for implementing DMA protection that is normally not covered by
|
|
Zynq SoCs. This line of work - as outlined in
|
|
Section [Custom IP block for DMA protection on AMD/Xilinx Zynq] - exemplifies
|
|
how well Genode and reconfigurable hardware can go hand in hand.
|
|
|
|
Also, PC platforms got their share of attention, benefiting from the
|
|
new distinction between Intel's P&E cores, or the principle support of
|
|
suspend/resume on both NOVA and Genode's custom base-hw microkernel.
|
|
|
|
When it comes to running applications on top of Genode, the release brings
|
|
good news as well. Our custom Goa tool for streamlining
|
|
application-development work flows received the ability to largely automate
|
|
the porting and packaging of 3rd-party libraries using CMake
|
|
(Section [Build system and tools]).
|
|
|
|
|
|
First system image of mobile Sculpt OS (PinePhone)
|
|
##################################################
|
|
|
|
Just in time for our
|
|
[https://fosdem.org/2023/schedule/event/genode_on_the_pinephone/ - public presentation]
|
|
of Genode on the PinePhone at FOSDEM in the beginning of February,
|
|
we published a first ready-to-use system image:
|
|
|
|
:First system image of mobile Sculpt OS:
|
|
|
|
[https://genodians.org/nfeske/2023-02-01-mobile-sculpt]
|
|
|
|
It features a
|
|
[https://genodians.org/nfeske/2023-01-05-mobile-user-interface - custom user interface],
|
|
voice calls and mobile-data connectivity, on-target software installation and
|
|
system update, device controls (battery, brightness, volume, mic, reset,
|
|
shutdown), and a variety of installable software. Among the installable
|
|
applications, there is the Chromium-based Morph web browser, an OpenGL demo
|
|
using the GPU, tests for the camera and microphone, as well as a light-weight
|
|
Unix-like system shell.
|
|
|
|
The underpinnings of the Genode system image for the PinePhone are nearly
|
|
identical to Sculpt OS on the PC. However, besides the new user interface
|
|
specifically designed for the touch screen of the phone, two noteworthy
|
|
differences set it apart from the regular version of Sculpt OS.
|
|
|
|
[image pinephone_presets]
|
|
|
|
First, the phone variant allows the user to rapidly switch between different
|
|
runtime configurations, called presets. This way, the limited resources of the
|
|
phone can be accounted and fully leveraged for each preset individually, while
|
|
making the system extremely versatile. The loading of a preset can be imagined
|
|
as the boot into a separate operating system, but it takes only a fraction of
|
|
a second. The structure of the running system is made fully transparent to the
|
|
user by the component graph known from Sculpt OS.
|
|
|
|
[image pinephone_scenarios]
|
|
The variety of presets includes the Morph browser, GLMark2, a system shell,
|
|
a simple oscilloscope, and camera test.
|
|
|
|
Second, the system is equipped with an on-target system update mechanism that
|
|
allows the user to install new versions of the system image when they become
|
|
available. System updates are secured by cryptographic signatures. The
|
|
mechanism does not only allow for updating the system but also for the
|
|
rollback to any previously downloaded version. This way, the user can try
|
|
out a new version while being able to fall back to the previous one in the
|
|
case of a regression. This reinforces the end user's ultimate control.
|
|
|
|
[image pinephone_update]
|
|
|
|
|
|
Interactive graphical VMs on ARM
|
|
################################
|
|
|
|
The virtual-machine monitor (VMM) using hardware-assisted virtualization on
|
|
ARM started as a case study eight years ago for Samsung's Exynos 5250 SoC.
|
|
Originally, it supported virtualization of CPU, timer, interrupt-controller,
|
|
and a UART-device only. Since then, it received several extensions like
|
|
support for 64-bit ARMv8 systems, VirtIO devices for network, console, and
|
|
block access. With release 22.11, the VMM's I/O device access, RAM
|
|
consumption, and CPU count have come configurable.
|
|
|
|
With the current release, we further enhance the VMM for ARM devices to
|
|
provide all the means necessary to become a useful virtualization solution for
|
|
interactive scenarios.
|
|
|
|
[image mnt_interactive_debian_vm]
|
|
Sculpt OS running Debian in a virtual machine on the MNT Reform laptop
|
|
|
|
Two additional VirtIO device models are available now: A GPU model and one for
|
|
input. Both models are mapped to Genode's GUI service under the hood. One can
|
|
extend the configuration of the VMM accordingly:
|
|
|
|
! <config ...>
|
|
! <virtio_device name="fb0" type="gpu"/>
|
|
! <virtio_device name="event0" type="input"/>
|
|
! ...
|
|
! </config>
|
|
|
|
For now, only one GPU and one input device can be declared. Both devices get
|
|
mapped to the very same GUI service, according to the service routing of the
|
|
VMM.
|
|
|
|
Caution: the GPU and input model are still in an experimental state, and there
|
|
are known corner cases, e.g., when the graphical window size of the VMM gets
|
|
changed dynamically.
|
|
|
|
Formerly, the VMM always expected an initial RAM file system to be provided as
|
|
ROM dataspace, which got loaded together with the Linux kernel into the VM's
|
|
memory. Now, it is possible to omit the "initrd_rom" configuration option.
|
|
If omitted, no initrd is provided to the Linux guest.
|
|
|
|
|
|
Custom IP block for DMA protection on AMD/Xilinx Zynq
|
|
#####################################################
|
|
|
|
As a continuation of the hardware-software co-design efforts presented in the
|
|
[https://genode.org/documentation/release-notes/22.11#Hardware-software_co-design_with_Genode_on_Xilinx_Zynq - previous release],
|
|
we turned towards enabling bulk-data transfer between the Zynq's CPU and its
|
|
FPGA. In a first step, we built a custom hardware design that implements a DMA
|
|
loopback device based on Xilinx' AXI DMA IP. Since we were particularly
|
|
interested in testing out the Zynq's accelerator coherency port (ACP), we
|
|
implemented two loopback devices: one attached to the ACP and one to the
|
|
high-performance (HP) AXI port of the Zynq. In order to test the design in
|
|
Genode, we added a port of Xilinx' embeddedsw repository that hosts standalone
|
|
driver code for the Xilinx IP cores. Based on this port, we implemented the
|
|
xilinx_axidma library as a Genode wrapper in order to simplify development of
|
|
custom drivers using Xilinx' AXI DMA IP. A newly written test component takes
|
|
throughput measurements for varying transfer sizes. A more detailed account of
|
|
this story is published in an
|
|
[https://www.hackster.io/johannes-schlatow/using-axi-dma-on-genode-6482d2 - article on hackster.io].
|
|
|
|
Knowing that DMA bypasses any memory protection on the Zynq as it does not
|
|
feature an IOMMU, we further spent some development efforts on implementing a
|
|
custom IP block, called DMA Guard, for protecting against unintended DMA
|
|
transfers from/to the FPGA. The DMA Guard is configured with a limited set of
|
|
address ranges for which DMA transfers will be granted. Any out-of-range
|
|
transfer will be denied. The configuration of the DMA Guard is conducted by
|
|
the Zynq's platform driver based on the allocated DMA buffers. For the time
|
|
being, we applied several changes to the platform driver. These modifications
|
|
are currently hosted in the genode-zynq repository but are going to find their
|
|
way into the generic platform driver for the next release.
|
|
|
|
More details about the DMA Guard are covered by the dedicated article:
|
|
[https://www.hackster.io/johannes-schlatow/taking-control-over-dma-transactions-on-zynq-with-genode-fd60b6 - Taking control over DMA transactions on Zynq with Genode].
|
|
To follow this line of work, keep watching our
|
|
[https://www.hackster.io/genode - hackster.io channel].
|
|
|
|
|
|
Base framework and OS-level infrastructure
|
|
##########################################
|
|
|
|
VFS optimization and simplification
|
|
===================================
|
|
|
|
For regular applications executed on Genode, input and output involves the
|
|
virtual file system (VFS). In contrast to traditional monolithic operating
|
|
systems (which host the VFS in the kernel) or traditional microkernel-based
|
|
operating systems (which host the VFS in a dedicated server component),
|
|
Genode's VFS has the form of a library, giving each component an individual
|
|
virtual file system. The feature set of the VFS library is not fixed
|
|
but extensible by so-called VFS plugins that come in the form of optional
|
|
shared libraries. These plugins can implement new file-system types, but also
|
|
expose other I/O facilities as pseudo files. For example, TCP/IP stacks like
|
|
lwIP and lxIP (IP stack ported from Linux) have the form of VFS plugins.
|
|
The extensibility of the VFS gives us extreme flexibility without compromising
|
|
Genode's simplicity.
|
|
|
|
On the other hand, the pervasiveness of the VFS - being embedded in Genode's C
|
|
runtime - puts it on the performance-critical path whenever application I/O is
|
|
involved. The ever-growing sophistication of application workloads like
|
|
running a Chromium-based web browser on the PinePhone puts merciless pressure
|
|
on the VFS, which motivated the following I/O-throughput optimizations.
|
|
|
|
Even though the VFS and various VFS plugins work asynchronously, the batching
|
|
of I/O operations is not consistently effective across different kernels. It
|
|
particularly depends on the kernel's scheduling decision upon the delivery of
|
|
asynchronous notifications. Kernels that eagerly switch to the signal receiver
|
|
may thereby prevent the batching of consecutive write operations. We could
|
|
observe variances of more than an order of magnitude of TCP throughput,
|
|
depending on the used kernel. In the worst case, when executing a kernel that
|
|
eagerly schedules the recipient of each asynchronous notification, the
|
|
application performance is largely dominated by context-switching costs.
|
|
|
|
Based on these observations, we concluded that the influence of the kernel's
|
|
scheduler should better be mitigated by scheduling asynchronous notifications
|
|
less eagerly at the application level. By waking up a remote peer not before
|
|
the application stalls for I/O, all scheduled operations would appear at the
|
|
remote side as one batch.
|
|
|
|
The implementation of this idea required a slight redesign of the VFS,
|
|
replacing the former implicit wakeup of remote peers by explicit wakeup
|
|
signalling. The wakeup signalling is triggered not before the VFS user settles
|
|
down. E.g., for libc-based applications, this is the case when the libc goes
|
|
idle, waiting for external I/O. In the case of a busy writer to a non-blocking
|
|
file descriptor or socket (e.g., lighttpd), the remote peers are woken up once
|
|
a write operation yields an out-count of 0. The deferring of wakeup signals is
|
|
accommodated by the new 'Remote_io' mechanism (_vfs/remote_io.h_) that is
|
|
designated to be used by all VFS plugins that interact with asynchronous
|
|
Genode services for I/O.
|
|
|
|
Combined with additional adjustments of I/O buffer sizes - like the request
|
|
queue of the file-system session, the TCP send buffer of the lwIP stack, or
|
|
the packet buffer of the NIC session - the VFS optimization almost eliminated
|
|
the variance of the I/O throughput among the different kernels and generally
|
|
improved the performance. On kernels that suffered most from the eager context
|
|
switching, netperf
|
|
[https://github.com/genodelabs/genode/issues/4697#issuecomment-1342542399 - shows a 10x]
|
|
improvement. But even on kernels with more balanced scheduling, the effect is
|
|
impressive.
|
|
|
|
While we were at it, and since this structural change affected all VFS plugins
|
|
and users anyway, we took the opportunity to simplify and modernize other
|
|
aspects of the VFS-related code as well.
|
|
|
|
In particular, the new interface 'Vfs::Env::User' replaces the former
|
|
'Vfs::Io_response_handler'. In contrast to the 'Io_response_handler', which
|
|
had to be called on a 'Vfs_handle', the new interface does not require any
|
|
specific handle. It is merely meant to prompt the VFS user (like the libc) to
|
|
re-attempt stalled I/O operations but it does not provide any immediate hint
|
|
about which of the handles have become ready for reading/writing. This
|
|
decoupling led to welcome simplifications of asynchronously working VFS
|
|
plugins.
|
|
|
|
Furthermore, we removed the 'file_size' type from read/write interfaces. The
|
|
former C-style pair of (pointer, size) arguments to those operations have been
|
|
replaced by 'Byte_range_ptr' and 'Const_byte_range_ptr' argument types, which
|
|
make the code safer and easier to follow. Also, the VFS utilities offered by
|
|
_os/vfs.h_ benefit from this safety improvement.
|
|
|
|
|
|
GPU performance optimizations
|
|
=============================
|
|
|
|
Session interface changes
|
|
-------------------------
|
|
|
|
The GPU session interface was originally developed along the first version of
|
|
our GPU multiplexer for Intel devices. For this reason, the interface
|
|
contained Intel specific nomenclature, like GTT and PPGTT for memory map and
|
|
unmap operations. With the introduction of new GPU drivers with different
|
|
architectures (e.g., Mali and Vivante), the Intel specifics should have gone
|
|
away. With the current Genode release, we streamlined the map and unmap
|
|
functions to semantically be more correct on all supported hardware. There are
|
|
two map functions now: First, _map_cpu_ which maps GPU graphics memory to be
|
|
accessed by the CPU. And second, _map_gpu_ which establishes a mapping of
|
|
graphics memory within the GPU.
|
|
|
|
Additionally, we removed the concept of buffers (as used by Mesa and Linux
|
|
drivers) to manage graphics memory and replaced it by the notion of video
|
|
memory (VRAM) where VRAM stands for the actual graphics memory used by a GPU -
|
|
may it be dedicated on-card memory or system RAM. The change makes it possible
|
|
to separate the graphics-memory management from the buffer management as
|
|
required by the Mesa library.
|
|
|
|
|
|
Intel graphics
|
|
--------------
|
|
|
|
When porting 3D applications using Mesa's OpenGL, we found that Mesa allocates
|
|
and frees a lot of small GPU buffer objects (data in GPU memory) during
|
|
operation. This is sub optimal for component-based systems because the Mesa
|
|
library has to perform an RPC to the GPU multiplexer for each buffer
|
|
allocation and for each buffer mapping. As mentioned above, we changed the
|
|
session semantics from buffer object to video memory and implemented this
|
|
feature within Intel's GPU multiplexer, which now only hands out VRAM. This
|
|
made it possible to move the buffer handling completely to the Mesa client
|
|
side (libdrm). Libdrm now allocates large chunks of video memory (i.e., 16MB)
|
|
and hands out memory for buffer objects from this pool. This brings two
|
|
advantages: First, the client-side VRAM pool acts as cache, which reduces the
|
|
number of RPCs required for memory management significantly. Second, because
|
|
of the larger VRAM allocations (compared to many 4K or 16K allocations before)
|
|
fewer capabilities for the actual dataspaces that back the memory are
|
|
required. Measurements showed that almost an order of magnitude of
|
|
capabilities can be saved at Mesa or the client side this way.
|
|
|
|
|
|
Mali graphics
|
|
-------------
|
|
|
|
The 22.08 release introduced a
|
|
[https://genode.org/documentation/release-notes/22.08#GPU_and_Mesa_driver_for_Mali-400 - driver]
|
|
for the GPU found in the PinePhone. Since it was merely a rapid prototype, it
|
|
was limited to one client at a time, and was normally started and stopped
|
|
together with its client. With this release, we remedied these limitations and
|
|
enabled support for multiple concurrent clients and also revised our libdrm
|
|
backend for Mesa's Lima driver.
|
|
|
|
We have not yet explored applying the same VRAM optimizations that are employed
|
|
by our Intel graphics stack. One VRAM allocation still correlates to one
|
|
buffer-object.
|
|
|
|
|
|
More flexible ACPI-event handling
|
|
=================================
|
|
|
|
The _acpica_ component uses the Intel ACPICA library to parse and interpret
|
|
ACPI tables and AML code. One designated feature is the monitoring of several
|
|
ACPI event sources including optional reporting of information about state
|
|
changes. The supported event sources are:
|
|
|
|
* Lid, which can be open or closed
|
|
* Smart battery (SB), information about battery parameters (e.g., capacity)
|
|
and charging/discharging status
|
|
* ACPI fixed events, e.g., power buttons
|
|
* AC adapters, which reflect power cable plug/unplug
|
|
* Embedded controller (EC), events like Fn-* keys, Lid, AC, SB changes
|
|
* Vendor-specific hardware events, e.g., Fujitsu FUJ02E3 key events
|
|
|
|
Acpica optionally reports information about state changes. These reports can
|
|
be monitored by other components as ROMs. The following configuration
|
|
illustrates the feature:
|
|
|
|
!<start name="report_rom">
|
|
! <resource name="RAM" quantum="2M"/>
|
|
! <provides> <service name="ROM" /> <service name="Report" /> </provides>
|
|
! <config>
|
|
! <policy label="acpi_event -> acpi_lid" report="acpica -> acpi_lid"/>
|
|
! <policy label="acpi_event -> acpi_battery" report="acpica -> acpi_battery"/>
|
|
! <policy label="acpi_event -> acpi_fixed" report="acpica -> acpi_fixed"/>
|
|
! <policy label="acpi_event -> acpi_ac" report="acpica -> acpi_ac"/>
|
|
! <policy label="acpi_event -> acpi_ec" report="acpica -> acpi_ec"/>
|
|
! <policy label="acpi_event -> acpi_hid" report="acpica -> acpi_hid"/>
|
|
! </config>
|
|
!</start>
|
|
!
|
|
!<start name="acpica">
|
|
! <resource name="RAM" quantum="8M"/>
|
|
! <config report="yes"/>
|
|
! <route>
|
|
! <service name="Report"> <child name="acpi_state"/> </service>
|
|
! ...
|
|
! </route>
|
|
!</start>
|
|
|
|
One such ACPI monitor component is _acpi_event_ that maps ACPI events to key
|
|
events of a requested Event session based on its configuration. This way, ACPI
|
|
state changes can be processed like ordinary key press-release events via, for
|
|
example, the _event_filter_. The following configuration illustrates how to
|
|
map the ACPI event types to key events:
|
|
|
|
!<start name="acpi_event">
|
|
! <resource name="RAM" quantum="1M"/>
|
|
! <config>
|
|
! <map acpi="lid" value="CLOSED" to_key="KEY_SLEEP"/>
|
|
! <map acpi="fixed" value="0" to_key="KEY_POWER"/>
|
|
! <map acpi="ac" value="ONLINE" to_key="KEY_WAKEUP"/>
|
|
! <map acpi="ec" value="20" to_key="KEY_BRIGHTNESSUP"/>
|
|
! <map acpi="ec" value="21" to_key="KEY_BRIGHTNESSDOWN"/>
|
|
! <map acpi="hid" value="0x4000000" to_key="KEY_FN_F4"/>
|
|
! </config>
|
|
! <route>
|
|
! <service name="ROM" label="acpi_lid"> <child name="acpi_state"/> </service>
|
|
! <service name="ROM" label="acpi_battery"> <child name="acpi_state"/> </service>
|
|
! <service name="ROM" label="acpi_fixed"> <child name="acpi_state"/> </service>
|
|
! <service name="ROM" label="acpi_ac"> <child name="acpi_state"/> </service>
|
|
! <service name="ROM" label="acpi_ec"> <child name="acpi_state"/> </service>
|
|
! <service name="ROM" label="acpi_hid"> <child name="acpi_state"/> </service>
|
|
! <service name="Event"> <child name="event_filter" label="acpi"/> </service>
|
|
! ...
|
|
! </route>
|
|
!</start>
|
|
|
|
In the current release, we replaced the limited list of supported key names by
|
|
a general mechanism, which supports the use of all key names declared in
|
|
_repos/os/include/input/keycodes.h_.
|
|
|
|
|
|
Base API changes
|
|
================
|
|
|
|
As part of our continuous motive to streamline and simplify the framework's
|
|
base API as much as possible, the current release removes the interfaces
|
|
_base/blocking.h_, _base/debug.h_, and _base/lock_guard.h_ as those headers
|
|
contained parts of the API that have become obsolete by now. As a further
|
|
minor change, the 'abs' function of _util/misc_math.h_ got removed.
|
|
|
|
The string utilities _util/string.h_ received the new 'Const_byte_range_ptr'
|
|
type complementing the existing 'Byte_range_ptr'. Both types are designated
|
|
for passing arguments that refer to a byte buffer, e.g., the source buffer of
|
|
a write operation.
|
|
|
|
|
|
On-target system-update and rollback mechanism
|
|
##############################################
|
|
|
|
For the mobile version of Sculpt OS as covered in
|
|
Section [First system image of mobile Sculpt OS (PinePhone)],
|
|
we envisioned easy-to-use system updates that would enable us to quickly
|
|
iterate based on the feedback of early field testers.
|
|
|
|
This topic confronted us with a variety of concerns. Just to name a few,
|
|
conventions for booting that would not require changes in the future,
|
|
equipping (system) images with self-reflecting version information, tools for
|
|
generating and publishing digitally-signed images, on-target discovery of new
|
|
image versions, secure downloading and cryptographic checking of new images,
|
|
directing the machine's boot loader to use the new version, and possibly
|
|
reverting to an earlier version.
|
|
|
|
Fortunately, most of these concerns have a lot in common with the problems
|
|
we had to address for Genode's
|
|
[https://genode.org/documentation/release-notes/18.02#On-target_package_installation_and_deployment - package management].
|
|
For example, the off-target and on-target tooling for digital signatures,
|
|
the notion of a depot, and the concept of federated software providers
|
|
(depot users) are established and time-tested by now.
|
|
|
|
|
|
Self-reflecting version information
|
|
-----------------------------------
|
|
|
|
To allow a running Sculpt system to know its own version, the sculpt.run
|
|
script generates an artificial boot module named "build_info", which can be
|
|
evaluated at runtime by the sculpt-manager component.
|
|
|
|
! <build_info genode_version="22.11-260-g89be3404c0d"
|
|
! date="2023-01-19" depot_user="nfeske" board="pinephone">
|
|
|
|
|
|
Formalism for generating images and image metadata
|
|
--------------------------------------------------
|
|
|
|
To enable the Sculpt system to easily detect new versions, system images must
|
|
be accompanied by metadata discoverable at a known location. This information
|
|
is provided by a so-called image-index file located at
|
|
_depot/<user>/image/index_. The image index of a depot user lists the
|
|
available images in XML form, e.g.,
|
|
|
|
! <index>
|
|
! <image os="sculpt" board="pinephone" version="2023-01-19">
|
|
! <info text="initial version"/>
|
|
! </image>
|
|
! ...
|
|
! </index>
|
|
|
|
The 'os', 'board', and 'version' attributes can be used to infer the file name
|
|
of the corresponding image file. The '<info>' nodes contain a summary of
|
|
changes as information for the end user.
|
|
|
|
The new _gems/run/sculpt_image.run_ script provides assistance with generating
|
|
appropriately named images, placing them into the depot, and presenting a
|
|
template for the manually curated image index.
|
|
|
|
|
|
Signing and publishing
|
|
----------------------
|
|
|
|
For signing and publishing system images and image indices, we extended the
|
|
existing _tool/depot/publish_ tool. To publish a new version of an image
|
|
index:
|
|
|
|
! ./tool/depot/publish <depot-user>/image/index
|
|
|
|
Each system image comes in two forms, a bootable disk image and an archive of
|
|
the boot directory. The bootable disk image can be used to install a new
|
|
system from scratch by copying the image directly to a block device. It
|
|
contains raw block data. The archive of the boot directory contains the
|
|
content needed for an on-target system update to this version. Within the
|
|
depot, this archive has the form of a directory - named after the image - that
|
|
contains the designated content of the boot directory on target. Depending on
|
|
the board, it may contain only a single file loaded by the boot loader (e.g.,
|
|
uImage), or several boot modules, or even the boot-loader configuration. The
|
|
following command publishes both forms:
|
|
|
|
! ./tool/depot/publish <depot-user>/image/<image-name>
|
|
|
|
This results in the following - accompanied by their respective .sig
|
|
files - in the public directory:
|
|
|
|
! <depot-user>/image/<image-name>.img.xz (disk image)
|
|
! <depot-user>/image/<image-name>.tar.xz (boot archive)
|
|
! <depot-user>/image/<image-name>.zip (disk image)
|
|
|
|
The .zip file contains the .img file. It is provided for users who download
|
|
the image on a system with no support for .xz.
|
|
|
|
|
|
On-target image discovery, download, and verification
|
|
-----------------------------------------------------
|
|
|
|
To enable a running Sculpt system to fetch image index files and images, the
|
|
existing depot-download component accepts the following two new download
|
|
types:
|
|
|
|
! <image_index path="<user>/image/index"/>
|
|
! <image path="<user>/image/<name>"/>
|
|
|
|
Internally, the depot-download subsystem employs the depot-query component to
|
|
determine the missing depot content. This component accepts the following two
|
|
new queries:
|
|
|
|
! <images user="..."/>
|
|
! <image_index user="..."/>
|
|
|
|
If present in the query, depot_query generates reports labeled as "images" and
|
|
"image_index" respectively. These reports are picked up by the depot-download
|
|
component to track the completion of each job. The reported information is
|
|
also used by the system updater to get hold of the images that are ready to
|
|
install.
|
|
|
|
|
|
On-target image installation and rollback
|
|
-----------------------------------------
|
|
|
|
Once downloaded into the local depot of a Sculpt system, the content of the
|
|
boot directory for a given image version is readily available, e.g.,
|
|
|
|
! depot/nfeske/image/sculpt-pinephone-2023-02-02/uImage
|
|
|
|
The installation comes down to copying this content to the _/boot/_ directory.
|
|
On the next reboot, the new image is executed.
|
|
|
|
When subsequently downloading new image versions, the old versions stay
|
|
available in the depot as sibling directories. This allows for an easy
|
|
rollback by copying the boot content of an old version to the _/boot/_
|
|
directory.
|
|
|
|
|
|
Device drivers
|
|
##############
|
|
|
|
NXP i.MX Ethernet & USB
|
|
=======================
|
|
|
|
The Ethernet driver for i.MX53, i.MX6, and i.MX7 got updated to use a more
|
|
recent Linux kernel version (5.11). These drivers got aligned with the
|
|
source-code base originally ported for the i.MX8 SoC.
|
|
|
|
Using the recent approach to port Linux device drivers, trying to preserve the
|
|
original semantic, it is necessary to provide the correct clock rates to the
|
|
driver. Therefore, specific platform drivers for i.MX6 and i.MX7 were created
|
|
that enable the network related clocks and export their rate values.
|
|
The i.MX53 related platform driver got extended to support these clocks.
|
|
|
|
The USB host-controller driver for the i.MX 8MQ EVK is now able to drive the
|
|
USB-C connector of this board too.
|
|
|
|
|
|
Realtek Wifi
|
|
============
|
|
|
|
As a welcoming side effect of switching to the new DDE-Linux approach,
|
|
enabling other drivers that are part of the same subsystem has become less
|
|
involved. In the past, we mostly focused on getting wireless devices supported
|
|
by the iwlwifi driver to work as those are the devices predominantly found in
|
|
commodity laptops. That being said, every now and then, one comes across a
|
|
different vendor and especially with the shifting focus on ARM-based systems
|
|
covering those as well became necessary.
|
|
|
|
As a first experiment, we enabled the rtlwifi driver that provides support
|
|
for Realtek-based wireless devices. Due to lacking access to other hardware,
|
|
the driver has been so far tested only with a specific RTL8188EE based device
|
|
(10ec:8179 rev 01). Of course, some trade-offs were made as power-management
|
|
is currently not available. But getting it to work, nevertheless, took barely
|
|
half a day of work, which is promising.
|
|
|
|
|
|
Platforms
|
|
#########
|
|
|
|
Base-HW microkernel
|
|
===================
|
|
|
|
Cache-maintenance optimization
|
|
------------------------------
|
|
|
|
On ARM systems, the memory view on instructions and data of the CPUs, as well
|
|
as between CPUs and other devices is not necessarily consistent. When dealing
|
|
with DMA transfers of devices, developers of related drivers need to ensure
|
|
that corresponding cache lines are cleaned before a DMA transfer gets
|
|
acknowledged. When dealing with just-in-time compilation, where instructions
|
|
are generated on demand, the data and instruction caches have to be aligned
|
|
too.
|
|
|
|
Until now, the base-API functions for such cache-maintenance operations were
|
|
mapped to kernel system calls specific to base-hw. Only the kernel was allowed
|
|
to execute cache maintenance related instructions. On ARMv8 however, it is
|
|
possible to allow unprivileged components to execute most of these
|
|
instructions.
|
|
|
|
With this release, we have implemented the cache maintenance functions outside
|
|
the kernel on ARMv8 where possible. Thereby, several device drivers with a lot
|
|
of DMA transactions, e.g. the GPU driver, benefit from this optimization
|
|
enormously. The JavaScript engine used in the Morph and Falkon browsers
|
|
profits as well.
|
|
|
|
|
|
ACPI suspend & resume
|
|
---------------------
|
|
|
|
In the previous release, we started to support the low-level
|
|
[https://genode.org/documentation/release-notes/22.11#Low-level_mechanism_for_suspend_resume_on_PC_platforms - ACPI suspend and resume]
|
|
mechanism with Genode for the NOVA kernel. With the current release, we added
|
|
the required low-level support to Genode's base-hw kernel for x86 64bit
|
|
platforms. Similar to the base-nova version, on base-hw the
|
|
'Pd::managing_system' RPC function of Genode's core roottask is used to
|
|
transfer the required ACPI values representing the S3 sleep state to the
|
|
kernel. The kernel then takes care to halt all CPUs and flush its state to
|
|
memory, before finally suspending the PC using the ACPI mechanism. On resume,
|
|
the kernel re-initializes necessary hardware used by the kernel, e.g., all
|
|
CPUs, interrupt controller, timer device, and serial device. One can test
|
|
drive the new feature using the _run/acpi_suspend_ scenario introduced by the
|
|
former release.
|
|
|
|
|
|
Scheduling improvements for interactive workloads
|
|
-------------------------------------------------
|
|
|
|
As Genode conquers the PinePhone, the base-hw kernel, for the first time, has
|
|
to perform real-life multimedia on a daily basis given a resource-limited
|
|
mobile target. One particularly important and ambitious use case has become
|
|
video conferencing in the Morph browser. A combination of an already demanding
|
|
browser engine with an application that not only streams video and audio in
|
|
both directions over network but also handles video and audio I/O at the
|
|
device, and all that fluently and at the same time.
|
|
|
|
A lot of thinking went into how to optimize this scenario on each level of
|
|
abstraction and one rather low-level lever was the scheduling scheme of the
|
|
base-hw kernel. The base-hw scheduling scheme consists of a combination of
|
|
absolute priority bands with execution-time quotas that prevent higher
|
|
prioritized subjects from starving lower ones. There is the notion of a super
|
|
period and each subject owns only a fraction of that super period as quota
|
|
together with its priority. Once a subject has depleted its quota, it can't
|
|
use its priority until the end of the current super period where its quota
|
|
will be re-filled. However, during that time, the subject is not blocked - It
|
|
can become active whenever there is no subject with priority and remaining
|
|
quota present.
|
|
|
|
So, this "zero" band below all the priority bands temporarily accommodates all
|
|
subjects that have a priority but that are out of quota. It contains, however,
|
|
also subjects that have no priority in general. These might be tasks like a GCC
|
|
compilation or a ray tracer. While prioritized tasks would be user input
|
|
handlers or the display driver. Now, one difficult problem that arises with
|
|
this scheduling scheme is that system integration has to decide how much quota
|
|
is required by a prioritized task. The perfect value can't be determined as it
|
|
depends on many factors including the target platform. Therefore, we have to
|
|
consider that an important task like the audio driver in the video-conference
|
|
scenario runs out of quota shortly before finishing its work.
|
|
|
|
This is already bad as is as the audio driver now has to share the CPU with
|
|
many unimportant tasks until the next super period. But it became even worse
|
|
because, in the past implementation, subjects always entered the zero band at
|
|
the tail position. It meant that, e.g., the remaining audio handling had to
|
|
wait at least until all the unprioritized tasks (e.g. long-taking computations)
|
|
had used up their zero-band time slice. In order to mitigate this situation, we
|
|
decided that prioritized tasks when depleting their quota become head of the
|
|
zero-band, so, they will be scheduled first whenever the higher bands become
|
|
idle.
|
|
|
|
This change relaxes the consequences of quota-depletion events for
|
|
time-critical tasks in a typical system with many unprioritized tasks.
|
|
At the same time, it should not have a significant impact on the overall
|
|
schedule because depletion events are rare and zero-band time-slices short.
|
|
|
|
|
|
NOVA microhypervisor
|
|
====================
|
|
|
|
ACPI suspend & resume
|
|
---------------------
|
|
|
|
As an extension to the principal
|
|
[https://genode.org/documentation/release-notes/22.11#Low-level_mechanism_for_suspend_resume_on_PC_platforms - ACPI suspend and resume]
|
|
support introduced with the Genode 22.11 release, the NOVA kernel now supports
|
|
also the re-enablement of the IOMMU after ACPI resume. The IOMMU as a hardware
|
|
feature has been supported by Genode since
|
|
[https://genode.org/documentation/release-notes/13.02#DMA_protection_via_IOMMU - release 13.02]
|
|
and extended in
|
|
[https://genode.org/documentation/release-notes/20.11#NOVA_microhypervisor - release 20.11],
|
|
which sandboxed device hardware and (malicious/faulty) drivers to avoid
|
|
arbitrary DMA transactions.
|
|
|
|
Intel P/E cores
|
|
---------------
|
|
|
|
Starting with [https://en.wikipedia.org/wiki/Intel_Core#12th_generation - Intel CPU generation 12],
|
|
Intel introduced CPUs with heterogeneous cores, similar to
|
|
[https://en.wikipedia.org/wiki/ARM_big.LITTLE - ARM's big/LITTLE] concept.
|
|
The new CPUs have a number of so called P-cores (performance) and E-cores
|
|
(efficient), which differ in their performance and power characteristics.
|
|
The CPU cores
|
|
([https://en.wikipedia.org/wiki/Alder_Lake#CPUID_incoherence - should be])
|
|
instruction compatible and are reported as identical via x86's CPUID
|
|
instruction nowadays. However, an operating system such as Genode must be able
|
|
to differentiate the cores in order to take informed decisions about the
|
|
placement and scheduling of Genode components.
|
|
|
|
With the current release, we added support to the NOVA kernel to propagate the
|
|
information about P/E cores to Genode's 'core' roottask. In Genode's core,
|
|
this information is used to group the CPU cores into Genode's
|
|
[https://genode.org/documentation/release-notes/13.08#Management_of_CPU_affinities - affinity space].
|
|
With
|
|
[https://genode.org/documentation/release-notes/20.05#NOVA_microhypervisor - release 20.05],
|
|
we introduced the grouping of hyperthreads on the y-axis, which we keep in
|
|
case the P-cores have the feature enabled. Following the P-cores and
|
|
hyperthreads, all remaining E-cores are placed in the affinity space.
|
|
|
|
The following examples showcase the grouping in the affinity-space on x/y axis:
|
|
|
|
Core i7 1270P - 4 P-cores (hyperthreading enabled) and 8 E-cores:
|
|
|
|
! x-axis 1 2 3 4 5 6 7 8
|
|
! ----------------------------------
|
|
! y-axis 1 | P\ P\ P\ P\ E E E E
|
|
! 2 | P/ P/ P/ P/ E E E E
|
|
!
|
|
! hyperthreads \ / of same core
|
|
|
|
Core i7 1280P - 6 P-cores (hyperthreading enabled) and 8 E-cores:
|
|
|
|
! x-axis 1 2 3 4 5 6 7 8 9 10
|
|
! -----------------------------------------
|
|
! y-axis 1 | P\ P\ P\ P\ P\ P\ E E E E
|
|
! 2 | P/ P/ P/ P/ P/ P/ E E E E
|
|
!
|
|
! hyperthreads \ / of same core
|
|
|
|
The information about the P/E cores is visible in the kernel and Genode's
|
|
log output and is reported in the 'platform_info' ROM, e.g.
|
|
|
|
! kernel:
|
|
!
|
|
! [ 0] CORE:00:00:0 6:9a:3:7 [415] P 12th Gen Intel(R) Core(TM) i7-1270P
|
|
! ...
|
|
! [15] CORE:00:17:0 6:9a:3:7 [415] E 12th Gen Intel(R) Core(TM) i7-1270P
|
|
! ...
|
|
|
|
! Genode's core:
|
|
!
|
|
! mapping: affinity space -> kernel cpu id - package:core:thread
|
|
! remap (0x0) -> 0 - 0: 0:0 P boot cpu
|
|
! remap (0x1) -> 1 - 0: 0:1 P
|
|
! remap (1x0) -> 2 - 0: 4:0 P
|
|
! remap (1x1) -> 3 - 0: 4:1 P
|
|
! remap (2x0) -> 4 - 0: 8:0 P
|
|
! remap (2x1) -> 5 - 0: 8:1 P
|
|
! remap (3x0) -> 6 - 0:12:0 P
|
|
! remap (3x1) -> 7 - 0:12:1 P
|
|
! remap (4x0) -> 8 - 0:16:0 E
|
|
! remap (4x1) -> 9 - 0:17:0 E
|
|
! remap (5x0) -> 10 - 0:18:0 E
|
|
! remap (5x1) -> 11 - 0:19:0 E
|
|
! remap (6x0) -> 12 - 0:20:0 E
|
|
! remap (6x1) -> 13 - 0:21:0 E
|
|
! remap (7x0) -> 14 - 0:22:0 E
|
|
! remap (7x1) -> 15 - 0:23:0 E
|
|
! ...
|
|
|
|
! platform_info ROM:
|
|
!
|
|
! ...
|
|
! <cpus>
|
|
! <cpu xpos="0" ypos="0" cpu_type="P" .../>
|
|
! ...
|
|
! <cpu xpos="5" ypos="0" cpu_type="E" .../>
|
|
! ...
|
|
! <cpus>
|
|
! ...
|
|
|
|
|
|
Build system and tools
|
|
######################
|
|
|
|
Building and packaging CMake-based shared libraries (via Goa)
|
|
=============================================================
|
|
|
|
The [https://github.com/nfeske/goa - Goa] tool streamlines the work of
|
|
cross-developing, testing, and publishing Genode application software
|
|
using commodity build tools like CMake. The tool is particularly suited for
|
|
porting existing 3rd-party software to Sculpt OS.
|
|
|
|
Until recently, Goa was solely focused on applications whereas the porting of
|
|
3rd-party libraries required the use of the traditional approach of hand
|
|
crafting build rules for Genode's build system. This limitation of Goa got
|
|
lifted now.
|
|
|
|
In the new version, a Goa project can host an _api_ file indicating that
|
|
the project is a library project. The file contains the list of headers that
|
|
comprise the library's public interface. The build artifact of a library
|
|
is declared in the _artifacts_ file and is expected to have the form
|
|
_<library-name>.lib.so_. The ABI symbols of such a library must be listed
|
|
in the file _symbols/<library-name>_. With these bits of information supplied
|
|
to Goa, the tool is able to build and publish both the library and the API as
|
|
depot archives - ready to use by Genode applications linking to the library.
|
|
The way how all those little pieces work together is best illustrated by the
|
|
accompanied
|
|
[https://github.com/nfeske/goa/tree/master/examples/cmake_library - example].
|
|
For further details, please consult Goa's builtin documentation via 'goa help'
|
|
(overview of Goa's sub commands and files) and 'goa help api' (specifics of
|
|
the _api_ declaration file).
|
|
|
|
When porting a library to Genode, one manual step remains, which is the
|
|
declaration of the ABI symbols exported by the library. The new sub command
|
|
'goa extract-abi-symbols' eases this manual step. It automatically generates a
|
|
template for the _symbols/<library-name>_ file from the library's built shared
|
|
object. Note, however, that the generated symbols file is expected to be
|
|
manually reviewed and tidied up, e.g., by removing library-internal symbols.
|
|
|
|
_Thanks to Pirmin Duss for having contributed this welcomed new feature, which_
|
|
_makes Goa much more versatile!_
|
|
|
|
|
|
New tool for querying metadata of ports
|
|
=======================================
|
|
|
|
The integration of third-party software into Genode is implemented via _ports_
|
|
that specify how to retrieve, verify, and patch the source code in preparation
|
|
for use with our build system. Ports are managed by tools residing in the
|
|
_tool/ports_ directory. For example, _tool/ports/prepare_port_ is used to
|
|
execute all required preparation steps.
|
|
|
|
Currently, the base Genode sources support 90 ports (you may try
|
|
_tool/ports/list_ yourself) and, thus, it's not trivial to keep track of all
|
|
the ports in the repo directories. Therefore, we introduce the
|
|
_tool/ports/metadata_ tool to extract information about license, upstream
|
|
version, and source URLs of individual ports. The tool can be used as follows:
|
|
|
|
!./tool/ports/metadata virtualbox6
|
|
!
|
|
!PORT: virtualbox6
|
|
!LICENSE: GPLv2
|
|
!VERSION: 6.1.26
|
|
!SOURCE: http://download.virtualbox.org/virtualbox/6.1.26/VirtualBox-6.1.26.tar.bz2 (virtualbox)
|
|
!SOURCE: http://download.virtualbox.org/virtualbox/6.1.26/VirtualBoxSDK-6.1.26-145957.zip (virtualbox_sdk)
|
|
|
|
|
|
Harmonization of the boot concepts across ARM and PC platforms
|
|
==============================================================
|
|
|
|
To make the system-update functionality covered in
|
|
Section [On-target system-update and rollback mechanism] equally usable across
|
|
PC and ARM platforms, the conventions of booting the platforms had to be
|
|
unified.
|
|
|
|
Traditionally, a bootable disk image for the PC contains a _boot/_ directory.
|
|
E.g., when using NOVA, it contains the GRUB boot-loader config + the hypervisor +
|
|
the bender pre-boot loader + the banner image + the Genode system image.
|
|
This structure corresponds 1:1 to the _boot/_ directory as found on the 3rd
|
|
partition of the Sculpt system, which is very nice. A manual system update of
|
|
Sculpt comes down to replacing these files. However, on ARM platforms, SD-card
|
|
images used to host a _uImage_ file and a U-Boot environment configuration
|
|
file in the root directory. The distinction of these differences complicates
|
|
both the build-time tooling and the on-target handling of system updates.
|
|
|
|
The current release unifies the boot convention by hosting a _boot/_ directory
|
|
on all platforms and reinforces the consistent naming of files. On ARM, the
|
|
_uImage_ and _uboot.env_ files now always reside under _boot/_. Thanks to this
|
|
uniform convention, Genode's new system update mechanism can now equally
|
|
expect that a system update corresponds to the mere replacement of the content
|
|
of the _boot/_ directory.
|
|
|
|
|
|
Minor run-tool changes
|
|
======================
|
|
|
|
The functionality of the _image/uboot_fit_ plugin has been integrated into the
|
|
regular _image/uboot_ plugin as both plugins were quite similar.
|
|
FIT images can now be produced by adding the run option '--image-uboot-fit'.
|
|
|