genode/doc/release_notes/15-05.txt
Norman Feske 98211db63d doc: move release notes to sub directory
This keeps the doc/ directory tidy and neat.
2020-11-27 09:19:09 +01:00

1217 lines
60 KiB
Plaintext

===============================================
Release notes for the Genode OS Framework 15.05
===============================================
Genode Labs
Version 15.05 represents the most substantial release in the history of Genode.
It is packed with profound architectural improvements, new device drivers, the
extension of the supported base platforms, and a brand new documentation.
With the new documentation introduced in Section [Comprehensive architectural documentation],
the project reaches a mile stone. On our mission to find the right
architectural abstractions, the past years had a strong research focus. We
conducted countless of experiments, gathered experience with highly diverse
hardware platforms and kernels, and explored application scenarios. Our target
audience used to be technology enthusiasts. Now that we have reached a point
where the architecture is mature, it is the time to invite a wider audience,
in particular people who are interested in building Genode-based solutions.
The new book "Genode Foundations" equips the reader with the holistic view and
the technological insights needed to get started.
Genode's custom kernel platform, originally conceived as a research vehicle,
has become feature complete. As explained in Section
[Feature completion of our custom kernel (base-hw)], the release contains
three substantial additions. First, with the added support for the 64-bit x86
architecture, the kernel moves beyond the realms of the ARM architecture. This
line of work is particularly exciting because it was conducted outside of
Genode Labs, by the developers of the Muen separation kernel. The second
addition introduces kernel-protected capabilities to the base-hw kernel. This
was the last missing functionality that stood in the way of using the kernel
in security-critical scenarios. Finally, the kernel's scheduler received the
ability to handle thread weights in a dynamic fashion.
With revising the framework's device-driver infrastructure as described in
Section [Revised device-driver infrastructure], this release addresses
long-standing architectural limitations with respect to the effective
confinement of device drivers. This topic encompasses changes in the NOVA
kernel, a redesign of the fundamental interfaces for user-level device
drivers, the design and implementation of a new platform driver, and the
adaptation of the drivers. Speaking of device drivers, the version 15.05 comes
with a new AHCI driver, new audio drivers ported from OpenBSD, new SD-card
drivers for the Raspberry Pi and i.MX53, platform support for i.MX6, and
multi-touch support.
The icing on the cake is the added support for the seL4 kernel as Genode base
platform. Section [Proof-of-concept support for the seL4 kernel] covers this
undertaking. Even though this work is still in its infancy, we are happy to
present the first simple Genode scenarios running on this kernel.
Comprehensive architectural documentation
#########################################
The popularity of Genode is slowly but steadily growing. Still, for most
uninitiated who stumble upon it, the project remains largely intangible
because it does not fit well in the established categories of software. With
the current release, we hope to change that. The release is accompanied by a
documentation in the form of the book "Genode OS Framework Foundations"
completely written from scratch:
[image genode_foundations_cover]
The book is published under the Creative Commons Attribution + ShareAlike
License (CC-BY-SA) and can be downloaded as
[https://genode.org/documentation/genode-foundations-15-05.pdf - PDF document].
It first presents the motivation behind our project, followed by a thorough
description of the Genode OS architecture. The conceptual material is
complemented with practical information for developers and a discussion of
framework internals. The second part of the book serves as a reference of
Genode's programming interfaces.
[https://genode.org/documentation/genode-foundations-15-05.pdf - Download the book (PDF)...]
In the upcoming weeks, we plan to update the documentation section of the
genode.org website with the new material. Until then, we hope you find the
book enjoyable.
Feature completion of our custom kernel (base-hw)
#################################################
Kernel-protected capabilities
=============================
One of the fundamental concepts used within Genode are capabilities. Although
this security mechanism was present in the Genode API from the very beginning,
our base-hw kernel could not guarantee the integrity of capabilities so far.
On top of this kernel, capabilities used to be represented as global IDs that
could get forged easily until now.
With this release, we introduce a major change of base-hw, which now supports
capability ID spaces per component. That means every component respectively
protection-domain has its own local name space for kernel objects. When a
component invokes a capability to access an RPC object, it provides the
corresponding capability ID to the kernel's system call. The kernel maintains
a tree of capability IDs per protection domain and can retrieve whether the
provided ID is valid and to which kernel object it points to. As all kernel
objects are constructed on behalf of the core process first, this component
always owns the initial capability during the lifetime of a kernel object.
Other components can obtain capabilities via remote-procedure calls (RPC)
only. Whenever a capability is part of a message transfer between threads,
the kernel translates the capability IDs within the message buffer from one
protection domain's capability space to another. If the target protection
domain does not own the capability during the transfer already, the kernel
creates a new capability ID for the receiving protection domain.
In contrast to other capability-based kernels that Genode supports, the
base-hw kernel manages the capability space on behalf of the components.
Nevertheless, as the kernel does not know whether a component is still using a
capability ID, even though the kernel object behind it got invalidated
already, components have to inform the kernel when a capability ID is not used
anymore so that is can be reused again. Therefore, we introduce a new
system-call 'delete_cap', which frees a capability ID from the local
protection domain.
To allocate entries in the capability space of components, the kernel needs
memory. The required memory is taken from the RAM quota a component provides
to its protection-domain session. If the kernel determines that the quota does
not fulfill the requirements when a component wants to receive capabilities,
the corresponding system-call delivers an error before the actual IPC
operation takes place. The component first has to upgrade the RAM quota before
it can retry its IPC operation. The procedure of IPC error-handling is
transparent to the developer and already solved by the base library
implementation for the base-hw kernel.
Principal support for the 64-bit x86 architecture
=================================================
_This section was written by Adrian-Ken Rueegsegger and Reto Buerki who_
_conducted the described line of work independent from Genode Labs._
The [https://muen.sk - Muen Separation Kernel (SK) project] is an Open-Source
microkernel, which uses the [https://spark-2014.org/ - SPARK] programming
language to enable light-weight formal methods for high assurance. The 64-bit
x86 kernel, currently consisting of a little over 5'000 LOC, makes extensive
use of the latest Intel virtualization features and has been formally proven
to contain no runtime errors at the source-code level.
As the core team of the Muen SK, we were intrigued by the idea of bringing
Genode to our kernel. In our view, combining Genode with the Muen project
makes perfect sense as it would allow us to leverage the entire OS framework
instead of re-inventing the wheel by implementing yet another user land.
To this end, we met the Genode team in their very cosy office in Dresden.
After a tour of the premises, we got right down to business: Norman gave us a
whirlwind tour of Genode and it was quickly decided that the way forward would
be to run base-hw as a subject on top of Muen. As an intermediate step, we
needed to port base-hw from ARM to Intel x86_64 first.
The Genode team gave us a head start by setting a roadmap and doing the
initial steps of extending the 'create_builddir' tool and adding the
'hw_x86_64' skeleton in a joint coding session. After this productive
workshop, we flew back to Switzerland with a clear picture of how to proceed.
Implementation
~~~~~~~~~~~~~~
We closely followed the roadmap for porting the base-hw kernel to the 64-bit
x86 architecture. The following list discusses the work items in detail,
summarizing the interesting points.
# Assembler startup code
Prior to the addition of our x86_64 port, base-hw was an ARM-only kernel.
Therefore, the boot code for the new platform had to be written from scratch.
Having already written a 64-bit x86 kernel, we were able to reuse its boot
up code pretty much unchanged.
# Memory management/IA-32e paging
Since transitioning to the IA-32e (long) mode requires paging, an initial set
of static page tables is part of the assembler startup code. For dynamic
memory management support however, a C++ implementation for creating IA-32e
paging structures was required. Similar to the startup code, we could draw
from the experiences made when implementing paging in the Muen project. One
minor obstacle was to get reacquainted with the C++ template mechanism.
Aside from that, there were no other issues and the subsequent implementation
was quite straight-forward.
# Assembler mode-switch code
The mode-transition code (MTC) takes care of switching from kernel-
to user-space and back. It consists of architecture-dependent assembly code
accessible to both kernel- and user-land.
A transition from user- to kernel-space occurs either explicitly by the
invocation of a syscall, or when an exception or interrupt occurs. The
mode-transition code saves the current context and restores the kernel state
or vice-versa when returning to user-mode from the kernel. To unify the
exception and syscall code paths on exit, we decided to implement syscall
invocation using the _int 0x80_ method instead of using the _SYSCALL/SYSRET_
machine instructions.
The peculiarities of the x86 architecture needed some attention to detail.
In contrast to ARM, several data structures such as the GDT (Global
Descriptor Table), IDT (Interrupt Descriptor Table) and TSS (Task-State
Segment) are implicitly referenced by the hardware and must be accessible on
entry into the mode-transition code from user-land. Thus, these tables must
be placed in the MTC memory region as otherwise, the hardware would trigger
a page fault.
# Interrupt controller implementation
The interrupt controller handles external interrupts triggered by devices.
After a little detour (see _PIC/PIT detour_ below), we ended up using the
local and I/O APIC for interrupt management. One annoying implementation
detail worth mentioning is the handling of edge-triggered interrupts by the
I/O APIC. As described in the Intel 82093AA I/O Advanced Programmable
Interrupt Controller (IOAPIC) specification, Section 3.4.2, edge-triggered
interrupts are lost if they occur while the mask bit of the corresponding
I/O APIC RTE (Routing Table Entry) is set. Therefore, we chose the pragmatic
approach not to mask edge-sensitive IRQs at all.
The issue of lost IRQs came up when dealing with the user-space PIT
(Programmable Interval Timer): The PIT driver would program the timer with a
short timeout and then unmask the corresponding IRQ line. If the timer fired
prior to completion of the unmask operation, the interrupt would be lost,
which, in turn, resulted in the driver being blocked forever.
# Kernel-timer implementation
The x86 platform provides a variety of timer sources, each of which bringing
its own bag of problems. After switching to the LAPIC for interrupt
management, the obvious choice was to use the LAPIC for the kernel timer as
well. The drawback of this timer is that its frequency must be measured
using a secondary source as reference. Luckily, we were able to reuse the
PIT driver, which resulted from our _PIC/PIT detour_ for this purpose.
# FPU support
To allow user-space code to use floating-point arithmetics, we needed to
handle the state of the x87 FPU. Similar to the ARM code, the FPU state is
saved and restored in a lazy manner, meaning the necessary work is only
performed if the FPU is actually used.
After making a small number of additional adjustments to core, we were able to
successfully execute even elaborate run scripts such as 'run/demo' on the
newly ported x86_64 base-hw kernel.
PIC/PIT detour
--------------
As described in the introduction, porting the base-hw kernel to the Intel
x86_64 architecture is only an intermediate step towards the ultimate goal of
bringing Genode to the Muen platform. To this end, we took a pragmatic
approach with regards to hardware drivers that are required for x86_64 but
will be paravirtualized on Muen. The interrupt controller and kernel timer
fall in this category. Because of simplicity reasons, we initially decided to
use the 8259 Programmable Interrupt Controller (PIC) and the 8253/8254
Programmable Interval Timer (PIT). We quickly had a working implementation but
later became aware that the only currently available Genode user-land timer on
x86 was the PIT. This was obviously a problem because, kernel and user-land
require separate timer sources.
After some discussion, we decided to rewrite the kernel interrupt controller
and timer code to use the LAPIC/IOAPIC. This freed up the PIT for use by the
user-land driver. Since we were able to reuse the PIT code for measuring the
LAPIC timer frequency, the detour was in fact beneficial to stabilize the
final implementation. Additionally, these changes lay the foundation for
future 'hw_x86_64' multiprocessor support.
Taking hw_x86_64 for a spin
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to try out the new 'hw_x86_64' port, perform the following steps:
! tool/create_builddir hw_x86_64
Prepare the ports required by the demo script:
! tool/ports/prepare_port x86emu
Change to the build directory:
! cd build/hw_x86_64/
Note: Make sure to enable the libports repository by editing the
_etc/build.conf_ file.
Finally, fire up the demo script:
! make run/demo
Limitations
~~~~~~~~~~~
The current implementation of the x86_64 base-hw kernel has the following
limitations:
* No dynamic memory discovery: The amount of memory is hard-coded to 256 MiB.
* No 32-bit support
* No SMP support
These are not fundamental restrictions of the base-hw x86_64 port but simply
missing features that can be implemented in the future.
Sentiments
~~~~~~~~~~
Considering that the base-hw kernel was an ARM-only microkernel, the port to
x86_64 went rather smoothly. In our opinion, this is a testament to the
modularity and the good overall design of the kernel. Architecture-specific
code is well encapsulated and the provided abstractions allow the overriding
of functionality at the appropriate level.
An interesting fact worth mentioning is that while emulators such as Qemu and
Bochs are great tools for development, it is important to perform tests on
real hardware as well. Since the hardware is emulated with varying degrees of
accuracy, subtle differences in behavior can go unnoticed. A recurring source
of potential problems is the initial state of memory. Whereas emulators
usually fill unused memory with zeros, on real hardware the content of
uninitialized memory is undefined. So while code that only partially
initializes memory may run without issues on Qemu, it is quite possible that
it simply fails on real hardware.
After finishing the base-hw port to 64-bit x86, we immediately started working
on the Muen port. As a little spoiler, we can report that the run/demo
scenario is already running as a subject on top of the Muen SK. We hope that
it will be part of the next Genode release.
Last but not least, we would like to thank the guys at Genode Labs for their
support and we are eager to see where this fruitful cooperation will take us.
Dynamic thread weights
======================
With the Genode release 14.11, we introduced an entirely
[https://genode.org/documentation/release-notes/14.11#Trading_CPU_time_between_components_using_the_HW_kernel - new scheduler]
in the base-hw kernel that allows for the trading of CPU time between Genode
components. This scheduler knows two parameters for each scheduling context: A
priority that models the urgency for low-latency execution and a quota that
limits the prioritized execution time of a context during one super period.
The user may adjust these parameters according to his demands by the means of
userland configuration. Through configuration files, the inter-component
distribution of priority and quota is configured whereas the
component-internal distribution of computation time is addressed by Genode's
thread API.
However, during the last months, the way of configuring the local distribution
of quota appeared to be not very satisfying for real-world scenarios. To assign
quota to a thread, one had to state a specific percentage of the component
quota at construction time. One disadvantage of this pattern becomes apparent
when looking at the main thread of a component. As the main thread gets
constructed by the component's parent without using the thread API, the
component itself has no means to influence the quota of this thread. The quota
of main threads was therefore always set to zero. Furthermore, a component had
to keep track of previously consumed thread quotas to be able to not violate
the local quota limit when creating new threads.
All this begged for a less rigid way of managing local CPU quota. We came to
the conclusion that a component does not want to manage quota distribution
itself but only the importance of threads in the quota distribution, their
so-called _weight_. This thread weight can be any number greater than zero
regardless of the weights of other threads. It gets translated to a portion of
the local quota by setting it into relation to the sum of all local thread
weights. Consequently, all the assigned quota of a component is distributed
among the local threads according to their weights. There is no slack quota
anymore. However, this implies that the quota of all local threads gets
adjusted each time the constellation of local thread weights changes. That is
when a new thread gets constructed or an existing one gets destructed. So, we
must be able to dynamically reconfigure the quota of a scheduling context -
something the base-hw kernel wasn't aware of hitherto. The new core-restricted
kernel call named 'thread_quota' solves this issue.
But let's get back to the thread API. When not explicitly defined, a thread's
weight is set to 10. So, logically, the main thread of a component always has
the weight of 10. This value initially equips the main thread with all the
quota of the component and should leave enough flexibility when configuring
secondary threads. If the next thread in the component would have the weight
30, the main thread, from that point on, would receive 25% of the quota while
the second thread starts with 75%. Let us go on and add a third thread with
the weight 960. Now, the local quota distribution would be as follows:
Main thread: 1%
Second thread: 3%
Third thread: 96%
Finally, if one of the threads is destructed, its quota logically moves to the
remaining two threads divided according to their weight ratio.
Now, with the comfort of weight-driven quota distribution, there was only the
question left, how to determine the weights reasonably. We had to provide a
way to translate a concrete need of execution time into a local thread weight.
Two things must be known inside a component to do so: The length of a super
period at the scheduler and how much of this super period the components quota
is worth. These two values can now be read via a new CPU-session RPC named
'quota'. The values returned are given in microseconds. However, when using
this instrument, one must consider slight rounding errors that can't be
prevented as the values have to pass up to two independent translations from
the source parameter to the microseconds value.
Revised device-driver infrastructure
####################################
In Genode, core represents the root of the component hierarchy and holds the
reins. This includes possession of system resources not reserved for the
kernel, in particular physical resources like RAM, memory-mapped I/O regions,
I/O ports, and IRQs. Access to resources is gained via session requests, e.g.,
an IO_PORT session permits access to a dedicated region of x86 I/O ports. Core
itself does not define any policy on these resources other than starting its
only child component init, which is qualified to allocate specific resources
via dedicated sessions to core. In turn, init employs a configured system
policy and bootstraps additional system components. From the physical
resources, init manages memory effectively by applying quota restrictions to
RAM sessions. It does not further differentiate I/O resources besides routing
session requests to the rather abstract services for IRQ, IO_MEM, and IO_PORT.
On the other side, device-driver components wish to access registers or drive
DMA transfers for specific devices only. What was missing up to now, was the
notion of a _device_ including its I/O resources or role as DMA actuator.
Motivated by enabling message-signalled interrupt (MSI) support on x86
platforms, we addressed several shortcomings and revised our device-driver
infrastructure. First, we noticed that while our ACPI driver (acpi_drv) did a
proper job with parsing ACPI tables relevant for IRQ remapping, polarity, and
trigger information, it did not apply any useful policy. The gathered
information was only propagated to the PCI driver (pci_drv, started as a child
component) by writing the IRQ remapping information into the PCI configuration
space of the devices. Though, pci_drv provided the PCI session and thereby
access to dedicated PCI devices, it did not apply device-specific policies
either. The PCI session was merely used by device drivers to retrieve
information about I/O resources, but the session request for the actual
resources was directed to the driver's parent (and routed to core in most
cases). Further, the PCI driver was in charge to allocate DMA-able memory on
behalf of the device driver. This enabled transparent support for IOMMUs on
NOVA, but also lacked proper quota donation. Last, we identified that the
current implementation of handling shared IRQs in core completely contradicted
with our goal of transparently handling interrupts as legacy IRQs or MSIs
depending on the capabilities of the device as well as the kernel platform.
At the end of our survey, we eagerly longed for real I/O resource management
in a central component, which provides the notion of a device. I/O resources
are assigned to those devices from the pool of abstract resources available
from core, e.g., dedicated IO_MEM dataspaces for regions of a PCI device. The
approach is not completely new in Genode when looking at certain ARM
platforms, where we have had a platform driver (platform_drv) for quite some
time. Now, we want to generalize this approach to fit both dynamic discovery
(e.g., for the PCI bus) and configuration (e.g., specific ARM SoCs or legacy
devices on PCs). Also, the configuration is expected to support the expression
of policy to restrict device drivers to access designated device resources
only.
The first working step to tackle the issue was to make the IRQ resource
available per device within the PCI driver. Until now, core implemented the
handling of IRQs per platform differently. On some platforms, namely x86, it
had support for shared IRQs, while other platforms got along without this
special feature. The biggest stumbling block was actually the synchronous RPC
interface 'wait_for_irq()', which forced a driver to issue a blocking IPC to
core to wait for IRQs. We simply disposed this relict of the early L4 times
and changed the IRQ session interface to employ asynchronous IRQ notifications
on all Genode platforms. For that reason, we had to adapt the various core
implementations, the platform drivers, and all device drivers. We refactored a
generalized shared IRQ implementation on x86 and then, moved it from core to
the PCI driver, which will become our platform_drv for x86 in a future step.
After we adapted all x86 drivers to request the IRQ session capability from
the PCI driver, and completed a thorough testing phase of shared IRQ handling,
we finally removed the shared IRQ support from core on all Genode platforms.
Next, we tackled the issue to transform the previous PCI session into an x86
platform session (although it is still called PCI session). The platform
session bundles I/O resources of one or more devices per client. Policies
define, which of the physical devices are actually visible and are
discoverable by clients. A client discovers devices either by explicitly
naming the device, e.g. for non PCI devices like the PS/2 controller, or by
iterating over a virtual PCI bus as defined by the policy. Besides device
discovery, a platform session is used for allocating DMA buffers. So, the
platform driver can take care of associating DMA memory regions with physical
devices, which is required as soon as IOMMUs are used by the underlying
kernel.
The result of a successful device discovery is a device capability, which
serves as the key to get access to device-specific resources like IO_MEM,
IO_PORT, and IRQs. The RPC interface provides functions to request dedicated
resource capabilities, which are of the types Io_mem_session_capability,
Io_port_session_capability, and Irq_session_capability.
If the device capability represents a PCI device, the IO_PORT and IO_MEM
resources are discovered by the platform driver by parsing the BARs in the PCI
configuration space. On behalf of the client, the platform driver establishes
the I/O resource sessions to core. For non-PCI devices, a device-specific
implementation is required. For now, only the PS/2 device is supported, which
bundles two IRQ sessions for mouse and keyboard as well as the well-known I/O
ports. The IRQ resources for PCI devices are handled differently. First, the
platform driver parses the PCI config space of a device to detect whether this
device is capable of MSIs. If so, the platform driver tries to open an IRQ
session at core, which succeeds on kernels supporting this feature, namely
Fiasco.OC and NOVA. On kernels lacking MSI support, the request will fail and
the platform driver falls back to allocate legacy IRQs, which are all treated
as shared. In either case, the driver does not need to handle the IRQ/MSI
cases separately as these are handled by the platform driver transparently.
The policy is provided by 'policy' entries in the config ROM of the pci_drv.
An entry corresponds to a virtual bus containing the listed devices, which is
accessible by drivers with the label configured in the 'label' attribute. PCI
devices are named by a 'pci' entry either explicitly by the attribute triple
'bus', 'device', 'function'
!<policy label="usb_drv">
! <pci bus="0" device="19" function="0"/>
! <pci bus="0" device="5" function="0"/>
!</policy>
or by a device class alias
!<policy label="usb_drv"> <pci class="USB"/> </policy>
In the first example, the USB driver gets access to two devices, e.g., the
xHCI and EHCI controller. This explicit approach is useful if the target
machine and the PCI bus hierarchy are known and security is a concern. Later,
a dynamic device-manager component could update the config at runtime
according to a device-discovery report of the platform driver. The second
option can be used when switching often between machines during development or
when the target machine is unknown in advance. The downside of the flexibility
is that a device driver may get access to devices it can't or should not
drive. For example in a router scenario, the inner network driver should only
drive the inner NIC while the outer driver gains access to the outer network.
Both components would then be connected by a secure routing component only.
Further classes are available and are extended as needed - please consult the
README of the platform driver for a list.
When the ACPI driver is used for Fiasco.OC, NOVA, and base-hw on x86, the
configuration for the PCI driver is constructed out of the ACPI config XML
node. Additionally, an explicit policy entry for the ACPI driver is required,
which permits rewriting potentially all legacy IRQ numbers for PCI devices as
discovered during IRQ-remapping-table parsing.
!<start name="acpi_drv">
! ...
! <config>
! <policy label="acpi_drv">
! <pci class="ALL"/>
! </policy>
! <policy label="usb_drv">
! <pci class="USB"/>
! </policy>
! </config>
!</start>
If, for some reason, MSIs should or can not be used, support may be disabled
explicitly by setting the 'irq_mode' attribute to 'nomsi' in the policy XML
node.
!<policy label="usb_drv" irq_mode="nomsi">
The configuration of a non-PCI device is described by a 'device' entry in the
policy.
!<policy label="ps_drv"> <device name="PS2"/> </policy>
With the changes described above, the platform driver is now in the position
to hand out solely those devices to drivers, which are explicitly permitted.
Furthermore, the platform driver can transparently discover I/O resources and
set up the appropriate interrupt scheme for devices, which removes this burden
from the device-driver developer.
The next steps in this direction are to co-locate and consolidate the PCI and
ACPI drivers into the platform driver as done partially for some ARM-based
platforms already. Then, the implementation should be generalized to comprise
ARM platforms too, which includes the configuration, the usage of the
regulator session, and the enforcement of policies per device.
Base framework and low-level OS infrastructure
##############################################
API refinements
===============
Our documentation efforts as mentioned in Section
[Comprehensive architectural documentation] provided the right incentive to
revisit the Genode API with the goal to reach API stability over the next
year. This section summarizes the API changes that may affect developers
using the framework.
:Semaphore simplification:
The semaphore at _base/semaphore.h_ used to be a template, which took the
queueing policy as argument. There was a reasonable default, which took a
FIFO queue as policy. Since we introduced the semaphore in 2006, we never
used a different queueing policy. So this degree of flexibility smells like
over-engineering. Hence, we cut it back by hard-wiring the FIFO policy in
the semaphore.
:Moving the packet stream and ring buffer into the Genode namespace:
The packet-stream utilities provided by _os/packet_stream.h_ provide the
common code to realize the transfer of bulk data between components in an
asynchronous fashion. It is used by several session interfaces such as the
NIC session, file-system session, and block session. Until now, however,
the utilities used to reside in the root namespace. Now, we have rectified
this glitch by moving them to the Genode namespace. We did the same for
the commonly used ring-buffer utility provided by _os/ring_buffer.h_.
:Moving 'Xml_node::Attribute' to 'Xml_attribute':
The XML parser used to represent XML attributes with the nested
'Xml_node::Attribute' class. However, the use of non-trivial nested classes
at API level tends to be confusing and difficult to document. Hence, we
decided to promote 'Xml_node::Attribute' to a dedicated top-level class.
:Unification of text-to-data conversion functions:
Until now, the set of functions to extract information from text strings has
grown rather evolutionary. It became a somehow weird mix of function
templates, overloads, and default arguments. To make the Genode API easier
to understand, we longed for a simple and more coherent concept. For this
reason, we changed the 'ascii_to' functionality of _util/string.h_ in two
ways.
First, each 'ascii_to' function has become a plain overloaded function - not
a kind of template specialization of a function-template signature. In some
cases, it may actually be a template, but only if the result type is a
template.
Second, the "base" argument has been be discarded. It was used to parse
numbers with different integer bases (like 16 for hexadecimal numbers). For
most types, however, the base argument made not much sense. For this reason,
the argument was mostly ignored. Now, the official way to extract integers
of different bases would be the introduction of dedicated types similar to
the existing 'Number_of_bytes' type.
Support for GPT partitions
==========================
The old-fashioned MBR partition table is on its way out. Its successor, the
GUID partition table (GPT), is increasingly used on recent systems. On some,
namely the ones featuring UEFI firmware without legacy boot support, it is the
only available option. Therefore, we have extended the 'part_blk' server by
adding rudimentary support for GPT so that we are able to use Genode on such
systems.
The support is enabled by configuring 'part_blk' accordingly:
! <start name="part_blk">
! [...]
! <config use_gpt="yes">
! [...]
! </start>
It will fall back to trying to use the MBR if it does not find a valid GPT
header.
The current implementation is limited in the following respects. For one, no
endian conversion takes place and it therefore only works on little-endian
platforms. This poses no problem because, for now, Genode does not run on any
big-endian platform anyway. Furthermore, as the GPT specification defines, the
content of the name field is encoded in UTF-16 but 'part_blk' will only
extract valid ASCII-encoded characters. It also ignores all GPE attributes.
Network-link state-change handling
==================================
We extended the NIC session interface with the ability to notify its client
about changes in the link-state of the session. Adding this mechanism was
motivated by the need for requesting new network configuration settings, e.g.,
IP and gateway addresses, when changing the location and switching the
network.
A NIC-session client can now install a signal handler that is called when the
link-state changes. After receiving the signal, the client may query the
current state by executing the 'link_state()' RPC function. In addition, the
NIC driver interface now provides a notification-callback method that is used
to forward link-state changes from the driver to the 'Nic::Session_component'.
The lwIP TCP/IP stack was adapted to that feature and always tries to acquire
new network settings via DHCP when the link state changes.
The following drivers now report link-state changes: dde_ipxe, nic_bridge, and
usb_drv. On the other hand, OpenVPN, Linux nic_drv, and the lan9118 driver do
not support it and always report the link-up state.
File-system utilities
=====================
When we introduced Genode's file-system session interface in
[https://genode.org/documentation/release-notes/12.05#New_file-system_infrastructure - version 12.05],
it was accompanied with a RAM file system as the first implementation. Since
then, a growing number of file-system services were developed, which took the
RAM file system as blue print. Over the years, this practice resulted in the
duplication of the utilities that were found worthwhile to reuse. The upcoming
addition of a new 9P file-system service prompted us to make those utilities
part of the public API, located at _os/include/file_system/_.
Device drivers
##############
New AHCI driver with support for native command queueing
========================================================
With Genode 15.05, we completely revised our AHCI driver in order to overcome
some severe limitations of the previous implementation. Specifically, we
desired support for multiple devices per controller, handle block requests
asynchronously, and consolidate the Exynos5 and the x86 code to enable code
sharing of the AHCI-specific features. We also wanted to improve the driver
performance by taking advantage of modern features like native command
queuing.
In order to achieve these goals, we implemented a generic AHCI driver by
taking advantage of Genode's MMIO framework. The code is shared between x86
and the Exynos5 platform. Additionally, we introduced a 'Platform_hba' class
that takes care of platform-specific initialisation and platform-dependent
functions, like the allocation of DMA memory or the handling of the PCI bus on
x86 platforms.
For supporting multiple devices, we extended Genode's block component by a
root component with multiple-session support. Sessions are routed much like it
is done for our partition server (part_blk) by using 'policy' XML nodes (see
the README file under _repos/os/src/drivers/ahci_).
Since version 15.02, Genode's block component offers support for asynchronous
block requests. The AHCI driver takes full advantage of this interface by
using native-command queuing (NCQ). NCQ allows up to 32 read/write requests to
be executed in parallel. Please note that requests may be processed out of
order because NCQ is implemented on the device side, giving the device vendor
the opportunity to optimize seek times for hard disks. With NCQ support and
asynchronous request processing in place, the driver is able to achieve a
performance that is on par with modern Linux drivers. We measured a throughput
of 75 MB/s for HDDs and 180 MB/s for SSDs when issuing sequential 4 KB
requests.
Feature-wise our AHCI driver offers read/write support for hard disks (HDDs or
SSDs) and experimental read-only support for ATAPI devices (CDROM, DVD, or
Blu-ray devices).
Multi-touch support
===================
One motivation to upgrade VirtualBox 4.3 with the Genode release 14.11 was to
use the multi-touch feature of Windows guests. With this release, we took the
opportunity to investigate and enable the feature using the multi-touch
capable Wacom USB driver introduced with release 15.02.
The first step was to capture the multi-touch input events in our USB port and
extend the input back end to propagate the information via Genode's input
session. We extended the input interface of Genode by a new event type "TOUCH"
(class Input::Event), which stores the absolute coordinates of a touch event
as well as the identifier of the touch contact. Each finger at a time on the
touch screen is represented as a contact with such a number/identifier.
Nitpicker, nit_fb and the window manager propagate this new type of event to
clients, which may process them if capable, as is the case for VirtualBox.
Finally, we extended the input back end of our VirtualBox port to process
Genode's input touch events so that the USB models in VirtualBox can utilize
them.
To enable the propagation of multi-touch events, the USB driver must be
configured explicitly by setting a "multitouch" attribute to "yes":
!<start name="usb_drv">
! ...
! <config uhci=... ohci=... xhci=...>
! <hid>
! <touchscreen width="1024" height="768" multitouch="yes"/>
! </hid>
! ...
!</start>
To be able to use the multi-touch feature in VirtualBox, make sure to enable a
USB controller model and a USB multi-touch capable device model in your VM
configuration (.vbox file):
!<VirtualBox ...>
! <Machine ...>
! <Hardware ...>
! <HID Pointing="USBMultiTouch" Keyboard="USBKeyboard"/>
! </Hardware>
! ...
! <USB>
! <Controllers>
! <Controller name="OHCI" type="OHCI"/>
! </Controllers>
! </USB>
! <Machine>
! ...
!</VirtualBox>
Audio drivers ported from OpenBSD
=================================
A few years back, we ported OSSv4 to Genode to account for the need of playing
audio on Genode. It worked fine on a handful of sound cards but unfortunately,
it did not work well on more recent Intel HD Audio devices. Though that
shortcoming was more a problem of our own port than of OSSv4 itself, we
decided to replace it rather than trying to fix the port. The rationale behind
this decision is the uncertain future of the OSSv4 project. A driver with an
active upstream development is certainly preferable.
By now, we gained a solid experience in porting drivers from other OSs and
developed a best practice that served us well. In the past, we mostly chose
Linux as driver donor. But this time, we went in another direction and picked
OpenBSD. One of the reasons for favouring it is its comprehensive
documentation that helped a lot in implementing the APIs. There is normally
one interface for a specific task used throughout all drivers whereas, on
Linux, several interfaces and different drivers tend to use the interface that
was popular at the time of their creation. We found the perceived code hygiene
noticeably higher on OpenBSD than on Linux.
Since porting a driver from a foreign OS involves picking the right layer to
extract the driver, we took a closer look at the overall audio architecture of
OpenBSD. At the highest level, it uses the sndio(7) interface. A user-land
daemon _sndiod(1)_ performs stream mixing, format conversion, exposes virtual
devices to its clients, and controls the actual audio device provided in the
form of the audio(4) device-independent driver layer. This layer abstracts the
particular audio-device driver. It provides device-agnostic means to configure
the device and to control the mixer. The device driver plugs into the audio(9)
kernel interface.
Genode contains its own user-land server/client audio interface, namely the
Audio_out session. Therefore, we dismissed the use of the sndio(7) interface
because it would involve porting _sndiod(1)_ as well as changing all our audio
clients. Merely porting the device driver and using the audio(9) kernel
interface directly would have given us the most flexibility indeed but we
would have been in charge of setting up the environment, e.g., DMA buffers
etc., for the device driver. The audio(4) subsystem, on the other hand, does
all this already and provides us with the common device interface, i.e.,
read(2), write(2), and ioctl(2). On these grounds, the audio(4) layer was
selected as the porting target.
The ported drivers are located in _repos/dde_bsd/_. The driver back end
resides in the form of library in _repos/dde_bsd/src/lib/audio_ whereas the
driver front end providing the Audio_out session is placed at
_repos/dde_bsd/src/drivers/audio_out_. As we did previously with other ported
drivers, we created an emulation header, in this case called _bsd_emul.h_ that
contains all needed definitions and data structures. All vanilla OpenBSD
source files are scanned and symlinks, named after the header files in the
include directives, are created. Each symlink points to the emulation header.
After that, the needed functionality is implemented. Since OpenBSD uses a
rather static approach on how the kernel is configured, i.e., which subsystems
and drivers are included, we needed to provide the parts required by the
autoconf(9) framework. Basically, we provide the config data structure that
contains the drivers (the audio subsystem as well as the audio device drivers)
and implemented some other functionality that normally would be generated by
the config mechanism in vanilla OpenBSD (see
_repos/dde_bsd/src/lib/audio/bsd_emul.c_). The rest of the implementation,
including the memory management and IRQ handling, turned out to be straight
forward.
In addition, the back end also implements the functions declared in the
private 'Audio' namespace (see _repos/dde_bsd/include/audio/audio.h_ and
_repos/dde_bsd/src/lib/audio/driver.cc_). The front end exclusively calls
these functions and has no knowledge of the driver back end ported from
OpenBSD. In this respect, these functions encapsulate the interface exposed by
the audio(4) interface. To play the content of a packet received via the
'Audio_out' session, the front end will simply call 'Audio::play()'. This
function internally calls 'audiowrite()' after preparing the needed 'struct
uio' argument by this function. 'audiowrite()' is called in a non-blocking
fashion. This is necessary because the audio-out driver operates as
single-threaded event-driven process. If it blocked, it could not handle IRQs
generated by the audio device. Last but not least, the write function copies
the samples into the DMA buffer and calls the device driver to trigger the
playback. After a block from the DMA buffer has been played, the audio device
will generate an interrupt, which will poke the front end. The front end
responds by requesting the playback of the next audio packet.
The driver currently supports Intel HD Audio (Azalia) and Ensoniq AudioPCI
(ES1370) compatible audio devices and is based on OpenBSD 5.7. It can be
tested by executing the run script _repos/dde_bsd/run/audio_out.run_. This run
script needs a sample file. Please refer to _repos/dde_bsd/README_ for the
instructions on how to create such a file.
SD-card drivers for i.MX53 and Raspberry Pi
===========================================
We improved the generic SD-card protocol implementation with the ability
to handle the version 1.0 of the CSD register, which contains the capacity
information of older SD cards.
At _os/src/drivers/sd_card/rpi_, there is a new driver for the SDHCI
controller as featured on the Raspberry Pi. As of now, the driver operates in
PIO mode only. Depending on the block size (512 bytes versus 128 KiB), it has
a throughput of 2 MiB/sec - 10 MiB/sec for reading and 173 KiB/sec - 8 MiB/sec
for writing.
At _os/src/drivers/sd_card/imx53_, there is a new driver for the Freescale
eSDHCv2 SD-card controller as used on the USB Armory platform. The
configuration of the highest available bus frequency and bus width is still
open for further optimization.
Board support for i.MX6-based Wandboard
=======================================
The increasing interest in the combination of Genode and the Freescale i.MX6
SoC motivated us to add official support for a board based on this SoC
to our custom kernel. We settled on the
[https://www.wandboard.org/ - Wandboard Quad] that was developed on a volunteer
basis. Thanks to Praveen Srinivas (IIT Madras, India) and Nikolay Golikov
(Ksys Labs LLC, Russia) who contributed their work on i.MX6. The Wandboard
Quad features 2 GiB of DDR3 RAM and a quad-core Cortex-A9 CPU. So, unlike when
porting i.MX53, our existing kernel drivers for the Cortex-A9 private
peripherals, namely the core-local timer and the ARM Generic Interrupt
Controller could be reused.
Although the board even supports SMP and the ARM Security Extensions, we don't
make use of these advanced features yet. However, our port is intended to
serve as a starting point for further development in these directions.
To create a build directory for Genode running on Wandboard Quad, use the
following command:
! ./tool/create_builddir hw_wand_quad
USB device-list report
======================
The USB driver has become able to generate a report with a list of all
currently connected devices, which gets updated when devices are added or
removed. This information can be useful to decide if and when a USB session
for a specific device should be opened or closed.
An example report looks as follows:
!<devices>
! <device vendor_id="0x17ef" product_id="0x4816"/>
! <device vendor_id="0x0a5c" product_id="0x217f"/>
! <device vendor_id="0x8087" product_id="0x0020"/>
! <device vendor_id="0x8087" product_id="0x0020"/>
! <device vendor_id="0x1d6b" product_id="0x0002"/>
! <device vendor_id="0x1d6b" product_id="0x0002"/>
!</devices>
The report is named 'devices' and an example policy for the report_rom
component would look like:
!<policy label="vbox -> usb_devices" report="usb_drv -> devices"/>
The report gets generated only when enabled in the configuration of the USB
driver:
!<config>
! <raw>
! <report devices="yes"/>
! </raw>
!</config>
There is no distinction yet for multiple devices of the same type.
Runtime environments
####################
VirtualBox on NOVA
==================
As with the previous releases, we continuously improved our version of
VirtualBox running on top of the NOVA microhypervisor.
Video Acceleration (VBVA)
~~~~~~~~~~~~~~~~~~~~~~~~~
We enabled the "VirtualBox Graphics Adapter" device model, which improves the
performance of screen-region updates in comparison to the standard VGA adapter
device model, and which allows the integration of the guest mouse pointer with
the nitpicker GUI server. The mouse pointer integration has been realized in
two steps. First, we extended VirtualBox to generate a "shape" report with the
detailed information about the mouse pointer shape. The counterpart is a
specialized vbox_pointer application, which receives the shape report as ROM
file (provided by the report_rom component) and draws the mouse pointer
accordingly when a nitpicker view related to VirtualBox is hovered.
USB-device pass-through support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the availability of the
[https://genode.org/documentation/release-notes/15.02#USB_session_interface - USB session interface]
and the new [USB device-list report] feature of the USB driver, it is now
possible to pass a selection of raw USB devices directly to VirtualBox guests.
VirtualBox obtains the list of available USB devices from a ROM module named
'usb_devices', which can be connected to the USB driver's device-list report
using the report_rom component with a policy as follows:
!<policy label="vbox -> usb_devices" report="usb_drv -> devices"/>
The devices to be passed-through need to have a matching device filter in the
VirtualBox configuration file ('*.vbox'). For example:
!<USB>
! <Controllers>
! <Controller name="OHCI" type="OHCI"/>
! </Controllers>
! <DeviceFilters>
! <DeviceFilter name="USB Scanner" active="true" vendorId="04a9"
! productId="2220" remote="0"/>
! </DeviceFilters>
!</USB>
The feature was successfully tested with HID devices (mouse, keyboard) and a
flatbed scanner. Mass storage devices are known to have problems, though we
also observed these problems with VirtualBox on Linux without the
closed-source extension pack.
When using this feature, it should be made sure that the USB driver itself
does not try to control the devices to be passed to VirtualBox. For example,
when passing-through a HID device, the '<hid/>' config option of the USB
driver should not be set.
Platforms
#########
Proof-of-concept support for the seL4 kernel
============================================
Since last summer when the [https://sel4.systems - seL4 kernel] was released
under the General Public License, we entertained the idea to run Genode on
this kernel. As the name suggests, the seL4 kernel is a member of the L4
family of kernels. But there are two things that set this kernel apart from
all the other family members. First, with the removal of the kernel memory
management from the kernel, it solves a fundamental robustness and security
issue that plagues all other L4 kernels so far. This alone would be reason
enough to embrace seL4. Second, seL4 is the world's first OS kernel that is
formally proven to be correct. That means, it is void of implementation bugs.
This makes the kernel extremely valuable in application areas that highly
depend on the correctness of the kernel.
Since last autumn, we conducted the port of Genode to the seL4 kernel as
background activity. We took the chance to thoroughly document our experience
by the following series of articles:
:[https://genode.org/documentation/articles/sel4_part_1 - Building a simple root task from scratch]:
The first article describes the integration of the kernel code with Genode's
source tree and the steps taken to create a minimalistic root task that runs
on the kernel. It is full of hands-on information about the methodology of
such a porting effort and describes the experience with using the kernel
from the perspective of someone with no prior association with the seL4
project.
:[https://genode.org/documentation/articles/sel4_part_2 - IPC and virtual memory]:
The second part of the article series examines the seL4 kernel interface
with respect to synchronous inter-process communication and the management
of virtual memory.
:[https://genode.org/documentation/articles/sel4_part_3 - Porting the core component]:
The third article presents the steps taken to bring Genode's core and init
components to life. Among the covered topics are the memory and capability
management, inter-component communication, and page-fault handling. The
article closes with a state of development that principally enables simple
Genode scenarios to run on seL4.
With the current release, we have integrated the intermediate result into the
mainline Genode source tree. At the time of the release, Genode's core and
init components are running, and init is able launch further child components
such as simple test programs. Still, the current level of seL4 support should
be understood as a proof of concept and is still riddled with several interim
solutions and shortcomings. Please refer to the third article linked above for
the details. Functionality-wise the most glaring gap is the unimplemented
support for user-level device drivers, which rules out most of the meaningful
Genode scenarios for the time being. Still, the current version shows that the
combination of seL4 and Genode is viable.
To give Genode a quick spin on the seL4 kernel, you may take the following
steps:
# Download the seL4 kernel
!./tool/ports/prepare_port sel4
# Create a Genode build directory for seL4:
!./tool/create_builddir sel4_x86_32
# Change to the build directory and start the _base/run/printf.run_ script:
!cd build/sel4_x86_32
!make run/printf
After compiling the Genode components (init, core, and test-printf), the run
script will build the kernel, integrate a boot image, and run the image inside
Qemu. You will be greeted with the output of the test-printf program, which
demonstrates that core, init, and test-printf are running (each in a different
protection domain) and that the components can interact with each other by the
means of capability invocations.
NOVA kernel mechanism for asynchronous notifications
====================================================
The vanilla NOVA kernel provides asynchronous signalling by the means of
semaphores. This mechanism offers a way to transfer one bit information from a
sender to one receiver at a time. So a thread may block by issuing a "down"
operation on a semaphore and wakes up as soon as the sender issues an "up"
operation. However, Genode's signal abstraction for asynchronous notification
requires that a receiver may potentially receive from multiple sources at a
time, which rendered this kernel feature unusable to be directly used by
Genode's signal framework.
Instead, for base-nova, the signalling phase was implemented as a indirection
over core for each Genode signal that got submitted. After an initial
registration at core to ask for incoming signals, a receiver block in its own
address space on a per-thread semaphore until a signal becomes available. The
signalling phase looked like that:
# A signal source (thread) generates a Genode signal by sending a synchronous
message via an RPC to core,
# Core notifies the receiver asynchronously via a kernel semaphore "up"
operation,
# The receiver's blocking IPC returns.
The context information about the signal is delivered with the IPC reply.
Besides all the book keeping in core, this approach requires at least 4
inter-address-space context switches. Ideally, this could be just one context
switch with a proper kernel mechanism in place.
On the course of updating the platform driver and the redesign of Genode's IRQ
session interface to operate asynchronously across all supported kernels, we
took the chance to extend the NOVA kernel to meet Genode's needs more closely.
We extended the NOVA kernel semaphores to support signalling via chained
semaphores. This extension enables the creation of kernel semaphores with a
per-semaphore value, which can be bound to another kernel semaphore. Each
bound semaphore corresponds to a Genode signal context. The per-semaphore
value is used to distinguish different sources of signals. Now, a signal
sender issues a _submit_ operation on a Genode signal capability via a regular
_semaphore-up_ syscall on NOVA. If the kernel detects that the used semaphore
is chained to another semaphore, the up operation is delegated to the chained
one. If a thread is blocked, it gets woken up directly and the per-semaphore
value of the bound semaphore gets delivered. In case no thread is currently
blocked, the signal is stored and delivered as soon as a thread issues the
next _semaphore-down_ operation.
Chaining semaphores is an operation that is limited to a single level, which
avoids attacks targeting endless loops in the kernel. The creation of such
signals can solely be performed if the issuer has a NOVA PD capability with
the semaphore-create permission set. On Genode, this effectively reserves the
operation to core. Furthermore, our solution upholds the invariant of the
original NOVA kernel that a thread may be blocked in only one semaphore at a
time. This makes our extension non-invasive and easily maintainable.
We applied the same principle to the delivery of interrupts by the NOVA
kernel, which corresponds to a _semaphore up_ operation. With minor changes,
we have become able to deliver interrupts as ordinary Genode signals. The main
benefits are a vastly simplified IRQ-session implementation in core and the
alleviation of the need for one thread per interrupt. The interrupt gets
directly delivered to the address space of the driver (MSI), or in case of a
shared interrupt, to the PCI driver.
Tool chain and build system
###########################
The tool chain has been updated to Binutils version 2.25 and GCC version 4.9.2.
This update comprises both the cross tool chain running on Linux as
development environment and the tool chain running within Genode's Noux
runtime environment.
To use Genode 15.05, please obtain and install the new binary version of the
tool chain available at [https://genode.org/download/tool-chain] or build it
manually via the _tool/tool_chain_ script.
Removal of deprecated features
##############################
The following parts have been pruned from the Genode source tree:
* We declared the support for Qt4 as deprecated in 2013. Since we switched
to Qt version 5 on Genode long ago, we finally removed the
_repos/qt4/_ repository.
* The _repos/base-host/_ repository was originally envisioned to be the ideal
place to document the framework-internal interfaces between the
kernel-agnostic and kernel-specific parts of the framework. It was
meant to provide mere stub functions that enable the compilation of
Genode-API-compliant code directly using the host compiler. However, it
remained an obscurity. Since it is neither used nor regularly tested, we
decided to remove it.
* The GTA01 platform support was originally added in 2006 to run Genode
on the Gamepark GP2x handheld console. The code remained unused and
unmaintained for several years.
* The original ATAPI driver is superseded by our new AHCI driver, which
principally also supports ATAPI devices. However, IDE support has been
dropped as it is not relevant on our current-day target platforms.
* The demo device driver (D3M) was created for the OKL4-based live system
released in 2010. Since then, it was in irregular use for a few
demonstration scenarios but has never evolved into a fully-fledged driver
manager. Since all of D3M's functionality except for the probing of boot
media is covered by a combination of other components, we decided to remove
D3M.
* The _linux_drivers_ repository hosted device drivers ported via the
original DDE-Linux approach. We
[https://genode.org/documentation/release-notes/12.05#Re-approaching_the_Linux_device-driver_environment - disregarded this approach]
in 2012. The only remaining code worth keeping is the i915 GPU driver, which
will potentially re-appear in our modern _repos/dde_linux_ repository.
* The _repos/dde_oss_ was an experiment to run the audio drivers of the
OSS project directly on Genode. Unfortunately, the contained Intel HD Audio
driver did not work on any Thinkpad models newer than T60. With the current
release, this repository is superseded by the _repos/dde_bsd_ repository.