mirror of
https://github.com/genodelabs/genode.git
synced 2025-01-25 05:46:57 +00:00
1041 lines
52 KiB
Plaintext
1041 lines
52 KiB
Plaintext
|
|
||
|
|
||
|
===========================================
|
||
|
Genode on seL4 - Porting the core component
|
||
|
===========================================
|
||
|
|
||
|
|
||
|
Norman Feske
|
||
|
|
||
|
|
||
|
This is the third part of a series of hands-on articles about bringing Genode
|
||
|
to the seL4 kernel.
|
||
|
[http://genode.org/documentation/articles/sel4_part_1 - Read the first part here...]
|
||
|
[http://genode.org/documentation/articles/sel4_part_2 - Read the second part here...]
|
||
|
|
||
|
After exercising the seL4 kernel interface using a small toy root task, it
|
||
|
is time to run Genode's core component as seL4 root task.
|
||
|
We face the following challenges:
|
||
|
* Building, linking, and running a skeleton of core on the seL4 kernel,
|
||
|
* Initializing core's allocators,
|
||
|
* Organizing core's capability space (CSpace),
|
||
|
* Realizing core-local memory mappings,
|
||
|
* Initializing core's services while complementing the framework's base
|
||
|
libraries such as the IPC marshalling and thread synchronization mechanisms,
|
||
|
* Finding a way to integrate multiple boot modules into core's ROM service,
|
||
|
* Bootstrapping the init component.
|
||
|
|
||
|
|
||
|
A skeleton of the core component
|
||
|
################################
|
||
|
|
||
|
Core is the root of Genode's component tree. On seL4 it will play the role
|
||
|
of the seL4 root task.
|
||
|
The seL4-specific parts of core will reside at _repos/base-sel4/src/core/_
|
||
|
whereas the kernel-agnostic parts are located at _repos/base/src/core/_.
|
||
|
|
||
|
It is useful to take one of the existing variants of core as a starting point.
|
||
|
With less than 800 lines of code, the variant for the Codezero kernel
|
||
|
(as _repos/base-codezero/src/core/_) is the least complex one. The skeleton
|
||
|
for seL4 can be derived from the Codezero version by mirroring the files
|
||
|
and reducing them to a bare minimum. While doing the latter, we already revisit
|
||
|
the core-internal API that is relevant for our porting work. While stripping
|
||
|
down the files, we replace the bodies of all methods and functions by the
|
||
|
statement 'PDBG("not implemented")'. The PDBG macro will nicely print the name
|
||
|
of the function or method when called.
|
||
|
|
||
|
:_core_rm_session.cc_ and _include/core_rm_session.h_:
|
||
|
The supplements for the core-specific RM session that is used to manage the
|
||
|
virtual address space of core. I.e., it hosts the implementation of the
|
||
|
'attach' method, which makes a dataspace visible within core.
|
||
|
|
||
|
:_thread_start.cc_:
|
||
|
The core-specific parts of the 'Thread' API. All components on top of
|
||
|
core create threads by using core's services. But for core's local
|
||
|
threads (which are needed to implement core's services), those services
|
||
|
are unavailable. So the functions 'start'. 'cancel_blocking', and
|
||
|
'_thread_start', '_init_platform_thread' are implemented differently
|
||
|
for core.
|
||
|
|
||
|
:_rm_session_support.cc_:
|
||
|
The way of how a virtual memory page is unmapped from a virtual address
|
||
|
space differs from kernel to kernel. This file hosts the kernel-specific
|
||
|
implementation of 'Rm_client::unmap'.
|
||
|
|
||
|
:_ram_session_support.cc_:
|
||
|
Each RAM dataspace is cleared when allocated. In order to initialize
|
||
|
its content with zeros, core has to temporarily make the dataspace
|
||
|
visible within its virtual address space. The exact procedure depends
|
||
|
on the used kernel and is expressed by the '_clear_ds' method.
|
||
|
|
||
|
:_include/platform_thread.h_ and _platform_thread.cc_:
|
||
|
The 'Platform_thread' class contains the functionality for creating and
|
||
|
manipulating threads. It is the back end of core's CPU service.
|
||
|
|
||
|
:_include/platform_pd.h_ and _platform_pd.cc_:
|
||
|
The 'Platform_pd' class represents a protection domain on the specific
|
||
|
kernel. It is the back end of core's PD service and provides an interface
|
||
|
to associate threads with their protection domain, and to flush parts
|
||
|
of the protection domain's virtual address space.
|
||
|
|
||
|
:_include/platform.h_ and _platform.cc_:
|
||
|
The 'Platform' class provides a core-internal interface for accessing
|
||
|
platform resources, i.e., physical memory, core-virtual memory,
|
||
|
I/O ports, memory-mapped I/O resources, and boot-time ROM modules.
|
||
|
The 'Platform' constructor performs the root-task initialization.
|
||
|
|
||
|
:_irq_session_component.cc_:
|
||
|
This file contains the kernel-specific way of waiting for device
|
||
|
interrupts and acknowledging them.
|
||
|
|
||
|
:_cpu_session_support.cc_:
|
||
|
This file contains a function to return a user-level thread control
|
||
|
block (aka IPC buffer) of a given thread as RAM dataspace.
|
||
|
|
||
|
:_io_mem_session_support.cc_:
|
||
|
This file contains the implementation of the core-local interface
|
||
|
for making memory-mapped I/O resources available to core. It is
|
||
|
only needed on kernels that employ a traditional L4-style mapping
|
||
|
database. In order to hand out a MMIO page to another component,
|
||
|
root task must keep a local mapping of the page. On seL4, this is
|
||
|
fortunately not needed.
|
||
|
|
||
|
:_include/map_local.h_:
|
||
|
This core-local header contains the functionality to make a given
|
||
|
physical memory range visible within core's local address space.
|
||
|
|
||
|
:_include/util.h_:
|
||
|
This header is the designated place to keep kernel-specific utilities
|
||
|
that simplify the use of the kernel bindings. It also contains a few
|
||
|
policy functions that are called by the generic code. E.g., the function
|
||
|
'contain_map_size_log2' controls the use of large-page mappings.
|
||
|
|
||
|
To compile and link the skeleton of core, the _target.inc_ and _target.mk_
|
||
|
files of the Codezero version could be reused almost unchanged. The only
|
||
|
missing piece is the _core_printf_ library, which contains the console
|
||
|
implementation to be used by core. The minimalistic root task that was
|
||
|
created in the first article already provides the needed functionality
|
||
|
in the form of the _base-sel4/src/base/console/core_console.h_ header.
|
||
|
All we need to do is to create a library description file
|
||
|
_lib/mk/core_printf.mk_ that compiles _base/src/base/console/core_printf.h_
|
||
|
with the seL4-specific _core_console.h_
|
||
|
|
||
|
At the link stage, the linker informs us about a number of unresolvable
|
||
|
symbols. Those are symbols that were not needed for the minimalistic root
|
||
|
task of the previous articles but are required by core. At this point, the
|
||
|
base-common sources used by the minimalistic root task will diverge from the
|
||
|
base-common sources as used by real Genode components such as core. To
|
||
|
preserve the minimalistic root task, we incorporate the previous base-common
|
||
|
version into the test/sel4 target description and leave the _base-common.mk_
|
||
|
library for the real Genode components. Now, the _base-common.mk_ library can
|
||
|
be extended with the missing features, specifically the full implementation
|
||
|
of the thread, pager, server, and signal APIs.
|
||
|
|
||
|
At the point where core can successfully be linked, the skeleton code in
|
||
|
_base-sel4/src/core/_ consists of merely 400 lines of code. To run this
|
||
|
version of core, we create the following simple run script:
|
||
|
|
||
|
! build { core }
|
||
|
! create_boot_directory
|
||
|
! build_boot_image "core"
|
||
|
! append qemu_args " -nographic -m 64 "
|
||
|
! run_genode_until forever
|
||
|
|
||
|
After applying some minor tweaks to _tool/run/boot_dir/sel4_ for instructing
|
||
|
the boot loader to start core instead of the minimalistic root task, we can
|
||
|
execute core via 'make run/core'. We are greeted by the first life sign of
|
||
|
the core skeleton:
|
||
|
|
||
|
!:phys_alloc: Allocator 106d4d8 dump:
|
||
|
!:virt_alloc: Allocator 106e534 dump:
|
||
|
!:io_mem_alloc: Allocator 106f59c dump:
|
||
|
!Could not allocate physical memory region of size 4096
|
||
|
!could not allocate backing store for new context
|
||
|
|
||
|
Core first prints the status of its allocators, which are empty.
|
||
|
Consequently, the first attempt of an allocation fails, which leads us the
|
||
|
next topic, the proper initialization of core's allocators.
|
||
|
|
||
|
|
||
|
Core's allocators
|
||
|
#################
|
||
|
|
||
|
During the platform initialization in the 'Platform' constructor in
|
||
|
_base-sel4/src/core/platform.cc_, core initializes the following
|
||
|
parameters and allocators:
|
||
|
|
||
|
:IRQ allocator:
|
||
|
Core's IRQ service hands out each interrupt only once. To track the
|
||
|
interrupts that are in use, an allocator is used. Regardless of how
|
||
|
many interrupts exists on the actual machine, this allocator
|
||
|
is initialized with the range 0 to 255.
|
||
|
|
||
|
:Physical memory:
|
||
|
Core's physical memory allocator ('_core_mem_alloc.phys_alloc()') keeps
|
||
|
track of unused physical memory.
|
||
|
On seL4, this corresponds to the untyped memory ranges as reported by
|
||
|
seL4's boot-info structure. For each range found in this structure,
|
||
|
we add a corresponding range to the physical-memory allocator.
|
||
|
|
||
|
:I/O memory:
|
||
|
Core's I/O memory allocator ('_io_mem_alloc') maintains the unused
|
||
|
I/O memory regions. It is used by the IO_MEM service for making sure
|
||
|
that each memory-mapped I/O resource is handed out only once.
|
||
|
On most kernels, Genode threats all non-RAM physical-memory ranges as
|
||
|
I/O memory. On seL4, the boot-info structure conveniently reports
|
||
|
device-memory regions, which we can use to initialize the I/O memory
|
||
|
allocator without relying on heuristics.
|
||
|
|
||
|
:Virtual memory:
|
||
|
Core stores the bounds of the available virtual memory per component
|
||
|
in the form of '_vm_base' and '_vm_size'. To detect null-pointer
|
||
|
dereferences, we exclude the first virtual-memory page by setting
|
||
|
'_vm_base' to 0x1000. Because, we address x86_32 for now, we hard-wire
|
||
|
the upper boundary to 2 GiB.
|
||
|
|
||
|
:Core's virtual memory:
|
||
|
Core's virtual-memory allocator ('_core_mem_alloc.virt_alloc()') maintains
|
||
|
the unused virtual-memory regions of core. As for any component, it is
|
||
|
initialized with '_vm_base' and '_vm_size'. However, we have to manually
|
||
|
exclude the virtual-address range of the core image and the so-called
|
||
|
thread context area. The latter is a virtual-address region designated
|
||
|
for the stacks of core's threads.
|
||
|
|
||
|
With the initialized allocators, core produces the following output:
|
||
|
|
||
|
! Starting node #0
|
||
|
! core image:
|
||
|
! virtual address range [01000000,0107f000) size=0x7f000
|
||
|
! VM area at [00001000,80000000)
|
||
|
! :phys_alloc: Allocator 106d4d8 dump:
|
||
|
! Block: [00100000,0010a000) size=0000a000 avail=0000a000 max_avail=03e0a800
|
||
|
! Block: [001f0000,03ffa800) size=03e0a800 avail=03e0a800 max_avail=03e0a800
|
||
|
! => mem_size=65095680 (62 MB) / mem_avail=65095680 (62 MB)
|
||
|
! :virt_alloc: Allocator 106e534 dump:
|
||
|
! Block: [00001000,01000000) size=00fff000 avail=00fff000 max_avail=00fff000
|
||
|
! Block: [0107f000,40000000) size=3ef81000 avail=3ef81000 max_avail=3ef81000
|
||
|
! Block: [50000000,80000000) size=30000000 avail=30000000 max_avail=30000000
|
||
|
! => mem_size=1878523904 (1791 MB) / mem_avail=1878523904 (1791 MB)
|
||
|
! :io_mem_alloc: Allocator 106f59c dump:
|
||
|
! Block: [00001000,00100000) size=000ff000 avail=000ff000 max_avail=000ff000
|
||
|
! Block: [03ffe000,20000000) size=1c002000 avail=1c002000 max_avail=dffff000
|
||
|
! Block: [20001000,00000000) size=dffff000 avail=dffff000 max_avail=dffff000
|
||
|
! => mem_size=4228907008 (4033 MB) / mem_avail=4228907008 (4033 MB)
|
||
|
! bool Genode::map_local(Genode::addr_t, Genode::addr_t, Genode::size_t): not implemented
|
||
|
! bool Genode::map_local(Genode::addr_t, Genode::addr_t, Genode::size_t): not implemented
|
||
|
! could not map phys 101000 at local 401fe000
|
||
|
|
||
|
The circumstances of the first call of 'map_local' can be revealed by adding
|
||
|
an infinite loop to the body of the 'map_local' function and attaching
|
||
|
GDB to Qemu's built-in GDB stub (as described in the
|
||
|
[http://genode.org/documentation/articles/sel4_part_1#A_root_task_for_exercising_the_kernel_interface - in the first article]).
|
||
|
The backtrace tells us that core tries to initialize the main thread
|
||
|
(_base/src/platform/init_main_thread.cc_). That is, it tries to allocate a
|
||
|
stack within the context area. As a side effect of this step, core's
|
||
|
Genode environment ('Genode::env()') gets constructed
|
||
|
(_base/src/core/include/core_env.h_).
|
||
|
The implementation of core's Genode environment differs from the version
|
||
|
used in regular components. In particular, it contains core's entrypoint
|
||
|
that provides core's services. Because this entrypoint is a thread, core
|
||
|
tries to create a new thread along with the thread's context (i.e., a stack).
|
||
|
The thread creation, in turn, triggers a memory allocation within core, which
|
||
|
ultimately results in the attempt to extend the 'core_mem_alloc'.
|
||
|
Consequently, the first call of 'map_local' refers to the memory block added
|
||
|
to core's local memory pool.
|
||
|
|
||
|
To proceed, we need to provide a working implementation of the
|
||
|
'map_local' function. This function will ultimately create and interact with
|
||
|
kernel objects such as page frames. For this reason, we first need to define
|
||
|
the way how core manages the capability space for such kernel objects.
|
||
|
|
||
|
|
||
|
Capability-space organization within core
|
||
|
#########################################
|
||
|
|
||
|
The seL4 kernel provides a flexible concept of hierarchical capability spaces
|
||
|
using guarded page tables. The freedom offered by this concept raises the
|
||
|
question of how to organize core's capability space in the best way.
|
||
|
The insights gathered in the
|
||
|
[http://genode.org/documentation/articles/sel4_part_2 - previous article]
|
||
|
were helpful to come up with following scheme:
|
||
|
|
||
|
* The 1st-level CNode has a size of 2^12. It contains up to 4096 2nd-level
|
||
|
CNodes.
|
||
|
|
||
|
* The 2nd-level CNode (called core CNode) at index 0 contains capability
|
||
|
selectors for core-local kernel objects that are unrelated to memory.
|
||
|
For example, endpoints, threads, and page directories.
|
||
|
For now, we pick a size of 16*1024 = 2^14. To align the capability selectors
|
||
|
at the least significant bit of core's CSpace addresses, we pick a guard
|
||
|
of 32 - 12 (1st level) - 14 (2nd level) = 6.
|
||
|
|
||
|
* The 2nd-level CNode (called core-VM CNode) at index 1 is designated to
|
||
|
contain the of page-frame selectors and page-table selectors currently in
|
||
|
use by core's address space.
|
||
|
|
||
|
* The 2nd-level CNode (phys CNode) at index 0xfff contains page-frame
|
||
|
capabilities whose indices correspond to their respective untyped memory
|
||
|
locations.
|
||
|
With 4 GiB of physical address space, we need a maximum of 2^20 4K frames.
|
||
|
Hence, we dimension the CNode to contain 2^20 frame capabilities.
|
||
|
With one CNode entry being 16 bytes, this CNode takes 16 MiB of
|
||
|
untyped memory.
|
||
|
The guard must be set to 0 (12 + 20 + 0 = 32).
|
||
|
Note that on 64-bit systems, the simple 1:1 correspondence between
|
||
|
frame selectors and physical frame numbers may become
|
||
|
impractical.
|
||
|
|
||
|
* The remaining 2nd-level CNodes at the indices 2 to 4095 contain CNodes
|
||
|
for non-core protection domains, in particular copies of the page-frame
|
||
|
selectors and page-table selectors in use by the corresponding protection
|
||
|
domain.
|
||
|
|
||
|
In order to construct such a CSpace, we need to create kernel objects by
|
||
|
using the kernel's "retype-untyped" operation. This operation takes an
|
||
|
untyped memory range as argument. The argument is a tuple of an untyped
|
||
|
memory selector and an offset within the untyped memory range. Core's
|
||
|
physical memory allocator, however, has no notion of untyped memory ranges
|
||
|
and their associated selectors. For this reason, we will create seL4
|
||
|
kernel objects with the following procedure:
|
||
|
|
||
|
# Calculation of the needed backing store in bytes.
|
||
|
# Allocation of the backing store from core's physical memory allocator.
|
||
|
The allocated range must lie completely within one of seL4's untyped
|
||
|
memory ranges. This can be achieved by performing the allocation
|
||
|
with an alignment that corresponds to the allocation size (natural
|
||
|
alignment).
|
||
|
# Lookup of the untyped memory range by using the physical address and
|
||
|
size of the allocated backing store as a key. The lookup returns
|
||
|
a tuple of an untyped memory selector and an offset relative to the
|
||
|
start of the untyped memory range. Within core, we call this tuple
|
||
|
'Untyped_address'.
|
||
|
# Creation of the kernel object by specifying the untyped address as
|
||
|
argument to the kernel's untyped-retype operation.
|
||
|
|
||
|
With the facility for constructing kernel objects in place, we can
|
||
|
create the CNode hierarchy of core's CSpace. After creating CNodes
|
||
|
for the different levels, the initial capability selectors can be copied
|
||
|
to the new core CNode, and a selector for core's CNode must be inserted into
|
||
|
the top-level CNode. Finally, the top-level CSpace is assigned to the
|
||
|
initial thread using the 'seL4_TCB_SetSpace' operation.
|
||
|
|
||
|
To validate the switch to the new CSpace, we can issue an 'seL4_TCB_Suspend'
|
||
|
operation onto the initial thread. This operation will work only if the
|
||
|
specified thread selector is meaningful.
|
||
|
|
||
|
|
||
|
Core-local memory mappings
|
||
|
##########################
|
||
|
|
||
|
At start time, the only memory present in core's virtual address space
|
||
|
(VSpace) is the core image along with the BSS segment of the ELF executable.
|
||
|
After the basic initialization, however, core requires dynamic memory
|
||
|
allocations that depend on the booted system scenario. In general, a
|
||
|
core-local memory allocation involves the reservation of a physical
|
||
|
memory range, the reservation of a core-virtual memory range of the same
|
||
|
size, and the insertion of MMU mappings from the core-virtual address range
|
||
|
to the physical address range. On seL4, however, a few additional considerations
|
||
|
are needed.
|
||
|
|
||
|
First, physical memory has the form of untyped memory, which cannot be used
|
||
|
as memory unless explicitly converted into page frames using the
|
||
|
retype-untyped operation. This operation results in a number of capability
|
||
|
selectors, one for each page frame. We store those frame selectors in the
|
||
|
phys CNode such that the CNode index equals the page-frame number. This
|
||
|
way, we do not need to dynamically allocate selector slots over the
|
||
|
lifetime of the system and can easily infer the capability selector of the
|
||
|
page frame for a given physical address. The conversion of untyped
|
||
|
memory to page frames and similar functionality will go to a dedicated header
|
||
|
file _untyped_memory.h_. The function 'Untyped_memory::convert_to_page_frames'
|
||
|
takes a physical address and the number of page frames as argument and nicely
|
||
|
wraps the seL4 retype-untyped operation.
|
||
|
|
||
|
Second, to insert a page frame into a virtual address space, we potentially
|
||
|
also need to allocate and insert a page table, depending on where the
|
||
|
frame should be inserted. Initially, the kernel sets up the page table of
|
||
|
core that spans the core image. We have to distinguish the case where a
|
||
|
frame is inserted at an address where a page table is already installed
|
||
|
(somewhere within the initial set of page tables) from the case where
|
||
|
the frame is inserted at a location with no page table installed. To
|
||
|
distinguish both cases, we have to track the information where page
|
||
|
tables are installed. This is accomplished with a data structure
|
||
|
called 'Page_table_registry', which must be instantiated per virtual
|
||
|
address space. This registry maintains a list of
|
||
|
page table selectors along with their corresponding virtual addresses.
|
||
|
When inserting a page frame at a virtual address, the page-table registry
|
||
|
tells us whether or not we need to install a page table first.
|
||
|
|
||
|
Third, since each page-frame selector can be inserted into only one
|
||
|
page table, we need to create a copy of the page-frame selector and insert
|
||
|
the copy into the page table. Otherwise, a page frame could be present in
|
||
|
only a single virtual address space at a time, which contradicts with Genode's
|
||
|
dataspace concept. This raises the question how to allocate the copies
|
||
|
of the page frame selectors and how to name them. The resource demands for
|
||
|
these copies will ultimately depend on the behaviour of the components.
|
||
|
For example, if a component decides to allocate a small dataspace of 4 KiB
|
||
|
and attaches the dataspace a large number of times within its address space,
|
||
|
we need to allocate a copy of the page frame selector for each attachment.
|
||
|
The preallocation of a large-enough CNode as we do for the physical
|
||
|
page-frame selectors is out of question because we would need to preallocate
|
||
|
a 16-MiB CNode for each virtual address space. This is for a 32-bit system.
|
||
|
On a 64-bit system, this approach would become entirely impractical.
|
||
|
Hence, we need to allocate the frame selector copies in an as-needed fashion.
|
||
|
This, however, requires us maintain an associative data structure to
|
||
|
store the relationship between a frame-selector copy and its corresponding
|
||
|
virtual address. This data structure is quite similar to the page-table
|
||
|
registry described above. For this reason, the problem is covered by
|
||
|
extending the page-table registry with the ability to register tuples
|
||
|
of (page frame selector, virtual address).
|
||
|
|
||
|
Fourth, frame selector copies must be allocated from some CNode. Because
|
||
|
the allocation demands depend on the behaviour of the components, we cannot
|
||
|
use a global CNode shared between all components. Instead, we have to
|
||
|
create so-called VM CNode for each virtual address space. This CNode is
|
||
|
used to allocate page-table selectors and frame selector copies for a
|
||
|
particular virtual memory space. Within core, VM CNodes are made visible
|
||
|
as 2nd-level CNodes. One problem remains: The VM Cnode is dimensioned at
|
||
|
creation time of the virtual address space. By intensely populating a
|
||
|
virtual address space, we may run out of selectors. Worse, this condition
|
||
|
will trigger during the page-fault handling. So no error condition can be
|
||
|
reflected to the component. However, fortunately,
|
||
|
if we encounter such a situation, we can safely discard existing selectors
|
||
|
via ('seL4_CNode_Delete') and reuse the selectors for different frames and
|
||
|
page tables. This would result in the eviction of MMU mappings in this
|
||
|
situation. So this concept basically corresponds to a software-loaded TLB
|
||
|
implemented in software.
|
||
|
|
||
|
The problems described above are addressed by the 'Vm_space' class that
|
||
|
takes care of the VM CNode creation and the allocation of VM-specific
|
||
|
selectors from the VM CNode. We can skip the implementation of the eviction
|
||
|
mechanism for now. In order to accommodate the needs of the 'map_local'
|
||
|
function, a 'Vm_space::map' method suffices.
|
||
|
|
||
|
To solve the problem for the initial call of 'map_local' for a memory
|
||
|
block allocated from 'Platform::core_mem_alloc', the platform-specific
|
||
|
implementation of 'Core_mem_allocator::Mapped_mem_allocator::_map_local'
|
||
|
has to perform the retype-untyped operation of the physical memory range
|
||
|
before calling 'map_local' ('map_local.h'). The 'map_local' function
|
||
|
performs the actual VM-space manipulation by calling 'Vm_space::map'.
|
||
|
With this implementation, the first call of 'map_local' succeeds.
|
||
|
However, the second call of 'map_local' fails with the following error:
|
||
|
|
||
|
! map_local from_phys=0x1e5000, to_virt=0x401fe000, num_pages=2
|
||
|
! <<seL4 [decodeCNodeInvocation/107 Te01e9880 @100092c]:
|
||
|
! CNode Copy/Mint/Move/Mutate: Source slot invalid or empty.>>
|
||
|
|
||
|
The virtual address of 0x401fe000 lies within the thread-context area, which
|
||
|
indicates that 'map_local' was called to install a stack. To investigate the
|
||
|
circumstances of the call more closely, it is useful to equip the 'map_local'
|
||
|
function with a "ghetto breakpoint":
|
||
|
|
||
|
! inline bool map_local(addr_t from_phys, addr_t to_virt, size_t num_pages)
|
||
|
! {
|
||
|
! PDBG("map_local from_phys=0x%lx, to_virt=0x%lx, num_pages=%zd",
|
||
|
! from_phys, to_virt, num_pages);
|
||
|
!
|
||
|
! if (to_virt == 0x401fe000)
|
||
|
! for (;;);
|
||
|
! ...
|
||
|
! }
|
||
|
|
||
|
A backtrace at the point where core gets stuck at the "breakpoint" reveals
|
||
|
that core tries to attach a thread context dataspace to the context area by
|
||
|
calling the 'Context_area_rm_session::attach' function. This function
|
||
|
allocates a physical memory range from core's phys alloc, manually creates
|
||
|
a 'Dataspace_component' object, and calls 'map_local'. Clearly, the step for
|
||
|
converting untyped memory to page frames (as we already did for
|
||
|
'core_mem_alloc' is missing here. So we have to modify the function in
|
||
|
perform the conversion. Unfortunately, the 'context_area.cc' is
|
||
|
generic source code located in _repos/base/_. Since we do not want
|
||
|
to touch generic code with the seL4 port, we have to create a copy of the file
|
||
|
to _repose/base-sel4/src/core/_ for now. Of course, this code duplication
|
||
|
is frowned upon. We will eventually need to add the call of a hook function to
|
||
|
the generic implementation. So the 'attach' function could remain generic and
|
||
|
only the hook function would have a platform-specific implementation. However,
|
||
|
to prevent us from becoming side tracked with changing core interfaces, we
|
||
|
leave this step for later. In the base-sel4-local copy of _context_area.cc_,
|
||
|
we insert a call to 'Untyped_memory::convert_to_page_frames'.
|
||
|
|
||
|
|
||
|
Enabling core's services
|
||
|
########################
|
||
|
|
||
|
With the core-internal memory management in place, core succeeds with
|
||
|
allocating the stack for core's RPC entrypoint. This is followed by the
|
||
|
attempt to create and start the entrypoint's thread as indicated by the
|
||
|
following debug message:
|
||
|
|
||
|
!void Genode::Thread_base::_init_platform_thread(...): not implemented
|
||
|
|
||
|
|
||
|
Core-local thread creation
|
||
|
==========================
|
||
|
|
||
|
We have to implement the functionality of _core/thread_start.cc_. The
|
||
|
'_init_platform_thread' function creates the in-kernel thread control block
|
||
|
(TCB) along with the thread's IPC buffer. The seL4 IPC buffer corresponds
|
||
|
to the UTCB (user-level thread control block) as present on other L4 kernels.
|
||
|
Genode places the UTCB of a thread along with the thread's stack in a slot
|
||
|
of the context area. All we need to do to reserve the virtual address range
|
||
|
for the IPC buffer is to tell Genode about the buffer size.
|
||
|
This is done by defining the 'Genode::Native_utcb' type to contain an
|
||
|
array that corresponds to the UTCB size. As we will use a dedicated memory
|
||
|
page for each thread, the array is dimensioned to span 4096 bytes. The
|
||
|
'Thread_base' class stores platform-specific thread meta data in a variable
|
||
|
of the 'Native_thread'. On seL4, we use this variable to store the selector
|
||
|
of the thread's TCB.
|
||
|
|
||
|
The thread creation consists of two steps. The first step is the physical
|
||
|
creation of the thread in the function '_init_platform_thread' and involves
|
||
|
the allocation of the thread's IPC buffer, the creation of the TCB, the
|
||
|
assignment of the IPC buffer to the TCB, the definition of the TCBs
|
||
|
scheduling priority, and the association of the TCB with core's protection
|
||
|
domain. The second step is the start of the thread execution via the
|
||
|
'start' function. It defines the initial register state of the instruction
|
||
|
pointer and stack pointer, and kicks off the execution via 'seL4_TCB_Resume'.
|
||
|
|
||
|
The thread-creation procedure is the first instance where seL4 kernel objects
|
||
|
are created without statically defined capability selectors. Hence, we need
|
||
|
to equip core with an allocator for managing the selectors of the core CNode.
|
||
|
|
||
|
At the current stage, we have not taken any precaution for the synchronization
|
||
|
of threads. Hence, when starting the execution of the new thread, we are
|
||
|
likely to corrupt core-internal state. For this reason, it is sensible to
|
||
|
modify the thread's entry code as follows:
|
||
|
|
||
|
!void Thread_base::_thread_start()
|
||
|
!{
|
||
|
! int dummy;
|
||
|
! PDBG("called, stack at 0x%p, spinning...", &dummy);
|
||
|
! for (;;);
|
||
|
! ..
|
||
|
!}
|
||
|
|
||
|
This way, we will see a life sign of the new thread without causing any
|
||
|
rampage. With the implementation of core's _thread_start.cc_, we are greeted
|
||
|
with the following message:
|
||
|
|
||
|
!static void Genode::Thread_base::_thread_start():
|
||
|
! called, stack at 0x401fef8c, spinning...
|
||
|
|
||
|
The address (in particular the leading 0x4 digit) of the local 'dummy'
|
||
|
variable tells us that the stack of the new thread was properly allocated
|
||
|
within the context area.
|
||
|
|
||
|
|
||
|
Thread synchronization
|
||
|
======================
|
||
|
|
||
|
After the initial thread has created and started core's entrypoint thread, it
|
||
|
blocks until the entrypoint's initialization completes. This startup
|
||
|
synchronization uses Genode's 'Lock' mechanism. The default lock
|
||
|
implementation relies on a kernel mechanism to block and wake up
|
||
|
threads. For example, Genode uses futexes on Linux, kernel semaphores on
|
||
|
NOVA, and vIRQ objects on Fiasco.OC. On seL4, the natural approach would be the
|
||
|
use of an asynchronous endpoint per thread. However, as we are currently
|
||
|
concerned about the core services, we are at risk to become side tracked
|
||
|
by the implementation of the proper lock. Hence, for the current state,
|
||
|
we can use a simple yielding spinlock as an interim solution. Of all supported
|
||
|
kernels, L4/Fiasco is the only kernel that lacks a proper base mechanism for
|
||
|
building synchronization primitives. Hence, we use to employ a yielding
|
||
|
spinlock on this kernel. We can simply borrow the lock implementation from
|
||
|
_base-fiasco/src/base/lock/_ and _base-fiasco/include/base/cancelable_lock.h_.
|
||
|
|
||
|
With the stop-gap lock implementation in place, we can activate
|
||
|
the regular '_thread_start' function and allow core's initialization to
|
||
|
proceed. We are greeted with the following messages:
|
||
|
|
||
|
!void Genode::Ram_session_component::_export_ram_ds(...): not implemented
|
||
|
!void Genode::Ram_session_component::_clear_ds(...): not implemented
|
||
|
!virtual Genode::Rm_session::Local_addr
|
||
|
! Genode::Core_rm_session::attach(...): not implemented
|
||
|
!Caught cap fault in send phase at address 0x0
|
||
|
!while trying to handle:
|
||
|
!vm fault on data at address 0x0 with status 0x6
|
||
|
!in thread 0xe0209880 at address 0x104c8aa
|
||
|
!void Genode::Ipc_istream::_wait(): not implemented
|
||
|
!void Genode::Ipc_ostream::_send(): not implemented
|
||
|
!void Genode::Ipc_istream::_wait(): not implemented
|
||
|
!...
|
||
|
|
||
|
|
||
|
The difficulty in interpreting these messages lies in the fact that we have
|
||
|
now more than one thread running. So we have to carefully analyze each
|
||
|
message individually. But we can already gather quite a bit of information
|
||
|
by just looking at the output:
|
||
|
|
||
|
* Core tries to allocate a RAM dataspace by invoking the RAM service. At
|
||
|
the allocation time of a new RAM dataspace, core clears the underlying
|
||
|
backing store by calling the '_clear_ds' function. Besides the obvious
|
||
|
fact that we have not yet implemented the function, it would be interesting
|
||
|
to learn the exact circumstance of its call.
|
||
|
|
||
|
* The VM fault at address 0 (the message comes from the kernel) is most
|
||
|
likely a subsequent fault of the missing back-end functionality of the RAM
|
||
|
service. It looks like a de-referenced null pointer returned by the
|
||
|
non-implemented 'Core_rm_session::attach' function. A quick look at the
|
||
|
reported instruction pointer 0x104c8aa via 'objdump -lSd' reveals that
|
||
|
the fault happens in the 'malloc' implementation of the C++ runtime's
|
||
|
internal heap. A plausible explanation could be that the C++ runtime
|
||
|
tried to perform a memory allocation (e.g., for allocating an exception
|
||
|
frame), which exceeds the 'initial_block' of the allocator. So the runtime's
|
||
|
heap tries to expand its backing store by allocating and using a new RAM
|
||
|
dataspace. So the calls of the 'Ram_session_component' functions are
|
||
|
indirectly caused by the attempt to expand the C++ runtime's heap.
|
||
|
|
||
|
* The 'Ipc_istream::_wait' function is called by the just-created RPC
|
||
|
entrypoint thread. The messages repeat infinitely. To work on the
|
||
|
RAM dataspace allocation, we will first need to silence those messages
|
||
|
by adding an infinite loop to '_wait' in _base-sel4/src/base/ipc/ipc.cc_.
|
||
|
|
||
|
|
||
|
RAM session supplements
|
||
|
=======================
|
||
|
|
||
|
To enable core to work with RAM dataspaces, we have to focus on the functions
|
||
|
of _ram_session_support.cc_. A RAM dataspace is allocated from core's physical
|
||
|
memory allocator. At allocation time, however, the underlying backing store
|
||
|
is still untyped memory. To use it as RAM dataspace, we need to convert it to
|
||
|
physical page frames as we already did for the core memory allocator. Placing
|
||
|
a call of 'Untyped_memory::convert_to_page_frames' in '_export_ram_ds' does
|
||
|
the trick.
|
||
|
|
||
|
With the implementation of '_clear_ds', we can follow the pattern as used on
|
||
|
other kernels. The OKL4 kernel is particularly useful because - like seL4 -
|
||
|
it populates virtual address spaces via an asynchronous map operation that
|
||
|
targets physical memory. While implementing this function, it becomes
|
||
|
apparent that we need to implement the counterpart of 'map_local' called
|
||
|
'unmap_local'. As a side effect, we need to implement 'Cnode_base::remove'
|
||
|
(for deleting a selector from a CNode) and 'Vm_space::unmap'.
|
||
|
As for the '_clear_ds' function, we can take the OKL4 version of core as
|
||
|
blue print for implementing the 'Core_rm_session::attach' function.
|
||
|
|
||
|
At this point, the logging of 'map_local' and 'unmap_local' operation
|
||
|
starts to produce a lot of noise. So it is a good time to remove this
|
||
|
no-longer needed messages. Now we are rewarded with the first real life
|
||
|
signs of core:
|
||
|
|
||
|
!Genode 15.02-336-g7b52dcd <local changes>
|
||
|
!int main(): --- create local services ---
|
||
|
!int main(): --- start init ---
|
||
|
!Could not open ROM session for module "init"
|
||
|
!ROM module "init" not present
|
||
|
!virtual void Genode::Pager_activation_base::entry(): not implemented
|
||
|
!void Genode::Ipc_istream::_wait(): not implemented
|
||
|
!void Genode::Ipc_ostream::_marshal_capability(...): not implemented
|
||
|
!void Genode::Ipc_client::_call(): not implemented
|
||
|
!void Genode::Ipc_ostream::_marshal_capability(...): not implemented
|
||
|
!void Genode::Ipc_client::_call(): not implemented
|
||
|
!Transfer CPU quota, core -> init, no reference relation
|
||
|
!int main(): transferred 45 MB to init
|
||
|
!void Genode::Ipc_client::_call(): not implemented
|
||
|
!void Genode::Ipc_istream::_unmarshal_capability(...): not implemented
|
||
|
!Genode::Platform_pd::Platform_pd(...): not implemented
|
||
|
!int main(): --- init created, waiting for exit condition ---
|
||
|
!void Genode::Ipc_istream::_wait(): not implemented
|
||
|
|
||
|
Apparently, the initialization of all core services has succeeded. This means
|
||
|
that the core-internal mechanics (threading, synchronization, memory
|
||
|
management) are running fine. Other than that, the output tells us the
|
||
|
following things:
|
||
|
|
||
|
* Core tries to obtain the ELF image for the init component. As we do not
|
||
|
have any ROM module available, this step fails.
|
||
|
|
||
|
* Core spawns the pager thread (Pager activation) for handing incoming page
|
||
|
faults. The page fault handler is not implemented yet.
|
||
|
|
||
|
* The core-internal threads try to communicate with each other using IPC.
|
||
|
The IPC library, however, is not implemented yet. Hence, RPC calls
|
||
|
return immediately with bogus results.
|
||
|
|
||
|
* Core tries to create the protection domain for the init component.
|
||
|
|
||
|
We have to address each of those issues in order to start the init component.
|
||
|
|
||
|
|
||
|
Capability lifetime management and IPC
|
||
|
======================================
|
||
|
|
||
|
Of the open points listed in the previous section, the RPC communication
|
||
|
between threads (also referred to as IPC) is the most challenging one.
|
||
|
Without going into extreme details, the following considerations are
|
||
|
worth mentioning.
|
||
|
|
||
|
|
||
|
Representation of Genode capabilities
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
For the time being, we will represent a Genode capability as a tuple of
|
||
|
a badged seL4 endpoint selector and a unique ID. The endpoint can be
|
||
|
used to invoke the capability whereas the unique ID enables components
|
||
|
to re-identify capabilities. For a discussion of the latter problem,
|
||
|
please refer to the corresponding thread at the
|
||
|
[http://sel4.systems/pipermail/devel/2014-November/000112.html - seL4 mailing list].
|
||
|
It is important to highlight that the unique ID is merely a stop-gap
|
||
|
solution as it has two unwelcome ramifications. First, by using globally
|
||
|
unique values, components can see global information, which could be misused
|
||
|
to construct covert storage channels. Second, the integrity of the unique
|
||
|
IDs is not protected by the kernel. A misbehaving component could forge this
|
||
|
part of the capability representation to interact with RPC objects without
|
||
|
authority. The second problem exists when capabilities are used as arguments
|
||
|
to RPC calls where the receiver of the call has not created the capability
|
||
|
in the first place. Even though this a rare case, it exists within Genode
|
||
|
in the form of session-control functions of Genode's parent interface.
|
||
|
For closing a session, a child component specifies a session capability to
|
||
|
the parent. The session capability, however, was originally created by a
|
||
|
server component, not the parent.
|
||
|
|
||
|
To remedy the problem, there exist a few approaches such as using core as
|
||
|
a proxy for capability delegations, using multiple seL4 selectors to
|
||
|
represent a single Genode capability, or using non-global hint values instead
|
||
|
of unique IDs combined with a selector-compare operation. However, even
|
||
|
though those attempts greatly differ with respect to complexity, performance
|
||
|
overhead, and awkwardness, none of them is both effective and easy to
|
||
|
implement. So we will stick with the unique IDs for now.
|
||
|
|
||
|
|
||
|
Manufacturing and managing capabilities
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
Genode capabilities are created and destroyed by core's CAP service. Hence,
|
||
|
the 'Cap_session_component' interface must be implemented. The 'alloc'
|
||
|
function takes an entrypoint capability as argument and produces a new
|
||
|
object identity by minting the entrypoint's endpoint selector. The badge
|
||
|
of the minted endpoint selector is set to a freshly allocated unique ID.
|
||
|
|
||
|
While working with a Genode capability within the local component, we need
|
||
|
to keep track of lifetime of the Genode capability. If a Genode capability
|
||
|
is no longer in use, we want to remove the corresponding endpoint selector
|
||
|
from the component's CSpace. For this reason, Genode capabilities are
|
||
|
implemented as reference-counted smart pointers. The framework-internal
|
||
|
representation of those smart pointers differs between core and non-core
|
||
|
component. Within core, we need to maintain the information about which CAP
|
||
|
session was used to create a given capability to ensure that each capability
|
||
|
can be freed only via the CAP session that created it. Outside of core, this
|
||
|
information is not needed. Because of this difference between core and
|
||
|
non-core, the 'Capability_space_sel4' implementation in
|
||
|
_base-sel4/base/internal/capability_space_sel4.h_ is implemented as a
|
||
|
class template.
|
||
|
|
||
|
The capability-space management is implemented in a way that cleanly
|
||
|
separates seL4-specific code from kernel-agnostic code. The latter part
|
||
|
will eventually move to the _base_ repository to be adapted by other base
|
||
|
platforms.
|
||
|
|
||
|
|
||
|
Capability invocation
|
||
|
~~~~~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
The functionality needed to call RPC objects is implemented in
|
||
|
_base-sel4/src/base/ipc/_ and the closely related definition of the message
|
||
|
buffer at _base-sel4/include/base/ipc_msgbuf.h_. The _ipc/ipc.cc_ file hosts
|
||
|
the implementation of the low-level IPC mechanism using the seL4 system calls
|
||
|
'seL4_Call', 'seL4_Wait', and 'seL4_ReplyWait'. The conversion between seL4
|
||
|
IPC messages and Genode's message buffer as defined in _ipc_msgbuf.h_ is
|
||
|
performed by the functions 'new_sel4_message' and 'decode_sel4_message'.
|
||
|
|
||
|
|
||
|
Building and linking the first non-core component
|
||
|
#################################################
|
||
|
|
||
|
With the RPC mechanism working between core-local threads, it is time to
|
||
|
consider the creation of the first non-core component, namely init. To build
|
||
|
the init binary, however, there are still a few bits missing.
|
||
|
|
||
|
! sel4_x86_32> make init
|
||
|
! checking library dependencies...
|
||
|
! Library-description file base.mk is missing
|
||
|
|
||
|
The _base.mk_ library complements the _base-common.mk_ library with
|
||
|
functionality that is different from core:
|
||
|
|
||
|
* Console output is directed to core's LOG service instead of the kernel
|
||
|
debugger.
|
||
|
* The context area is managed via a nested RM session, provided by core's
|
||
|
RM service.
|
||
|
* The capability space is managed differently. The lower part of the
|
||
|
capability space is managed by core. It is populated with selectors of
|
||
|
kernel objects that are unrelated to Genode capabilities, e.g., TCB
|
||
|
selectors. The upper part is managed locally by the component.
|
||
|
* The thread API uses core's CPU service.
|
||
|
|
||
|
For creating the _base.mk_ library for seL4, we can take cues from the
|
||
|
other base platforms. It turns out that we don't need to implement much new
|
||
|
code. Most functionality can be taken from the generic _base_ repository.
|
||
|
|
||
|
|
||
|
Boot-time ROM modules
|
||
|
#####################
|
||
|
|
||
|
In contrast to most L4 kernels, seL4 does not support the loading of multiple
|
||
|
boot modules in addition to root task. If we supplied multiple multi-boot
|
||
|
modules, seL4 would regard them as individual system images in a multi-kernel
|
||
|
setup.
|
||
|
|
||
|
For this reason, we need a way to include Genode's initial ROM modules such as
|
||
|
the ELF binary of the init component, the init configuration, and further
|
||
|
ELF binaries into one ELF image with core. Fortunately, seL4 is not the first
|
||
|
kernel where we face such an issue. We can simply reuse the solution that we
|
||
|
created originally for the base-hw kernel:
|
||
|
|
||
|
* Core is built as a library.
|
||
|
* All boot modules are merged into a single object file called
|
||
|
* _boot_modules.o_. The object file
|
||
|
is created from an automatically generated assembly file that uses the
|
||
|
assembler's incbin directive to incorporate binary data. In addition to the
|
||
|
binary data, the assembly file contains the module name as meta data.
|
||
|
A dummy skeleton of a _boot_modules.s_ file resides at
|
||
|
_base-sel4/src/core/_.
|
||
|
* Both the core library and _boot_modules.o_ are linked at the boot
|
||
|
image-creation stage of Genode's run tool. The mechanism is provided by
|
||
|
the _run/tool/boot_dir/sel4_ script snippet, which is based on the hw
|
||
|
version.
|
||
|
* Core's platform initialization code (the constructor of 'Platform')
|
||
|
accesses the meta data provided by _boot_modules.o_ to populate core's
|
||
|
ROM service with ROM modules.
|
||
|
|
||
|
Merging all boot modules with core is only half the way to go. We also need
|
||
|
to make the boot modules known to core's ROM service. The ROM service
|
||
|
expect the physical address and size of each ROM module. But since each
|
||
|
ROM module is just a part of core's virtual address space, we neither
|
||
|
know its physical address nor do we have the underlying page frames present
|
||
|
in core's phys CNode. However, the seL4 boot info tells us the frame
|
||
|
selector range of core (including the boot modules) via the 'userImageFrames'
|
||
|
field. Given the virtual addresses of the start of core and each of the
|
||
|
boot modules, we can calculate the page-frame selectors that belongs to the
|
||
|
corresponding module. Still, we need to make the page selectors available
|
||
|
within the phys CNode in order to access it. Unfortunately, we do not know
|
||
|
the physical address where the 'userImageFrames' reside. In theory, however,
|
||
|
any large-enough range within the phys CNode that is not occupied with RAM
|
||
|
or device memory will do. To determine the unused ranges with the phys
|
||
|
Cnode, we introduce a new allocator called 'unused_phys_alloc'. This
|
||
|
allocator is initialized with the entirety if the address space covered by
|
||
|
the phys CNode. With each range we add to either the physical memory
|
||
|
allocator or the I/O memory allocator, we remove the range from the unused
|
||
|
phys allocator. After the initialization of the allocators completed, we
|
||
|
can allocate an unused range of the phys CNode from the unused phys allocator
|
||
|
and copy the page frames of the boot modules to this range of the phys CNode.
|
||
|
All this magic happens in the function 'Platform::_init_rom_modules'.
|
||
|
|
||
|
After this procedure, the boot modules will appear to core as being part
|
||
|
of the physical address space, even though we technically don't know the
|
||
|
actual physical addresses populated by them.
|
||
|
|
||
|
|
||
|
Bootstrapping the init component
|
||
|
################################
|
||
|
|
||
|
With the IPC and boot-module handling in place, core arrives at the following
|
||
|
point when starting the _base/run/printf.run_ script:
|
||
|
|
||
|
!boot module 'init' (371380 bytes)
|
||
|
!boot module 'test-printf' (249664 bytes)
|
||
|
!boot module 'config' (272 bytes)
|
||
|
!Genode 15.02-348-ge612bff
|
||
|
!int main(): --- create local services ---
|
||
|
!int main(): --- start init ---
|
||
|
!virtual void Genode::Pager_activation_base::entry(): not implemented
|
||
|
!int main(): transferred 43 MB to init
|
||
|
!Genode::Platform_pd::Platform_pd(): not implemented
|
||
|
!Genode::Platform_thread::Platform_thread(): not implemented
|
||
|
!virtual void Genode::Core_rm_session::detach(): not implemented
|
||
|
|
||
|
According to the log messages, core attempts to create the protection domain
|
||
|
and the main thread of the init component. Now that we start to deal with
|
||
|
more than one virtual address space, debugging with Qemu's GDB stub will
|
||
|
become difficult. Since we don't know, which address space is active when
|
||
|
attaching GDB, we cannot unambiguously create back traces or refer to
|
||
|
virtual addresses. A simple work-around for this issue is to use
|
||
|
disjoint link addresses for the core and init programs. By default, we
|
||
|
link components (including init) to 0x01000000. By adding the following line
|
||
|
to core's _target.mk_ file, we force core to use a different virtual-address
|
||
|
range:
|
||
|
! LD_TEXT_ADDR ?= 0x02000000
|
||
|
|
||
|
For bootstrapping the init component, we have to solve the creation of
|
||
|
init's kernel objects, the handling of page faults, the propagation of
|
||
|
the parent capability to the new PD, and the synchronous inter-component
|
||
|
communication between init and core.
|
||
|
|
||
|
|
||
|
Platform_thread and Platform_pd
|
||
|
===============================
|
||
|
|
||
|
Core's 'Platform_pd' and 'Platform_thread' classes use seL4 kernel primitives.
|
||
|
A 'Platform_pd' needs the following ingredients:
|
||
|
|
||
|
* A 'Page_table_registry' (the one we created in
|
||
|
Section [Core-local memory mappings]) is used to keep track of the
|
||
|
memory mappings installed in the PD's virtual address space.
|
||
|
* A seL4 page-directory object represents the PD at the kernel.
|
||
|
* A 'Vm_space' operates on the 'Page_table_registry' and invokes seL4
|
||
|
system calls to populate the seL4 page-directory object with the
|
||
|
corresponding memory mappings.
|
||
|
* A CNode for the CSpace of the PD represents the cap selector space
|
||
|
within the PD.
|
||
|
|
||
|
The CSpace of the PD is divided into two parts. The lower part is managed
|
||
|
by core and will hold selectors to kernel objects created or installed by
|
||
|
core. For example, one unbadged endpoint selector for each thread,
|
||
|
the parent capability, or a fault-handler selector for each thread.
|
||
|
The size of the lower part is defined by the
|
||
|
NUM_CORE_MANAGED_SELECTORS_LOG2 value. To allocate selectors within the
|
||
|
core-managed part of the PD's CSpace, core maintain an allocator '_sel_alloc'
|
||
|
per PD. The upper part of CSpace is managed by the PD itself. It is
|
||
|
designated for holding badged endpoint capabilities.
|
||
|
|
||
|
A 'Platform_thread' consists of an seL4 thread, an IPC buffer, and an unbadged
|
||
|
endpoint. The code for creating and starting threads outside of core is similar
|
||
|
to the code we already created for core-local threads. So we can use the
|
||
|
functionality of _thread_sel4.h_.
|
||
|
|
||
|
With the creation of PDs and non-core threads in place, the main thread
|
||
|
of init tries to fetch its first instruction from init's start of the
|
||
|
text segment (0x01000000). We can see the fault as a kernel message.
|
||
|
|
||
|
|
||
|
Page-fault handling
|
||
|
===================
|
||
|
|
||
|
To allow the init's main thread to execute, we have to resolve its page
|
||
|
faults by implementing the pager-related interfaces within core. In line
|
||
|
with all L4 kernels, the
|
||
|
kernel reflects page faults as IPC messages to the pager as defined by
|
||
|
the 'seL4_TCB_SetSpace' operation. This operation takes a PD-local
|
||
|
fault-handler selector as argument. When the PD raises a page fault, the
|
||
|
faulting thread implicitly sends a message through this selector. To direct
|
||
|
those messages to core's pager endpoint, we have to copy the endpoint selector
|
||
|
of core's pager endpoint into the CSpace of the PD. To enable the pager
|
||
|
endpoint to distinguish the origin of page faults, each fault-handler
|
||
|
selector is badged with a value that reveals the faulting thread.
|
||
|
|
||
|
The resolution of page faults differs from traditional L4 kernels, which
|
||
|
establish memory mappings as a side effect of the reply to synchronous
|
||
|
page-fault messages. In contrast, seL4 requires the pager to install a
|
||
|
mapping using an asynchronous map operation on a page-directory object.
|
||
|
It turns out that core's pager infrastructure is yet very well prepared
|
||
|
to handle this case. However, in order to keep Genode's generic code
|
||
|
untouched during the porting work, I decided to emulate the traditional
|
||
|
approach by modelling the side effect of populating the faulting address space
|
||
|
via the global 'install_mapping' function that takes the faulter's badge
|
||
|
and a mapping as argument. It is called by the
|
||
|
'Ipc_pager::reply_and_wait_for_fault' function. The implementation resides in
|
||
|
'platform_thread.cc'. It looks up the faulting thread using its faulter
|
||
|
badge as key and then installs the mapping into the thread's PD.
|
||
|
This implementation is meant as an interim solution until core accommodates
|
||
|
the asynchronous map semantics in the generic code.
|
||
|
With the page-fault handling in place, we can observe the following log
|
||
|
output:
|
||
|
|
||
|
!PF: ip=0x1000000, pf=0x1000000, write=0
|
||
|
!PF: ip=0x100000b, pf=0x1081040, write=1
|
||
|
!PF: ip=0x103d4e0, pf=0x103d4e0, write=0
|
||
|
!PF: ip=0x1023f00, pf=0x1023f00, write=0
|
||
|
!PF: ip=0x102b640, pf=0x102b640, write=0
|
||
|
|
||
|
From observing the instruction-pointer values, we can infer that the main
|
||
|
thread of the init component starts to execute. It cannot yet interact with
|
||
|
core because we have not taken care of supplying the component with a proper
|
||
|
parent capability. However, thanks to the 'seL4_DebugPutChar' system call that
|
||
|
allows us to output characters directly via the kernel, we can obtain a life
|
||
|
sign from init by adding a message to the early initialization code.
|
||
|
|
||
|
!void prepare_init_main_thread()
|
||
|
!{
|
||
|
! kernel_debugger_outstring("prepare_init_main_thread called\n");
|
||
|
!}
|
||
|
|
||
|
On the next run, we can see this message in-between the page-fault
|
||
|
log messages.
|
||
|
|
||
|
|
||
|
Non-core component startup
|
||
|
==========================
|
||
|
|
||
|
For debugging the non-core startup code, it is useful to temporary
|
||
|
use the core_printf library also for init. This way, init is able
|
||
|
to print messages directly via the kernel.
|
||
|
|
||
|
To allow init to interact with its parent (core), we need to make the
|
||
|
parent capability available to init and implement the management of the
|
||
|
capability space for non-core components. The code for the latter goes
|
||
|
to _base/env/capability_space.cc_ and uses the infrastructure that we
|
||
|
already employ for core. Non-core components define the
|
||
|
'Native_capability::Data' type differently because - unlike core - they do not
|
||
|
need to associate capabilities with their corresponding CAP sessions.
|
||
|
|
||
|
As a prerequisite to get the IPC communication from init to core working, we
|
||
|
need to equip init's main thread with a proper IPC buffer. In particular, we
|
||
|
have to map the IPC buffer into init's virtual memory. By convention, we
|
||
|
install it at the second page (0x1000). To prevent this page to be used for
|
||
|
anything else, we explicitly exclude it from the range of the available
|
||
|
virtual address space by adjusting 'Platform::_vm_base' accordingly.
|
||
|
|
||
|
With the IPC buffer in place, init tries to contact its parent. But at
|
||
|
this point, we have not made the parent capability available to init.
|
||
|
This requires two steps. First, core needs to install the endpoint selector
|
||
|
for the parent capability to init's CSpace. The designated place to
|
||
|
implement this functionality is 'Platform_pd::assign_parent'. Second,
|
||
|
init has to pick up the selector and create a proper Genode capability
|
||
|
out of it. This is done in the seL4-specific implementation
|
||
|
'Genode::parent_cap' in _src/platform/_main_parent_cap.h_.
|
||
|
|
||
|
With the propagation of the parent capability to init, we can observe
|
||
|
init's main thread to successfully initialize the environment of the
|
||
|
init component. It stops when it tries to create secondary threads, namely
|
||
|
the signal handler thread and the entrypoint that will serve a "test-printf"
|
||
|
child of init. To accommodate those threads, core needs to make sure
|
||
|
to map their corresponding IPC buffers. In contrast to the IPC buffer
|
||
|
of the main thread, the virtual address of the secondary thread's IPC buffers
|
||
|
is explicitly specified by the thread-creating code. Hence, core does not
|
||
|
need to rely on predefined virtual addresses for those. To allow a thread
|
||
|
to act as an entrypoint, we need to make it aware of its unbadged endpoint
|
||
|
selector. This selector was allocated within the core-managed part of the PD's
|
||
|
CSpace during the thread creation. So only core knows its value. However
|
||
|
core can share this knowledge with the just-created thread by writing it to
|
||
|
the thread's IPC buffer before starting the thread. Upon startup, the thread
|
||
|
will readily find the needed information in its IPC buffer when entering
|
||
|
the 'Thread_base::_thread_bootstrap' method.
|
||
|
|
||
|
After completing this step and re-enabling the LOG console for non-core
|
||
|
components, we are greeted with the following output:
|
||
|
|
||
|
!boot module 'init' (371556 bytes)
|
||
|
!boot module 'test-printf' (249776 bytes)
|
||
|
!boot module 'config' (272 bytes)
|
||
|
!Genode 15.02-360-gb3cb2ca <local changes>
|
||
|
!int main(): --- create local services ---
|
||
|
!int main(): --- start init ---
|
||
|
!int main(): transferred 43 MB to init
|
||
|
!int main(): --- init created, waiting for exit condition ---
|
||
|
![init] Could not open ROM session for module "ld.lib.so"
|
||
|
![init -> test-printf] -1 = -1 = -1Test succeeded
|
||
|
|
||
|
We have successfully started the first Genode scenario on seL4!
|
||
|
|
||
|
The scenario consists of the three components core, init, and test-printf.
|
||
|
The latter has been started as a child of init. Each component lives in
|
||
|
its dedicated protection domain. So we know at this point
|
||
|
that Genode's recursive structure is working. We have multiple threads
|
||
|
running, which happily communicate with each other via synchronous RPC.
|
||
|
The delegation of capabilities works.
|
||
|
|
||
|
|
||
|
Next steps
|
||
|
==========
|
||
|
|
||
|
The _base/run/printf.run_ script is just the start. It is the simplest
|
||
|
scenario possible. To execute meaningful Genode scenarios, we will need
|
||
|
to address the following points next:
|
||
|
|
||
|
* Support for asynchronous notifications
|
||
|
* Full lock implementation to retire the yielding spinlock
|
||
|
* Interrupts and I/O port access
|
||
|
* Access to memory-mapped I/O registers
|
||
|
* Timer driver
|
||
|
* DMA by user-level device drivers (IOMMU considerations)
|
||
|
* PIC-compatible system-call bindings
|
||
|
|
||
|
|