=========================================================
        Genode on seL4 - Building a simple root task from scratch
        =========================================================


                              Norman Feske


This document is a loose collection of notes about the exploration of the
seL4 and the port of the Genode system to this kernel. The seL4 kernel is
a modern microkernel jointly developed by NICTA and General Dynamics.

:[http://sel4.systems]:

  Website of the seL4 project


A fresh Genode source-code repository
#####################################

Following the convention to host each kernel in its dedicated _base-<kernel>_
source repository, the seL4-specific code will go to _base-sel4_. This way,
we can cleanly hold the seL4-specific code apart from generic Genode code.

For the start, the new repository will contain two things: This notes
document at _doc/_ and the port-description file for downloading the seL4
kernel _(ports/sel4.port)_ accompanied with the corresponding hash file
_(ports/sel4.hash)_.

Since seL4 is hosted on GitHub, writing a port-description file is easy.
We can simply use _base-nova/ports/nova.port_ as a template and adapt it:

! LICENSE   := GPLv2
! VERSION   := git
! DOWNLOADS := sel4.git
!
! URL(sel4) := https://github.com/seL4/seL4.git
! # experimental branch
! REV(sel4) := b6fbb78cb1233aa8549ea3acb90524306f49a8d2
! DIR(sel4) := src/kernel/sel4

There are two branches of seL4. The master branch is the stable branch
that contains the formally verified version of the kernel. The experimental
branch contains features that are not yet ready to be included in the
master branch. Among those features is the support for virtualization.
To anticipate current developments of the seL4 community, I am going with
the experimental branch. For the ports file, I pick the commit ID of the
most recent commit of the experimental branch at the time of porting. Once,
the _ports/sel4.port_ file is in place, we can generate the matching
port hash as follows (from the base directory of the Genode source tree):

! # create an empty hash file
! touch repos/base-sel4/ports/sel4.hash
!
! # update the hash for the current version of the sel4.port file
! ./tool/ports/update_hash sel4

With the _sel4.port_ file in place, we can download the seL4 kernel via:
! ./tool/ports/prepare_port sel4

After invoking this command, the kernel source will be located at
_contrib/sel4-<hash>/src/kernel/sel4_.


Building the kernel for the first time
######################################

For getting acquainted with the code base, the README.md file provides a
good starting point. It seems to contain exactly the information that I need
at this point. As a first test, I am going to build the kernel for the pc99
platform using the Genode tool chain. According to the README, the following
command should work

! make TOOLPREFIX=/usr/local/genode-gcc/bin/genode-x86- \
!      ARCH=ia32 PLAT=pc99

On the first attempt, the following error occurs:

! Traceback (most recent call last):
!   File "./tools/invocation_header_gen.py", line 18, in <module>
!     import tempita
! ImportError: No module named tempita
! make: *** [arch/api/invocation.h] Error 1

This problem could be easily solved by installing the 'python-tempita'
package. However, further down the road, the build process stops with
the following message:

! src/arch/ia32/kernel/boot_sys.c:75:26: error: ‘CONFIG_MAX_NUM_IOAPIC’ undeclared here (not in a function)
! src/plat/pc99/machine/hardware.c: In function ‘maskInterrupt’:
! src/plat/pc99/machine/hardware.c:36:9: error: implicit declaration of function ‘pic_mask_irq’ [-Werror=implicit-function-declaration]
! src/plat/pc99/machine/hardware.c:36:9: error: nested extern declaration of ‘pic_mask_irq’ [-Werror=nested-externs]

A 'grep' for 'MAX_NUM_IOAPIC' reveals that this definition is normally
generated by the kernel configuration program as there is a match in
_src/plat/pc99/Kconfig_. At this point, I am wondering if I am on the wrong
track altogether. On the seL4 download page, the kernel is mentioned as just
one of several "projects". But since it is the only one of the list of
projects that I am actually interested in, I wanted to focus on only this one.
But apparently, attempts to configure the kernel via 'make menuconfig' are not
supposed to work if checking out the repository in a free-standing fashion.

Sooner or later, I would have to look behind the curtain of the seL4 build
system. So why not now? Passing 'BUILD_VERBOSE=1 V=1' to 'make' is quite
helpful in this situation. A look into _include/config.h_ reveals that the
use of the autoconf-generated header _autoconf.h_ is optional. This is
nice. So I can leave the kernel-configuration magic out of the loop and just
manually specify the config definitions at the build command line. After
a few iterations, I came up with the following command-line arguments:

! make TOOLPREFIX=/usr/local/genode-gcc/bin/genode-x86- \
!      ARCH=ia32 PLAT=pc99 BUILD_VERBOSE=1 V=1 \
!      LDFLAGS+=-Wl,-melf_i386 \
!      LDFLAGS+=-nostdlib \
!      LDFLAGS+=-Wl,-nostdlib \
!      CFLAGS+=-m32 \
!      CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_MAX_NUM_IOAPIC=1 \
!      CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_IRQ_IOAPIC=1

The 'CONFIG_IRQ_IOAPIC' flag is needed to specify whether the legacy PIC
or the IOPIC should be used. It is picked up by for conditionally compiling
the code of _src/plat/pc99/machine/hardware.c_.

As the result of the build process, we get a freshly baked 'kernel.elf'
file.

Of course, we don't want Genode users to manually build the kernel in this
way. So we add a "kernel" target to our _base-sel4_ repository. The kernel
target comes in the form of a _src/kernel/target.mk_ file and a library
_lib/mk/x86_32/kernel.mk_. The _target.mk_ file is just a pseudo target
with a dependency on the _kernel.mk_ library. There may be multiple versions
of the _kernel.mk_ library. The build-system configuration determines the
version that is used. E.g., when we set up the build directory for x86_32,
the _lib/mk/x86_32/kernel.mk_ one will be used. The _kernel.mk_ file looks
as follows:

! SEL4_DIR = $(call select_from_ports,sel4)/src/kernel/sel4
!
! ifeq ($(called_from_lib_mk),yes)
! all: build_kernel
! endif
!
! LINKER_OPT_PREFIX := -Wl,
!
! build_kernel:
! 	$(VERBOSE)$(MAKE) TOOLPREFIX=$(CROSS_DEV_PREFIX) \
! 	          ARCH=ia32 PLAT=pc99 \
! 	          LDFLAGS+=-nostdlib LDFLAGS+=-Wl,-nostdlib \
! 	          $(addprefix LDFLAGS+=$(LINKER_OPT_PREFIX),$(LD_MARCH)) \
! 	          $(addprefix CFLAGS+=,$(CC_MARCH)) \
! 	          CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_MAX_NUM_IOAPIC=1 \
! 	          CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_IRQ_IOAPIC=1 \
! 	          SOURCE_ROOT=$(SEL4_DIR) -f$(SEL4_DIR)/Makefile

The pseudo target _base-sel4/src/kernel/target.mk_ exists merely for making
the result from the seL4 build directory visible in the install directory
_(bin/)_.

! TARGET = sel4
! LIBS   = kernel
!
! $(INSTALL_DIR)/$(TARGET):
! 	$(VERBOSE)ln -sf $(LIB_CACHE_DIR)/kernel/kernel.elf $@

Genode's build system works in two stages. At the first (light-weight) stage,
it merely determines library dependencies. At the second stage, the actual
build steps are performed. The condition around the 'all' target ensures that
the 'build_kernel' target will be visited only at the second build stage with
the current working directory set to the library location and all build
variables (such as CROSS_DEV_PREFIX) defined. Fortunately, the seL4 build
system supports out-of-tree builds by defining the SOURCE_ROOT variable.

To test drive the kernel target, we first need to create Genode build
directory prepared for seL4. Later, we will add seL4 support to the regular
'create_builddir' tool. But for now, it suffices to create one manually:

# Create a new directory with an _etc/_ subdirectory.

# Add an _etc/specs.conf_ file with the following content:
  ! SPECS = sel4 x86_32
  This tells the build system the so-called build specification.

# Add an _etc/build.conf_ file with the following content:
  ! GENODE_DIR   := /path/to/your/genode.git
  ! BASE_DIR     := $(GENODE_DIR)/repos/base
  ! REPOSITORIES := $(GENODE_DIR)/repos/base-sel4 \
  !                 $(GENODE_DIR)/repos/base
  ! CONTRIB_DIR  := $(GENODE_DIR)/contrib

  GENODE_DIR must point to the root of Genode's source tree.
  BASE_DIR is used by the build system to find the build-system scripts.
  The CONTRIB_DIR is needed to enable the build system to find the location
  of the seL4 source code. The REPOSITORIES declaration lists all source
  repositories that should be included in the build. Note that the order
  is important. By listing 'base-sel4' before 'base', we allow 'base-sel4'
  to override the content of the 'base' repository.

# Symlink the _<genode-dir>/too/builddir/build.mk_ file to a _Makefile_.
  ! ln -s <genode-dir>/tool/builddir/build.mk Makefile

With the build directory, we can trigger the kernel build via 'make kernel'.
The seL4 build process will be invoked from within the kernel library. Hence,
the library's directory _var/libcache/kernel_ will be used as the build
directory of the kernel.


Starting the kernel in Qemu
###########################

To test-drive the kernel, we need to create a bootable image including a boot
loader, the boot-loader configuration, and the kernel. To spare us the
manual work, Genode comes with a so-called run tool that automates this
procedure. We merely need to supplement the run tool with a script snippet
that describes how a boot image is assembled for the seL4 base platform.
This snippet has to reside at _tool/run/boot_dir/sel4_. To create it, we can
take inspiration from the versions for the other platforms such as NOVA. It
comes down to implementing the function 'run_boot_dir' according to our needs.
For the most part, the function contains the generation of the boot-loader
configuration, i.e., GRUB's _menu.lst_ file. At the current stage, we simply
load the seL4 ELF image as multi-boot kernel. For using the run tool for the
initial test, the following small run script at _base-sel4/run/test.run_ does
the trick:

! create_boot_directory
! build_boot_image ""
! append qemu_args " -nographic -m 64 "
! run_genode_until forever

To invoke it, all we have to do is issuing 'make run/test' from the build
directory. All we see, however, is, well, nothing.

Normally we would expect from the kernel to print a life sign when booting
up. This is suggested by a 'printf' in _arch/arm/kernel/boot.c_. So the lack
of output is most certainly a configuration issue. A look in
_include/plat/pc99/plat/machine/io.h_ indicates that 'kernel_putchar', which
is used as the back end of 'printf' _(src/machine/io.c)_ is compiled-in only
if DEBUG mode is enabled. Adding 'DEBUG=1' to the sel4 build command (in our
_lib/mk/x86_32/kernel.mk_ file) enables this mode.

But enabling the DEBUG option comes with a strange surprise. The kernel build
would fail with the following error:

! kernel.o: In function `insert_dev_p_reg':
! kernel_final.c:(.boot.text+0x131a): undefined reference to `putchar'

The message is strange because the seL4 source code is void of any call to
'putchar'. It turned out that the compiler automatically turned a 'printf'
with a one-character string ("/n") into a call of 'putchar'. Fortunately,
this built-in heuristic can be disabled by passing '-fno-builtin-printf'
to the CFLAGS when compiling seL4. With this fix, we can successfully compile
the kernel in debug mode, boot it in Qemu, and observe the first life sign:

! Boot config: parsing cmdline '/sel4'
! Boot config: console_port of node #0 = 0x3f8
! Boot config: debug_port of node #0 = 0x3f8
! Boot config: max_num_nodes = 1
! Boot config: num_sh_frames = 0x0
! seL4 failed assertion '_ndks_end - _ndks_start <= NDKS_SIZE'
!   at src/arch/ia32/kernel/boot_sys.c:424 in function try_boot_sys

The NDKS section apparently contains those parts of the image that should
always be mapped at the upper part of the virtual address space. The assertion
triggers because the section of the binary is larger (40900 bytes) than the
assumed limit NDKS_SIZE (12288 bytes). According to the linker script
_(src/plat/pc99/linker.lds)_, the section includes the kernel stack, the BSS,
and "COMMON" symbols. When looking at the list of those symbols using 'nm |
sort', the 'ksReadyQueues' symbol raises my eyebrows because it is quite
large (32768 bytes) compared to all others. This symbol belongs to an array,
which is dimensioned as CONFIG_NUM_DOMAINS * CONFIG_NUM_PRIORITIES. As we are
using the default configuration values of _include/config.h_,
CONFIG_NUM_DOMAINS is 16 and CONFIG_NUM_PRIORITIES is 256. Hence, the array
has 4096 entries with 8 bytes for each entry ('tcb_queue_t' with two
pointers). Interestingly, the default value of NUM_DOMAINS in the _Kconfig_ is
1, which would result in a much smaller 'ksReadyQueues' array. In fact,
passing the following argument to the kernel build fixes the assertion:

! CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_NUM_DOMAINS=1

Now, the kernel bootstraps successfully, detects one CPU and tries to
obtain information about the boot modules, which we don't have provided yet.
Hence, the kernel backs out with the following message:

! Boot loader did not provide information about boot modules
! seL4 called fail at src/arch/ia32/kernel/boot_sys.c:711
!   in function boot_sys, saying "boot_sys failed for some reason :(

Well, this is expected. So it is time to take the first baby step of providing
a root task to the system.


A root task for exercising the kernel interface
###############################################

At this point, there are two options. We could either start with a very
minimalistic hello-world program that is completely unrelated to Genode. Such
a root-task from scratch could be used to explore the individual system calls
before jumping into Genode. The second option would be to go directly for
a Genode program, which includes a proper C++ runtime and Genode's generic
linker script.

I went for the latter option and created a simple test program at
_base-sel4/src/test/sel4/_. The _target.mk_ file looks as follows:

! TARGET = test-sel4
! SRC_CC = main.cc
! LIBS   = cxx startup

The 'main' function of the _main.cc_ file does not much except for writing
at an illegal (yet known) address:

! int main()
! {
! 	*(int *)0x1122 = 0;
! 	return 0;
! }

When attempting to build the program by issuing 'make test/sel4' from within
our build directory, the build system will attempt to compile the C++ runtime,
which, in turn, depends on a few Genode headers. Some headers are readily
provided by the _base/_ repository but others are expected to be supplied
by the _base-<kernel>_ repository. The compile errors look like this:

!     COMPILE  malloc_free.o
! In file included from repos/base/include/parent/capability.h:17:0,
!                  from repos/base/include/base/env.h:20,
!                  from repos/base/src/base/cxx/malloc_free.cc:17:
! repos/base/include/base/capability.h:21:31:
!   fatal error: base/native_types.h: No such file or directory

For now, we can supply a dummy version of this header, which contain
preliminary type definitions. To see, which types are expected from which
header file, we can take cues from the other _base-<kernel>_ platforms.

At link time, we will observe plenty of undefined references originating
from the startup code. Most of those references concern basic data
structures such as the AVL tree. Normally, those symbols are provided by
the so-called _base-common_ library, which is used both core and non-core
programs. We can take a copy of a _base-common.mk_ file from one of the
other base platforms as a starting point. I decided to go for the version
provided by _base-nova_. With the base-common library present, we can
replace the 'cxx' and 'startup' libs by the base-common library in the
'LIBS' declaration of our test-sel4 program. Because base-common already
depends on both 'cxx' and 'startup', there is no need to specify those
libraries twice.

On the attempt to compile the base-common library, we will stumble over
further missing headers such as _ipc_msgbuf.h_. Here the _base-linux_
version of this file becomes handy because it is free-standing.

When compiling _lock.cc_, the compiler complaints about the missing
_lock_helper.h_ file. This platform-specific file that is internally used by
the lock implementation is normally expected
at _base-<kernel>/src/base/lock/_. For now, just for getting the binary
linked, we provide empty dummies:

! static inline void thread_yield() { }
! static inline void thread_switch_to(Genode::Thread_base *thread_base) { }
! static inline void thread_stop_myself() { }
! 
! static inline bool thread_check_stopped_and_restart(Genode::Thread_base *)
! {
! 	return false;
! }

The next missing piece is the platform-specific implementation of the IPC
API _(ipc/ipc.cc and ipc/pager.cc)_, which are expected to reside at
_base-sel4/src/base/ipc/_. We just take the simplest version of the other
base platforms (in this case _base-codezero/src/base/ipc/_) and strip it down
to become a mere dummy.

By adding the base-common library, the number of unresolved references
decreased by a great amount. Some functions are still unresolved.
There are many references to 'printf', which is not part of the base-common
library because core uses a different back end (typically a kernel debugger)
than non-core processes (using a session to core's LOG service). Because
our test plays the role of a root task, we can include core's version by
adding _base/src/base/console/core_printf.cc_ as source to our target
description file.

! SRC_CC += base/console/core_printf.cc
! vpath %.cc $(BASE_DIR)/src

The back end of _core_printf.cc_ has to be provided by a header at
_base-sel4/src/base/console/core_console.h_. For now, we just provide
an empty implementation of '_out_char'. To let the compiler find the
header, we need to extend the include search path as follows:

! INC_DIR += $(REP_DIR)/src/base/console

For the unresolved references of 'env_context_area_ram_session' and
'env_context_area_rm_session', the following dummy will do:

! #include <ram_session/ram_session.h>
! #include <rm_session/rm_session.h>
!
! namespace Genode {
! 	Rm_session  *env_context_area_rm_session()  { return nullptr; }
! 	Ram_session *env_context_area_ram_session() { return nullptr; }
! }

The remaining piece of the puzzle is the 'Genode::env()' function, which
is an accessor to the Genode environment. Because the Genode environment does
not exist yet, we provide a dummy 'env' called _mini_env_. Initially, this
implementation of the 'Genode::Env' interface merely returns dummy values
(null pointers and invalid capabilies).

After this step, the test-sel4 target links successfully. To use it as
roottask, we have to modify our _test.run_ script by

# Adding a build step
  ! build { test/sel4 }
# Specifying the test-sel4 binary as boot module so that the our run
  environment includes it in the boot image and appends the file as module
  to the boot-loader configuration.
  ! build_boot_image "test-sel4"

When issuing 'make run/test' now, we get the following messages from the
kernel:

! Detected 1 boot module(s):
!   module #0: start=0x169000 end=0x186754 size=0x1d754
!              name='/genode/test-sel4'
! ELF-loading userland images from boot modules:
!   module #0 for node #0: size=0x27000 v_entry=0x136cc
!                          v_start=0x0 v_end=0x27000
!                          p_start=0x187000 p_end=0x1ae000
! Moving loaded userland images to final location:
!                          from=0x187000 to=0x15e000 size=0x27000
!
! Starting node #0
! Caught cap fault in send phase at address 0x0
! while trying to handle:
! vm fault on data at address 0x9090c3fb with status 0x4
! in thread 0xe0189880 at address 0x314a

Assuming that 'v_start' and 'v_end' stand for the virtual address range
of the loaded binary, those numbers look weird. Normally, Genode binaries
are linked at a much higher address, e.g., 0x1000000. By inspecting the
binary via the readelf command, it turns out that we haven't yet declared the
link address for the seL4 platform. So its time to introduce a so-called spec
file for the "sel4" build-spec value. The new file _base-sel4/mk/spec-sel4.mk_
will be incorporated into the build process whenever the 'SPEC' declaration of
the build directory has the value "sel4" listed. To define the default link
address for the platform, the file has the following content:

! LD_TEXT_ADDR ?= 0x01000000

When issuing 'make test/sel4 VERBOSE=', we can see the link address specified
at the linker command line. Another look at the binary via readelf confirms
that the location of the text segment looks good now. Re-executing the run
script produces the same result as before though. But the last message is
quite helpful:

! vm fault on data at address 0x0 with status 0x4
! in thread 0xe0189880 at address 0x100312b

The last address lies well within the text segment of our binary. It seems
that the kernel has kicked off the execution of our root task (presumably
at the entry point address as found in the ELF binary). At some point, our
"root task" de-references a null pointer. Given that several of the dummy
functions that we just created, return null pointers, this is hardly a
surprise. So let us have a look how far we have come by inspecting the
fault address 0x100312b using
! objdump -lSd bin/test-sel4 | less
In less, we search for the pattern "100312b". We see that the surrounding
code belongs to the heap implementation ('Heap::_allocate_dataspace'). It
seems that someone is trying to use 'Genode::env()->heap()'. To confirm
this assumption, we can use Qemu's GDB stub to obtain a backtrace. We have
to take the following steps:

# We want to halt the execution at the point where the fault would happen
  instead of triggering a page fault. The easiest way to accomplish that
  is to insert an infinite loop at the right spot. The infinite loop serves
  us as a poor man's break point. In our case, we add a 'for (;;);' statement
  right at the beginning of the '_allocate_dataspace' function in
  _repos/base/src/base/heap/heap.cc_. When re-executing the run script after
  this change, we can see that the fault message won't appear.

# We cancel the execution of the run script and start Qemu manually using
  the command-line arguments that are found in the log output (the message
  "spawn qemu ...". To enable Qemu's GDB stub, we have to append "-s" as
  argument.
  ! qemu-system-i386 -nographic -m 64 -cdrom var/run/test.iso -s

# When qemu is running, we start GDB from another shell (changing to
  our build directory) as follows:
  ! gdb bin/test-sel4 -ex "target remote :1234"

# Listing the backtrace via the 'bt' command is quite revealing (output
  slightly edited for brevity):

  ! (gdb) bt
  ! #0  Genode::Heap::_allocate_dataspace
  !     at genode/repos/base/src/base/heap/heap.cc:57
  ! #1  0x010030b9 in Genode::Heap::_unsynchronized_alloc
  !     at genode/repos/base/src/base/heap/heap.cc:181
  ! #2  0x01003155 in Genode::Heap::alloc
  !     at genode/repos/base/src/base/heap/heap.cc:199
  ! #3  0x01010732 in malloc
  ! #4  0x0100f539 in start_fde_sort
  ! #5  init_object
  !     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:768
  ! #6  search_object
  !     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:958
  ! #7  0x01010375 in _Unwind_Find_registered_FDE
  !     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:1022
  ! #8  _Unwind_Find_FDE
  !     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde-dip.c:440
  ! #9  0x0100d4c3 in uw_frame_state_for
  ! #10 0x0100dff8 in uw_init_context_1
  !     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2.c:1500
  ! #11 0x0100e3ea in _Unwind_RaiseException
  ! #12 0x01011f51 in __cxa_throw ()
  ! #13 0x01013751 in init_main_thread ()
  !     at genode/repos/base/src/platform/init_main_thread.cc:119
  ! #14 0x010134fe in _start ()
  !     at genode/repos/base/src/platform/x86_32/crt0.s:47

We can see that Genode's startup code tries to throw the first C++ exception.
There is a lengthy comment at the corresponding code portion that explains
the rationale behind throwing an exception right from the startup code.
The exception handling, in particular the stack unwinding) is done by the
GCC support library, which eventually calls 'malloc' to allocate some
backing store for metadata. Genode has no 'malloc' function. But for making
the GCC support library happy (which expects a C runtime to be present), our
Genode-specific C++ support code in the form of the 'cxx' library comes with
a library-local version of malloc _(base/src/base/cxx/malloc_free.cc)_. This
malloc implementation uses a separate 'Genode::Heap' instance. The instance
has an initial capacity of 512 bytes. If the allocations exceed this capacity,
this heap will try to expand using 'env()->ram_session()' as backing store.
This is why the '_allocate_dataspace' function was called. At the current
stage, we don't have an implementation of 'env()->ram_session()'. To work
around this early allocation issue, we can simply increase the capacity of
the 'initial_block', let's say by factor 10. When re-executing the run script
now, '_allocate_dataspace' won't be called. Hence, we execution won't get
stuck in our infinite loop (we can remove the loop either way now). Instead,
we get another null-pointer dereference:

! vm fault on data at address 0x0 with status 0x4
! in thread 0xe0189880 at address 0x1003ab7

The procedure to investigate the reason for this page fault is exactly the
same as for the first one, using objdump, infinite loops, and Qemu's GDB stub.

This time, the null-pointer dereference occurs in
_base/src/base/thread/thread.cc_ on the attempt to allocate so-called
thread context for the main thread. Such thread contexts contain the
stacks and meta data of threads. They are placed in a dedicated virtual
memory area that is manually managed by the process.

For our minimalistic root task, we don't have Genode's address-space
management facilities at our disposal yet. So we have to side-step the
'_alloc_context' function somehow. This can be accomplished by using
custom version of _thread.cc_ instead of the one provided by the _base/_
repository. So we remove _thread.cc_ from the base-common library for now
and add a new _thread.cc_ file to our test-sel4 target. The following
dummy stub will do for now:

! static Thread_base::Context *main_context()
! {
! 	enum { STACK_SIZE = sizeof(long)*4*1024 };
! 
! 	static long buffer[STACK_SIZE/sizeof(long)];
! 
! 	/* the context is located beyond the top of the stack */
! 	addr_t const context_addr = (addr_t)buffer + sizeof(buffer)
! 	                          - sizeof(Thread_base::Context);
! 
! 	Thread_base::Context *context = (Thread_base::Context *)context_addr;
! 	context->stack_base = (addr_t)buffer;
! 	return context;
! }
! 
! Thread_base *Thread_base::myself() { return nullptr; }
! 
! Thread_base::Thread_base(const char *name, size_t stack_size, Type type,
!                          Cpu_session *cpu_session)
! :
! 	_cpu_session(cpu_session), _context(main_context())
! {
! 	strncpy(_context->name, name, sizeof(_context->name));
! 	_context->thread_base = this;
! 
! 	_init_platform_thread(type);
! }
! 
! Thread_base::Thread_base(const char *name, size_t stack_size, Type type)
! : Thread_base(name, stack_size, type, nullptr) { }
! 
! Thread_base::~Thread_base() { _deinit_platform_thread(); }

Genode's startup code will change the stack of the main thread prior calling
the main function. This way, the stack can be placed at a dedicated area of
the virtual address space (thread-context area). By not placing the stack
close to the program image, stack overflows won't silently corrupt data but
trigger a page fault. The code above, however, allocates a dummy context for
the main thread in the BSS segment (the 'buffer' used as backing store for the
context is a static variable). This code above is intended as an interim
solution for the initialization of the main thread only.

When executing our run script again, we will get the following message:

! vm fault on data at address 0x1122 with status 0x6
! in thread 0xe0189880 at address 0x1000190

This message is caused by the main function of our test program! In this
function, we deliberately triggered a fault at the address 0x1122 via
the statement '*(int *)0x1122 = 0;'.


Issuing the first system call
#############################

We have successfully started our custom root task but we have not interacted
with the kernel yet. So it is time to take a look at seL4's system-call
interface. The interfaces comes in the form of several header files within
sel4's _libsel4/_ directory. At the first glance, the directory layout looks
straight forward. The generic parts reside in _libsel4/include/_ whereas
the architecture-depending parts are located at _libsel4/arch_include/_.
However, when skimming over the headers, it becomes apparent that some of
them are generated from XML files. Also, some headers are including
a top-level header '<autoconf.h>'.

To make sel4's kernel interface visible to the Genode world, we use a pseudo
library called _platform.mk_. The platform library is built before all other
libraries and targets and thereby gives us a hook to populate the build
directory with a custom include-directory structure. Because the selection of
the kernel-interface header depends on the architecture, we place the
_platform.mk_ file at _lib/mk/x86_32_. To create the platform library, we can
take OKL4's version as a blue print. When the library gets built, the include
directory structure will be created as a side effect. However, we cannot
implement the removal of the _include/_ directory of a side effect of a clean
rule because library description files have no clean rule (the build system
just wipes the respective library directory when cleaning). To complement the
creation of the _include/_ directory structure with a corresponding clean
rule, the _base-sel4/mk/spec-sel4.mk_ file can be extended with such a rule,
which will be globally visible.

The platform library is used as a mere hook to create the include directory
structure within the build directory. To allow a program to actually use
those headers, we'd need to extend the include-search path accordingly.
One way would be to have each target specify the build-directory's local
include path via 'INC_DIR += $(BUILD_BASE_DIR)/include'. Not too bad. However,
to make the use of the syscall headers more convenient, we introduce just
another library called _syscall_. The library is solely used for providing
a so-called library-import file to all targets that use the library. The
import file contains the extension of the include-search path. Additionally,
we extend the 'REP_INC_DIR' with the value "include/sel4". This way, we can
place custom headers (such as a version of _autoconf.h_) within one of the
Genode source repositories under _include/sel4/_. Those headers will appear
to be located at the inlude root scope for such targets. Speaking of
_autoconf.h_, this header is expected by some sel4 includes to distinguish
the debug mode from the non-debug mode. As we want to enable the debug
functionality, we supply our version of the _autoconf.h_ file at
_base-sel4/include/sel4/autoconf.h_ with the definition:

! #define SEL4_DEBUG_KERNEL 1

Besides the _autoconf.h_ file, the kernel-interface headers also require
an _stdint.h_ to be present at the include root scope. So we place a version
of this file at _base-sel4/include/sel4/stdint.h_.

! #include <base/fixed_stdint.h>
! 
! typedef genode_uint32_t uint32_t;
! 
! #ifndef NULL
! #define NULL ((void *)0)
! #endif

For trying out the access to the kernel-interface headers, we let our target
use the syscall library by extending the
_target.mk_ file with 'LIBS += syscall'. Then, we change the main program
as follows.

! /* Genode includes */
! #include <util/string.h>
! 
! /* seL4 includes */
! #include <sel4/arch/syscalls.h>
! 
! int main()
! {
! 	char const *string = "\nMessage printed via the kernel\n";
! 	for (unsigned i = 0; i < Genode::strlen(string); i++)
! 		seL4_DebugPutChar(string[i]);
! 
! 	*(int *)0x1122 = 0;
! 	return 0;
! }

Here we see three things. First, we can use Genode's usual utilities such as
the string functions, which is quite convenient. Second, we include one of
seL4 headers. And third, we try to invoke the 'seL4_DebugPutChar' system call
to print a string before triggering the page fault at address 0x1122. Before
we can run the program, through, we just have to overcome yet another problem.
On the attempt to build it, we get the following message:

!     COMPILE  main.o
! In file included from include/sel4/arch/syscalls.h:15:0,
!                  from genode/repos/base-sel4/src/test/sel4/main.cc:18:
! include/sel4/types.h:16:28: fatal error: sel4/types_gen.h: No such file or directory

A look at the top-level _Makefile_ of seL4 reveals that this file is generated
from a "types.bf" file using a python script called _tools/bitfield_gen.py_.
So we have to add a rule for generating this file to our platform library.

! $(BUILD_BASE_DIR)/include/sel4/types_gen.h: $(LIBSEL4_DIR)/include/sel4/types.bf
! 	$(VERBOSE)python $(LIBSEL4_DIR)/tools/bitfield_gen.py \
! 	                 --environment libsel4 "$<" $@

The next missing header that we stumble over is _assert.h_. So we have to add
a simple version of _assert.h_ to _base-sel4/include/sel4/_. This version
uses Genode's PDBG facility to print error messages, which is, of course, not
expected to work yet.

On the next attempt to build the program, the compilation fails because
of a missing header again:

! In file included from include/sel4/arch/syscalls.h:15:0,
!                  from genode/repos/base-sel4/src/test/sel4/main.cc:18:
! include/sel4/types.h:17:26: fatal error: sel4/syscall.h: No such file or directory

The _libsel4/Makefile_ contains the rule to generate this file. At this point,
I am wondering whether to use this Makefile to add those rules to our
platform library. I decided for the latter because there are not too many
files to generate, I will need to look behind the scenes sooner or later
anyway, and I would need to supply some some additional environment (such
as providing a _common.mk_ file in order to invoke the original Makefile.
The rules for generating headers like _syscall.h_ look similar to the rule
for _types_gen.h_ above.

With _sel4/syscall.h_ in place, we get confronted with another problem:

! include/sel4/arch/syscalls.h: In function ‘void seL4_Send(seL4_CPtr, seL4_MessageInfo_t)’:
! include/sel4/arch/syscalls.h:32:26: error: ‘seL4_GetMR’ was not declared in this scope

Fortunately, this error is simply caused by a missing include directive of
_syscalls.h_. The 'seL4_GetMR' function is provided by _sel4/arch/functions.h_
but this file is never included except for _sel4/sel4.h_. I assume that seL4
users are expected to always include the _sel4/sel4.h_ file instead of
including individual kernel headers. By prepending the include of
'<sel4/arch/functions.h>' to the include of '<sel4/arch/syscalls.h>, the
problem goes away and clears the path for the next one:

! include/sel4/arch/syscalls.h: In function ‘void seL4_DebugPutChar(char)’:
! include/sel4/arch/syscalls.h:478:16: error: ‘seL4_SysDebugPutChar’ was not declared in this scope

This message is accompanied with similar errors for "seL4_SysDebugHalt",
"seL4_SysDebugSnapshot", and "seL4_SysDebugCapIdentify".
A look in the generated _include/sel4/syscall.h_ reveals that those
declarations exists only when building in DEBUG mode. Hence, we need
to add the following line to our _autoconf.h_ file:

! #define DEBUG 1

The next problem is more tricky:

! include/sel4/arch/syscalls.h: In function ‘int main()’:
! include/sel4/arch/syscalls.h:481:6: error: impossible register constraint in ‘asm’

The error refers to the system-call binding for 'seL4_DebugPutChar'. After
twiddling with the asm constraints, it turns out that the error is caused by
the use of an enum value as input argument. The C++ compiler is free to pick
any integer type that it sees fit for representing an enum value. Even though
the seL4 developers use a helper macro (SEL4_FORCE_LONG_ENUM) to force a
certain minimum bit width for the used type, the C++ compiler complains. By
explicitly casting the input argument to 'int', this ambiguity can be resolved
and the compiler becomes happy. Unfortunately, this means that I will have to
patch the system-call bindings to make them fit for the use with C++. But
looking at the bindings, I think that a patch won't be avoidable anyway
because the bindings clobber the EBX register. This means that we won't be
able to use the headers for compiling position-independent code (as is the
case for Genode). For now, we have are not compiling with '-fPIC' enabled but
this issue is clear in front of us.

Patches for the seL4 code will be reside at _base-sel4/src/kernel/_. E.g.,
we just added the current modification of the _syscalls.h_ header by
copying a git diff to the file _base-sel4/src/kernel/syscalls.patch_.
To apply the patch automatically when preparing the seL4 port, we need
to modify the _base-sel4/ports/sel4.port_ file by adding the following
lines:

! PATCHES   := src/kernel/syscalls.patch
! PATCH_OPT := -p1 -d src/kernel/sel4

Since we modified the port-description file, we need to update the
accompanied hash via './tool/ports/update_hash sel4'.

_Edit: After consulting the seL4 mailing list, Adrian Danis pointed out that_
_the actual issue is an off-by-one bug in the SEL4_FORCE_LONG_ENUM macro._
_So instead of explicitly casting all opcodes to integers, the macro_
_can be fixed at one place. Hence, I replaced the syscalls.patch by a_
_macros.patch until the fix appears upstream._

Anyway, after all the steps, our test-sel4 program can be successfully
built. Executing the run script produces the result that we longed for:

! Message printed via the kernel
! Caught cap fault in send phase at address 0x0
! while trying to handle:
! vm fault on data at address 0x1122 with status 0x6
! in thread 0xe0189880 at address 0x10001d8

The first line is produced by our test program. Knowing how to print
characters using the kernel's debug interface, filling out the empty
stub function 'Genode::Core_console::_out_char' in _core_console.h_
is easy. We can replace the main program with this version:

! #include <base/printf.h>
! 
! int main()
! {
! 	PDBG("a message printed via Genode's PDBG");
! 
! 	*(int *)0x1122 = 0;
! 	return 0;
! }

When running it via 'make run/test', it produces the expected result:

! int main(): a message printed via Genode's PDBG


Exploration of the kernel interface
###################################

Now that we have a nice playground in place, it is time to explore the
actual kernel interface step by step. The goal is to get a tangible
feeling for the kernel and to exercise the functionality that is needed
by Genode. Those functionalities are:

* Access to boot information such as the memory layout and the boot modules
* Multi-threading
* Process-local synchronization
* Synchronous inter-process communication between threads
* Address-space creation
* Page-fault handling

At this point, it is useful to take a look at the excellent seL4 reference
manual to learn the concepts behind the kernel interface:

:[http://sel4.systems/Docs/seL4-manual.pdf]:

  seL4 reference manual


Obtaining boot information
==========================

The manual mentions the function 'seL4_GetBootInfo' to obtain a pointer to
a boot-info structure. The function implementation, however, requires a
prior call of 'seL4_InitBootInfo' from the startup code. According to the
manual, the startup code gets the pointer passed in a CPU register. It
presumably registers this pointer via 'seL4_InitBootInfo'. Of course, we
don't want to change Genode's kernel-agnostic startup code by inserting
a call to the 'seL4_InitBootInfo' function. Fortunately, this is not the
first time, Genode has to pick up a register value passed by the kernel to
root task via a CPU register. The startup code already saves the initial
stack pointer, EAX and EDI. The seL4 manual does not state, which register
is used. In the kernel code ('create_initial_thread'), the register is denoted
at "capRegister". A look into _ia32/arch/machine/registerset.h_ reveals that
this register is actually EBX on the x86 architecture. Since EBX is not
saved by Genode's startup code yet, we need to enhance the startup code
a bit to save the initial EBX value in the variable '_initial_bx'.

The boot-info type is declared in _sel4/bootinfo.h_. When including the
header, we have to expand our _sel4/stdint.h_ version by a definition
of 'uint8_t'. Also, the header expects CONFIG_MAX_NUM_BOOTINFO_UNTYPED_CAPS
to be defined. We are just using the kernel's default config value, which
we copy to our _sel4/autoconf.h_ file.

! #define CONFIG_MAX_NUM_BOOTINFO_UNTYPED_CAPS 800

With those changes in place, we can access the boot info with a utility like
this:

! #include <sel4/bootinfo.h>
! 
! static seL4_BootInfo const *boot_info()
! {
! 	extern Genode::addr_t __initial_bx;
! 	return (seL4_BootInfo const *)__initial_bx;
! }

While writing a simple 'dump_boot_info' function, I noticed that some boot-info
fields mentioned in the manual and present on the master branch, have
disappeared in the experimental branch, i.e., 'numDeviceRegions'.

Anyway, the output of 'dump_boot_info' function looks like this:

! --- boot info at 102c000 ---
! ipcBuffer:               102b000
! initThreadCNodeSizeBits: 12
! untyped:                 [38,4d)
!                          [38] [00100000,00107fff]
!                          [39] [00108000,00109fff]
!                          [3a] [001a0000,001bffff]
!                          [3b] [001c0000,001fffff]
!                          [3c] [00200000,003fffff]
!                          [3d] [00400000,007fffff]
!                          [3e] [00800000,00ffffff]
!                          [3f] [01000000,01ffffff]
!                          [40] [02000000,02ffffff]
!                          [41] [03000000,037fffff]
!                          [42] [03800000,03bfffff]
!                          [43] [03c00000,03dfffff]
!                          [44] [03e00000,03efffff]
!                          [45] [03f00000,03f7ffff]
!                          [46] [03f80000,03fbffff]
!                          [47] [03fc0000,03fdffff]
!                          [48] [03fe0000,03feffff]
!                          [49] [03ff0000,03ff7fff]
!                          [4a] [03ff8000,03ffbfff]
!                          [4b] [03ffc000,03ffdfff]
!                          [4c] [00189000,001897ff]


Creating a thread
=================

On seL4, kernel objects are created by turning untyped memory into kernel
memory using the 'seL4_Untype_Retype' function. As a first test. I am going to
manually define the parameters for this function using the information from
the boot info.

But before I can start using this function, we first need to generate some
stub code. We need to generate _sel4/invocation.h_ and its corresponding
_sel4/arch/invocation.h_. The rules in the platform library are quickly added,
taking the seL4 Makefile as inspiration. We also need to generate the
_interfaces/sel4_client.h_ stub code. In the stub code, we find the function
'seL4_Untyped_RetypeAtOffset', which pretty much matches the signature of
'seL4_Untype_Retype' explained in the manual. This is just another difference
from the master branch.

I decided to proceed with invoking 'seL4_Untyped_RetypeAtOffset' by
manually specifying its parameters. At this point, I am having a hard time
with wrapping my mind around seL4's kernel-resource management, in particular
the CNode addressing. My first attempt looked like this:

! {
!   seL4_Untyped const service     = 0x38; /* untyped */
!   int          const type        = seL4_TCBObject;
!   int          const offset      = 0;
!   int          const size_bits   = 0;
!   seL4_CNode   const root        = seL4_CapInitThreadCNode;
!   int          const node_index  = 0;
!   int          const node_depth  = 32;
!   int          const node_offset = 0x100;
!   int          const num_objects = 1;
!
!   int const ret = seL4_Untyped_RetypeAtOffset(service,
!                                               type,
!                                               offset,
!                                               size_bits,
!                                               root,
!                                               node_index,
!                                               node_depth,
!                                               node_offset,
!                                               num_objects);
!   PDBG("seL4_Untyped_RetypeAtOffset returned %d", ret);
! }

Admittedly, I feel a bit flabbergasted by the amount of arguments and not 100%
certain what I am doing here. I put the individual arguments into named
constants instead of directly supplying them to the function to make their
meaning easier to memorize. The 'service' argument refers to one of the
untyped memory capabilities reported by the boot info. This memory will be
turned into a thread control block (TCB). The 'node_offset' is a presumably
free capability slot of the root CScope that is supposed to be free. This is
where we want to store the capability for the newly created thread.

When executing the code, the kernel reports an error like this:
! vm fault on data at address 0x1e8 with status 0x6
! in thread 0xe0189880 at address 0x10002ed

The fault is triggered by the function 'seL4_SetCap', more precisely by the
instruction:
!  mov %eax,%gs:0x1e8(,%ecx,4)

It appears that the seL4 bindings rely on a thread-local-storage facility
via GS-relative addressing. When the kernel switches thread contexts, it
loads a segment with a thread-specific memory location. Since we have not
initialized the GS register with a particular value, we end up addressing
the first page, which is not mapped. The issue could be resolved by
initializing the GS register as follows:

! static inline void init_ipc_buffer()
! {
!   asm volatile ("movl %0, %%gs" :: "r"(IPCBUF_GDT_SELECTOR) : "memory");
! }

The IPCBUF_GDT_SELECTOR is defined by the seL4 headers. On the next attempt
to execute the code, we get a much nicer kernel message:

! <<seL4 [decodeUntypedInvocation/136 Te0189880 @100035a]:
!         Untyped Retype: Destination cap invalid or read-only.>>

In reality, the message looks even much better - it is in color!
The message tells us that the kernel has actually received our request
for retyping untyped memory but the arguments are messed up.
The message comes from _src/object/untyped.c_:

! /* Lookup the destination CNode (where our caps will be placed in). */
! if (nodeDepth == 0) {
!     nodeCap = extraCaps.excaprefs[0]->cap;
! } else {
!     cap_t rootCap = extraCaps.excaprefs[0]->cap;
!     lu_ret = lookupTargetSlot(rootCap, nodeIndex, nodeDepth);
!     if (lu_ret.status != EXCEPTION_NONE) {
!         userError("Untyped Retype: Invalid destination address.");
!         return lu_ret.status;
!     }
!     nodeCap = lu_ret.slot->cap;
! }

Apparently, by specifying the value 32 for the depth argument, we entered
the code path for traversing a CNode tree instead of just inserting a
capability into the root CScope. By changing 'node_depth' to 0, system
call returns successfully:
! int main(): seL4_Untyped_RetypeAtOffset returned 0

Even though the new thread has no valid register set and no defined space,
let us see what happens when we try to start it anyway. This can be done
via the 'seL4_TCB_Resume' call with our just created TCB capability as
argument.

! seL4_TCB_Resume(0x100);

This results in the following exciting output:

! Caught cap fault in send phase at address 0x0
! while trying to handle:
! vm fault on data at address 0x1122 with status 0x6
! in thread 0xe0189880 at address 0x10003a5
! Caught cap fault in send phase at address 0x0
! while trying to handle:
! vm fault on data at address 0x0 with status 0x4
! in thread 0xe0100080 at address 0x0

We see two threads faulting! The main thread faults at our "breakpoint"
0x1122. But there is also another thread, which faults at 0x0. We can see
that the TCB creation via 'seL4_Untyped_RetypeAtOffset' was successful!

Now, turning the situation into something useful seems like a walk in the
park: We need to allocate a stack for the new thread and set up the initial
program counter and stack pointer of the new thread. At this point, I decide
to give the number 0x100 a proper name "SECOND_THREAD_CAP" because it will be
repeatedly used.

! enum { SECOND_THREAD_CAP = 0x100 };

Following the manual, we have to call 'seL4_TCB_WriteRegisters' and
'seL4_TCB_SetSpace'. The following code snippet allocates the stack for
the new thread on the stack of the main thread, initializes the stack
pointer and program counter of the new thread, assigns the new thread to
the same address space as the main thread, and kicks off the execution of
the new thread.

!  long stack[0x1000];
!  {
!    seL4_UserContext regs;
!    Genode::memset(&regs, 0, sizeof(regs));
!
!    regs.eip = (uint32_t)&second_thread_entry;
!    regs.esp = (uint32_t)&stack[0] + sizeof(stack);
!    int const ret = seL4_TCB_WriteRegisters(SECOND_THREAD_CAP, false,
!                                            0, 2, &regs);
!    PDBG("seL4_TCB_WriteRegisters returned %d", ret);
!  }
!
!  {
!    seL4_CapData_t no_cap_data = { { 0 } };
!    int const ret = seL4_TCB_SetSpace(SECOND_THREAD_CAP, 0,
!                      seL4_CapInitThreadCNode, no_cap_data,
!                      seL4_CapInitThreadPD, no_cap_data);
!    PDBG("seL4_TCB_SetSpace returned %d", ret);
!  }
!
!  seL4_TCB_Resume(SECOND_THREAD_CAP);

The entry function of the new thread is supposed to produce a page fault at
the predefined address 0x2244:

! void second_thread_entry()
! {
!   *(int *)0x2244 = 0;
! }

When executing the code, we get the desired result:

! vm fault on data at address 0x1122 with status 0x6
! ...
! vm fault on data at address 0x2244 with status 0x6

From these messages, we can see that both the main thread and the second
thread are faulting at their designated fault addresses.

With two threads in place, we can test-drive the preemptive scheduling of
the kernel by changing the 'second_thread_entry' function to:

! static int volatile cnt = 0;
!
! void second_thread_entry()
! {
!   for (;;)
!     cnt++;
! }

At the end of the main function, we repeatedly print the counter value:

! for (;;)
!   PDBG("cnt = %d", cnt);

When executing the code, the counter values surprisingly stays at the value
0. This is because the just-created new thread has a lower priority than the
main thread. By explicitly assigning the maximum priority to the second
thread, we can enable the preemptive round-robin scheduling:

! seL4_TCB_SetPriority(SECOND_THREAD_CAP, 0xff);

Now, we can see the counter value nicely increasing:

! int main(): cnt = 0
! ...
! int main(): cnt = 0
! int main(): cnt = 2908738
! ...
! int main(): cnt = 2908738
! int main(): cnt = 5876191
! ...
! int main(): cnt = 5876191
! ...

Each thread consumes its entire time slice. This way, the second thread has
the chance to increment the counter circa 3 million times per time slice after
which the main thread has the chance to print the counter about 50 times
before being preempted again.