========================================================== Introduction into the Genode porting for Xilinx MicroBlaze ========================================================== Norman Feske Martin Stein This file gives an overview to the Genode porting for MicroBlaze-based platforms. To get a quick introduction in how to build and run Genode on such platforms, please refer to: ! Xilinx MicroBlaze is a so-called softcore CPU, which is commonly used as part of FPGA-based System-on-Chip designs. At Genode Labs, we are regularly using this IP core, in particular for our Genode FPGA Graphics Project, which is a GUI software stack and a set of IP cores for implementing fully-fledged windowed GUIs on FPGAs: :Website of the Genode FPGA Graphics Project: [http://genode-labs.com/products/fpga-graphics] Ever since we first released the Genode FPGA project, we envisioned to combine it with the Genode OS Framework. In Spring 2010, Martin Stein joined our team at Genode Labs and accepted the challenge to bring the Genode OS Framework to the realms of FPGA-based SoCs. Technically, this implies porting the framework to the MicroBlaze CPU architecture. In contrast to most softcore CPUs such as the popular Lattice Mico32, the MicroBlaze features a MMU, which is a fundamental requirement for implementing a microkernel-based system. Architecturally-wise MicroBlaze is a RISC CPU similar to MIPS. Many system parameters of the CPU (caches, certain arithmetic and shift instructions) can be parametrized at synthesizing time of the SoC. We found that the relatively simple architecture of this CPU provides a perfect playground for pursuing some of our ideas about kernel design that go beyond the scope of current microkernels. So instead of adding MicroBlaze support into one of the existing microkernels already supported by Genode, we went for a new kernel design. Deviating from the typical microkernel, which is a self-sufficient program running in kernel mode that executes user-level processes on top, our design regards the kernel as a part of Genode's core. It is not a separate program but a library that implements the glue between user-level core and the raw CPU. Specifically, it provides the entrypoint for hardware exceptions, a thread scheduler, an IPC mechanism, and functions to manipulate virtual address spaces (loading and flushing entries from the CPU's software-loaded TLB). It does not manage any physical memory resources or the relationship between processes. This is the job of core. From the kernel-developer's point of view, the kernel part can be summarized as follows: * The kernel provides user-level threads that are scheduled in a round-robin fashion. * Threads can communicate via synchronous IPC. * There is a mechanism for blocking and waking up threads. This mechanism can be used by Genode to implement locking as well as asynchronous inter-process communication. * There is a single kernel thread, which never blocks in the kernel code paths. So the kernel acts as a state machine. Naturally, there is no concurrency in the execution paths traversed in kernel mode, vastly simplifying these code parts. However, all code paths are extremely short and bounded with regard to execution time. Hence, we expect the interference with interrupt latencies to be low. * The IPC operation transfers payload between UTCBs only. Each thread has a so-called user-level thread control block which is mapped transparently by the kernel. Because of this mapping, user-level page faults cannot occur during IPC transfers. * There is no mapping database. Virtual address spaces are manipulated by loading and flushing physical TLB entries. There is no caching of mappings done in the kernel. All higher-level information about the interrelationship of memory and processes is managed by the user-level core. * Core runs in user mode, mapped 1-to-1 from the physical address space except for its virtual thread-context area. * The kernel paths are executed in physical address space (MicroBlaze). Because both kernel code and user-level core code are observing the same address-space layout, both worlds appear to run within a single address space. * User processes can use the entire virtual address space (4G) except for a helper page for invoking syscalls and a page containing atomic operations. There is no reservation used for the kernel. * The MicroBlaze architecture lacks an atomic compare-and-swap instruction. On user-level, this functionality is emulated via delayed preemption. A kernel- provided page holds the sequence of operations to be executed atomically and prevents (actually delays) the preemption of a thread that is currently executing instructions at that page. * The MicroBlaze MMU supports several different page sizes (1K up to 16MB). Genode fully supports this feature for page sizes >= 4K. This way, the TLB footprint can be minimized by choosing sensible alignments of memory objects. Current state ============= The MicroBlaze platform support resides in the 'base-mb' repository. At the current stage, core is able to successfully start multiple nested instances of the init process. Most of the critical kernel functionality is working. This includes inter-process communication, address-space creation, multi-threading, thread synchronization, page-fault handling, and TLB eviction. The nested init scenario runs on Qemu, emulating the Petalogix Spartan 3A DSP1800 design, as well as on real hardware, tested with the Xilinx Spartan 3A Starter Kit configured with an appropriate Microblaze SoC. This simple scenario already illustrates the vast advantage of using different page sizes supported by the MicroBlaze CPU. If using 4KB pages only, a scenario with three nested init processes produces more than 300.000 page faults. There is an extremely high pressure on the TLB, which only contains 64 entries. Those entries are constantly evicted so that threshing effects are likely to occur. By making use of flexible page sizes (4K, 16K, 64K, 256K, 1M, 4M, 16M), the number of page faults gets slashed to only 1.800, speeding up the boot time by factor 10. On hardware the capability remains to increase execution speed significantly by turning on instruction- and data-caches. However this feature has not been tested for now. The kernel provides, beyond the requirements of the nested init scenario, allocation, handling and deallocation of IRQs to the userland to enable core to offer IRQ and IO Memory session services. This allows custom device-driver implementations within the userland. Currently, there is no restriction of IPC communication rights. Threads are addressed using their global thread IDs (in fact, using their respective indices in the KTCB array). For the future, we are planning to add capabilty-based delegation of communication rights.