On exception, the CPU first checks the IDT in order to find the
associated ISR. The IDT must therefore be placed in the mode transition
pages to make them available for non-core threads.
The limit is set to match the TSS size - 1 and the base address is
hardcoded to the *current* address of the TSS instance (0x3a1100).
TODO: Set the base address using the 'tss' label. If the TSS descriptor
format were not so utterly unusable this would be straightforward.
Changes to the code that indirectly lead to a different location
of the tss result in #GP since the base address will be invalid.
The class Genode::Tss represents a 64-bit Task State Segment (TSS) as
specified by Intel SDM Vol. 3A, section 7.7.
The setup function sets the stack pointers for privilege levels 0-2 to
the kernel stack address. The load function loads the TSS segment
selector into the task register.
Implement user argument setter and getter support functions. The mapping of
the state registers corresponds to the system call parameter passing
convention.
The instruction pointer is the first field of the master context and can
directly be used as a jump argument, which avoids additional register
copy operations.
Point stack to client context region and save registers using push
instructions.
Note that since the push instruction first increments the stack pointer
and then stores the value on the stack, the RSP has to point one field
past RBP before pushing the first register value.
As the kernel entry is called from the interrupt handler the stack
layout is as specified by Intel SDM Vol. 3A, figure 6-8. An additional
vector number is stored at the top of the stack.
Gather the necessary client information from the interrupt stack frame
and store it in the client context.
The new errcode field is used to store the error code that some
interrupts provide (e.g. #PF). Rework mode transition reserved space and
offset constants to match the new CPU_state layout.
The macros are used to assign syscall arguments to specific registers.
Using the AMD64 parameter passing convention avoids additional copying of
variables since the C++ function parameters are already in the right
registers.
The interrupt return instruction in IA-32e mode applies the prepared
interrupt stack frame to set the RFLAGS, CS and SS segment as well as
the RIP and RSP registers. It then continues execution of the user code.
For detailed information refer to Intel SDM Vol. 3A, section 6.14.3.
After activating the client page tables the client context cannot be
accessed any longer. The mode transition buffer however is globally
mapped and can be used to restore the remaining register values.
Set the stack pointer to the R8 field in the client context to enable
restoring registers by popping values of the stack.
After this step the only remaining registers that do not contain client
values are RAX, RSP and RIP.
Note that the client value of RAX is pop'd to the global buffer region as
the register will still be used by subsequent steps. It will be restored to
the value in the buffer area just prior to resuming client code execution.
Set I/O privilege level to 3 to allow core to perform port I/O from
userspace. Also make sure the IF flag is cleared for now until interrupt
handling is implemented.
Setup an IA-32e interrupt stack frame in the mode transition buffer region.
It will be used to perform the mode switch to userspace using the iret
instruction.
For detailed information about the IA-32e interrupt stack frame refer to
Intel SDM Vol. 3A, figure 6-8.
The constants specify offset values of CPU context member variables as
specified by Genode::Cpu_state [1] and Genode::Cpu::Context [2].
[1] - repos/base/include/x86_64/cpu/cpu_state.h
[2] - repos/base-hw/src/core/include/spec/x86/cpu.h
The new entries specify a 64-bit code segment with DPL 3 at index 3 and a
64-bit data segment with DPL 3 at index 4.
These segments are needed for transitioning to user mode.
A pointer to the client context is placed in the mt_client_context_ptr area.
It is used to pass the current client context to the lowlevel mode-switching
assembly code.
IA-32e paging translates 48-bit linear addresses to 52-bit physical
addresses. Translation structures are hierarchical and four levels deep.
The current implementation supports regular 4KB and 1 GB and 2 MB large
page mappings.
Memory typing is not yet implemented since the encoded type bits depend
on the active page attribute table (PAT)*.
For detailed information refer to Intel SDM Vol. 3A, section 4.5.
* The default PAT after power up does not allow the encoding of the
write-combining memory type, see Intel SDM Vol. 3A, section 11.12.4.
* Add common IA-32e paging descriptor type:
The type represents a table entry and encompasses all fields shared by
paging structure entries of all four levels (PML4, PDPT, PD and PT).
* Simplify PT entry type by using common descriptor:
Differing fields are the physical address, the global flag and the memory
type flags.
* Simplify directory entry type by using common descriptor:
Page directory entries (PDPT and PD) have an additional 'page size' field
that specifies if the entry references a next level paging structure or
represents a large page mapping.
* Simplify PML4 entry type by using common descriptor
Top-level paging structure entries (PML4) do not have a 'pat' flag and the
memory type is specified by the 'pwt' and 'pcd' fields only.
* Implement access right merging for directory paging entries
The access rights for translations are determined by the U/S, R/W and XD
flags. Paging structure entries that reference other tables must provide
the superset of rights required for all entries of the referenced table.
Thus merge access rights of new mappings into existing directory entries to
grant additional rights if needed.
* Add cr3 register definition:
The control register 3 is used to set the current page-directory base
register.
* Add cr3 variable to x86_64 Cpu Context
The variable designates the address of the top-level paging structure.
* Return current cr3 value as translation table base
* Set context cr3 value on translation table assignment
* Implement switch to virtual mode in kernel
Activate translation table in init_virt_kernel function by updating the
cr3 register.
* Ignore accessed and dirty flags when comparing existing table entries
These flags can be set by the MMU and must be disregarded.
* Add isr.s assembler file:
The file declares an array of Interrupt Service Routines (ISR) to handle
the exception vectors from 0 to 19, see Intel SDM Vol. 3A, section
6.3.1.
* Add Idt class:
* The class Genode::Idt represents an Interrupt Descriptor Table as
specified by Intel SDM Vol. 3A, section 6.10.
* The setup function initializes the IDT with 20 entries using the ISR
array defined in the isr.s assembly file.
* Setup and load IDT in Genode::Cpu ctor:
The Idt::setup function is only executed once on the BSP.
* Declare ISRs for interrupts 20-255
* Set IDT size to 256
This patch contains the initial code needed to build and bootstrap the
base-hw kernel on x86 64-bit platforms. It gets stuck earlier
because the binary contains 64-bit instructions, but it is started in
32-bit mode. The initial setup of page tables and switch to long mode is
still missing from the crt0 code.
A Nic::Session client can install a signal handler that is used to
propagate changes of the link-state by calling 'link_state_sigh()'.
The actual link state is queried via 'link_state()'.
The nic-driver interface now provides a Driver_notification callback,
which is used to forward link-state changes from the driver to the
Nic::Session_component.
The following drivers now provide real link state: dde_ipxe, nic_bridge,
and usb_drv. Currently, OpenVPN, Linux nic_drv, and lan9118 do not
support link state and always report link up.
Fixes#1327
If a client acknowledges the same packet more than once, the packet also
gets freed more than once. At the second attempt the underlaying
Bit_array will throw an 'Invalid_clear' exception, which results in an
uncaught exception that leads to an abort() call in the freeing
component.
Fixes#1462.
To ease debugging without the need to tweak the kernel every time, and to
support userland developers with useful information this commit extends several
warnings and errors printed by the kernel/core by which thread/application
caused the problem, and what exactly failed.
Fix#1382Fix#1406
The driver for the Freescale eSDHCv2 doesn't support the highest
available bus frequency by now and also the bus width may be set to a
higher value but that needs further checks on the capabilities of the
inserted card.
The commits provide a benchmark as it exists for the OMAP4 SDHC driver.
Fix#1458
The 'continue_hw_accelerated' assertion at the end of the recall handler
can fail in situations which are not problematic, for example if the
'Timer' thread has set the 'VMCPU_FF_TIMER' flag in the meantime and
requested a recall afterwards. Since we don't know for sure if a recall is
requested for the other flags as well, the assertion gets replaced by a
debug message, which gets printed if any of the 'not yet verified as safe'
flags is set.
Fixes#1426
The GUID partition table (GPT) is primarily used by systems using
(U)EFI and is a replacement for the legacy MBR. For now, the current
implementation is able to address up to 128 GUID partition entries
(GPE).
To enable the GPT support in 'part_blk' it has to be configured
accrodingly:
! <start name="part_blk">
! [...]
! <config use_gpt="yes">
! [...]
! </start>
If 'part_blk' is not able to find a valid GPT header it falls back
to using the MBR.
Current limitations:
Since no endian conversion takes place it only works on LE platforms
and of all characters in the UTF-16 encoded name field of an entry
only the ones included in the ASCII encoding are printed. It also
ignores all GPE attributes.
Issue #1429.
The hover reports provides information about the session currently
pointed-to, i.e., hovered session. It can be enabled by the 'hover'
attribute of nitpicker's 'report' configuration element
<report hover="yes" />
Fixes#1442
The bindings for 32bit did not consider that in the syscall_3 function
edx changes due to the assembly instructions and that in the syscall_4
function edx and ecx change. So, the compiler wrongly assumed that the
content of these registers stayed unchanged.
Fixes#1447
In the past, unmap sometimes occured on RM clients that have no thread,
PD, or translation table assigned. However, this shouldn't be the
case anymore.
Fixes#504
* Introduce hw-specific crt0 for core that calls e.g.: init_main_thread
* re-map core's main thread UTCB to fit the right context area location
* switch core's main thread's stack to fit the right context area location
Fix#1440
This decouples the size of the mode transition control region from the
minimal mapping size of the page tables implementation. Rather, the CPU
architecture is able to specify the actual size.
Rationale: For x86_64, we need the mtc region to span two pages in order
to store all the tables required to perform the mode switch.
The size of empty structs differs in C (0 byte) and C++ (1 byte), which
leads to different offsets in compound structures. This fixes the driver
on 32Bit platforms.
Issue #1439.
The wireless stack calls timer_before(foo, timer.expires) and up to now
it was always 0. Let's be save and set this field when scheduling the
timer, although it worked fine so far.
Issue #1439.
We will always see this error message when the driver is started. It
is expected and not an actual error. When the driver is running it will
not allocate larger chunks than the Slab provides. Therefore, we can
safely ignore this message.
Issue #1439.
Some functions in the time manager, for example 'TMTimerSet()' and
'TMTimerStop()' let VirtualBox abort with a failed assertion if the timer
does not change to a 'stable' state after 1000 calls of a mixture of
'yield' and 'sleep'. On Genode, this happens sometimes when the 'EMT'
thread is executing 'TMTimerSet()' and gets interrupted by the 'TAP'
thread, which calls 'TMTimerStop()' and waits for the 'EMT' thread to
finish setting the timer. Since the 'EMT' thread has the lowest priority,
1000 retries can be too few. Without the assertion, these functions would
return an error code, which is often ignored by the caller, so it seems
safer to keep retrying until the function can return successfully.
Fixes#1437
Among others, this function is used in the for_each_set_big() macro,
which is used when configuring the data rate tables. Therefore, this
fixes observed performance issues.
Fixes#1439.
If running multiple VBox VMMs with Windows as guest concurrently then it may
happen that the system seem to hang. It turned out that actually
a VM-exit storm (vmx_exception->handle_exc_nm) causes a endless loop between
kernel and vCPU. Nothing gets scheduled nor interrupts are received anymore.
The referenced kernel commit fixes this issue.
Issue #1343
Drivers like SD-Card, platform, AHCI, and framebuffer are specified as Exynos5
compliant. But they are at least not compliant with Odroid-XU although this is
Exynos5. Thus, prevent tests that rely on such drivers when building for
hw_odoid_xu. Furthermore, make previous Arndale regulator/consts.h,
uart_defs.h, and some Board_base enums available to all Exynos5 builds to
enable at least building the drivers.
Fixes#1419
For the USB-Armory, we use a newer version of Linux (3.18) as for the
i.MX53-QSB. The main difference is, that the newer Linux uses a DTB instead of
ATAGs.
Fixes#1422
The USB Armory is almost the same as the i.MX53-QSB but it uses only
one of the two RAM banks available in i.MX53. Furthermore we use the USB
Armory only with Trustzone enabled.
Ref #1422
With the new run tool, there is no more is_qemu_available function. However,
some scripts still try to use it because only frequently used scripts were
updated by now. The commit replaces the function calls with the new
'have_include power_on/qemu' check.
Ref #1419
The wifi_drv now provides two reports. The first one contains all
accesspoints that were found while scanning the supported frequencies.
The second one reports the state of the driver, i.e., if it is
conntected to an accesspoint or not. In addition to that, the driver
now gets its configuration via a ROM session.
More detailed information are available in 'repos/dde_linux/README'.
Issue #1415.
* enables world-switch using ARM virtualization extensions
* split TrustZone and virtualization extensions hardly from platforms,
where it is not used
* extend 'Vm_session' interface to enable configuration of guest-physical memory
* introduce VM destruction syscall
* add virtual machine monitor for hw_arndale that emulates a simplified version
of ARM's Versatile Express Cortex A15 board for a Linux guest OS
Fixes#1405
To enable support of hardware virtualization for ARM on the Arndale board,
the cpu needs to be prepared to enter the non-secure mode, as long as it does
not already run in it. Therefore, especially the interrupt controller and
some TrustZone specific system registers need to be prepared. Moreover,
the exception vector for the hypervisor needs to be set up properly, before
booting normally in the supervisor mode of the non-secure world.
Ref #1405
To enable the usage of virtualization extension related instructions
there is the need to enable the '-mcpu=cortex_a15' compiler flag on
those cpus. To not conflict with other compiler flags (Ref #810) we've
to disable the '-march=arm_v7a' flag.
Ref #1405
The generalization of interrupt objects in the kernel and the use of
C++ polymorphism instead of explicitely checking for special interrupts
within generic code (Cpu_job::_interrupt) enables the registration of
additional interrupts used by the kernel, which are needed for specific
aspects added to the kernel, like ARM hardware virtualization interrupts.
* Introduce generic base class for interrupt objects handled by the kernel
* Derive an interrupt class for those handled by the user-land
* Implement IPI-specific interrupt class
* Implement timer interrupts using the new generic base class
Ref #1405
Until now, one distinct software generated IRQ per cpu was used to
send signals between cpus. As ARM's GIC has 16 software generated
IRQs only, and they need to be partitioned between secure/non-secure
TrustZone world as well as virtual and non-virtual worlds, we should
save them.
Ref #1405
* name irq controller memory mapped I/O regions consistently
in board descriptions
* move irq controller and timer memory mapped I/O region descriptions
from cpu class to board class
* eliminate artificial distinction between flavors of ARM's GIC
* factor cpu local initialization out of ARM's GIC interface description,
which is needed if the GIC is initialized differently e.g. for TrustZone
Ref #1405
The 'dest' target is renamed in the updated rump version to 'dest.stage'. This
triggered some building steps, even when the targets already existed.
Issue #1409
The handling of MMIO regions now supports more pathological cases with
weird cross references. Also, MMIO regions are releases after the
parsing is done.
Fixes#998
Setting the ACTLR.SMP bit also without SMP support fastens RAM access
significantly. A proper solution would implement SMP support which must enable
the bit anyway.
Fixes#1353
When returning early on directory operations, file systems that might
be able to handle the request but come after the current one are not
tried.
Fixes#1400.
Up to now Noux used the libc sleep functions, which actually is not
possible because the _nanosleep() function implemented by our libc
creates a new thread to handle the timeout. Noux childs may have
only one thread, e.g., the main thread, though. To fix this issue
sleeping is now handled directly by Noux. It is implemented by calling
select(2) with a timeout. This fix is needed for mutt(1), which calls
sleep when it prints a notification for the user.
Fixes#1374.
Since rump now requires large buffers of random numbers (>= 512 bytes), use the
jitterentropy library instead of the slow timer pseudo random number generation.
Fixes#1393
To circumvent compilation errors with the older L4Android Linux kernel
version, the ballooning driver is included in the more recent L4Linux
kernel only. Moreover, to be able to maintain L4Android / L4Linux in a more
convenient way, e.g. to apply patches valid for both versions, we use
the same git clone that is used for L4Linux instead of using the upstream
L4Android version by applying patches.
Fixes#1390
Instead of returning an uint64_t value, return a structured time stamp.
This change is only visible to components using Rtc_session directly.
Fixes#1381.
Up until now 'schedule_timeout' did only wait for the next signal to occur.
However, we might run into situations where there won't occur signals for longer
periods of time. Therefore, we took care of the respective timeout handling.
This commit also adds Genode's tracing support
Issue #1310
This has been broken for a while now. Use correct (global) signal transmission,
do not use local signal transmission, as signals seems to get lost.
Issue #1310
This patch changes the Shared_object::lookup function to use a
reinterpret_cast instead of a static_cast to allow the conversion
from symbol addresses to arbitrary pointers.
By blocking on a timeout, we yield the CPU in order to give a
concurrently running sporadic process a chance to obtain ROM modules.
Otherwise, such requests would be deferred until the ROM prefetcher
completes its operation or in the unlikely event that the prefetcher
gets preempted.
Fixes#1378