Do not mask edge-triggered interrupts to avoid losing them while masked,
see Intel 82093AA I/O Advanced Programmable Interrupt Controller
(IOAPIC) specification, section 3.4.2, "Interrupt Mask":
"When this bit is 1, the interrupt signal is masked. Edge-sensitive
interrupts signaled on a masked interrupt pin are ignored (i.e., not
delivered or held pending)"
Or to quote Linus Torvalds on the subject:
"Now, edge-triggered interrupts are a _lot_ harder to mask, because the
Intel APIC is an unbelievable piece of sh*t, and has the edge-detect
logic _before_ the mask logic, so if a edge happens _while_ the device
is masked, you'll never ever see the edge ever again (unmasking will not
cause a new edge, so you simply lost the interrupt)."
So when you "mask" an edge-triggered IRQ, you can't really mask it at
all, because if you did that, you'd lose it forever if the IRQ comes in
while you masked it. Instead, we're supposed to leave it active, and set
a flag, and IF the IRQ comes in, we just remember it, and mask it at
that point instead, and then on unmasking, we have to replay it by
sending a self-IPI." [1]
[1] - http://yarchive.net/comp/linux/edge_triggered_interrupts.html
Ref #1448
In order to match the I/O APIC configuration, a request for user timer
IRQ 0 is remapped to vector 50 (Board::TIMER_VECTOR_USER), all other
requests are transposed by adding the vector offset 48
(Board::VECTOR_REMAP_BASE).
* Enable the use of the FXSAVE and FXRSTOR instructions, see Intel SDM
Vol. 3C, section 2.5.
* The state of the x87 floating point unit (FPU) is loaded and saved on
demand.
* Make the cr0 control register accessible in the Cpu class. This is in
preparation of the upcoming FPU management.
* Access to the FPU is disabled by setting the Task Switch flag in the cr0
register.
* Access to the FPU is enabled by clearing the Task Switch flag in the cr0
register.
* Implement FPU initialization
* Add is_fpu_enabled helper function
* Add pointer to CPU lazy state to CPU class
* Init FPU when finishing kernel initialization
* Add function to retry FPU instruction:
Similar to the ARM mechanism to retry undefined instructions, implement a
function for retrying an FPU instruction. If a floating-point instruction
causes an #NM exception due to the FPU being disabled, it can be retried
after the correct FPU state is restored, saving the current state and
enabling the FPU in the process.
* Disable FPU when switching to different user context:
This enables lazy save/restore of the FPU since trying to execute a
floating point instruction when the FPU is disabled will cause a #NM
exception.
* Declare constant for #NM exception
* Retry FPU instruction on #NM exception
* Assure alignment of FXSAVE area:
The FXSAVE area is 512-byte memory region that must be 16-byte aligned. As
it turns out the alignment attribute is not honored in all cases so add a
workaround to assure the alignment constraint is met by manually rounding
the start of the FXSAVE area to the next 16-byte boundary if necessary.
The LAPIC timer is programmed in one-shot mode with vector 32
(Board::TIMER_VECTOR_KERNEL). The timer frequency is measured using PIT
channel 2 as reference (50ms delay).
Disable PIT timer channel 0 since BIOS programs it to fire periodically.
This avoids potential spurious timer interrupts.
The implementation initializes the Local APIC (LAPIC) of CPU 0 in xapic
mode (mmio register access) and uses the I/O APIC to remap, mask and
unmask hardware IRQs. The remapping offset of IRQs is 48.
Also initialize the legacy PIC and mask all interrupts in order to
disable it.
For more information about LAPIC and I/O APIC see Intel SDM Vol. 3A,
chapter 10 and the Intel 82093AA I/O Advanced Programmable Interrupt
Controller (IOAPIC) specification
Set bit 9 in the RFLAGS register of user CPU context to enable
interrupts on kernel- to usermode switch.
The location in memory is arbitrary but we use the same address as the
ARM architecture. Adjust references to virtual addresses in the mode
transition pages to cope with 64-bit values.
Use parameter instead of class member variable because it would get
stored into the mtc region otherwise. In a further iteration only the
actual IDT should be saved into the mtc, not the complete class
instance. Currently the class instance size is equal to the IDT table
size.
The class provides the load() function which reloads the GDTR with the
GDT address in the mtc region. This is needed to make the segments
accessible to non-core threads.
Make the _gdt_start label global to use it in the call to
_virt_mtc_addr().
Use the _mt_tss label and the placement new operator to create the
Tss class instance in the mtc region. Update the hard-coded
TSS base address to use the virtual mtc address.
The class Genode::Tss represents a 64-bit Task State Segment (TSS) as
specified by Intel SDM Vol. 3A, section 7.7.
The setup function sets the stack pointers for privilege levels 0-2 to
the kernel stack address. The load function loads the TSS segment
selector into the task register.
Implement user argument setter and getter support functions. The mapping of
the state registers corresponds to the system call parameter passing
convention.
IA-32e paging translates 48-bit linear addresses to 52-bit physical
addresses. Translation structures are hierarchical and four levels deep.
The current implementation supports regular 4KB and 1 GB and 2 MB large
page mappings.
Memory typing is not yet implemented since the encoded type bits depend
on the active page attribute table (PAT)*.
For detailed information refer to Intel SDM Vol. 3A, section 4.5.
* The default PAT after power up does not allow the encoding of the
write-combining memory type, see Intel SDM Vol. 3A, section 11.12.4.
* Add common IA-32e paging descriptor type:
The type represents a table entry and encompasses all fields shared by
paging structure entries of all four levels (PML4, PDPT, PD and PT).
* Simplify PT entry type by using common descriptor:
Differing fields are the physical address, the global flag and the memory
type flags.
* Simplify directory entry type by using common descriptor:
Page directory entries (PDPT and PD) have an additional 'page size' field
that specifies if the entry references a next level paging structure or
represents a large page mapping.
* Simplify PML4 entry type by using common descriptor
Top-level paging structure entries (PML4) do not have a 'pat' flag and the
memory type is specified by the 'pwt' and 'pcd' fields only.
* Implement access right merging for directory paging entries
The access rights for translations are determined by the U/S, R/W and XD
flags. Paging structure entries that reference other tables must provide
the superset of rights required for all entries of the referenced table.
Thus merge access rights of new mappings into existing directory entries to
grant additional rights if needed.
* Add cr3 register definition:
The control register 3 is used to set the current page-directory base
register.
* Add cr3 variable to x86_64 Cpu Context
The variable designates the address of the top-level paging structure.
* Return current cr3 value as translation table base
* Set context cr3 value on translation table assignment
* Implement switch to virtual mode in kernel
Activate translation table in init_virt_kernel function by updating the
cr3 register.
* Ignore accessed and dirty flags when comparing existing table entries
These flags can be set by the MMU and must be disregarded.
* Add isr.s assembler file:
The file declares an array of Interrupt Service Routines (ISR) to handle
the exception vectors from 0 to 19, see Intel SDM Vol. 3A, section
6.3.1.
* Add Idt class:
* The class Genode::Idt represents an Interrupt Descriptor Table as
specified by Intel SDM Vol. 3A, section 6.10.
* The setup function initializes the IDT with 20 entries using the ISR
array defined in the isr.s assembly file.
* Setup and load IDT in Genode::Cpu ctor:
The Idt::setup function is only executed once on the BSP.
* Declare ISRs for interrupts 20-255
* Set IDT size to 256
This patch contains the initial code needed to build and bootstrap the
base-hw kernel on x86 64-bit platforms. It gets stuck earlier
because the binary contains 64-bit instructions, but it is started in
32-bit mode. The initial setup of page tables and switch to long mode is
still missing from the crt0 code.
To ease debugging without the need to tweak the kernel every time, and to
support userland developers with useful information this commit extends several
warnings and errors printed by the kernel/core by which thread/application
caused the problem, and what exactly failed.
Fix#1382Fix#1406
* Introduce hw-specific crt0 for core that calls e.g.: init_main_thread
* re-map core's main thread UTCB to fit the right context area location
* switch core's main thread's stack to fit the right context area location
Fix#1440
This decouples the size of the mode transition control region from the
minimal mapping size of the page tables implementation. Rather, the CPU
architecture is able to specify the actual size.
Rationale: For x86_64, we need the mtc region to span two pages in order
to store all the tables required to perform the mode switch.
* enables world-switch using ARM virtualization extensions
* split TrustZone and virtualization extensions hardly from platforms,
where it is not used
* extend 'Vm_session' interface to enable configuration of guest-physical memory
* introduce VM destruction syscall
* add virtual machine monitor for hw_arndale that emulates a simplified version
of ARM's Versatile Express Cortex A15 board for a Linux guest OS
Fixes#1405
To enable support of hardware virtualization for ARM on the Arndale board,
the cpu needs to be prepared to enter the non-secure mode, as long as it does
not already run in it. Therefore, especially the interrupt controller and
some TrustZone specific system registers need to be prepared. Moreover,
the exception vector for the hypervisor needs to be set up properly, before
booting normally in the supervisor mode of the non-secure world.
Ref #1405
The generalization of interrupt objects in the kernel and the use of
C++ polymorphism instead of explicitely checking for special interrupts
within generic code (Cpu_job::_interrupt) enables the registration of
additional interrupts used by the kernel, which are needed for specific
aspects added to the kernel, like ARM hardware virtualization interrupts.
* Introduce generic base class for interrupt objects handled by the kernel
* Derive an interrupt class for those handled by the user-land
* Implement IPI-specific interrupt class
* Implement timer interrupts using the new generic base class
Ref #1405
Until now, one distinct software generated IRQ per cpu was used to
send signals between cpus. As ARM's GIC has 16 software generated
IRQs only, and they need to be partitioned between secure/non-secure
TrustZone world as well as virtual and non-virtual worlds, we should
save them.
Ref #1405
* name irq controller memory mapped I/O regions consistently
in board descriptions
* move irq controller and timer memory mapped I/O region descriptions
from cpu class to board class
* eliminate artificial distinction between flavors of ARM's GIC
* factor cpu local initialization out of ARM's GIC interface description,
which is needed if the GIC is initialized differently e.g. for TrustZone
Ref #1405
Setting the ACTLR.SMP bit also without SMP support fastens RAM access
significantly. A proper solution would implement SMP support which must enable
the bit anyway.
Fixes#1353