When running the same kernel in a VM as on the host system and the
kernel boot message from the VM appears on the log output, the run tool
assumes that the host machine has rebooted unexpectedly. With this
commit, an unexpected reboot is assumed only if the kernel boot message
appears at the beginning of a line. On base-hw, we enforce a line feed
at the beginning of the boot message as the SPIKE emulator log starts
with the first message of the kernel lacking a line feed.
Fixes#2041
This patch establishes the sole use of generic headers across all
kernels. The common 'native_capability.h' is based on the version of
base-sel4. All traditional L4 kernels and Linux use the same
implementation of the capability-lifetime management. On base-hw, NOVA,
Fiasco.OC, and seL4, custom implementations (based on their original
mechanisms) are used, with the potential to unify them further in the
future.
This change achieves binary compatibility of dynamically linked programs
across all kernels.
Furthermore, the patch introduces a Native_capability::print method,
which allows the easy output of the kernel-specific capability
representation using the base/log.h API.
Issue #1993
This patch alleviates the need for a Native_capability::Dst at the API
level. The former use case of this type as argument to
Deprecated_env::reinit uses the opaque Native_capability::Raw type
instead. The 'Raw' type contains the portion of the capability that is
transferred as-is when delegating the capability (i.e., when installing
the parent capability into a new component, or when installing a new
parent capability into a new forked Noux process). This information can
be retrieved via the new Native_capability::raw method.
Furthermore, this patch moves the functions for retriving the parent
capability to base/internal/parent_cap.h, which is meant to be
implemented in platform-specific ways. It replaces the former set of
startup/internal/_main_parent_cap.h headers.
Issue #1993
Write tick count of next kernel timer to the guest timed events page if
present. This causes the guest VM to be preempted at the requested tick
count and ensures that the guest VM can not monopolize the CPU if no
traps occur.
The base-hw kernel expects a configured switch-event from the guest VM
to base-hw with ID 30 and target vector 32 to be present in the system
policy.
Issue #2016
Switch kernel timer driver to timed event interface. The base-hw kernel
expects a configured self-event with ID 31 and target vector 32 to be
present in the system policy.
ssue #2016
* The Vm thread is always paused and on exception to make sure that guest VM
execution is suspended whenever we handle an interrupt. Also signal the Vm
session to poke waiting threads (e.g. Virtualbox EMT).
* Implement Vm::proceed
Switch to the mode transition assembly code declared at the _vt_vm_entry
label.
Issue #2016
The entry enables interrupts and initiates a handover to the guest VM by
invoking event number one. The sti instruction is placed at the start to
allow exits to Muen before handing off to the VM if window exiting is
requested.
Issue #2016
This patch introduces the Genode::raw function that prints output
directly via a low-level kernel mechanism, if available.
On base-linux, it replaces the former 'raw_write_str' function.
On base-hw, it replaces the former kernel/log.h interface.
Fixes#2012
The sinfo function declared in sinfo_instance.h creates a static sinfo
object instance and returns a pointer to the caller.
- kernel timer and platform support to use sinfo() function to
instantiate sinfo object
- address and size of the base-hw RAM region via the sinfo API
- log_status() function in sinfo API
Instead of introducing a $(BASE_HW_DIR) variable that has to be defined in each
core makefile for the different base-hw targets, this commit replaces the
$(REP_DIR) variable usage in core.inc files with $(BASE_DIR)/../base-hw.
Ref #1955
This patch removes the outdates doc/architecture.txt since the
topics are covered by the book. We keep repos/os/doc/init.txt
because it contains a few details not present in the book (yet).
The patch streamlines the terminology a bit. Furthermore, it
slightly adjusts a few source-code comments to improve the book's
functional specification chapter.
* Adds public timeout syscalls to kernel API
* Kernel::timeout installs a timeout and binds a signal context to it that
shall trigger once the timeout expired
* With Kernel::timeout_max_us, one can get the maximum installable timeout
* Kernel::timeout_age_us returns the time that has passed since the
calling threads last timeout installation
* Removes all device specific back-ends for the base-hw timer driver and
implements a generic back-end taht uses the kernel timeout API
* Adds assertions about the kernel timer frequency that originate from the
requirements of the the kernel timeout API and adjusts all timers
accordingly by using the their internal dividers
* Introduces the Kernel::Clock class. As member of each Kernel::Cpu object
it combines the management of the timer of the CPU with a timeout scheduler.
Not only the timeout API uses the timeout scheduler but also the CPUs job
scheduler for installing scheduling timeouts.
* Introduces the Kernel::time_t type for timer tic values and values inherited
from timer tics (like microseconds).
Fixes#1972
To avoid the need for adapting the names of the core restricted syscalls
each time we add a public syscall (restricted names must always be
greater than public names), let restricted syscall names simply start at
100 (we should never have more than 100 public syscalls).
Ref #1972
Building a kernel test produced an error about a missing config
apparently because of recent changes in the run tool. So, we add
a dummy XML node as config.
Ref #1972
We do not ensure that the Fpu::Context is 16-byte aligned and,
therefore, should not tell the compiler that we did. Otherwise, the GCC
may optimize operations regarding the addresses of members as it did for
if ((addr_t)_fxsave_area & 0xf) ...
With the declared 16-byte alignment the condition will never become
true.
This patch moves the thread operations from the 'Cpu_session'
to the 'Cpu_thread' interface.
A noteworthy semantic change is the meaning of the former
'exception_handler' function, which used to define both, the default
exception handler or a thread-specific signal handler. Now, the
'Cpu_session::exception_sigh' function defines the CPU-session-wide
default handler whereas the 'Cpu_thread::exception_sigh' function
defines the thread-specific one.
To retain the ability to create 'Child' objects without invoking a
capability, the child's initial thread must be created outside the
'Child::Process'. It is now represented by the 'Child::Initial_thread',
which is passed as argument to the 'Child' constructor.
Fixes#1939
Adjust IRTE_COUNT to specify the number of IRTEs and not the index of
the last IRTE entry. This fixes an off-by-one error in the toggle_mask()
function, where the range check for I/O APIC IRQs wrongly ignored IRQ
23.
The custom version merely differs from the generic one with respect to
the session quota. Since we support the dynamic upgrading of sessions,
we don't need to provide the big amount (128KiB) defined by the custom
version.
This patch supplements each existing connection type with an new
constructor that is meant to replace the original one. The new
one takes a reference to the component's environment as argument and
thereby does not rely on the presence of the globally accessible
'env()' interface.
The original constructors are marked as deprecated. Once we have
completely abolished the use of the global 'env()', we will remove them.
Fixes#1960
All core.inc files now use $BASE_HW_DIR instead of $REP_DIR. The former
is defined by the core.mk file. This allows including core.inc files
from other repositories (e.g. genode-world) for additional platform
support.
Fixes#1955
The old implementation cleared all other bits in the SCU control
register when enabling the SCU, which broke the kernel startup on zynq-
based boards.
By only raising the enable bit, we can keep the initial/default state
e.g. as set up by uboot.
Fixes#1953
This patch cleans up the thread API and comes with the following
noteworthy changes:
- Introduced Cpu_session::Weight type that replaces a formerly used
plain integer value to prevent the accidental mix-up of
arguments.
- The enum definition of Cpu_session::DEFAULT_WEIGHT moved to
Cpu_session::Weight::DEFAULT_WEIGHT
- New Thread constructor that takes a 'Env &' as first argument.
The original constructors are now marked as deprecated. For the
common use case where the default 'Weight' and 'Affinity' are
used, a shortcut is provided. In the long term, those two
constructors should be the only ones to remain.
- The former 'Thread<>' class template has been renamed to
'Thread_deprecated'.
- The former 'Thread_base' class is now called 'Thread'.
- The new 'name()' accessor returns the thread's name as 'Name'
object as centrally defined via 'Cpu_session::Name'. It is meant to
replace the old-fashioned 'name' method that takes a buffer and size
as arguments.
- Adaptation of the thread test to the new API
Issue #1954
This patch moves the base library from src/base to src/lib/base,
flattens the library-internal directory structure, and moves the common
parts of the library-description files to base/lib/mk/base.inc and
base/lib/mk/base-common.inc.
Furthermore, the patch fixes a few cosmetic issues (whitespace and
comments only) that I encountered while browsing the result.
Fixes#1952
This patch makes the former 'Process' class private to the 'Child'
class and changes the constructor of the 'Child' in a way that
principally enables the implementation of single-threaded runtime
environments that virtualize the CPU, PD, and RAM services. The
new interfaces has become free from side effects. I.e., instead
of implicitly using Genode::env()->rm_session(), it takes the reference
to the local region map as argument. Also, the handling of the dynamic
linker via global variables is gone. Now, the linker binary must be
provided as constructor argument.
Fixes#1949
This patch replaces the former 'Pd_session::bind_thread' function by a
PD-capability argument of the 'Cpu_session::create_thread' function, and
removes the ancient thread-start protocol via 'Rm_session::add_client' and
'Cpu_session::set_pager'. Threads are now bound to PDs at their creation
time and implicitly paged according to the address space of the PD.
Note the API change:
This patch changes the signature of the 'Child' and 'Process' constructors.
There is a new 'address_space' argument, which represents the region map
representing the child's address space. It is supplied separately to the
PD session capability (which principally can be invoked to obtain the
PD's address space) to allow the population of the address space
without relying on an 'Pd_session::address_space' RPC call.
Furthermore, a new (optional) env_pd argument allows the explicit
overriding of the PD capability handed out to the child as part of its
environment. It can be used to intercept the interaction of the child
with its PD session at core. This is used by Noux.
Issue #1938
This patch integrates three region maps into each PD session to
reduce the session overhead and to simplify the PD creation procedure.
Please refer to the issue cited below for an elaborative discussion.
Note the API change:
With this patch, the semantics of core's RM service have changed. Now,
the service is merely a tool for creating and destroying managed
dataspaces, which are rarely needed. Regular components no longer need a
RM session. For this reason, the corresponding argument for the
'Process' and 'Child' constructors has been removed.
The former interface of the 'Rm_session' is not named 'Region_map'. As a
minor refinement, the 'Fault_type' enum values are now part of the
'Region_map::State' struct.
Issue #1938
The return code of assign_parent remained unused. So this patch
removes it.
The bind_thread function fails only due to platform-specific limitations
such as the exhaustion of ID name spaces, which cannot be sensibly
handled by the PD-session client. If occurred, such conditions used to
be reflected by integer return codes that were used for diagnostic
messages only. The patch removes the return codes and leaves the
diagnostic output to core.
Fixes#1842
When bringing up the kernel on multiple cores, there is a time span
where some cores already have caches enabled and some don't. Core-local
storage that may be used during this time must be aligned at least to
the maximum line size among global caches. Otherwise, a cached core may
unintentionally prefetch data of a yet uncached core into a global
cache. This may corrupt the view of the uncached core as soon as it
enables caches. However, to determine the exact alignment for every
single ARM platform isn't sensible. Instead, we can align to the minimum
page size assuming that a cache never wants to prefetch from multiple
pages at once and thus fulfills "line size <= page size".
Fixes#1937
This is a generalisation approach of the hw_zynq target. As the boards
typically use UART1 instead of UART0 (used by qemu), we have to
distinguish between those. Moreover, in general hw_zynq does not imply
zynq_qemu anymore, so that the support of particular boards can be
placed in third-party or community repositories (e.g. Genode world).
Fixes#1926
Besides unifying the Msgbuf_base classes across all platforms, this
patch merges the Ipc_marshaller functionality into Msgbuf_base, which
leads to several further simplifications. For example, this patch
eventually moves the Native_connection_state and removes all state
from the former Ipc_server to the actual server loop, which not only
makes the flow of control and information much more obvious, but is
also more flexible. I.e., on NOVA, we don't even have the notion of
reply-and-wait. Now, we are no longer forced to pretend otherwise.
Issue #1832
This patch unifies the CPU session interface across all platforms. The
former differences are moved to respective "native-CPU" interfaces.
NOVA is not covered by the patch and still relies on a custom version of
the core-internal 'cpu_session_component.h'. However, this will soon be
removed once the ongoing rework of pause/single-step on NOVA is
completed.
Fixes#1922
This commit introduces the new `Component` interface in the form of the
headers base/component.h and base/entrypoint.h. The os/server.h API
has become merely a compatibilty wrapper and will eventually be removed.
The same holds true for os/signal_rpc_dispatcher.h. The mechanism has
moved to base/signal.h and is now called 'Signal_handler'.
Since the patch shuffles headers around, please do a 'make clean' in the
build directory.
Issue #1832
This commit replaces the stateful 'Ipc_client' type with the plain
function 'ipc_call' that takes all the needed state as arguments.
The stateful 'Ipc_server' class is retained but it moved from the public
API to the internal ipc_server.h header. The kernel-specific
implementations were cleaned up and simplified. E.g., the 'wait'
function does no longer exist. The badge and exception code are no
longer carried in the message buffers but are handled in kernel-specific
ways.
Issue #610
Issue #1832
This patch moves details about the stack allocation and organization
the base-internal headers. Thereby, I replaced the notion of "thread
contexts" by "stacks" as this term is much more intuitive. The fact that
we place thread-specific information at the bottom of the stack is not
worth introducing new terminology.
Issue #1832
By moving the stub implementation to rm_session_client.cc, we can use
the generic base/include/rm_session/client.h for base-linux and
base-nova and merely use platform-specific implementations.
Issue #1832
This patch establishes a common organization of header files
internal to the base framework. The internal headers are located at
'<repository>/src/include/base/internal/'. This structure has been
choosen to make the nature of those headers immediately clear when
included:
#include <base/internal/lock_helper.h>
Issue #1832
This patch integrates the functionality of the former CAP session into
the PD session and unifies the approch of supplementing the generic PD
session with kernel-specific functionality. The latter is achieved by
the new 'Native_pd' interface. The kernel-specific interface can be
obtained via the Pd_session::native_pd accessor function. The
kernel-specific interfaces are named Nova_native_pd, Foc_native_pd, and
Linux_native_pd.
The latter change allowed for to deduplication of the
pd_session_component code among the various base platforms.
To retain API compatibility, we keep the 'Cap_session' and
'Cap_connection' around. But those classes have become mere wrappers
around the PD session interface.
Issue #1841
This patch removes the SIGNAL service from core and moves its
functionality to the PD session. Furthermore, it unifies the PD service
implementation and terminology across the various base platforms.
Issue #1841
The gnat and gprbuild tools are not necessarily in the PATH when
preparing the port since the effective location is specified by the
--image-muen-gnat-path RUN_OPT.
Use the new Sinfo::get_dev_info function to retrieve device information
in the platform-specific get_msi_params function. If the requested
device supports MSI, set the IRQ and MSI address/data register values to
enable MSIs in remappable format (see VT-d specification, section
5.1.2.2).
Currently only one MSI per device is supported as the subhandle in the
data register is always set to 0.
The new Sinfo::get_dev_info function can be used to retrieve information
for a PCI device with given source-id (SID). The function returns false
if no device information for the specified device exists.
The platform-specific get_msi_params function returns MSI parameters for
a device identified by PCI config space address. The function returns
false if either the platform or the device does not support MSI mode of
operation.
Extend the base-hw Irq_session_component class with _is_msi, _address
and _value variables required to support MSI mode of operation.
Return MSI configuration in info() function if _is_msi is set to true.
This commit adds rocket core on the Zynq FPGA support to base HW. It also takes
advantage of the new timer infrastructure introduced with the privileged 1.8 and
adds improved TLB flush support.
fixes#1880
Do not build core-muen_on library without the muen soecifier set.
Do not reference files of the muen contrib directory in the first
pass of make's rule analysis, when parding the muen specific kernel
makefile.
Fix#1859
The new implementation of the FPU and FPU context is taken out to
separate architecture-dependent header files. The generic Cpu_lazy_state
is deleted. There is no hint about the existence of something like an
FPU in the generic non-architexture-dependent code anymore. Instead the
architecture-dependent CPU context of a thread is extended by an FPU
context where supported.
Moreover, the current FPU implementations are enhanced so that threads
that get deleted now release the FPU when still obtaining it.
Fix#1855
This commit enables multi-processing for all Cortex A9 SoCs we currently
support. Moreover, it thereby enables the L2 cache for i.MX6 that was not
enabled until now. However, the QEMU variants hw_pbxa9 and hw_zynq still
only use 1 core, because the busy cpu synchronization used when initializing
multiple Cortex A9 cores leads to horrible boot times on QEMU.
During this work the CPU initialization in general was reworked. From now
on lots of hardware specifics were put into the 'spec' specific files, some
generic hook functions and abstractions thereby were eliminated. This
results to more lean implementations for instance on non-SMP platforms,
or in the x86 case where cache maintainance is a non-issue.
Due to the fact that memory/cache coherency and SMP are closely coupled
on ARM Cortex A9 this commit combines so different aspects.
Fix#1312Fix#1807
On ARM Cortex A9 platforms the external PL310 L2 cache controller
needs to be initialized dependent on the SoC. For instance on Pandaboard
it needs to call the firmware running in TrustZone's secure world,
on i.MX6 it initializes it directly, on other boards it doesn't need
to be initialized at all, because the bootloader already did so.
Therefore, we should implement the PL310 intialization in board specific
code and not in the base class implementation.
Ref #1312
This commit separates certain SMP aspects into 'spec/smp' subdirectories.
Thereby it simplifies non-SMP implementations again, where no locking
and several platform specific maintainance operations are not needed.
Moreover, it moves several platform specifics to appropriated places,
removes dead code from x86, and starts to turn global static pointers
into references that are handed over.
The main thread's UTCB, used during bootstrap of the main thread before
it allocates its context area, needs to be outside the virtual memory
area controlled by the RM session, because it is needed before the main
thread can access its RM session.
Fix#1804
The test threads previously used a stack size independent from the machine
word width. Qemu was previously configured to provide 64Mb of RAM which isn't
sufficient for x86_64.
Ref #1805
Upgrading the quota of a PD session on HW always triggers a "Quota
exceeded" warning. To prevent unecessary debugging effort in the future,
we explain in an in-code comment that the warning is normal.
Ref #1805
When capabilities are delegated to components, they are added to the UTCB of the
target thread. Before the thread is able to take out the capability id out of
the UTCB and adapt the user-level capability reference counter, it might happen
that another thread of the same component deletes the same capability because
its user-level reference counter reached zero. If the kernel then destroys the
capability, before the same capability id is taken out of all UTCBs, an
inconsitent view in the component is the result. To keep an consistent view in
the multi-threading scenario, the kernel now counts how often it puts a
capability into a UTCB. The threads on the other hand hint the kernel when they
took capabilities out of the UTCB, so the kernel can decrement the counter
again. Only when the counter is zero, capabilities can get destructed.
Fix#1623
Likewise on the x86 branch, we have to remove all virtual memory ranges from the
virtual memory allocator that are used by one-by-one mappings of I/O regions
used by the kernel.
Fix#1797
On the USB Armory, we want to secure different devices than on other i.MX53
implementations. Thus, add a board specific configuration that is interpreted
by the kernel Trustzone initialization.
Ref #1497
Enhance the VM state, that can be accessed by a VMM, by a member
'unsigned irq_injection'. In Kernel::Vm::proceed check, whether
irq_injection is set. If so, check whether irq_injection is a
non-secure IRQ. If so, let the PIC raise this IRQ in the VM and reset
irq_injection.
Ref #1497
'block_for_signal' and 'pending_signal' now set pending flag in signal context
in order to determine pending signal. The context list is also used by the
'Signal_receiver' during destruction.
Fixes#1738
Currently, when a signal arrives in the main thread, the signal dispatcher is
retrieved and called from the main thread, the dispatcher uses a proxy object
that in turn sends an RPC to the entry point. This becomes a problem when the
entry point destroys the dispatcher object, before the dispatch function has
been called by the main thread. Therefore, the main thread should simply send an
RPC to the entry point upon signal arrival and the dispatching should be handled
solely by the entry point.
Issue #1738
Destroying an object within the scope of a lambda/functor executed
in the object pool's apply function leads potentially to memory corruption.
Within the scope the corresponding object is locked and unlocked when
leaving the scope. Therefore, it is illegal to free the object's memory meanwhile.
This commit eliminates several places in core that destroyed wrongly in
the object pool's scope.
Fix#1713
* Move the Synced_interface from os -> base
* Align the naming of "synchronized" helpers to "Synced_*"
* Move Synced_range_allocator to core's private headers
* Remove the raw() and lock() members from Synced_allocator and
Synced_range_allocator, and re-use the Synced_interface for them
* Make core's Mapped_mem_allocator a friend class of Synced_range_allocator
to enable the needed "unsafe" access of its physical and virtual allocators
Fix#1697
Instead of holding SPEC-variable dependent files and directories inline
within the repository structure, move them into 'spec' subdirectories
at the corresponding levels, e.g.:
repos/base/include/spec
repos/base/mk/spec
repos/base/lib/mk/spec
repos/base/src/core/spec
...
Moreover, this commit removes the 'platform' directories. That term was
used in an overloaded sense. All SPEC-relative 'platform' directories are
now named 'spec'. Other files, like for instance those related to the
kernel/architecture specific startup library, where moved from 'platform'
directories to explicit, more meaningful places like e.g.: 'src/lib/startup'.
Fix#1673
Instead of returning pointers to locked objects via a lookup function,
the new object pool implementation restricts object access to
functors resp. lambda expressions that are applied to the objects
within the pool itself.
Fix#884Fix#1658
Propagating the user context-pointer from C++ code to the mode
transition assembly doesn't touch any CPU global data. Thus, we can
reduce the in-sync window.
Fixes#1223
Other platforms implement Kernel::Cpu_context stuff in
kernel/cpu_context.cc. On x86_64, it was implemented in
kernel/thread.cc. The commit fixes this inconsistency to the other
platforms.
Ref #1652
The distinction between Kernel::Thread and Kernel::Thread_base is
unnecessary as currently all Hw platforms would have the same content in
the latter class. Thus I've merged Kernel::Thread_base into
Kernel::Thread. Thereby, Kernel::Thread_event can be moved to
kernel/thread.h.
Ref #1652
The Muen-specific PIC implementation provides the irq_occurred()
function which is used to register an IRQ with the PIC upon thread
exception.
The occurred IRQs are stored in a boolean array internally and handed
out to a CPU via take_request().
The driver uses the timer page containing a vector and timer value to
implement the start_one_shot() and value() functions. The timer value
designates the absolute tick count of the next event.
The address of the time page is acquired using the get_memregion_info
Sinfo API function.
The Muen Sinfo API is used to retrieve information about the execution
environment of a subject running on the Muen Separation Kernel.
While the C++ API is defined in sinfo.h, musinfo.h specifies the
internal format of the information stored in the Sinfo pages provided by
the Muen SK. It is a copy of the file contained in the libmusinfo
library of the Muen project. That is the reason why the coding style in
this file differs from the official style.
Move Platform::setup_irq_mode function from x86 platform_support.cc to
x86_64 specific file. This will enable the upcoming x86_64_muen platform
to provide a separate implementation.
The hw_x86_64_muen platform is a x86/64 base-hw kernel which runs as
isolated subject (guest) on the Muen Separation Kernel (SK) [1].
The platform is implemented as an extension to hw_x86_64 replacing the
PIC and timer drivers with paravirtualized variants. The skeleton
contains a dummy PIC and timer implementation for now.
[1] - http://muen.sk
Add spin loop hint by means of the PAUSE instruction since
wait_for_interrupt is called in a busy loop. This should improve processor
performance and reduce power consumption.
Note: HLT cannot be used since it is a privileged instruction and the idle
thread is executed in userspace.
Move the _core_only_mmio_regions function to the
x86_64/platform_support.cc file. This is required to make it overridable
for other platforms deriving from x86.
For most platforms except of NOVA a distinction between pager entrypoint
and pager activation is not needed, and only exists due to historical
reasons. Moreover, the pager thread's execution path is almost identical
between most platforms excluding NOVA, HW, and Fisco.OC. Therefore,
this commit unifies the pager loop for the other platforms, and removes
the pager activation class.
Moves the Bios Data Area header from base-hw to base. Modifies the
base-nova core console that it uses the header as replacement for
the previous BDA bit logic.
Ref #1625
Three things were done:
* Timouts are measured in an asynchronous way to be able to start counters
after the potentially expensive RPC that starts the timeout.
* Timeouts were increased from 45 and 15 seconds to 60 and 20 seconds
because at least on Arndale, results were not stable enough.
* Counting is done on 'unsigned long long' instead of 'unsigned' because
with the higher timeouts, overflows occured.
Fixes#1628
Since the HW-kern-caps commit, there was a bug in the Platform_thread
constructor. When called for a user thread, the constructor stated 0
as CPU quota at the Kernel_object instead of its quota input-paramater.
Fixes#1620
Instead of using the Genode user-level signal API to signal page-faults to
a page-fault handler, use the kernel API directly. Thereby the accounting
of signal contexts needed for each paging subject can be done easily.
Fix#956
Moreover, be strict when calculating the page-table requirements of
core, which is architecture specific, and declare the virtual memory
requirements of core architecture-wise.
Ref #1588
The ~Irq_session_component relied on the IRQ number obtained by the
corresponding kernel IRQ object to mark the IRQ as free at the IRQ
allocator. However, since the kernel IRQ object is initialized not
before the 'sigh' function is called, the IRQ of sessions that
never called 'sigh' could not be freed correctly. This patch fixes
the problem by not relying on the kernel IRQ object for obtaining
the number in the destructor but using the '_irq_number' member
variable instead.
Instead of organizing page tables within slab blocks and allocating such
blocks dynamically on demand, replace the page table allocator with a
simple, static alternative. The new page table allocator is dimensioned
at compile-time. When a PD runs out of page-tables, we simply flush its
current mappings, and re-use the freed tables. The only exception is
core/kernel that should not produce any page faults. Thereby it has to
be ensured that core has enough page tables to populate it's virtual
memory.
A positive side-effect of this static approach is that the accounting
of memory used for page-tables is now possible again. In the dynamic case
there was no protocol existent that solved the problem of donating memory
to core during a page fault.
Fix#1588
This patch enable clients of core's TRACE service to obtain the
execution times of trace subjects (i.e., threads). The execution time is
delivered as part of the 'Subject_info' structure.
Right now, the feature is available solely on NOVA. On all other base
platforms, the returned execution times are 0.
Issue #813
Add a Platform::setup_irq_mode function which enables the IRQ session to
update the trigger mode and polarity of the associated IRQ according to
the session parameters. On ARM this function is a nop.
This change enables the x86_64 platform to support devices which use
arbitrary trigger modes and polarity settings, e.g. AHCI on QEMU and
real hardware.
Fixes#1528.
Because of helping, it is possible that a core thread that wants to
destroy another thread at the kernel is using the scheduling context of
the thread that shall be destroyed at this point in time. When building
without GENODE_RELEASE defined, this always triggers an assertion in the
kernel. But when building with GENODE_RELEASE defined, this might silently
lead to kernel-memory corruption. This commit eliminates the latter case.
Should be reverted as soon as the scheduler is able to remove its head.
Ref #1537
Placement new can be misleading, as we already overload the new operator
to construct objects via pointers to allocators. To prohibit any problems here,
and to use one consistent approach, we can explicitely construct the object
with the already available 'construct_at' template function.
Ref #1443
* Introduce a hw specific Address_space interface for protection
domains, which combines all memory-virtualization related functionality
* Introduce a core-specific Platform_pd object that solves all the hen-egg
problems formerly distributed in kernel and core-platform code
Ref #595
Ref #1443
The assumption that IRQs in the legacy ISA range are always
edge-triggered is wrong. For the free-for-use IRQs it depends on the
actual device which uses the specific IRQ. Therefore, treat IRQs 9, 10
and 11 as level-triggered.
Enable a platform to specify how the MMIO memory allocator is to be
initialized. On ARM the existing behavior is kept while on x86 the I/O
memory is defined as the entire address space excluding the core only
RAM regions. This aligns the hw_x86_64 I/O memory allocator
initialization with how it is done for other x86 kernels such as NOVA or
Fiasco.
Perform lazy-initialization of FPU state when it is enabled for the
first time. This assures that the FXSAVE area (including the stored
MXCSR) is always properly setup and initialized to the platform default
values.
Perform all FPU-related setup in the Cpu class' init_fpu function instead of
the general system bring-up assembly code.
Set all required control register 0 and 4 flags according to Intel SDM Vol. 3A,
sections 9.2 and 9.6 instead of only enabling FPU error reporting and OSFXSR.
In the past, when the user blocked for an IRQ signal, the last signal was
acknowledged automatically thereby unmasking the IRQ. Now, the signal session
got a dedicated RPC for acknowledging IRQs and the HW back-end of that RPC
acknowledged the IRQ signal too. This led to the situation that IRQs were
unmasked twice. However, drivers expect an interrupt to be unmasked only on
the Irq_session::ack_irq and thus IRQ unmasking was moved from
Kernel::ack_signal to a dedicated kernel call.
Fixes#1493
The thread library (thread.cc) in base-foc shared 95% of the code with
the generic implementation except myself(). Therefore, its
implementation is now separated from the other generic sources into
myself.cc, which allows base-foc to use a foc-specific primitive to
enable our base libraries in L4Linux.
Issue #1491
Physical CPU quota was previously given to a thread on construction only
by directly specifying a percentage of the quota of the according CPU
session. Now, a new thread is given a weighting that can be any value.
The physical counter-value of such a weighting depends on the weightings
of the other threads at the CPU session. Thus, the physical quota of all
threads of a CPU session must be updated when a weighting is added or
removed. This is each time the session creates or destroys a thread.
This commit also adapts the "cpu_quota" test in base-hw accordingly.
Ref #1464
This patch adds const qualifiers to the functions Allocator::consumed,
Allocator::overhead, Allocator::avail, and Range_allocator::valid_addr.
Fixes#1481
Instead of handing over object ids to the kernel, which has to find them
in object pools then, core can simply use object pointers to reference
kernel objects.
Ref #1443
Instead of having an ID allocator per object class use one global allocator for
all. Thereby artificial limitations for the different object types are
superfluent. Moreover, replace the base-hw specific id allocator implementation
with the generic Bit_allocator, which is also memory saving.
Ref #1443
The verb "bin" in the context of destroying kernel objects seems pretty
unusual in contrast to "delete". When reading "bin" in the context of
systems software an association to something like "binary" is more likely.
Ref #1443
* Instead of using local capabilities within core's context area implementation
for stack allocation/attachment, simply do both operations while stack gets
attached, thereby getting rid of the local capabilities in generic code
* In base-hw the UTCB of core's main thread gets mapped directly instead of
constructing a dataspace component out of it and hand over its local
capability
* Remove local capability implementation from all platforms except Linux
Ref #1443
The global capability ID counter is not used by NOVA and Fiasco.OC
and in the future not needed by base-hw too. Thereby, remove the static
counter variable from the generic code base and add it where appropriated.
Ref #1443
Enable platform specific allocations and ram quota accounting for
protection domains. Needed to allocate object identity references
in the base-hw kernel when delegating capabilities via IPC.
Moreover, it can be used to account translation table entries in the
future.
Ref #1443
There are lots of places where a numeric argument of an argument string
gets extraced as signed long value and then assigned to an unsigned long
variable. If the value in the string was negative, it would not be
detected as invalid (and replaced by the default value), but become a
positive bogus value.
With this patch, numeric values which are supposed to be unsigned get
extracted with the 'ulong_value()' function, which returns the default
value for negative numbers.
Fixes#1472
There were two bugs. First, the caller of Kernel::await_signal wasn't
re-activated for scheduling. Second, the caller did not memorize that he
doesn't wait on a receiver anymore which had bad side effects on further
signal handling.
Fix#1459
The port uses the Cortex-A9 private timer for the kernel and an EPIT as
user timer. It was successfully tested on the Wandboard Quad and the CuBox-i
with the signal test. It lacks L2-cache and Trustzone support by now.
Thanks to Praveen Srinivas (IIT Madras, India) and Nikolay Golikov (Ksys Labs
LLC, Russia). This work is partially based on their contributions.
Fix#1467
Do not mask edge-triggered interrupts to avoid losing them while masked,
see Intel 82093AA I/O Advanced Programmable Interrupt Controller
(IOAPIC) specification, section 3.4.2, "Interrupt Mask":
"When this bit is 1, the interrupt signal is masked. Edge-sensitive
interrupts signaled on a masked interrupt pin are ignored (i.e., not
delivered or held pending)"
Or to quote Linus Torvalds on the subject:
"Now, edge-triggered interrupts are a _lot_ harder to mask, because the
Intel APIC is an unbelievable piece of sh*t, and has the edge-detect
logic _before_ the mask logic, so if a edge happens _while_ the device
is masked, you'll never ever see the edge ever again (unmasking will not
cause a new edge, so you simply lost the interrupt)."
So when you "mask" an edge-triggered IRQ, you can't really mask it at
all, because if you did that, you'd lose it forever if the IRQ comes in
while you masked it. Instead, we're supposed to leave it active, and set
a flag, and IF the IRQ comes in, we just remember it, and mask it at
that point instead, and then on unmasking, we have to replay it by
sending a self-IPI." [1]
[1] - http://yarchive.net/comp/linux/edge_triggered_interrupts.html
Ref #1448
In order to match the I/O APIC configuration, a request for user timer
IRQ 0 is remapped to vector 50 (Board::TIMER_VECTOR_USER), all other
requests are transposed by adding the vector offset 48
(Board::VECTOR_REMAP_BASE).
* Enable the use of the FXSAVE and FXRSTOR instructions, see Intel SDM
Vol. 3C, section 2.5.
* The state of the x87 floating point unit (FPU) is loaded and saved on
demand.
* Make the cr0 control register accessible in the Cpu class. This is in
preparation of the upcoming FPU management.
* Access to the FPU is disabled by setting the Task Switch flag in the cr0
register.
* Access to the FPU is enabled by clearing the Task Switch flag in the cr0
register.
* Implement FPU initialization
* Add is_fpu_enabled helper function
* Add pointer to CPU lazy state to CPU class
* Init FPU when finishing kernel initialization
* Add function to retry FPU instruction:
Similar to the ARM mechanism to retry undefined instructions, implement a
function for retrying an FPU instruction. If a floating-point instruction
causes an #NM exception due to the FPU being disabled, it can be retried
after the correct FPU state is restored, saving the current state and
enabling the FPU in the process.
* Disable FPU when switching to different user context:
This enables lazy save/restore of the FPU since trying to execute a
floating point instruction when the FPU is disabled will cause a #NM
exception.
* Declare constant for #NM exception
* Retry FPU instruction on #NM exception
* Assure alignment of FXSAVE area:
The FXSAVE area is 512-byte memory region that must be 16-byte aligned. As
it turns out the alignment attribute is not honored in all cases so add a
workaround to assure the alignment constraint is met by manually rounding
the start of the FXSAVE area to the next 16-byte boundary if necessary.
The LAPIC timer is programmed in one-shot mode with vector 32
(Board::TIMER_VECTOR_KERNEL). The timer frequency is measured using PIT
channel 2 as reference (50ms delay).
Disable PIT timer channel 0 since BIOS programs it to fire periodically.
This avoids potential spurious timer interrupts.
The implementation initializes the Local APIC (LAPIC) of CPU 0 in xapic
mode (mmio register access) and uses the I/O APIC to remap, mask and
unmask hardware IRQs. The remapping offset of IRQs is 48.
Also initialize the legacy PIC and mask all interrupts in order to
disable it.
For more information about LAPIC and I/O APIC see Intel SDM Vol. 3A,
chapter 10 and the Intel 82093AA I/O Advanced Programmable Interrupt
Controller (IOAPIC) specification
Set bit 9 in the RFLAGS register of user CPU context to enable
interrupts on kernel- to usermode switch.
Make the local APIC accessible via its MMIO region by adding a 2 MB
large page mapping at 0xfee00000 with memory type UC.
Note: The mapping is added to the initial page tables to make the APIC
usable prior to the activation of core's page tables, e.g. in the
constructor of the timer class.
The location in memory is arbitrary but we use the same address as the
ARM architecture. Adjust references to virtual addresses in the mode
transition pages to cope with 64-bit values.
The interrupt stack must reside in the mtc region in order to use it for
non-core threads. The size of the stack is set to 56 bytes in order to
hold the interrupt stack frame plus the additional vector number that is
pushed onto the stack by the ISR.
Call the _virt_mtc_addr function with the _mt_isrs label to calculate
the ISR base address in Idt::setup. Again, assume the address to be
below 0x10000.
Use parameter instead of class member variable because it would get
stored into the mtc region otherwise. In a further iteration only the
actual IDT should be saved into the mtc, not the complete class
instance. Currently the class instance size is equal to the IDT table
size.
The class provides the load() function which reloads the GDTR with the
GDT address in the mtc region. This is needed to make the segments
accessible to non-core threads.
Make the _gdt_start label global to use it in the call to
_virt_mtc_addr().
Use the _mt_tss label and the placement new operator to create the
Tss class instance in the mtc region. Update the hard-coded
TSS base address to use the virtual mtc address.
On exception, the CPU first checks the IDT in order to find the
associated ISR. The IDT must therefore be placed in the mode transition
pages to make them available for non-core threads.
The limit is set to match the TSS size - 1 and the base address is
hardcoded to the *current* address of the TSS instance (0x3a1100).
TODO: Set the base address using the 'tss' label. If the TSS descriptor
format were not so utterly unusable this would be straightforward.
Changes to the code that indirectly lead to a different location
of the tss result in #GP since the base address will be invalid.
The class Genode::Tss represents a 64-bit Task State Segment (TSS) as
specified by Intel SDM Vol. 3A, section 7.7.
The setup function sets the stack pointers for privilege levels 0-2 to
the kernel stack address. The load function loads the TSS segment
selector into the task register.
Implement user argument setter and getter support functions. The mapping of
the state registers corresponds to the system call parameter passing
convention.
The instruction pointer is the first field of the master context and can
directly be used as a jump argument, which avoids additional register
copy operations.
Point stack to client context region and save registers using push
instructions.
Note that since the push instruction first increments the stack pointer
and then stores the value on the stack, the RSP has to point one field
past RBP before pushing the first register value.
As the kernel entry is called from the interrupt handler the stack
layout is as specified by Intel SDM Vol. 3A, figure 6-8. An additional
vector number is stored at the top of the stack.
Gather the necessary client information from the interrupt stack frame
and store it in the client context.
The new errcode field is used to store the error code that some
interrupts provide (e.g. #PF). Rework mode transition reserved space and
offset constants to match the new CPU_state layout.
The macros are used to assign syscall arguments to specific registers.
Using the AMD64 parameter passing convention avoids additional copying of
variables since the C++ function parameters are already in the right
registers.
The interrupt return instruction in IA-32e mode applies the prepared
interrupt stack frame to set the RFLAGS, CS and SS segment as well as
the RIP and RSP registers. It then continues execution of the user code.
For detailed information refer to Intel SDM Vol. 3A, section 6.14.3.
After activating the client page tables the client context cannot be
accessed any longer. The mode transition buffer however is globally
mapped and can be used to restore the remaining register values.
Set the stack pointer to the R8 field in the client context to enable
restoring registers by popping values of the stack.
After this step the only remaining registers that do not contain client
values are RAX, RSP and RIP.
Note that the client value of RAX is pop'd to the global buffer region as
the register will still be used by subsequent steps. It will be restored to
the value in the buffer area just prior to resuming client code execution.
Set I/O privilege level to 3 to allow core to perform port I/O from
userspace. Also make sure the IF flag is cleared for now until interrupt
handling is implemented.
Setup an IA-32e interrupt stack frame in the mode transition buffer region.
It will be used to perform the mode switch to userspace using the iret
instruction.
For detailed information about the IA-32e interrupt stack frame refer to
Intel SDM Vol. 3A, figure 6-8.
The constants specify offset values of CPU context member variables as
specified by Genode::Cpu_state [1] and Genode::Cpu::Context [2].
[1] - repos/base/include/x86_64/cpu/cpu_state.h
[2] - repos/base-hw/src/core/include/spec/x86/cpu.h
The new entries specify a 64-bit code segment with DPL 3 at index 3 and a
64-bit data segment with DPL 3 at index 4.
These segments are needed for transitioning to user mode.
A pointer to the client context is placed in the mt_client_context_ptr area.
It is used to pass the current client context to the lowlevel mode-switching
assembly code.
IA-32e paging translates 48-bit linear addresses to 52-bit physical
addresses. Translation structures are hierarchical and four levels deep.
The current implementation supports regular 4KB and 1 GB and 2 MB large
page mappings.
Memory typing is not yet implemented since the encoded type bits depend
on the active page attribute table (PAT)*.
For detailed information refer to Intel SDM Vol. 3A, section 4.5.
* The default PAT after power up does not allow the encoding of the
write-combining memory type, see Intel SDM Vol. 3A, section 11.12.4.
* Add common IA-32e paging descriptor type:
The type represents a table entry and encompasses all fields shared by
paging structure entries of all four levels (PML4, PDPT, PD and PT).
* Simplify PT entry type by using common descriptor:
Differing fields are the physical address, the global flag and the memory
type flags.
* Simplify directory entry type by using common descriptor:
Page directory entries (PDPT and PD) have an additional 'page size' field
that specifies if the entry references a next level paging structure or
represents a large page mapping.
* Simplify PML4 entry type by using common descriptor
Top-level paging structure entries (PML4) do not have a 'pat' flag and the
memory type is specified by the 'pwt' and 'pcd' fields only.
* Implement access right merging for directory paging entries
The access rights for translations are determined by the U/S, R/W and XD
flags. Paging structure entries that reference other tables must provide
the superset of rights required for all entries of the referenced table.
Thus merge access rights of new mappings into existing directory entries to
grant additional rights if needed.
* Add cr3 register definition:
The control register 3 is used to set the current page-directory base
register.
* Add cr3 variable to x86_64 Cpu Context
The variable designates the address of the top-level paging structure.
* Return current cr3 value as translation table base
* Set context cr3 value on translation table assignment
* Implement switch to virtual mode in kernel
Activate translation table in init_virt_kernel function by updating the
cr3 register.
* Ignore accessed and dirty flags when comparing existing table entries
These flags can be set by the MMU and must be disregarded.
* Add isr.s assembler file:
The file declares an array of Interrupt Service Routines (ISR) to handle
the exception vectors from 0 to 19, see Intel SDM Vol. 3A, section
6.3.1.
* Add Idt class:
* The class Genode::Idt represents an Interrupt Descriptor Table as
specified by Intel SDM Vol. 3A, section 6.10.
* The setup function initializes the IDT with 20 entries using the ISR
array defined in the isr.s assembly file.
* Setup and load IDT in Genode::Cpu ctor:
The Idt::setup function is only executed once on the BSP.
* Declare ISRs for interrupts 20-255
* Set IDT size to 256