The memory barrier prevents the compiler from changing the program order
of memory accesses in such a way that accesses to the guarded resource
get outside the guarded stage. As cmpxchg() defines the start of the
guarded stage it also represents an effective memory barrier.
On x86, the architecture ensures to not reorder writes with older reads,
writes to memory with other writes (except in cases that are not
relevant for our locks), or read/write instructions with I/O
instructions, locked instructions, and serializing instructions.
However on ARM, the architectural memory model allows not only that
memory accesses take local effect in another order as their program
order but also that different observers (components that can access
memory like data-busses, TLBs and branch predictors) observe these
effects each in another order. Thus, a correct program order isn't
sufficient for a correct observation order. An additional architectural
preservation of the memory barrier is needed to achieve this.
Fixes#692
Invalidating all branch predictors before switching the PD
fixes instability problems on Panda and has not much effect
on the performance of other boards. However, we neither know why
this is a fix nor wether it fixes the real cause of the problem.
fix#1294
Previously, the timer was used to remember the state of the time slices.
This was sufficient before priorities entered the scene as a thread always
received a fresh time slice when he was scheduled away. However, with
priorities this isn't always the case. A thread can be preempted by another
thread due to a higher priority. In this case the low-priority thread must
remember how much time he has consumed from its current time slice because
the timer gets re-programmed. Otherwise, if we have high-priority threads
that block and unblock with high frequency, the head of the next lower
priority would start with a fresh time slice all the time and is never
superseded.
fix#1287
* When flushing the data and unified cache on ARM, clean and invalidate
instead of just cleaning the corresponding cache lines
* After zero-ing a freshly constructed dataspace in core, invalidate
corresponding cache lines from the instruction cache
After modifying mode transition for branch prediction tz_vmm wasn't
working anymore on hw_imx53_tz but the modifications had nothing to do
with the VM code. However, the amount of instructions in the MT before the
VM exception-vector changed. So I tried stuffing the last working version with
NOPs and found that tz_vmm worked for some NOP amounts and for others not.
Thus, I increased the alignment of the VM exception-vector from 16 bytes to 32
bytes, é voila, its working with any amount of NOPs as well as with branch
prediction commits.
ref #474
Previously, we did the protection-domain switches without a transitional
translation table that contains only global mappings. This was fine as long
as the CPU did no speculative memory accesses. However, to enabling branch
prediction triggers such accesses. Thus, if we don't want to invalidate
predictors on every context switch, we need to switch more carefully.
ref #474
When a page fault cannot be resolved, the GDB monitor can get a hint about
which thread faulted by evaluating the thread state object returned by
'Cpu_session::state()'. Unfortunately, with the current implementation,
the signal which informs GDB monitor about the page fault is sent before
the thread state object of the faulted thread has been updated, so it
can happen that the faulted thread cannot be determined immediately
after receiving the signal.
With this commit, the thread state gets updated before the signal is sent.
At least on base-nova it can also happen that the thread state is not
accessible yet after receiving the page fault notification. For this
reason, GDB monitor needs to retry its query until the state is
accessible.
Fixes#1206.
The build config for core is now provided through libraries to enable
implicit config composition through specifiers and thereby avoid
consideration of inappropriate targets.
fix#1199
A subject that inherits from Processor_client not necessarily has the need for
doing a processor-global TLB flush (e.g. VMs). At the other hand the Thread
class (as representation of the only source of TLB flushes) is already one of
the largest classes in base-hw because it provides all the syscall backends
and should therefore not accumulate other aspects without a functional reason.
Hence, I decided to move the aspect of synchronizing a TLB flush over all
processors to a dedicated class named Processor_domain_update.
Additionally a singleton of Processor_domain_update_list is used to enable
each processor to see all update-domain requests that are currently pending.
fix#1174
Commit 6a3368ee that refactored the mode transition assembler path, and
high-level entry point, fundamentally broke that part for the TrustZone VMs.
Instead of jumping to the appropriated address, the instruction value at that
point where used as target address.
Moreover, the TrustZone part of the mode transition page was not included into
the boundary check.
Ref #1182
On ARM it's relevant to not only distinguish between ordinary cached memory
and write-combined one, but also having non-cached memory too. To insert the
appropriated page table entries e.g.: in the base-hw kernel, we need to preserve
the information about the kind of memory from allocation until the pager
resolves a page fault. Therefore, this commit introduces a new Cache_attribute
type, and replaces the write_combined boolean with the new type where necessary.
Don't define assembler constants inside macros, thereby calling the
corresponding macros isn't needed anymore. To prevent having to much
constants included in files where they aren't needed, split macros.s
file into a generic mode_transition.s part, and globally used macros.s.
Fix#1180
Previously this was not done before Thread_base::start(..) in
base-hw as it was not needed to have a valid cap that early. However,
when changing the affinity of a thread we need the cap to be valid
before Thread_base::start(..).
fix#1151
By now the scheduling timer was only refreshed for a new scheduling timeout
when the choosen scheduling context has changed. But we want it to be refreshed
also when the scheduled context yields without an effect to the schedulers
choice (this is the case e.g. when the idle thread gets a scheduling timeout
or a thread yields without any competitor in its priority band).
ref #1151
Fixes an alignment problem introduced by commit "hw: map core on demand"
where physical address alignment wasn't checked anymore, when inserting
a section within the first-level table of ARM's short translation table
format.
Many thanks to Christian Prochaska for helping to debug the problem.
On ARM, when machine instructions get written into the data cache
(for example by a JIT compiler), one needs to make sure that the
instructions get written out to memory and read from memory into
the instruction cache before they get executed. This functionality
is usually provided by a kernel syscall and this patch adds a generic
interface for Genode applications to use it.
Fixes#1153.
This patch changes the top-level directory layout as a preparatory
step for improving the tools for managing 3rd-party source codes.
The rationale is described in the issue referenced below.
Issue #1082