* Instead of always re-load page-tables when a thread context is switched
only do this when another user PD's thread is the next target,
core-threads are always executed within the last PD's page-table set
* remove the concept of the mode transition
* instead map the exception vector once in bootstrap code into kernel's
memory segment
* when a new page directory is constructed for a user PD, copy over the
top-level kernel segment entries on RISCV and X86, on ARM we use a designated
page directory register for the kernel segment
* transfer the current CPU id from bootstrap to core/kernel in a register
to ease first stack address calculation
* align cpu context member of threads and vms, because of x86 constraints
regarding the stack-pointer loading
* introduce Align_at template for members with alignment constraints
* let the x86 hardware do part of the context saving in ISS, by passing
the thread context into the TSS before leaving to user-land
* use one exception vector for all ARM platforms including Arm_v6
Fix#2091
* introduce new syscall (core-only) to create privileged threads
* take the privilege level of the thread into account
when doing a context switch
* map kernel segment as accessable for privileged code only
Ref #2091
Always switch to the "exception stack" instead of having a hardware initiated
stack switch during exceptions/interrupts when the privilege level changes only.
Moreover, this commit increases the exception stack slightly.
Ref #2091
* introduces central memory map for core/kernel
* on 32-bit platforms the kernel/core starts at 0x80000000
* on 64-bit platforms the kernel/core starts at 0xffffffc000000000
* mark kernel/core mappings as global ones (tagged TLB)
* move the exception vector to begin of core's binary,
thereby bootstrap knows from where to map it appropriately
* do not map boot modules into core anymore
* constrain core's virtual heap memory area
* differentiate in between user's and core's main thread's UTCB,
which now resides inside the kernel segment
Ref #2091
This was an error output-line for each affected packet previously but it
is pretty normal for the router to receive packets whose network layer
protocol it doesn't know . In the default case, these packets shall be
ignored silently.
Ref #2490
One can configure the NIC router to act as DHCP server at interfaces of a
domain by adding the <dhcp> tag to the configuration of the domain like
this:
<domain name="vbox" interface="10.0.1.1/24">
<dhcp-server ip_first="10.0.1.80"
ip_last="10.0.1.100"
ip_lease_time_sec="3600"
dns_server="10.0.0.2"/>
...
</domain>
The attributes ip_first and ip_last define the available IPv4 address
range while ip_lease_time_sec defines the lifetime of an IPv4 address
assignment in seconds. The IPv4 address range must be in the subnet
defined by the interface attribute of the domain tag and must not cover
the IPv4 address in this attribute. The dns_server attribute gives the
IPv4 address of the DNS server that might also be in another subnet.
The lifetime of an offered assignment is the configured round trip time of
the router while the ip_lease_time_sec is applied only if the offer is
requested by the client in time.
The ports/run/virtualbox_nic_router.run script is an example of how to
use the new DHCP server functionality.
Ref #2490
Previously, garbage collect was only done when an incoming packet passed the
Ethernet checks. Now it is really done first when receiving a packet at an
interface.
Ref #2490
If the router has no gateway attribute for a domain (means that the router
itself is the gateway), and it gets an ARP request for a foreign IP, it shall
answer with its own IP.
Ref #2490
Do not use two times the RTT for the lifetime of links but use it as
it is configured to simplify the usage of the router. Internally, use
Microseconds/Duration type instead of plain integers.
Ref #2490
The nic_dump uses a wrapper for all supported protocols that
takes a packet and a verbosity configuration. The wrapper object can
than be used as argument for a Genode log function and prints the
packet's contents according to the given configuration. The
configuration is a distinct class to enable the reuse of one instance
for different packets.
There are currently 4 possible configurations for each protocol:
* NONE (no output for this protocol)
* SHORT (only the protocol name)
* COMPACT (the most important information densely packed)
* COMPREHENSIVE (all header information of this protocol)
Ref #2490
Provide utilities for appending new options to an existing DHCP packet
and a utility for finding existing options that returns a typed option
object. Remove old version that return untyped options.
Ref #2490
Apply the style rule that an accessor is named similar to the the underlying
value. Provide read and write accessors for each mandatory header attribute.
Fix some incorrect structure in the headers like with the flags field
in Ipv4_packet.
Ref #2490
Encapsulate the enum into a struct so that it is named
Ethernet_frame::Type::Enum, give it the correct storage type
uint16_t, and remove those values that are (AFAIK) not used by
now (genode, world).
Ref #2490
Do not stop routing if the transport layer protocol is unknown but
continue with trying IP routing instead. The latter was already
done when no transport routing could be applied but for unknown transport
protocols we caught the exception at the wrong place.
Ref #2490
No starvation of timeout signals
--------------------------------
Add several timeouts < 1ms to the stress test and check that timeout
handling doesn't become significantly unfair (starvation) in this situation
where some timeouts trigger nmuch faster than they get handled.
Rate limiting for timeout handling in timer
-------------------------------------------
Ensure that the timer does not handle timeouts again within 1000
microseconds after the last handling of timeouts. This makes denial of
service attacks harder. This commit does not limit the rate of timeout
signals handled inside the timer but it causes the timer to do it less
often. If a client continuously installs a very small timeout at the
timer it still causes a signal to be submitted to the timer each time
and some extra CPU time to be spent in the internal handling method. But
only every 1000 microseconds this internal handling causes user timeouts
to trigger.
If we would want to limit also the call of the internal handling method
to ensure that CPU time is spent beside the RPCs only every 1000
microseconds, things would get more complex. For instance, on NOVA
Time_source::schedule_timeout(0) must be called each time a new timeout
gets installed and becomes head of the scheduling queue. We cannot
simply overwrite the already running timeout with the new one.
Ref #2490
In the past, a signal context, that was chosen for handling by
'Signal_receiver::pending_signal and always triggered again before
the next call of 'pending_signal', caused all other contexts behind
in the list to starve. This was the case because 'pending_signal'
always took the first pending context in its context list.
We avoid this problem now by handling pending signals in a round-robin
fashion instead.
Ref #2532
We did not set the correct now_period previously but it wasn't conspicuous
because the bug triggered not before a full period had passed which on most
platforms is a pretty long time.
Ref #2490
Ensure that the timer does not handle timeouts again within 1000
microseconds after the last handling of timeouts. This makes denial of
service attacks harder. This commit does not limit the rate of timeout
signals handled inside the timer but it causes the timer to do it less
often. If a client continuously installs a very small timeout at the
timer it still causes a signal to be submitted to the timer each time
and some extra CPU time to be spent in the internal handling method. But
only every 1000 microseconds this internal handling causes user timeouts
to trigger.
If we would want to limit also the call of the internal handling method
to ensure that CPU time is spent beside the RPCs only every 1000
microseconds, things would get more complex. For instance, on NOVA
Time_source::schedule_timeout(0) must be called each time a new timeout
gets installed and becomes head of the scheduling queue. We cannot
simply overwrite the already running timeout with the new one.
Ref #2490
The remote shell facilities are past deprecation and there is an
obligation to prevent their use rather than to support them. This patch
removes the related function definitions from 'unistd.h', which have not
been been included in the Genode libc ABI regardless.
Fix#2530
Remove getaddrinfo and freeaddrinfo from the Libc::Plugin and get rid of
the extra libc_resolv library. Remove getaddrinfo/freeaddrinfo symbol
hiding patch for FreeBSD sources. Remove libc_resolv from Makefiles and
run scenarios.
Fix#2273
Linux del_timer() and mod_timer() return if the timer was pending before
the modification. Additionally, these functions are potentially called
from handler function of the timer to modify and, therefore, checking
for timeout != INVALID_TIMEOUT is not sufficient as the timeout is
indeed valid when the handler is executed.
This patch fixes an aliasing problem of the 'close' method signature
that prevented the Input::Root_component::close method to be called.
This way, the event-queue state was not reset at session-close time,
which prevented a subsequent session-creation request to succeed. With
the patch, input servers like ps2_drv, usb_drv that rely on the
Input::Root_component support the dynamic re-opening of sessions. This
happens in particular when using a dynamically configured input filter.
We update the alarm-scheduler time with results of
Timer::Connection::curr_time when we schedule new timeouts but when
handling the signal from the Timer server we updated the alarm-scheduler
time with the result of Timer::Connection::elapsed_us. Mixing times
like this could cause a non-monotone time value in the alarm scheduler.
The alarm scheduler then thought that the time value wrapped and
triggered all timeouts immediately. The problem was fixed by always
using Timer::Connection::curr_time as time source.
Ref #2490
This recipe copies the entire stdcxx library into the API archive, which
is an interim solution until we introduce a proper ABI for stdcxx. With
this current version, every user of the stdcxx ABI will implicitly build
the stdcxx library.
This patch merges two similar rules, which create content at 'include'
into a single rule. This prevents a possible race condition when
creating archives in parallel.
We moved the stack-area segment 128 MiB behind text and data to comply
with assumptions in the kernel ELF loader.
This commit also reenables static binaries on linux and removes the
unused stack_area.stdlib.ld script.
Fixes#2521
This is a drivers subsystem that starts the most fundamental
(framebuffer, input, block) device drivers dynamically, depending on the
runtime-detected devices. The discovered block devices are reported
as a "block_devices" report.
This patch applies the handling of cursor keys, function keys, and page
up/down keys even if no keymap is defined. This is the case when using
the terminal with character events produced by the input filter.
This patch is a workaround for the apparent problem that noux
applications, which perform execve, implicitly use functionality from
the dynamic linker, not explicitly via the libc. If the binary lacks the
dependency information, noux will fail on the execve attempt. The latter
is the case when the noux package is built as a depot archive where
library dependencies are not traversed over multiple levels.
In nested scenarios like driver_manager.run, the initial session quota
for IO_PORT, IO_PORT, and IRQ sessions is expectedly insufficient.
However, the condition is properly handled by re-attemping the request
with a slightly increased quota. Still, core prints a warning each time
the request is denied for quota reasons, which spams the log. This patch
removes the non-critical message.
Create periodic and one-shot timeouts with the maximum duration
to see if triggers any corner-case bugs. They must not trigger during
the test.
Ref #2490
If we add an absolute timeout to the back-end alarm-scheduler we must first
call 'handle' at the scheduler to update its internal time value.
Otherwise, it might happen that we add a timeout who's deadline is so big that
it normally belongs to the next time-counter period but the scheduler thinks
that it belongs to the current period as its time is older than the one used
to calculate the deadline.
Ref #2490
When we have two time values of an unsigned integer type and we create
the difference and want to know wether it is positive or negative within
the same value we loose at least one half of the value range for casting
to signed integers. This was the case in the alarm scheduler when
checking wether an alarm already triggered. Even worse, we casted from
'unsigned long' to 'signed int' which caused further loss on at least
x86_64. Thus, big timeouts like ~0UL falsely triggered directly.
Now, we use an extra boolean value to remember in which period of the
time counter we are and to which period of the time counter the deadline
of an alarm belongs. This boolean switches its value each time the time
counter wraps. This way, we can avoid any casting by checking wether the
current time is of the same period as the deadline of the alarm that we
inspect. If so, the alarm is pending if "current time >= alarm
deadline", otherwise it is pending if "current time < alarm deadline".
Ref #2490
If the PIT timer driver gets activated too slow (e.g. because of a bad priority
configuration), it might miss counter wraps and would than produce sudden time
jumps. The driver now detects this problem dynamically, warns about it and
adapts the affected values to avoid time jumps.
Ref #2400
The cache directory content is slightly different on each prepare-port
run and fortunately not used at build time. So, we just remove it at the
end of port preparation.
When using the elfweaver to generate boot images, python stores
precompiled modules in the source directory besides the .py files. This
changed the contrib source tree with binary files specific to the build
host. As a result the depot create tool picked up the changed source
tree and produced strange new hashes. Now, the tool sources are copied
to the build directory where python can do its optimizations and the
depot stays clean.
The NIC router always reports the link state "Up" (true) because
the effective link state depends on the targeted remote interface
and thus on the individual routing for each packet. Consequently,
also the signal handler for state changes gets ignored.
Ref #2490
IP stacks may treat a network interface as "down" when it states a MAC
address with the I/G bit (bit 40) set to "Group" (value 0) instead of
"Individual" (value 1). This was observed with a TinyCore 8 inside a
Virtualbox VM. Thus, the previously choosen 03:03:03:03:03:00 as base
for the MAC address allocator is bad. Now we use the 02:02:02:02:02:00
instead. This also ensures that the MAC addresses are not marked as
"Universal" but as "Local" (bit 41, value 1) which is correct in general
as the router allocates MAC addresses only for virtual networks.
Ref #2490
The timer driver should always be of the highest priority to avoid
problem with timers that have low max-counter values like the PIT
with only 53 ms.
Ref #2400
The NIC dump component didn't support forwarding of link states and link-state
signals until now. Furthermore, it now prints MAC address and link state
on session creation and on every link state change.
Ref #2490
Previously, the uplink session was created on component startup while the
creation of the downlink session is timed by the client component. This
created a time span in which packets from the uplink were dropped at the
nic_dump. Now the uplink session-request is done by the session component
of the downlink.
Ref #2490
Add a "writeable" policy option to the ahci_drv and part_blk Block
servers and default from writeable to ready-only. Should a policy
permit write acesss the session request argument "writeable" may still
downgrade a session to ready-only.
Fix#2469
This should actually never happen. However if it happens, be a bit robuster
and don't provide the memory for re-use (which causes tons of other trouble
afterwards).
Issue #2505
Some x86 machines do have a LAPIC speed < 1000 ticks per millisecond
when configured to use the maximum divider (as it was always the case).
But we need microseconds precision for the timeout framework. Thus,
reduce the divider dynamically until the frequency fullfills our
requirements.
Ref #2400
There are hardware timers whose frequency can't be expressed as
ticks-per-microsecond integer-value because only a ticks-per-millisecond
integer-value is precise enough. We don't want to use expensive
floating-point values here but nonetheless want to translate from ticks
to time with microseconds precision. Thus, we split the input in two and
translate both parts separately. This way, we can raise precision by
shifting the values to their optimal bit position. Afterwards, the results
are shifted back and merged together again.
As this algorithm is not so trivial anymore and used by at least three
timer drivers (base-hw/x86_64, base-hw/cortex_a9, timer/pit), move it to a
generic header to avoid redundancy.
Ref #2400
Due to the simplicity of the algorithm that translated from timer ticks
to time, we lost microseconds precision although the timer allows for it.
Ref #2400
Due to the simplicity of the algorithm that translated from timer ticks
to time, we lost microseconds precision although the timer allows for it.
Ref #2400
When synchronizing with the remote time source, we have to take care that the
measured time difference cannot become null because its real value is smaller
than the measurement granularity. Since the granularity is one microsecond, we
simply go on polling timestamp and time until the microsecond has passed.
This busy waiting should be no problem for the system for two reasons. First,
it is limited to a relatively small amount of time and second, a busy lock
does not happen because the time source that is responsible for the limiting
factor is explicitely called on each poll.
Ref #2400
When building Genode on a Linux system running in a Xen Dom0, the 'xen'
run target can run a Genode scenario in a Xen DomU.
Usage: in build/x86_*/etc/build.conf, define:
RUN_OPT = --include boot_dir/$(KERNEL) --include image/iso --include power_on/xen --include log/xen --include power_off/xen
The Xen DomU runs in HVM mode and loads Genode from an ISO image. Serial
log output is printed to the console and graphical output is shown in an
SDL window.
The Xen DomU ist managed using the 'xl' command line tool and it is
possible to add configuration options in the 'xen_args' variable in a run
script. Common options are:
- disabling the graphical output:
append xen_args { sdl="0" }
- configuring a network device:
append xen_args { vif=\["model=e1000,mac=02:00:00:00:01:01,bridge=xenbr0"\] }
- configuring USB input devices:
append xen_args { usbdevice=\["mouse","keyboard"\] }
Note: the 'xl' tool requires super-user permissions and interactive
password input can be troublesome in combination with 'expect' and is not
practical for automatic tests. For this reason, the current implementation
assumes that no password input is needed when running 'sudo xl', which can
be achieved by creating a file '/etc/sudoers.d/xl' with the content
'user ALL=(root) NOPASSWD: /usr/sbin/xl'
(where 'user' is the Linux user name).
Fixes#2504
When running core as the kernel inside every component, a separate
stack area for core is needed that is different from the user-land
component's one.
Ref #2091
Acquire Signal_context objects locks via Object_pool::apply() in the
context of the entrpyoint thread, instead in the context of the calling
thread.
Fixes#2485
The files are generated via flex and bison. Until now, this step was
performed when preparing the libc port. Unfortunately, the generated
files have subtle differences depending on the flex/bison versions
installed in the host. For example, the bison version number appears in
the generated code. This, in turn, breaks the hash mechanism of the
depot where a src/libc archive ends up being slightly different when
created on different hosts.
By moving the code generation to the build stage, the src/libc archive
merely contains the nslexer.l and nsparser.y source files but not the
generated files.
- Fix fatal exception handling so that stack traces are dumped
- Add 'include/nim' directories to Nim module search path
- Enable release optimizations for release builds
Fix#2493
This patch removes the assertion about the unexpected call of
'block_for_signal' within core. On Linux, this call is actually
expected because of the handling of SIGCHLD signals by core.
When idle, menu_view de-schedules timer events to save processing time.
Once reactivated by a dialog update, it computes the passed time and
applies the result to the animator. However, the animation was most likely
started by the update not during the sleep. So the passed time must not
be applied to the animation in this case. Otherwise, many animation steps
are computed at once within a single visible frame.
Furthermore, the patch adjusts the REDRAW_PERIOD to 2, which is a better
value for geometric movements as opposed to mere color-blending effects
where the frame rate does not matter so much.
It also refines the nitpicker-buffer relocation in a way that extends
the buffer but does not shrink it. This lowers the interaction with
nitpicker in situations where the dialog size changes a lot.
By applying the text output to the alpha buffer in addition to the pixel
buffer, labels can now appear without the need for an underlying frame
or button.
The new widget allows one to align a child widget within a larger parent
widget by specifying the boolean attributes 'north', 'south', 'east',
and 'west'. If none is specified, the child is centered. If opposite
attributes are specified, the child is stretched.
This improves the output quality of antialiased lines onto a transparent
nitpicker buffer. For antialiased graphics operations, the initial color
leaks through. Leaking 50% gray is better than leaking black, in
particular when drawing white lines.
This patch makes the use of 'List' invisible at the 'Animator'
interface. This allows users of the utility to keep 'Animator::Items' in
a custom 'List' with no aliasing problems.
The VFS library can be used in single-threaded or multi-threaded
environments and depending on that, signals are handled by the same thread
which uses the VFS library or possibly by a different thread. If a VFS
plugin needs to block to wait for a signal, there is currently no way
which works reliably in both environments.
For this reason, this commit makes the interface of the VFS library
nonblocking, similar to the File_system session interface.
The most important changes are:
- Directories are created and opened with the 'opendir()' function and the
directory entries are read with the recently introduced 'queue_read()'
and 'complete_read()' functions.
- Symbolic links are created and opened with the 'openlink()' function and
the link target is read with the 'queue_read()' and 'complete_read()'
functions and written with the 'write()' function.
- The 'write()' function does not wait for signals anymore. This can have
the effect that data written by a VFS library user has not been
processed by a file system server yet when the library user asks for the
size of the file or closes it (both done with RPC functions at the file
system server). For this reason, a user of the VFS library should
request synchronization before calling 'stat()' or 'close()'. To make
sure that a file system server has processed all write request packets
which a client submitted before the synchronization request,
synchronization is now requested at the file system server with a
synchronization packet instead of an RPC function. Because of this
change, the synchronization interface of the VFS library is now split
into 'queue_sync()' and 'complete_sync()' functions.
Fixes#2399
This patch changes init's service forwarding such that pending requests
are kept unanswered as long as the requested service is not present
(yet). In dynamic-init scenarios, this is needed in situtions where the
dynamic init is known to eventually provide the service but the internal
subsystem is not ready yet. Previously, a client that attempted to
request a session in this early phase would get a 'Service_denied'
exception. By deferring the forwarding in this situation, the behaviour
becomes deterministic.
If a matching '<service>' exists but there is no matching policy sub
node, the request is answered with 'Service_denied' - as expected.
A boot module with size 0 previously made Core crash with a page fault in
Region_map_component::attach. This patch prevents the creation of ROM-FS
entries for such modules.
Ref #2490
For most base platforms (except linux and sel4), the initialization of
boot modules is the same. Thus, merge this default implementation in the
new unit base/src/core/platform_rom_modules.cc.
Ref #2490
In Region_map_component::attach, storing the metadata for a region may
throw an exception. Catch it and throw an Invalid_dataspace exception.
Ref #2490
Currently, init does not test wether a service is abandoned on a new
configuration if the service was routed via an any-child route. Trigger
this behaviour in the init test.
Ref #2483
We used a hardware timer locally in the RPI USB driver because a timer
connection was not precise enough to fullfill the host controllers
requirements.
With the modern timer connection interface, however, reading out time at
a connection is microseconds precise and we can remove the local timer.
But we cannot use the same timer connection for doing legacy-interface
stuff like usleep (currently used in LX kit) and modern-interface stuff
like curr_time. Thus, we open two connections for now.
Ref #2400
The calibration of the interpolation parameters was previously only done
periodically every 500 ms. Together with the fact that the parameters
had to be stable for at least 3 calibration steps to enable
interpolation, it took at least 1.5 seconds after establishing a
connection to get microseconds-precise time values.
This is a problem for some drivers that directly start to poll time.
Thus, the timer connection now does a calibration burst as soon as it
switches to the modern mode (the mode with microseconds precision).
During this phase it does several (currently 9) calibration steps
without a delay inbetween. It is assumed that this is fast enough to not
get interrupted by scheduling. Thus, despite being small, the measured
values should be very stable which is why the burst should in most cases
be sufficient to get the interpolation initialized.
Ref #2400
When in modern mode (with local time interpolation), the timer
connection used to maximize the left shifting of its
timestamp-to-microseconds factor. The higher the shift the more precise
is the translation from timestamps to microseconds. If the timestamp
values used for determining the best shift were small - i.e. the delay
between the calibration steps were small - we may got a pretty big
shift. If we then used the shift with bigger timestamp values - i.e.
called curr_time seldom or raised calibration delays - the big shift
value became a problem. The framework had to scale down all measured
timestamps and time values temporarily to stay operative until the next
calibration step.
Thus, we now raise the shift only that much that the resulting factor
fullfills a given minimum. This keeps it as low as possible according
to the precision requirement. Currently, this requirement is set to 8
meaning that the shifted factor shall be at least 2^8 = 256.
Ref #2400
The kernel timer on RPI is able to measure time microseconds-precise.
Howeer, due to a bug, we dropped precision during the ticks-to-time
translation and return only milliseconds-precise time.
Ref #2400
As the timer session now provides a method 'elapsed_us', there is no more need
for doing any internal calculations with values of milliseconds.
Ref #2400
As timer sessions are not expected to be microseconds precise (because
of RPC latency and scheduling), the session interface provided only a
method 'elapsed_ms' although the back end of this method in the timer
driver works with microseconds.
However, in some cases it makes sense to have a method 'elapsed_us'. The
values it returns might be milliseconds away from the "real" time but it
allows you to work with delays smaller than a millisecond without
getting a zero delta value.
This commit is motivated by the need for fast bursts of calibration
steps for the time interpolation in the new timer connection.
Ref #2400