If the attribute 'interface' is not set in a 'domain' tag, the router tries to
dynamically receive and maintain an IP configuration for that domain by using
DHCP in the client role at all interfaces that connect to the domain. In the
DHCP discover phase, the router simply chooses the first DHCP offer that
arrives. So, no comparison of different DHCP offers is done. In the DHCP
request phase, the server is expected to provide an IP address, a gateway, a
subnet mask, and an IP lease time to the router. If anything substantial goes
wrong during a DHCP exchange, the router discards the outcome of the exchange
and goes back to the DHCP discover phase. At any time where there is no valid
IP configuration present at a domain, the domain does only act as DHCP client
and all other router functionality is disabled for the domain. A domain cannot
act as DHCP client and DHCP server at once. So, a 'domain' tag must either
have an 'interface' attribute or must not contain a 'dhcp-server' tag.
Ref #2534
An IPv4 config (for a domain/interface of the router) consists of
an IPv4 address, a subnet prefix specifier, an optional gateway
IPv4 address, and some flags that declare whether these fields and
the config as a whole are valid. To make the handling of those
tightly connected values easier and less error prone, we encapsulate
them in a new class.
Ref #2534
Under certain circumstances we don't want inits state report to become too
outdated even if there is no change to its config or the sessions of its
children. This is the case if init is requested to provide a capability or RAM
info of it's children via its state report. Now, init automatically updates
the state report with each 1000 ms if the attribute 'child_caps' or
'child_ram' is positively set in the 'report' tag.
Timing itself costs time. Thus, the stressfull timeout phase of the
test is not exactly as long as set but a little bit longer. This is why the
fast timeouts are able to trigger more often than they are expected to
(the timer has a static timeout-rate limit). Normally we consider this effect
through an error tolerance of 10%. But at least on foc x86_32 (PIT with very
low max timeout), timing is so expensive that 10% is not enough. We have to
raise it to 11%.
This patch propages the 'Service_denied' condition of forwarded sessions
to the parent. Without it, the invalid session request stays pending
infinitely, which leads to the problem described in issue #2542. It
turns out that suggested solution given in the issue text is actually
not needed when applying this fix.
Fixes#2542
The ROM filter did not handle the situation where the generated content
exceeds the size of the initially allocated dataspace for the target
buffer. This patch wraps the XML generation in a retry loop that
expands the buffer as needed.
This patch makes the specification of screen coordinates more flexible.
First, the 'origin' attribute allows one to refer to either of the four
screen corners without knowing the screen size. Second, the 'width'
and 'height' values now accept negative values, which are relative to
the screen size.
This was an error output-line for each affected packet previously but it
is pretty normal for the router to receive packets whose network layer
protocol it doesn't know . In the default case, these packets shall be
ignored silently.
Ref #2490
One can configure the NIC router to act as DHCP server at interfaces of a
domain by adding the <dhcp> tag to the configuration of the domain like
this:
<domain name="vbox" interface="10.0.1.1/24">
<dhcp-server ip_first="10.0.1.80"
ip_last="10.0.1.100"
ip_lease_time_sec="3600"
dns_server="10.0.0.2"/>
...
</domain>
The attributes ip_first and ip_last define the available IPv4 address
range while ip_lease_time_sec defines the lifetime of an IPv4 address
assignment in seconds. The IPv4 address range must be in the subnet
defined by the interface attribute of the domain tag and must not cover
the IPv4 address in this attribute. The dns_server attribute gives the
IPv4 address of the DNS server that might also be in another subnet.
The lifetime of an offered assignment is the configured round trip time of
the router while the ip_lease_time_sec is applied only if the offer is
requested by the client in time.
The ports/run/virtualbox_nic_router.run script is an example of how to
use the new DHCP server functionality.
Ref #2490
Previously, garbage collect was only done when an incoming packet passed the
Ethernet checks. Now it is really done first when receiving a packet at an
interface.
Ref #2490
If the router has no gateway attribute for a domain (means that the router
itself is the gateway), and it gets an ARP request for a foreign IP, it shall
answer with its own IP.
Ref #2490
Do not use two times the RTT for the lifetime of links but use it as
it is configured to simplify the usage of the router. Internally, use
Microseconds/Duration type instead of plain integers.
Ref #2490
The nic_dump uses a wrapper for all supported protocols that
takes a packet and a verbosity configuration. The wrapper object can
than be used as argument for a Genode log function and prints the
packet's contents according to the given configuration. The
configuration is a distinct class to enable the reuse of one instance
for different packets.
There are currently 4 possible configurations for each protocol:
* NONE (no output for this protocol)
* SHORT (only the protocol name)
* COMPACT (the most important information densely packed)
* COMPREHENSIVE (all header information of this protocol)
Ref #2490
Provide utilities for appending new options to an existing DHCP packet
and a utility for finding existing options that returns a typed option
object. Remove old version that return untyped options.
Ref #2490
Apply the style rule that an accessor is named similar to the the underlying
value. Provide read and write accessors for each mandatory header attribute.
Fix some incorrect structure in the headers like with the flags field
in Ipv4_packet.
Ref #2490
Encapsulate the enum into a struct so that it is named
Ethernet_frame::Type::Enum, give it the correct storage type
uint16_t, and remove those values that are (AFAIK) not used by
now (genode, world).
Ref #2490
Do not stop routing if the transport layer protocol is unknown but
continue with trying IP routing instead. The latter was already
done when no transport routing could be applied but for unknown transport
protocols we caught the exception at the wrong place.
Ref #2490
No starvation of timeout signals
--------------------------------
Add several timeouts < 1ms to the stress test and check that timeout
handling doesn't become significantly unfair (starvation) in this situation
where some timeouts trigger nmuch faster than they get handled.
Rate limiting for timeout handling in timer
-------------------------------------------
Ensure that the timer does not handle timeouts again within 1000
microseconds after the last handling of timeouts. This makes denial of
service attacks harder. This commit does not limit the rate of timeout
signals handled inside the timer but it causes the timer to do it less
often. If a client continuously installs a very small timeout at the
timer it still causes a signal to be submitted to the timer each time
and some extra CPU time to be spent in the internal handling method. But
only every 1000 microseconds this internal handling causes user timeouts
to trigger.
If we would want to limit also the call of the internal handling method
to ensure that CPU time is spent beside the RPCs only every 1000
microseconds, things would get more complex. For instance, on NOVA
Time_source::schedule_timeout(0) must be called each time a new timeout
gets installed and becomes head of the scheduling queue. We cannot
simply overwrite the already running timeout with the new one.
Ref #2490
We did not set the correct now_period previously but it wasn't conspicuous
because the bug triggered not before a full period had passed which on most
platforms is a pretty long time.
Ref #2490
Ensure that the timer does not handle timeouts again within 1000
microseconds after the last handling of timeouts. This makes denial of
service attacks harder. This commit does not limit the rate of timeout
signals handled inside the timer but it causes the timer to do it less
often. If a client continuously installs a very small timeout at the
timer it still causes a signal to be submitted to the timer each time
and some extra CPU time to be spent in the internal handling method. But
only every 1000 microseconds this internal handling causes user timeouts
to trigger.
If we would want to limit also the call of the internal handling method
to ensure that CPU time is spent beside the RPCs only every 1000
microseconds, things would get more complex. For instance, on NOVA
Time_source::schedule_timeout(0) must be called each time a new timeout
gets installed and becomes head of the scheduling queue. We cannot
simply overwrite the already running timeout with the new one.
Ref #2490
We update the alarm-scheduler time with results of
Timer::Connection::curr_time when we schedule new timeouts but when
handling the signal from the Timer server we updated the alarm-scheduler
time with the result of Timer::Connection::elapsed_us. Mixing times
like this could cause a non-monotone time value in the alarm scheduler.
The alarm scheduler then thought that the time value wrapped and
triggered all timeouts immediately. The problem was fixed by always
using Timer::Connection::curr_time as time source.
Ref #2490
Create periodic and one-shot timeouts with the maximum duration
to see if triggers any corner-case bugs. They must not trigger during
the test.
Ref #2490
If we add an absolute timeout to the back-end alarm-scheduler we must first
call 'handle' at the scheduler to update its internal time value.
Otherwise, it might happen that we add a timeout who's deadline is so big that
it normally belongs to the next time-counter period but the scheduler thinks
that it belongs to the current period as its time is older than the one used
to calculate the deadline.
Ref #2490
When we have two time values of an unsigned integer type and we create
the difference and want to know wether it is positive or negative within
the same value we loose at least one half of the value range for casting
to signed integers. This was the case in the alarm scheduler when
checking wether an alarm already triggered. Even worse, we casted from
'unsigned long' to 'signed int' which caused further loss on at least
x86_64. Thus, big timeouts like ~0UL falsely triggered directly.
Now, we use an extra boolean value to remember in which period of the
time counter we are and to which period of the time counter the deadline
of an alarm belongs. This boolean switches its value each time the time
counter wraps. This way, we can avoid any casting by checking wether the
current time is of the same period as the deadline of the alarm that we
inspect. If so, the alarm is pending if "current time >= alarm
deadline", otherwise it is pending if "current time < alarm deadline".
Ref #2490
If the PIT timer driver gets activated too slow (e.g. because of a bad priority
configuration), it might miss counter wraps and would than produce sudden time
jumps. The driver now detects this problem dynamically, warns about it and
adapts the affected values to avoid time jumps.
Ref #2400
The NIC router always reports the link state "Up" (true) because
the effective link state depends on the targeted remote interface
and thus on the individual routing for each packet. Consequently,
also the signal handler for state changes gets ignored.
Ref #2490
IP stacks may treat a network interface as "down" when it states a MAC
address with the I/G bit (bit 40) set to "Group" (value 0) instead of
"Individual" (value 1). This was observed with a TinyCore 8 inside a
Virtualbox VM. Thus, the previously choosen 03:03:03:03:03:00 as base
for the MAC address allocator is bad. Now we use the 02:02:02:02:02:00
instead. This also ensures that the MAC addresses are not marked as
"Universal" but as "Local" (bit 41, value 1) which is correct in general
as the router allocates MAC addresses only for virtual networks.
Ref #2490
The NIC dump component didn't support forwarding of link states and link-state
signals until now. Furthermore, it now prints MAC address and link state
on session creation and on every link state change.
Ref #2490
Previously, the uplink session was created on component startup while the
creation of the downlink session is timed by the client component. This
created a time span in which packets from the uplink were dropped at the
nic_dump. Now the uplink session-request is done by the session component
of the downlink.
Ref #2490
Add a "writeable" policy option to the ahci_drv and part_blk Block
servers and default from writeable to ready-only. Should a policy
permit write acesss the session request argument "writeable" may still
downgrade a session to ready-only.
Fix#2469
There are hardware timers whose frequency can't be expressed as
ticks-per-microsecond integer-value because only a ticks-per-millisecond
integer-value is precise enough. We don't want to use expensive
floating-point values here but nonetheless want to translate from ticks
to time with microseconds precision. Thus, we split the input in two and
translate both parts separately. This way, we can raise precision by
shifting the values to their optimal bit position. Afterwards, the results
are shifted back and merged together again.
As this algorithm is not so trivial anymore and used by at least three
timer drivers (base-hw/x86_64, base-hw/cortex_a9, timer/pit), move it to a
generic header to avoid redundancy.
Ref #2400
Due to the simplicity of the algorithm that translated from timer ticks
to time, we lost microseconds precision although the timer allows for it.
Ref #2400
When synchronizing with the remote time source, we have to take care that the
measured time difference cannot become null because its real value is smaller
than the measurement granularity. Since the granularity is one microsecond, we
simply go on polling timestamp and time until the microsecond has passed.
This busy waiting should be no problem for the system for two reasons. First,
it is limited to a relatively small amount of time and second, a busy lock
does not happen because the time source that is responsible for the limiting
factor is explicitely called on each poll.
Ref #2400
The VFS library can be used in single-threaded or multi-threaded
environments and depending on that, signals are handled by the same thread
which uses the VFS library or possibly by a different thread. If a VFS
plugin needs to block to wait for a signal, there is currently no way
which works reliably in both environments.
For this reason, this commit makes the interface of the VFS library
nonblocking, similar to the File_system session interface.
The most important changes are:
- Directories are created and opened with the 'opendir()' function and the
directory entries are read with the recently introduced 'queue_read()'
and 'complete_read()' functions.
- Symbolic links are created and opened with the 'openlink()' function and
the link target is read with the 'queue_read()' and 'complete_read()'
functions and written with the 'write()' function.
- The 'write()' function does not wait for signals anymore. This can have
the effect that data written by a VFS library user has not been
processed by a file system server yet when the library user asks for the
size of the file or closes it (both done with RPC functions at the file
system server). For this reason, a user of the VFS library should
request synchronization before calling 'stat()' or 'close()'. To make
sure that a file system server has processed all write request packets
which a client submitted before the synchronization request,
synchronization is now requested at the file system server with a
synchronization packet instead of an RPC function. Because of this
change, the synchronization interface of the VFS library is now split
into 'queue_sync()' and 'complete_sync()' functions.
Fixes#2399
This patch changes init's service forwarding such that pending requests
are kept unanswered as long as the requested service is not present
(yet). In dynamic-init scenarios, this is needed in situtions where the
dynamic init is known to eventually provide the service but the internal
subsystem is not ready yet. Previously, a client that attempted to
request a session in this early phase would get a 'Service_denied'
exception. By deferring the forwarding in this situation, the behaviour
becomes deterministic.
If a matching '<service>' exists but there is no matching policy sub
node, the request is answered with 'Service_denied' - as expected.
The calibration of the interpolation parameters was previously only done
periodically every 500 ms. Together with the fact that the parameters
had to be stable for at least 3 calibration steps to enable
interpolation, it took at least 1.5 seconds after establishing a
connection to get microseconds-precise time values.
This is a problem for some drivers that directly start to poll time.
Thus, the timer connection now does a calibration burst as soon as it
switches to the modern mode (the mode with microseconds precision).
During this phase it does several (currently 9) calibration steps
without a delay inbetween. It is assumed that this is fast enough to not
get interrupted by scheduling. Thus, despite being small, the measured
values should be very stable which is why the burst should in most cases
be sufficient to get the interpolation initialized.
Ref #2400
When in modern mode (with local time interpolation), the timer
connection used to maximize the left shifting of its
timestamp-to-microseconds factor. The higher the shift the more precise
is the translation from timestamps to microseconds. If the timestamp
values used for determining the best shift were small - i.e. the delay
between the calibration steps were small - we may got a pretty big
shift. If we then used the shift with bigger timestamp values - i.e.
called curr_time seldom or raised calibration delays - the big shift
value became a problem. The framework had to scale down all measured
timestamps and time values temporarily to stay operative until the next
calibration step.
Thus, we now raise the shift only that much that the resulting factor
fullfills a given minimum. This keeps it as low as possible according
to the precision requirement. Currently, this requirement is set to 8
meaning that the shifted factor shall be at least 2^8 = 256.
Ref #2400
As the timer session now provides a method 'elapsed_us', there is no more need
for doing any internal calculations with values of milliseconds.
Ref #2400
As timer sessions are not expected to be microseconds precise (because
of RPC latency and scheduling), the session interface provided only a
method 'elapsed_ms' although the back end of this method in the timer
driver works with microseconds.
However, in some cases it makes sense to have a method 'elapsed_us'. The
values it returns might be milliseconds away from the "real" time but it
allows you to work with delays smaller than a millisecond without
getting a zero delta value.
This commit is motivated by the need for fast bursts of calibration
steps for the time interpolation in the new timer connection.
Ref #2400
The run script did not consider the routing for the environment ROM
sessions for the test-iso component. It routed all ROM sessions -
including the ones for the executable and the dynamic linker - to
fs_rom. The patch also adds the cap quota definitions required since
version 17.05 and fixes a whitespace inconsistency between the test
program and the run script.
Thanks to Steven Harp for reporting!
This is expected by hardware terminals, ie., terminal programs connected
to null-modem serial connections. Otherwise, the next line starts at the
column right after the last line.
The new version of the test exercises the combination of fs_report with
ram_fs and fs_rom as a more flexible alternative to report_rom.
It covers two corner cases that remained unaddressed by fs_rom and
ram_fs so far: First, the late installation of a ROM-update signal
handler at fs_rom right before the content of the file is modified.
Second, the case where the requested file is not present on the file
system at the creation time of the ROM session. Here, the ram_fs missed
to inform listeners for the compound directory about the later created
file.
This patch ensures that fs_rom delivers a ROM-update notification in the
case where the underlying file was changed in-between requesting the
initial ROM content and registering the signal handler.
With the introduction of the CONTENT_CHANGED notifications delivered via
the packet stream, the assumption that no more than one READ packet is
in flight at all times does no longer hold. If the fs server responds
to a CONTENT_CHANGED packet while the fs_rom expects the completion of a
read request, the '_update_dataspace' method would prematurely return,
leaving the dataspace unpopulated. This patch solves the problem by
specifically waiting for the completion of the read request.
On platforms that use the PIT timer driver, 'elapsed_ms' is pretty
inprecise/unsteady (up to 3 ms deviation) for a reason that is not
clearly determined yet. On Fiasco and Fiasco.OC, that use kernel timing,
it is the same. So, on these platforms, our locally interpolated time
seems to be fine but the reference time is bad. Until this is fixed, we
raise the error tolerance for these platforms in the run script.
Ref #2400
Appending a suffix to report filenames was behavior inherited from
fs_log, it prevents creating files where directories need to be created
later. But unlike logs, only a subset of the hierarchy will report and
those that do append a component-local label, so the risk of collision
is low.
By removing the suffix fs_rom can serve reports back as ROM just as
report_rom does.
Ref #2422
In the timeout framework, we maintain a translation factor value to
translate between time and timestamps. To raise precision we scale-up
the factor when we calculate it and scale-down the result of its
appliance later again. This up and down scaling is achieved through
left and right shifting. Until now, the shift width was statically
choosen. However, some platforms need a big shift width and others a
smaller one. The one static shift width couldn't cover all platforms
which caused overflows or precision problems.
Now, the shift width is choosen optimally for the actual translation
factor each time it gets re-calculated. This way, we can take care that
the shift always renders the best precision level without the risk for
overflows.
Ref #2400
The result-buffer related members of the fast polling test are
the same for each buffered result type. Thus, we can make the
code easier by providing them through a struct.
Ref #2400
On QEMU, NOVA uses the pretty unstable TSC emulation as primary time
source. Thus, timeouts do not trigger with the common precision (< 50
ms). Use an error tolerance of 200 ms for this platform constellation.
Ref #2400
The fast polling test uses one timer session for raw 'elapsed_ms' calls
and another one for potentially interpolated 'curr_time' calls. It then
compares the two results against each other. However, until now, the
test did not consider that the duration of the session construction may
create a remarkable shift between the local times of the two sessions.
This shift is now determined and compensated before doing any
comparison.
Ref #2400
The multiple-handlers test was checking if handlers at one signal were
activated in a fair manner. But on Qemu, the error tolerance of one was
too small in rare cases (2 of 100 runs). However, having multiple
handlers for the same signal context can be considered deprecated
anyway. With the recommended Signal_handler wrapper for signal sessions,
you can't use this feature. Thus, we removed the multiple-handlers test.
Fixes#2450