In overload situations, i.e. when a sender fills up the entire buffer, we land
in situations where the sender receives an ack_avail signal, releases one
packet, allocates and sends a packet and fails to allocate a second packet.
This is especially relevant if the receiver does not batch ack_avail signals
(such as vfs_lwip). In those ping-pong scheduling scenarios, the overhead from
catching the Packet_alloc_failed exception becomes significant. In case of the
NIC router, we will land in an overload situation if the sender is faster than
the receiver. The packet buffer will be filled up at some point and the NIC
router starts to drop packets. For every dropped packet, we currently have to
catch the Packet_alloc_failed exception.
This commit adds a new method alloc_packet_attempt to Packet_stream_source that
has almost the same signature as the older alloc_packet method but returns
an Attempt<Packet_descriptor, Alloc_packet_error> object. As the method already
used the allocator back end exception-less, changes on lower levels were not
needed. Furthermore, the NIC router was modified to use the new exception-less
alloc_packet_attempt instead of alloc_packet.
Ref #4555
Replaces the former implementation of the 'find_by_ip' method at the data
structure for ARP cache entries. This method used to return a reference to the
found object and threw an exception if no matching object was found.
The new implementation doesn't return anything and doesn't throw exceptions. It
takes two lambda arguments instead. One for handling the case that a match was
found with a reference to the matching object as argument and another for
handling the case that no object matches.
This way, expensive exception handling can be avoided and object references
stay in a local scope.
Ref #4555
According to a benchmarking series on Zynq (base-hw) and x260 (base-nova) using
test-nic_perf_router, increasing the 'max_packets_per_signal' has a significant
effect on the packet throughput. By increasing the default value from 32
to 150, we could gain a few hundred Mbit/s. Increasing the value further
does not seem to have such a strong effect, though.
genodelabs/genode#4555
The checksums for forwarded/routed UDP, TCP and ICMP, used to be always
re-calculated from scratch in the NIC router although the router changes only
a few packet fields. This commit replaces the old approach whereever sensible
with an algorithm for incremental checksum updates suggested in RFC 1071.
The goal is to improve router performance.
Ref #4555
The checksums for forwarded/routed IPv4, used to be always re-calculated from
scratch in the NIC router although the router changes only a few packet fields.
This commit replaces the old approach whereever sensible with an algorithm for
incremental checksum updates suggested in RFC 1071. The goal is to improve
router performance.
Ref #4555
We used to use 'unsigned long' for the accumulating variable when calculating
internet checksums. However, 'signed long' is more in accordance with RFC 1071
and will allow us to share the same back end for folding, once we implement
incremental updating of internet checksums.
Ref #4555
Prevent public reflection of the only internally used 'init_sum' argument in
'uint16_t internet_checksum(...)' that, in addition, added a default value to
the function interface.
Ref #4555
When sending an ICMP ECHO reply, the router merely swaps SRC and DST of the
IPv4 header of the corresponding request and these changes cancel each other
out in checksum calculation. Therefore, with this commit, the router skips
updating the IPv4 checksum in this context.
Ref #4555
The router used to update IPv4 checksums when routing via an <ip> rule
despite the fact that it doesn't change any IPv4 header fields in this case.
Ref #4555
The NIC router used to update IPv4 and layer 4 checksums of a packet for each
interface it was sent to (say, all interfaces of the domain the packet was
routed to). However, there was and is no technical reason for not doing it
only once and then iterating over the interfaces with the already updated
packet. This is what this commit does in an intent to raise the router's
performance.
Ref #4555
The NIC router uses the timer for relatively coarse-grained timeouts.
It therefore suffices to update and store the current time when the NIC router
is signalled and use the cached time instead. This prevents frequent
syscalls or RPCs when acquiring the current time for every packet.
genodelabs/genode#4555
The link dissolve timeout is updated for every packet, which leads to
trigger_once() RPCs that only marginally change the scheduled timeout but
significantly slow down the packet throughput.
genodelabs/genode#4555
The wakeup call only emits a single signal as it assumed both are
handled by the same signal handler. However, the original implementation
did not reset the wakeup_needed variable properly.
genodelabs/genode#4555
When using signal batching, ack_avail and packet_avail should always
be emitted and preferred over ready_to_submit and ready_to_ack.
A signal receiver might decide to not register the ready_to_* signals when it
handles congestion by dropping packets. The Nic router is an example of
such a signal receiver.
genodelabs/genode#4555
Enables symetric-multi-processor support in the Linux kernel configuration
used as base for the driver ports for PC. This is done to be compliant with
common usage of x86 drivers today.
Moreover, this commit uses the original kernel source for softirq/tasklet
implementation to get rid of the insufficient shadow implementation
in the lx_emul sources.
Ref genodelabs/genode#4562
Prepare shadow implementations in spinlocks, pagetable defines,
and irq_stack assembler macros to be able to enable SMP on x86/PC.
Ref genodelabs/genode#4562
The test runs as lx_user task and uses several *delay and wait queue test
cases happened to be used in real ported linux drivers. The test shows
the time spent with several time sources, e.g. jiffies, rdtsc,
lx_time_counter_count etc.
Issue #4540
Use common implementation used by wifi and (not merged) audio driver.
Avoid usage of lib/delay.c since lpj and loop_for_jiffies are not calibrated
for the ported drivers as done on native Linux during boot and leads to wrong
delays for usb and intel_fb.
Issue #4540
Using the 'query_buffer_ppgtt()' function allows for retrieving the
virtual address of the buffer in the PPGTT.
This is for components that manage the GPU virtual addresses rather than
the client as is the case with the lima driver.
Issue #4559.
Adhere to include/linux/rcutiny.h behaviour, which sets the max timeout
for rcu_needs_cpu. Without the commit, in the most cases the timeout value
is zero (or random since the pointer is on stack uninitialized), which leads
to programming very short timeouts again and again, making the system never
idle.
Issue #4540