mirror of
https://github.com/openwrt/openwrt.git
synced 2025-01-07 14:28:50 +00:00
237 lines
9.7 KiB
Diff
237 lines
9.7 KiB
Diff
|
From: Pablo Neira Ayuso <pablo@netfilter.org>
|
||
|
Date: Wed, 24 Mar 2021 02:30:55 +0100
|
||
|
Subject: [PATCH] docs: nf_flowtable: update documentation with
|
||
|
enhancements
|
||
|
|
||
|
This patch updates the flowtable documentation to describe recent
|
||
|
enhancements:
|
||
|
|
||
|
- Offload action is available after the first packets go through the
|
||
|
classic forwarding path.
|
||
|
- IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at
|
||
|
this stage.
|
||
|
- Tuple has been augmented to track VLAN id and PPPoE session id.
|
||
|
- Bridge and IP forwarding integration, including bridge VLAN filtering
|
||
|
support.
|
||
|
- Hardware offload support.
|
||
|
- Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table
|
||
|
listing.
|
||
|
- Replace 'flow offload' by 'flow add' in example rulesets (preferred
|
||
|
syntax).
|
||
|
- Describe existing cache limitations.
|
||
|
|
||
|
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
||
|
---
|
||
|
|
||
|
--- a/Documentation/networking/nf_flowtable.rst
|
||
|
+++ b/Documentation/networking/nf_flowtable.rst
|
||
|
@@ -4,35 +4,38 @@
|
||
|
Netfilter's flowtable infrastructure
|
||
|
====================================
|
||
|
|
||
|
-This documentation describes the software flowtable infrastructure available in
|
||
|
-Netfilter since Linux kernel 4.16.
|
||
|
+This documentation describes the Netfilter flowtable infrastructure which allows
|
||
|
+you to define a fastpath through the flowtable datapath. This infrastructure
|
||
|
+also provides hardware offload support. The flowtable supports for the layer 3
|
||
|
+IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
|
||
|
|
||
|
Overview
|
||
|
--------
|
||
|
|
||
|
-Initial packets follow the classic forwarding path, once the flow enters the
|
||
|
-established state according to the conntrack semantics (ie. we have seen traffic
|
||
|
-in both directions), then you can decide to offload the flow to the flowtable
|
||
|
-from the forward chain via the 'flow offload' action available in nftables.
|
||
|
-
|
||
|
-Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
|
||
|
-output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
|
||
|
-path (the visible effect is that you do not see these packets from any of the
|
||
|
-netfilter hooks coming after the ingress). In case of flowtable miss, the packet
|
||
|
-follows the classic forward path.
|
||
|
-
|
||
|
-The flowtable uses a resizable hashtable, lookups are based on the following
|
||
|
-7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
|
||
|
-and destination ports and the input interface (useful in case there are several
|
||
|
-conntrack zones in place).
|
||
|
-
|
||
|
-Flowtables are populated via the 'flow offload' nftables action, so the user can
|
||
|
-selectively specify what flows are placed into the flow table. Hence, packets
|
||
|
-follow the classic forwarding path unless the user explicitly instruct packets
|
||
|
-to use this new alternative forwarding path via nftables policy.
|
||
|
+Once the first packet of the flow successfully goes through the IP forwarding
|
||
|
+path, from the second packet on, you might decide to offload the flow to the
|
||
|
+flowtable through your ruleset. The flowtable infrastructure provides a rule
|
||
|
+action that allows you to specify when to add a flow to the flowtable.
|
||
|
+
|
||
|
+A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
|
||
|
+transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
|
||
|
+classic IP forwarding path (the visible effect is that you do not see these
|
||
|
+packets from any of the Netfilter hooks coming after ingress). In case that
|
||
|
+there is no matching entry in the flowtable (ie. flowtable miss), the packet
|
||
|
+follows the classic IP forwarding path.
|
||
|
+
|
||
|
+The flowtable uses a resizable hashtable. Lookups are based on the following
|
||
|
+n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
|
||
|
+source and destination, layer 4 source and destination ports and the input
|
||
|
+interface (useful in case there are several conntrack zones in place).
|
||
|
+
|
||
|
+The 'flow add' action allows you to populate the flowtable, the user selectively
|
||
|
+specifies what flows are placed into the flowtable. Hence, packets follow the
|
||
|
+classic IP forwarding path unless the user explicitly instruct flows to use this
|
||
|
+new alternative forwarding path via policy.
|
||
|
|
||
|
-This is represented in Fig.1, which describes the classic forwarding path
|
||
|
-including the Netfilter hooks and the flowtable fastpath bypass.
|
||
|
+The flowtable datapath is represented in Fig.1, which describes the classic IP
|
||
|
+forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
|
||
|
|
||
|
::
|
||
|
|
||
|
@@ -67,11 +70,13 @@ including the Netfilter hooks and the fl
|
||
|
Fig.1 Netfilter hooks and flowtable interactions
|
||
|
|
||
|
The flowtable entry also stores the NAT configuration, so all packets are
|
||
|
-mangled according to the NAT policy that matches the initial packets that went
|
||
|
-through the classic forwarding path. The TTL is decremented before calling
|
||
|
-neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
|
||
|
-path given that the transport selectors are missing, therefore flowtable lookup
|
||
|
-is not possible.
|
||
|
+mangled according to the NAT policy that is specified from the classic IP
|
||
|
+forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
|
||
|
+traffic is passed up to follow the classic IP forwarding path given that the
|
||
|
+transport header is missing, in this case, flowtable lookups are not possible.
|
||
|
+TCP RST and FIN packets are also passed up to the classic IP forwarding path to
|
||
|
+release the flow gracefully. Packets that exceed the MTU are also passed up to
|
||
|
+the classic forwarding path to report packet-too-big ICMP errors to the sender.
|
||
|
|
||
|
Example configuration
|
||
|
---------------------
|
||
|
@@ -85,7 +90,7 @@ flowtable and add one rule to your forwa
|
||
|
}
|
||
|
chain y {
|
||
|
type filter hook forward priority 0; policy accept;
|
||
|
- ip protocol tcp flow offload @f
|
||
|
+ ip protocol tcp flow add @f
|
||
|
counter packets 0 bytes 0
|
||
|
}
|
||
|
}
|
||
|
@@ -103,6 +108,117 @@ flow is offloaded, you will observe that
|
||
|
does not get updated for the packets that are being forwarded through the
|
||
|
forwarding bypass.
|
||
|
|
||
|
+You can identify offloaded flows through the [OFFLOAD] tag when listing your
|
||
|
+connection tracking table.
|
||
|
+
|
||
|
+::
|
||
|
+ # conntrack -L
|
||
|
+ tcp 6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
|
||
|
+
|
||
|
+
|
||
|
+Layer 2 encapsulation
|
||
|
+---------------------
|
||
|
+
|
||
|
+Since Linux kernel 5.13, the flowtable infrastructure discovers the real
|
||
|
+netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
|
||
|
+parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
|
||
|
+VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
|
||
|
+flowtable datapath also deals with layer 2 decapsulation.
|
||
|
+
|
||
|
+You do not need to add the PPPoE and the VLAN devices to your flowtable,
|
||
|
+instead the real device is sufficient for the flowtable to track your flows.
|
||
|
+
|
||
|
+Bridge and IP forwarding
|
||
|
+------------------------
|
||
|
+
|
||
|
+Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
|
||
|
+flowtable infrastructure discovers the topology behind the bridge device. This
|
||
|
+allows the flowtable to define a fastpath bypass between the bridge ports
|
||
|
+(represented as eth1 and eth2 in the example figure below) and the gateway
|
||
|
+device (represented as eth0) in your switch/router.
|
||
|
+
|
||
|
+::
|
||
|
+ fastpath bypass
|
||
|
+ .-------------------------.
|
||
|
+ / \
|
||
|
+ | IP forwarding |
|
||
|
+ | / \ \/
|
||
|
+ | br0 eth0 ..... eth0
|
||
|
+ . / \ *host B*
|
||
|
+ -> eth1 eth2
|
||
|
+ . *switch/router*
|
||
|
+ .
|
||
|
+ .
|
||
|
+ eth0
|
||
|
+ *host A*
|
||
|
+
|
||
|
+The flowtable infrastructure also supports for bridge VLAN filtering actions
|
||
|
+such as PVID and untagged. You can also stack a classic VLAN device on top of
|
||
|
+your bridge port.
|
||
|
+
|
||
|
+If you would like that your flowtable defines a fastpath between your bridge
|
||
|
+ports and your IP forwarding path, you have to add your bridge ports (as
|
||
|
+represented by the real netdevice) to your flowtable definition.
|
||
|
+
|
||
|
+Counters
|
||
|
+--------
|
||
|
+
|
||
|
+The flowtable can synchronize packet and byte counters with the existing
|
||
|
+connection tracking entry by specifying the counter statement in your flowtable
|
||
|
+definition, e.g.
|
||
|
+
|
||
|
+::
|
||
|
+ table inet x {
|
||
|
+ flowtable f {
|
||
|
+ hook ingress priority 0; devices = { eth0, eth1 };
|
||
|
+ counter
|
||
|
+ }
|
||
|
+ ...
|
||
|
+ }
|
||
|
+
|
||
|
+Counter support is available since Linux kernel 5.7.
|
||
|
+
|
||
|
+Hardware offload
|
||
|
+----------------
|
||
|
+
|
||
|
+If your network device provides hardware offload support, you can turn it on by
|
||
|
+means of the 'offload' flag in your flowtable definition, e.g.
|
||
|
+
|
||
|
+::
|
||
|
+ table inet x {
|
||
|
+ flowtable f {
|
||
|
+ hook ingress priority 0; devices = { eth0, eth1 };
|
||
|
+ flags offload;
|
||
|
+ }
|
||
|
+ ...
|
||
|
+ }
|
||
|
+
|
||
|
+There is a workqueue that adds the flows to the hardware. Note that a few
|
||
|
+packets might still run over the flowtable software path until the workqueue has
|
||
|
+a chance to offload the flow to the network device.
|
||
|
+
|
||
|
+You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
|
||
|
+listing your connection tracking table. Please, note that the [OFFLOAD] tag
|
||
|
+refers to the software offload mode, so there is a distinction between [OFFLOAD]
|
||
|
+which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
|
||
|
+to the hardware offload datapath being used by the flow.
|
||
|
+
|
||
|
+The flowtable hardware offload infrastructure also supports for the DSA
|
||
|
+(Distributed Switch Architecture).
|
||
|
+
|
||
|
+Limitations
|
||
|
+-----------
|
||
|
+
|
||
|
+The flowtable behaves like a cache. The flowtable entries might get stale if
|
||
|
+either the destination MAC address or the egress netdevice that is used for
|
||
|
+transmission changes.
|
||
|
+
|
||
|
+This might be a problem if:
|
||
|
+
|
||
|
+- You run the flowtable in software mode and you combine bridge and IP
|
||
|
+ forwarding in your setup.
|
||
|
+- Hardware offload is enabled.
|
||
|
+
|
||
|
More reading
|
||
|
------------
|
||
|
|