Linux Firewalling in 2021 and a Gentle Introduction to NFTables Part II
Posted on Thu 28 October 2021 in Computing
Introduction
Following on from Part I, let's look at an example ruleset I built recently and walk through it.
As you can see it's a simple set opening just three ports but setting up some counters for all inbound DNS traffic, rate-limiting it, counting requests that exceed the rate-limit (we'll call those flooded) and logging flood traffic but rate-limiting that too so we don't DoS ourself with log traffic.
Reference Ruleset
flush ruleset
# ipv4 only firewall / table
table ip main {
# Base input table - drop by default
chain inbound {
type filter hook input priority 0; policy drop;
# Allow traffic from established and related packets, drop invalid
ct state vmap { established : accept, related : accept, invalid : drop }
# Allow loopback traffic
meta iifname lo accept
# Allow ssh, http (for letsencrypt) and 853 for DoT
tcp dport ssh accept
tcp dport http accept
tcp dport 853 jump inbound_dns
}
# Base outchain chain. Not required as default is accept
chain outbound {
type filter hook output priority 0; policy accept;
}
# Base forward chain - drop by default
chain forward {
type filter hook forward priority 0; policy drop;
}
# Regular chain for limiting, counting and logging DNS traffic
chain inbound_dns {
counter name counter_all_dns_packets
ct state new add @rate_meter_inbound_dns { ip saddr limit rate 30/minute burst 10 packets } accept
counter name counter_flooded_dns_packets
limit rate 6/minute log prefix "[nftables dns flood]"
}
# Counters and Maps
counter counter_all_dns_packets {
}
counter counter_flooded_dns_packets {
}
set rate_meter_inbound_dns {
type ipv4_addr
flags dynamic
timeout 10m
}
}
Walkthrough
flush ruleset
Flush any existing rules first.
Tables
table ip main {
Tables are namespaces or containers for chains and chains are containers for rules. Multiple tables can be used if necessary.
Here we define a new table with the arbitrary name "main". ip
defines
a table containing rules for IPv4 traffic. Possible values are ip, ip6,
inet, arp, bridge, netdev
(inet
captures both IPv4 and IPv6 traffic)
So, now we have a table, let's add some chains to it.
Chains
chain inbound {
type filter hook input priority 0; policy drop;
Here we create a "Base Chain" with the arbitrary name "inbound". Base chains (as opposed to "regular" chains) have a type, a hook and a priority.
The type can be either filter
, route
or nat
. We want to filter
IP traffic so the filter
type is used accordingly.
The hook will be familiar to users of iptables as this is one place where they
are the same (as they are both part of netfilter). The hooks for IPv4 and IPv6
are prerouting
, input
, forward
, output
,
postrouting
The priority is a signed integer (so negative values are allowed) e.g. 10, -100.
Rules
As iptables, we define rules within chains.
A rule consists of something to match written as an expression and a action statement to perform upon it.
I'll start by explaining a very simple rule in the ruleset.
meta iifname lo accept
The matches portion here is meta iifname lo
and accept
is the statement.
This is actually known as a "verdict statement". Here is a copy of the possible
verdict statements from the
wiki
-
accept
: Accept the packet and stop the remain rules evaluation -
drop
: Drop the packet and stop the remain rules evaluation -
queue
: Queue the packet to userspace and stop the remain rules evaluation -
continue
: Continue the ruleset evaluation with the next rule -
return
: Return from the current chain and continue at the next rule of the last chain. In a base chain it is equivalent to accept -
jump
: Continue at the first rule of . It will continue at the next rule after a return statement is issued -
goto
: Similar to jump, but after the new chain the evaluation will continue at the last chain instead of the one containing the goto statement
The match here (meta iifname lo
) is a meta type
match that is matching
information about the packet rather than the contents of it. So in this case,
match packets on the input interface lo (loopback) and accept it. As stated above,
accept
is a verdict statement. accept
is a type of statement sometimes referred to as
a "terminating" statement in nftables, as such, no more rules are
evaluated. Not all verdict statements are terminating.
tcp dport ssh accept
tcp dport http accept
Hardly worthy of explanation. These are tcp matches. Port numbers can be used instead of service names. Service names must match those in /etc/services
tcp dport 853 jump inbound_dns
Here the verdict statement jump
tells the rule evaluation to continue with
the rules in the named chain inbound_dns
. More on that chain later. This is similar to iptables's
-j
ct state vmap { established : accept, related : accept, invalid : drop }
This can look unfriendly at first but is actually very straight forward and very concise.
ct state
is the first part of the connection tracking
match and matches stateful traffic which is part of the conntrack (connection tracking) netfilter design.
The vmap
or verdict map, is a map containing expressions as keys and verdicts as values. Without the use of a vmap, the same one line rule becomes:
ct state established accept
ct state related accept
ct state invalid drop
A question of style ultimately, but I prefer the vmap one.
Rate-limiting, Counters and Logging
TCP traffic with a destination port of 853 we want to have evaluated in
it's own chain. This is defined by the syntax chain inbound_dns
. Nftables refers to
this type of chain as a "regular" chain and requires just an arbitrary name.
Essentially, regular chains are means of rule organization.
Let's look at the first rule in this chain:
counter name counter_all_dns_packets
This is a non-verdict and non-terminating statement. It increments a "named" counter which is defined later in the ruleset as:
counter counter_all_dns_packets
Therefor, every inbound 853 packet is counted. This is perfect for metrics and by using a named counter, we can do some interrogation from the command line, e.g.
$>nft list counter main counter_all_dns_packets
table ip main {
counter counter_all_dns_packets {
packets 4106 bytes 262200
}
Counters can also be dumped as JSON with the -j
flag. Perfect for putting in to something like DataDog DogStatsD and generating metrics.
$>nft -j list counter main counter_all_dns_packets
{"nftables": [{"metainfo": {"version": "0.9.8", "release_name": "E.D.S.", "json_schema_version": 1}}, {"counter": {"family": "ip", "name": "counter_all_dns_packets", "table": "main", "handle": 5, "packets": 4107, "bytes": 262264}}]}
Because this is a non-verdict and non-terminating statement, rule evaluation continues with:
ct state new add @rate_meter_inbound_dns { ip saddr limit rate 30/minute burst 10 packets } accept
This is obviously the most comprehensive and powerful rule in our ruleset. As you can probably see, we want to prevent a traffic-flooding DoS attack by rate-limiting inbound traffic but importantly, do this per source IP address.
30 packets are allowed per minute (per IP), with a burst-limit of 10 (see below for an explanation of this). All limits are reset after 10minutes. Furthermore, we also want to count the number of requests that break this rule (flood) and we want to log a selection of those requests.
Those coming from iptables will know that this is implemented as a hashlimit.
nftables uses dynamic maps and sets to keep state. Because this rate-limiting
rule tracks source IP addresses it is therefore dynamic. So, we match new tcp
connections and use a "named dynamic set" to store the source ip address
which forms part of the rule match. @
specifies the named set.
The named set is defined with this syntax:
set rate_meter_inbound_dns {
type ipv4_addr
flags dynamic
timeout 10m
The man page covers sets. In a nutshell, we create a set to store the IPv4 address as part of the rule, it's contents are dynamic of course, and any elements older than 10 minutes will be purged - this allows us to reset all rate-limiting limits every 10 minutes.
Because this is a named set, we can also easily interrogate it from the command line.
$>nft list set main rate_meter_inbound_dns
table ip main {
set rate_meter_inbound_dns {
type ipv4_addr
size 65535
flags dynamic,timeout
timeout 10m
elements = { 1.7.23.29 limit rate 30/minute burst 10 packets expires 9m45s892ms, 17.93.41.155 limit rate 30/minute burst 10 packets expires 9m23s708ms }
}
}
Lastly, because the verdict statement is accept
any packets that match our
rate-limit requirements are accepted and no further rules in the chain are evaluated. Any that do not (flood traffic), are evaluated by our final two rules:
counter name counter_flooded_dns_packets
Increment the named counter which tracks the number of flood packets (roughly corresponds to the number of DNS requests)
limit rate 6/minute log prefix "[nftables dns flood]"
Our final rule, again is only matched by flood traffic. This logs to the kernel log with the level WARN by default. To log every flood packet is asking to DoS ourselves, to avoid that we rate-limit the the logging to 6/minute. Note that the order in which we write the match is critical here. Writing the rule as...
log prefix "[nftables dns flood]" limit rate 6/minute
will not have the desired effect as the match to log is applied before the limit match thus every packet would be logged.
Understanding Burst and Rate-limit
This seems to be a source of confusion and misunderstanding-understanding, however it is actually fairly simple.
An easy example to understand is rate-limiting bandwidth. Assume a rate-limiting rule of 30MB/minute and a burst-limit of 2MB/second for a web server serving a website - this does not mean that the maximum download rate would be a constant 512KB/s (30MB/1m). It allows for a user to burst that limit, so for the first second of a new connection, the download rate could actually be a maximum of 2MB/s. This is ideal for websites where downloading a small about of initial content should be quick, but for sustained downloads, (which could quickly saturate bandwidth with many concurrent users) download speeds would be capped at approx 512KB/s after the initial burst.
I think of the rate-limit and burst like a stamina meter in a video game. Think about sprinting in something like Battlefield or Zelda. The burst-limit is the maximum size of the stamina gauge and the rate-limit is how quickly that burst-limit will recharge. If the timeout in the dynamic set is 10minutes, then the stamina gauge is recharged to full every 10 minutes.
Command Line Examples
nft -cf <filename>
- check syntax is valid without applying
nft -f <filename>
- load a ruleset
nft list ruleset
nft list counters
nft list sets
List a named table, counter or set:
nft list table main
nft list counter main counter_all_dns_packets
nft list set main rate_meter_inbound_dns
-j
dumps output as JSON