Any network device, be it a transit router or a host, usually has multiple IP addresses assigned to its interfaces. One of the first things we learn as network engineers is how to determine which IP address is used for the locally-sourced traffic. However, the default scenario can be changed in a couple of different ways and this post is a brief documentation of the available options.
The Default Scenario
Whenever a local application decides to connect to a remote network endpoint, it creates a network socket, providing a minimal amount of details required to build and send a network packet. Most often, this information includes a destination IP and port number as you can see from the following abbreviated output:
$ strace -e trace=network curl http://example.com
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
connect(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16)
While this output does not show the DNS resolution part (due to getaddrinfo()
not being a syscall), we can see that the only user-specific input information provided by an application (curl
) in the connect()
syscall are the remote socket port sin_port
and IP address sin_adddr
.
What happens next is what we all learned to expect from any operating system, not just Linux:
- Destination IP is looked up in the local routing table.
- The resulting route is used to determine the egress interface.
- The IP of that interface is assigned as the source address for the TCP socket.
This is a sane default that picks an IP address that is most likely to reach the destination, since it’s assigned to an egress interface.
User-provided IP
In some scenarios, when multiple local IPs are reachable outside of the host, users may want to override the default behaviour. A very common use case is to specify an IP address (or interface name) as the traffic source. The following strace
output looks exactly the same as above, with one notable exception:
$ strace -e trace=network curl --interface lo http://example.com
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
setsockopt(5, SOL_SOCKET, SO_BINDTODEVICE, "lo\0", 3) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16)
The setsockopt()
syscall allows clients to bind to a specific interface name using the SO_BINDTODEVICE
option.
Another alternative would be bind()
the client socket to a specific IP address (192.0.2.2
is one of the IPs on lo
interface), which is what curl
does in the following case:
$ strace -e trace=network curl --interface 192.0.2.2 http://example.com
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
setsockopt(5, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
setsockopt(5, SOL_SOCKET, SO_BINDTODEVICE, "192.0.2.2\0", 10) = -1 ENODEV (No such device)
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.0.2.2")}, 16) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16)
The problem with the above options is that they are application-specific and, thus, require explicit user configuration. While this may work for a small number of applications, in some scenarios it may be easier to have a global setting that would influence this behaviour.
Netlink Route Source IP
Another available option, that is frequently used on L3 multi-homed network hosts, is the rtnetlink’s src
option or RTA_PREFSRC
. Continuing from the previous example, let’s add a static route for the example.com
and specify the src
option with the loopback IP:
$ ip route add 93.184.216.34 via 172.20.20.1 src 192.0.2.2
$ ip route get 93.184.216.34
93.184.216.34 via 172.20.20.1 dev eth0 src 192.0.2.2 uid 0
Now we can re-run the original curl
command without specifying the source IP:
$ tcpdump -enni eth0 host 93.184.216.34 &
$ strace -e trace=network curl http://example.com
...
connect(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("93.184.216.34")}, 16)
14:19:00.970631 IP 192.0.2.2.33068 > 93.184.216.34.80: Flags [S]
The resulting packet source IP has been changed by the kernel to the IP specified in the ip route add
command above. This option can also be configured by an IP routing daemon, for example, FRR’s route-map set src
command or Bird’s krt_prefsrc
configuration option.