iptables SNAT

Neutron里私有网络想通到外网,就在租户network namespace里做了iptables的NAT,这部分是十分必要弄清楚的,网络的一切都是抓包决定,做一个小测试

先创建一台VM,但是只有私有网络,也就是内网

接着创建一个namespace

lihui@2016 ~# ip netns add lihui_namespace
lihui@2016 ~# ip netns list

创建一对peer网卡,veth1作为namespace的network interface,veth0则作为vm的network interface

namespace里:

lihui@2016 ~# ip link add veth0 type veth peer name veth1
lihui@2016 ~# ip link set veth1 netns lihui_namespace
lihui@2016 ~#
lihui@2016 ~# ip netns exec lihui_namespace ip a
1: lo:  mtu 65536 qdisc noop state DOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth1:  mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether f2:c1:57:ad:c1:2f brd ff:ff:ff:ff:ff:ff
lihui@2016 ~# ip netns exec lihui_namespace ifconfig veth1 1.1.1.2 netmask 255.255.255.0 up
lihui@2016 ~# ip netns exec lihui_namespace ip a
1: lo:  mtu 65536 qdisc noop state DOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth1:  mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether f2:c1:57:ad:c1:2f brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.2/24 brd 1.1.1.255 scope global veth1
       valid_lft forever preferred_lft forever

VM里:

lihui@2016 ~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:b0:1f:86 brd ff:ff:ff:ff:ff:ff
    inet 192.168.38.116/21 brd 192.168.39.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb0:1f86/64 scope link
       valid_lft forever preferred_lft forever
4: veth0:  mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 7e:62:95:c8:a9:f8 brd ff:ff:ff:ff:ff:ff
lihui@2016 ~#
lihui@2016 ~# ifconfig veth0 1.1.1.3 netmask 255.255.255.0 up
lihui@2016 ~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:b0:1f:86 brd ff:ff:ff:ff:ff:ff
    inet 192.168.38.116/21 brd 192.168.39.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb0:1f86/64 scope link
       valid_lft forever preferred_lft forever
4: veth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 7e:62:95:c8:a9:f8 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.3/24 brd 1.1.1.255 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7c62:95ff:fec8:a9f8/64 scope link
       valid_lft forever preferred_lft forever

由于是veth一对peer设备,因此相互肯定互通

lihui@2016 ~# ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_req=1 ttl=64 time=0.053 ms
64 bytes from 1.1.1.2: icmp_req=2 ttl=64 time=0.043 ms
^C
--- 1.1.1.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.043/0.048/0.053/0.005 ms
lihui@2016 ~# ip netns exec lihui_namespace ping 1.1.1.3
PING 1.1.1.3 (1.1.1.3) 56(84) bytes of data.
64 bytes from 1.1.1.3: icmp_req=1 ttl=64 time=0.028 ms
64 bytes from 1.1.1.3: icmp_req=2 ttl=64 time=0.046 ms
^C
--- 1.1.1.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.028/0.037/0.046/0.009 ms

此时在namespace里连通外网试试,发现行不通

lihui@2016 ~# ip netns exec lihui_namespace ping 114.114.114.114
connect: Network is unreachable

需要加一个网关,这里就直接以veth0作为网关,包就能够出来了

lihui@2016 ~# ip netns exec lihui_namespace route add default gw 1.1.1.3
lihui@2016 ~# ip netns exec lihui_namespace ip r
default via 1.1.1.3 dev veth1
1.1.1.0/24 dev veth1  proto kernel  scope link  src 1.1.1.2
lihui@2016 ~# ip netns exec lihui_namespace ping 114.114.114.114
PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
^C
--- 114.114.114.114 ping statistics ---
11 packets transmitted, 0 received, 100% packet loss, time 10078ms

但是不在同一网段,包还是没法出去,这时候注意到VM里eth0是可以出去的(已知,通过NAT),因此,这里可以在VM里通过iptables添加一条SNAT规则,将namespace里发出去的包,SRC IPADDR修改成eth0的IPADDR,这样包就能转发出去

lihui@2016 ~# iptables -t nat -A POSTROUTING -s 1.1.1.0/24 -j SNAT --to-source 192.168.38.116
lihui@2016 ~# iptables -t nat -vnL
Chain PREROUTING (policy ACCEPT 2 packets, 80 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 SNAT       all  --  *      *       1.1.1.0/24           0.0.0.0/0            to:192.168.38.116

但是包还是没法转发出去

lihui@2016 ~# ip netns exec lihui_namespace ping 114.114.114.114
PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
^C
--- 114.114.114.114 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms

原因就是Linux内核里IP Forwarding默认是不打开的,因此需要修改配置下

lihui@2016 ~# cat /proc/sys/net/ipv4/ip_forward
0
lihui@2016 ~# echo 1 > /proc/sys/net/ipv4/ip_forward

再次试试,包就转发出去了

lihui@2016 ~# ip netns exec lihui_namespace ping 114.114.114.114
PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
64 bytes from 114.114.114.114: icmp_req=1 ttl=78 time=5.48 ms
64 bytes from 114.114.114.114: icmp_req=2 ttl=80 time=5.18 ms
64 bytes from 114.114.114.114: icmp_req=3 ttl=79 time=5.22 ms
^C
--- 114.114.114.114 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.181/5.294/5.480/0.145 ms

下面通过veth1,veth0,eth0三个地方抓包来确认这个流程

namespace里veth1

lihui@2016 ~# ip netns exec lihui_namespace tcpdump -i veth1 icmp -en
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), capture size 65535 bytes
02:23:40.955006 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 1, length 64
02:23:40.960659 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 1, length 64
02:23:41.956927 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 2, length 64
02:23:41.962097 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 2, length 64
02:23:42.958418 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 3, length 64
02:23:42.963529 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 3, length 64

网关抓包

lihui@2016 ~# tcpdump -i veth0 icmp -en
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), capture size 65535 bytes
02:23:40.955012 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 1, length 64
02:23:40.960656 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 1, length 64
02:23:41.956936 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 2, length 64
02:23:41.962095 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 2, length 64
02:23:42.958427 f2:c1:57:ad:c1:2f > 7e:62:95:c8:a9:f8, ethertype IPv4 (0x0800), length 98: 1.1.1.2 > 114.114.114.114: ICMP echo request, id 18231, seq 3, length 64
02:23:42.963527 7e:62:95:c8:a9:f8 > f2:c1:57:ad:c1:2f, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 1.1.1.2: ICMP echo reply, id 18231, seq 3, length 64

eth0抓包,可见连接114.114.114.114的包,src ip都变成了eth0的ip地址

lihui@2016 ~# tcpdump -i eth0 icmp -en
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
02:23:40.955030 fa:16:3e:b0:1f:86 > fa:16:3e:92:36:49, ethertype IPv4 (0x0800), length 98: 192.168.38.116 > 114.114.114.114: ICMP echo request, id 18231, seq 1, length 64
02:23:40.960647 fa:16:3e:92:36:49 > fa:16:3e:b0:1f:86, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 192.168.38.116: ICMP echo reply, id 18231, seq 1, length 64
02:23:41.956957 fa:16:3e:b0:1f:86 > fa:16:3e:92:36:49, ethertype IPv4 (0x0800), length 98: 192.168.38.116 > 114.114.114.114: ICMP echo request, id 18231, seq 2, length 64
02:23:41.962084 fa:16:3e:92:36:49 > fa:16:3e:b0:1f:86, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 192.168.38.116: ICMP echo reply, id 18231, seq 2, length 64
02:23:42.958447 fa:16:3e:b0:1f:86 > fa:16:3e:92:36:49, ethertype IPv4 (0x0800), length 98: 192.168.38.116 > 114.114.114.114: ICMP echo request, id 18231, seq 3, length 64
02:23:42.963516 fa:16:3e:92:36:49 > fa:16:3e:b0:1f:86, ethertype IPv4 (0x0800), length 98: 114.114.114.114 > 192.168.38.116: ICMP echo reply, id 18231, seq 3, length 64

在进入路由层面的route之后,出本地的网络栈之前,改写源地址,目标地址不变,并在本机建立NAT表项,当数据返回时,根据NAT表将目的地址数据改写为数据发送出去时候的源地址,并发送给主机。解决内网用户用同一个公网地址上网的问题

当一个数据包进入linux系统以后,首先进入mangle表的prerouting链,进行某些预路由的修改(也可能不改),然后数据包进入nat表的 prerouting链,进行dnat之类(改变数据包的目的地址,比如我们所说的网关,比如从外网返回的数据包并不知道是内网的哪台机器需要这个数据包,都发给了网关的外网地址,而网关就要把这些数据包的目的地址改为正确的目的地址,而不是自己)的事情,然后进行判断这个数据包是发给这台计算机自身的还是仅仅需要转发的。如果是转发,就发送给mangle表的FORWARD链,进行一些参数修改(比如tos什么的参数)或者不修改,然后送给 filter表的forward链进行过滤(就是通常所说的转发过滤规则),然后送给mangle表的postrouting链进行进一步的参数修改(或者不修改),然后发给nat表的postrouting链修改(或者不修改)源地址(比如网关这个时候会把本来发自内网ip的数据包的源地址改为自己的外网IP,这样发送出去后,外面的主机就会以为这是网关发出的数据包了),然后发给网卡设备发送到网上

MASQUERADE,是SNAT的一种特殊形式,适用于像adsl这种临时会变的

上面iptables规则也可以写成

iptables -t nat -A POSTROUTING -s 1.1.1.0/24 -o eth0 -j MASQUERADE

 

发表回复