在做高并发性能测试的时候,虚拟机数量100+,但是在执行自动化脚本过程中,在连通OpenVPN的前提下,部分虚拟机是无法连通的,但是很奇怪的是都是同一段Network,而且之前进行测试的时候比较正常,可能中间做了其它相关的操作导致
场景,在节点机上调用API接口创建云主机,启动OpenVPN进程,进而连接私有网络,结果却是10.177.80.xx段能够连上,10.177.81.xx段无法连上
$ ping 10.177.81.124 PING 10.177.81.124 (10.177.81.124) 56(84) bytes of data. From 10.177.81.66 icmp_seq=1 Destination Host Unreachable From 10.177.81.66 icmp_seq=2 Destination Host Unreachable From 10.177.81.66 icmp_seq=3 Destination Host Unreachable ^C --- 10.177.81.124 ping statistics --- 5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms pipe 3 $ ping 10.177.80.29 PING 10.177.80.29 (10.177.80.29) 56(84) bytes of data. 64 bytes from 10.177.80.29: icmp_req=1 ttl=63 time=1.57 ms 64 bytes from 10.177.80.29: icmp_req=2 ttl=63 time=0.605 ms 64 bytes from 10.177.80.29: icmp_req=3 ttl=63 time=0.592 ms 64 bytes from 10.177.80.29: icmp_req=4 ttl=63 time=0.539 ms ^C --- 10.177.80.29 ping statistics ---
先查看下network以及CIDR
+--------------------------------------+------------------------------------------+-----------------------------------------------------+ | id | name | subnets | +--------------------------------------+------------------------------------------+-----------------------------------------------------+ | f00eff19-afd5-4be6-ae16-dcb51c0455f7 | private_10e5051a1cee4f8ebb0e8b5d877de581 | bc9fc957-2fdf-4663-9189-65fc9b0a559d 10.177.80.0/23 | +--------------------------------------+------------------------------------------+-----------------------------------------------------+
根据Network的CIDR可以看到,80和81段连通应该没有丝毫问题
查询subnet信息
$ neutron subnet-show bc9fc957-2fdf-4663-9189-65fc9b0a559d +------------------+------------------------------------------------------------------------------+ | Field | Value | +------------------+------------------------------------------------------------------------------+ | allocation_pools | {"start": "10.177.80.2", "end": "10.177.81.254"} | | cidr | 10.177.80.0/23 | | dns_nameservers | | | enable_dhcp | True | | enable_dns | True | | gateway_ip | 10.177.80.1 | | host_routes | {"destination": "10.177.8.0/22", "nexthop": "10.177.80.1", "order": 10} | | | {"destination": "10.177.82.0/23", "nexthop": "10.177.80.1", "order": 10} | | | {"destination": "169.254.169.254/32", "nexthop": "10.177.80.1", "order": 10} | | id | bc9fc957-2fdf-4663-9189-65fc9b0a559d | | ip_version | 4 | | name | private_10e5051a1cee4f8ebb0e8b5d877de581 | | network_id | f00eff19-afd5-4be6-ae16-dcb51c0455f7 | | tenant_id | 10e5051a1cee4f8ebb0e8b5d877de581 | +------------------+------------------------------------------------------------------------------+
连通网关10.177.80.1,结果是通的,这就太奇怪了
其实这时候我应该看下节点机路由的,但是并没有,傻了一下,去TAP设备上抓包
$ sudo tcpdump -i tap850f5c20-3b icmp -en tcpdump: WARNING: tap850f5c20-3b: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tap850f5c20-3b, link-type EN10MB (Ethernet), capture size 65535 bytes
依旧没有包,可见问题在节点机通过OpenVPN连通,然后包出去这个阶段,但是到网关是能够通的,所以再一次没放在心上
本地连接同样的VPN,结果发现81段是通的
lihui@MacBook ~ ping 10.177.81.124 PING 10.177.81.124 (10.177.81.124): 56 data bytes 64 bytes from 10.177.81.124: icmp_seq=0 ttl=63 time=1.701 ms 64 bytes from 10.177.81.124: icmp_seq=1 ttl=63 time=2.252 ms 64 bytes from 10.177.81.124: icmp_seq=2 ttl=63 time=1.430 ms 64 bytes from 10.177.81.124: icmp_seq=3 ttl=63 time=1.350 ms 64 bytes from 10.177.81.124: icmp_seq=4 ttl=63 time=1.778 ms ^C --- 10.177.81.124 ping statistics ---
于是乎再次傻了,以为我Router出现了啥异常,看了半天Neutron日志,也没发现啥
最后才想到既然是节点机上才出现81段没法连通,而且问题也出现在包刚出节点机这阶段,是否路由出现问题呢,于是乎查看了下节点机的路由规则,顿时心里千万头那什么马
$ ip r default via 115.236.124.1 dev eth2.101 10.160.252.0/22 via 10.177.0.223 dev eth0.100 10.177.0.0/22 dev eth0.100 proto kernel scope link src 10.177.0.39 10.177.4.0/22 dev eth0.101 proto kernel scope link src 10.177.4.39 10.177.8.0/22 dev eth0.102 proto kernel scope link src 10.177.8.39 10.177.12.0/22 dev eth0.103 proto kernel scope link src 10.177.12.39 10.177.80.0/23 via 10.177.82.1 dev tun0 10.177.81.0/24 dev tapc63e8890-67 proto kernel scope link src 10.177.81.66 10.177.81.0/24 dev tap1a7cb844-10 proto kernel scope link src 10.177.81.67 10.177.82.0/23 dev tun0 proto kernel scope link src 10.177.82.4 115.236.124.0/24 dev eth2.101 proto kernel scope link src 115.236.124.39
根据路由规则,到10.177.81.xx段都从两个tap设备出去了,查下具体的出路
$ ip r get 10.177.81.124 10.177.81.124 dev tapc63e8890-67 src 10.177.81.66 cache $ ip r get 10.177.80.29 10.177.80.29 via 10.177.82.1 dev tun0 src 10.177.82.3 cache
这就是为什么80段能通,而81段没法通的原因
走VPN隧道出去才是正解,具体可以看tun两端
请求端
$ ip a show tun0 2082: tun0: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100 link/none inet 10.177.82.3/23 brd 10.177.83.255 scope global tun0 valid_lft forever preferred_lft forever
接收端
$ sudo ip netns exec qrouter-7f95fc78-6964-49d5-8f60-79f795011c82 ip a show tun0 22: tun0: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100 link/none inet 10.177.82.1/23 scope global tun0 valid_lft forever preferred_lft forever
那两个tap设备是测试网络的时候,绑上节点机的,删除即可
$ sudo ovs-vsctl del-port tapc63e8890-67 $ sudo ovs-vsctl del-port tap1a7cb844-10 $ ip r default via 115.236.124.1 dev eth2.101 10.160.252.0/22 via 10.177.0.223 dev eth0.100 10.177.0.0/22 dev eth0.100 proto kernel scope link src 10.177.0.39 10.177.4.0/22 dev eth0.101 proto kernel scope link src 10.177.4.39 10.177.8.0/22 dev eth0.102 proto kernel scope link src 10.177.8.39 10.177.12.0/22 dev eth0.103 proto kernel scope link src 10.177.12.39 10.177.80.0/23 via 10.177.82.1 dev tun0 10.177.82.0/23 dev tun0 proto kernel scope link src 10.177.82.3 115.236.124.0/24 dev eth2.101 proto kernel scope link src 115.236.124.39
再验证下81段,连通性无误
$ ip r get 10.177.81.124 10.177.81.124 via 10.177.82.1 dev tun0 src 10.177.82.3 cache $ ping 10.177.81.124 PING 10.177.81.124 (10.177.81.124) 56(84) bytes of data. 64 bytes from 10.177.81.124: icmp_req=1 ttl=63 time=1.35 ms 64 bytes from 10.177.81.124: icmp_req=2 ttl=63 time=0.520 ms 64 bytes from 10.177.81.124: icmp_req=3 ttl=63 time=0.498 ms 64 bytes from 10.177.81.124: icmp_req=4 ttl=63 time=0.547 ms
OK,问题知道了,原因就是自己太傻