通过OpenVPN私有网络不同段连通性异常分析

在做高并发性能测试的时候,虚拟机数量100+,但是在执行自动化脚本过程中,在连通OpenVPN的前提下,部分虚拟机是无法连通的,但是很奇怪的是都是同一段Network,而且之前进行测试的时候比较正常,可能中间做了其它相关的操作导致

场景,在节点机上调用API接口创建云主机,启动OpenVPN进程,进而连接私有网络,结果却是10.177.80.xx段能够连上,10.177.81.xx段无法连上

$ ping 10.177.81.124
PING 10.177.81.124 (10.177.81.124) 56(84) bytes of data.
From 10.177.81.66 icmp_seq=1 Destination Host Unreachable
From 10.177.81.66 icmp_seq=2 Destination Host Unreachable
From 10.177.81.66 icmp_seq=3 Destination Host Unreachable
^C
--- 10.177.81.124 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4024ms
pipe 3
$ ping 10.177.80.29
PING 10.177.80.29 (10.177.80.29) 56(84) bytes of data.
64 bytes from 10.177.80.29: icmp_req=1 ttl=63 time=1.57 ms
64 bytes from 10.177.80.29: icmp_req=2 ttl=63 time=0.605 ms
64 bytes from 10.177.80.29: icmp_req=3 ttl=63 time=0.592 ms
64 bytes from 10.177.80.29: icmp_req=4 ttl=63 time=0.539 ms
^C
--- 10.177.80.29 ping statistics ---

先查看下network以及CIDR

+--------------------------------------+------------------------------------------+-----------------------------------------------------+
| id                                   | name                                     | subnets                                             |
+--------------------------------------+------------------------------------------+-----------------------------------------------------+
| f00eff19-afd5-4be6-ae16-dcb51c0455f7 | private_10e5051a1cee4f8ebb0e8b5d877de581 | bc9fc957-2fdf-4663-9189-65fc9b0a559d 10.177.80.0/23 |
+--------------------------------------+------------------------------------------+-----------------------------------------------------+

根据Network的CIDR可以看到,80和81段连通应该没有丝毫问题

查询subnet信息

$ neutron subnet-show bc9fc957-2fdf-4663-9189-65fc9b0a559d
+------------------+------------------------------------------------------------------------------+
| Field            | Value                                                                        |
+------------------+------------------------------------------------------------------------------+
| allocation_pools | {"start": "10.177.80.2", "end": "10.177.81.254"}                             |
| cidr             | 10.177.80.0/23                                                               |
| dns_nameservers  |                                                                              |
| enable_dhcp      | True                                                                         |
| enable_dns       | True                                                                         |
| gateway_ip       | 10.177.80.1                                                                  |
| host_routes      | {"destination": "10.177.8.0/22", "nexthop": "10.177.80.1", "order": 10}      |
|                  | {"destination": "10.177.82.0/23", "nexthop": "10.177.80.1", "order": 10}     |
|                  | {"destination": "169.254.169.254/32", "nexthop": "10.177.80.1", "order": 10} |
| id               | bc9fc957-2fdf-4663-9189-65fc9b0a559d                                         |
| ip_version       | 4                                                                            |
| name             | private_10e5051a1cee4f8ebb0e8b5d877de581                                     |
| network_id       | f00eff19-afd5-4be6-ae16-dcb51c0455f7                                         |
| tenant_id        | 10e5051a1cee4f8ebb0e8b5d877de581                                             |
+------------------+------------------------------------------------------------------------------+

连通网关10.177.80.1,结果是通的,这就太奇怪了

其实这时候我应该看下节点机路由的,但是并没有,傻了一下,去TAP设备上抓包

$ sudo tcpdump -i tap850f5c20-3b icmp -en
tcpdump: WARNING: tap850f5c20-3b: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap850f5c20-3b, link-type EN10MB (Ethernet), capture size 65535 bytes

依旧没有包,可见问题在节点机通过OpenVPN连通,然后包出去这个阶段,但是到网关是能够通的,所以再一次没放在心上

本地连接同样的VPN,结果发现81段是通的

lihui@MacBook  ~  ping 10.177.81.124
PING 10.177.81.124 (10.177.81.124): 56 data bytes
64 bytes from 10.177.81.124: icmp_seq=0 ttl=63 time=1.701 ms
64 bytes from 10.177.81.124: icmp_seq=1 ttl=63 time=2.252 ms
64 bytes from 10.177.81.124: icmp_seq=2 ttl=63 time=1.430 ms
64 bytes from 10.177.81.124: icmp_seq=3 ttl=63 time=1.350 ms
64 bytes from 10.177.81.124: icmp_seq=4 ttl=63 time=1.778 ms
^C
--- 10.177.81.124 ping statistics ---

于是乎再次傻了,以为我Router出现了啥异常,看了半天Neutron日志,也没发现啥

最后才想到既然是节点机上才出现81段没法连通,而且问题也出现在包刚出节点机这阶段,是否路由出现问题呢,于是乎查看了下节点机的路由规则,顿时心里千万头那什么马

$ ip r
default via 115.236.124.1 dev eth2.101
10.160.252.0/22 via 10.177.0.223 dev eth0.100
10.177.0.0/22 dev eth0.100  proto kernel  scope link  src 10.177.0.39
10.177.4.0/22 dev eth0.101  proto kernel  scope link  src 10.177.4.39
10.177.8.0/22 dev eth0.102  proto kernel  scope link  src 10.177.8.39
10.177.12.0/22 dev eth0.103  proto kernel  scope link  src 10.177.12.39
10.177.80.0/23 via 10.177.82.1 dev tun0
10.177.81.0/24 dev tapc63e8890-67  proto kernel  scope link  src 10.177.81.66
10.177.81.0/24 dev tap1a7cb844-10  proto kernel  scope link  src 10.177.81.67
10.177.82.0/23 dev tun0  proto kernel  scope link  src 10.177.82.4
115.236.124.0/24 dev eth2.101  proto kernel  scope link  src 115.236.124.39

根据路由规则,到10.177.81.xx段都从两个tap设备出去了,查下具体的出路

$ ip r get 10.177.81.124
10.177.81.124 dev tapc63e8890-67  src 10.177.81.66
    cache

$ ip r get 10.177.80.29
10.177.80.29 via 10.177.82.1 dev tun0  src 10.177.82.3
    cache

这就是为什么80段能通,而81段没法通的原因

走VPN隧道出去才是正解,具体可以看tun两端

请求端

$ ip a show tun0
2082: tun0:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
    link/none
    inet 10.177.82.3/23 brd 10.177.83.255 scope global tun0
       valid_lft forever preferred_lft forever

 接收端

$ sudo ip netns exec qrouter-7f95fc78-6964-49d5-8f60-79f795011c82 ip a show tun0
22: tun0:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
    link/none
    inet 10.177.82.1/23 scope global tun0
       valid_lft forever preferred_lft forever

那两个tap设备是测试网络的时候,绑上节点机的,删除即可

$ sudo ovs-vsctl del-port tapc63e8890-67
$ sudo ovs-vsctl del-port tap1a7cb844-10
$ ip r
default via 115.236.124.1 dev eth2.101
10.160.252.0/22 via 10.177.0.223 dev eth0.100
10.177.0.0/22 dev eth0.100  proto kernel  scope link  src 10.177.0.39
10.177.4.0/22 dev eth0.101  proto kernel  scope link  src 10.177.4.39
10.177.8.0/22 dev eth0.102  proto kernel  scope link  src 10.177.8.39
10.177.12.0/22 dev eth0.103  proto kernel  scope link  src 10.177.12.39
10.177.80.0/23 via 10.177.82.1 dev tun0
10.177.82.0/23 dev tun0  proto kernel  scope link  src 10.177.82.3
115.236.124.0/24 dev eth2.101  proto kernel  scope link  src 115.236.124.39

再验证下81段,连通性无误

$ ip r get 10.177.81.124
10.177.81.124 via 10.177.82.1 dev tun0  src 10.177.82.3
    cache
$ ping 10.177.81.124
PING 10.177.81.124 (10.177.81.124) 56(84) bytes of data.
64 bytes from 10.177.81.124: icmp_req=1 ttl=63 time=1.35 ms
64 bytes from 10.177.81.124: icmp_req=2 ttl=63 time=0.520 ms
64 bytes from 10.177.81.124: icmp_req=3 ttl=63 time=0.498 ms
64 bytes from 10.177.81.124: icmp_req=4 ttl=63 time=0.547 ms

 OK,问题知道了,原因就是自己太傻

发表回复