今天有一个LXC私有网络到外网不通,首先简单看了下L3 agent的绑定情况,是没有问题的,并且同租户其它KVM虚拟机私有网络都是无误的,更加确认了应该是LXC自身的问题
其实,一开始PORT的State为BUILD,我没有放在心上
查看namespace里,LXC里ping网关,根本收不到任何包
~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 48: tun0: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100 link/none inet 10.180.66.1/23 scope global tun0 valid_lft forever preferred_lft forever 44450: ha-fdba3c47-12: mtu 1400 qdisc htb state UNKNOWN group default qlen 1000 link/ether fa:16:3e:8d:06:9b brd ff:ff:ff:ff:ff:ff inet 10.180.64.10/23 brd 10.180.65.255 scope global ha-fdba3c47-12 valid_lft forever preferred_lft forever inet 10.180.64.1/23 scope global secondary ha-fdba3c47-12 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe8d:69b/64 scope link valid_lft forever preferred_lft forever 44451: qg-8e8268bf-02: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 00:16:3e:fe:08:5f brd ff:ff:ff:ff:ff:ff inet 169.254.8.95/18 brd 169.254.63.255 scope global qg-8e8268bf-02 valid_lft forever preferred_lft forever inet6 fe80::216:3eff:fefe:85f/64 scope link valid_lft forever preferred_lft forever ~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 tcpdump -i ha-fdba3c47-12 icmp -en tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ha-fdba3c47-12, link-type EN10MB (Ethernet), capture size 262144 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel
从namespace里ping网关,是好的
~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 ping 10.180.64.1 PING 10.180.64.1 (10.180.64.1) 56(84) bytes of data. 64 bytes from 10.180.64.1: icmp_req=1 ttl=64 time=0.058 ms ^C --- 10.180.64.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
这么更肯定了是LXC内部到网关的问题,查看TAP设备也是没有任何包
~$ sudo tcpdump -i tap0aaa1d9c-99 icmp -en ^C
查看一下ovs上该PORT的状况
sudo ovs-vsctl show | less
但是令人惊讶的是,关注的tap设备居然没有tag
Port "tap0aaa1d9c-99" Interface "tap0aaa1d9c-99"
此时就只有查看下neutron-server日志,看到了RPC超时
~$ grep 0aaa1d9c-99da-4c72-88d0-c81c85a9af30 /data//log/neutron/neutron-openvswitch-agent.log --color 省略…… 2016-09-18 14:57:53.674 32685 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent DeviceListRetrievalError: Unable to retrieve port details for devices: [u'169c8bbe-fc88-49a2-82d6-111ef018daf4', u'3c1e1c4a-3889-49be-8a62-8acfcb6e2f01', u'0aaa1d9c-99da-4c72-88d0-c81c85a9af30', u'c214fc37-3b5b-4ac4-8eba-782d32d446e2', u'e2272d3f-d95f-422e-b9c2-d0f37229f3cd'] because of error: Timeout while waiting on RPC response - topic: "q-plugin", RPC method: "get_devices_details_list" info: “unknown"
也就是RPC超时导致没刷上去
此时解决方法就是重启下该节点的ovs进程
sudo service neutron-plugin-openvswitch-agent restart
刷好了之后查询一下,tag有了
~$ sudo ovs-vsctl list port tap0aaa1d9c-99 _uuid : 16ad8807-c9c0-4718-8bc5-aacc2b954c9b bond_downdelay : 0 bond_fake_iface : false bond_mode : [] bond_updelay : 0 external_ids : {} fake_bridge : false interfaces : [d80d3d57-6cfc-46e9-9b57-5fe66648ee7b] lacp : [] mac : [] name : "tap0aaa1d9c-99" other_config : {} qos : [] statistics : {} status : {} tag : 1 trunks : [] vlan_mode : []
从ovs查询的结果也正确了
Port "tap0aaa1d9c-99" tag: 1 Interface "tap0aaa1d9c-99"
此时私有网络就通了
/# ping www.baidu.com PING www.a.shifen.com (115.239.210.27) 56(84) bytes of data. 64 bytes from 115.239.210.27: icmp_req=1 ttl=56 time=1.64 ms 64 bytes from 115.239.210.27: icmp_req=2 ttl=56 time=1.37 ms 64 bytes from 115.239.210.27: icmp_req=3 ttl=56 time=1.31 ms 64 bytes from 115.239.210.27: icmp_req=4 ttl=56 time=1.33 ms 64 bytes from 115.239.210.27: icmp_req=5 ttl=56 time=1.37 ms 64 bytes from 115.239.210.27: icmp_req=6 ttl=56 time=1.52 ms
顺便备注一下LXC的进入方法:
sudo virsh -c lxc:/// lxc-enter-namespace 2902f3c8-4d13-490d-97c4-9121cfb28008 --noseclabel /bin/bash