今天有一个LXC私有网络到外网不通,首先简单看了下L3 agent的绑定情况,是没有问题的,并且同租户其它KVM虚拟机私有网络都是无误的,更加确认了应该是LXC自身的问题
其实,一开始PORT的State为BUILD,我没有放在心上
查看namespace里,LXC里ping网关,根本收不到任何包
~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
48: tun0: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
link/none
inet 10.180.66.1/23 scope global tun0
valid_lft forever preferred_lft forever
44450: ha-fdba3c47-12: mtu 1400 qdisc htb state UNKNOWN group default qlen 1000
link/ether fa:16:3e:8d:06:9b brd ff:ff:ff:ff:ff:ff
inet 10.180.64.10/23 brd 10.180.65.255 scope global ha-fdba3c47-12
valid_lft forever preferred_lft forever
inet 10.180.64.1/23 scope global secondary ha-fdba3c47-12
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe8d:69b/64 scope link
valid_lft forever preferred_lft forever
44451: qg-8e8268bf-02: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 00:16:3e:fe:08:5f brd ff:ff:ff:ff:ff:ff
inet 169.254.8.95/18 brd 169.254.63.255 scope global qg-8e8268bf-02
valid_lft forever preferred_lft forever
inet6 fe80::216:3eff:fefe:85f/64 scope link
valid_lft forever preferred_lft forever
~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 tcpdump -i ha-fdba3c47-12 icmp -en
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ha-fdba3c47-12, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
从namespace里ping网关,是好的
~$ sudo ip netns exec qrouter-8e8268bf-0202-4401-8a20-70d118791451 ping 10.180.64.1 PING 10.180.64.1 (10.180.64.1) 56(84) bytes of data. 64 bytes from 10.180.64.1: icmp_req=1 ttl=64 time=0.058 ms ^C --- 10.180.64.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
这么更肯定了是LXC内部到网关的问题,查看TAP设备也是没有任何包
~$ sudo tcpdump -i tap0aaa1d9c-99 icmp -en ^C
查看一下ovs上该PORT的状况
sudo ovs-vsctl show | less
但是令人惊讶的是,关注的tap设备居然没有tag
Port "tap0aaa1d9c-99"
Interface "tap0aaa1d9c-99"
此时就只有查看下neutron-server日志,看到了RPC超时
~$ grep 0aaa1d9c-99da-4c72-88d0-c81c85a9af30 /data//log/neutron/neutron-openvswitch-agent.log --color 省略…… 2016-09-18 14:57:53.674 32685 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent DeviceListRetrievalError: Unable to retrieve port details for devices: [u'169c8bbe-fc88-49a2-82d6-111ef018daf4', u'3c1e1c4a-3889-49be-8a62-8acfcb6e2f01', u'0aaa1d9c-99da-4c72-88d0-c81c85a9af30', u'c214fc37-3b5b-4ac4-8eba-782d32d446e2', u'e2272d3f-d95f-422e-b9c2-d0f37229f3cd'] because of error: Timeout while waiting on RPC response - topic: "q-plugin", RPC method: "get_devices_details_list" info: “unknown"
也就是RPC超时导致没刷上去
此时解决方法就是重启下该节点的ovs进程
sudo service neutron-plugin-openvswitch-agent restart
刷好了之后查询一下,tag有了
~$ sudo ovs-vsctl list port tap0aaa1d9c-99
_uuid : 16ad8807-c9c0-4718-8bc5-aacc2b954c9b
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
external_ids : {}
fake_bridge : false
interfaces : [d80d3d57-6cfc-46e9-9b57-5fe66648ee7b]
lacp : []
mac : []
name : "tap0aaa1d9c-99"
other_config : {}
qos : []
statistics : {}
status : {}
tag : 1
trunks : []
vlan_mode : []
从ovs查询的结果也正确了
Port "tap0aaa1d9c-99"
tag: 1
Interface "tap0aaa1d9c-99"
此时私有网络就通了
/# ping www.baidu.com PING www.a.shifen.com (115.239.210.27) 56(84) bytes of data. 64 bytes from 115.239.210.27: icmp_req=1 ttl=56 time=1.64 ms 64 bytes from 115.239.210.27: icmp_req=2 ttl=56 time=1.37 ms 64 bytes from 115.239.210.27: icmp_req=3 ttl=56 time=1.31 ms 64 bytes from 115.239.210.27: icmp_req=4 ttl=56 time=1.33 ms 64 bytes from 115.239.210.27: icmp_req=5 ttl=56 time=1.37 ms 64 bytes from 115.239.210.27: icmp_req=6 ttl=56 time=1.52 ms
顺便备注一下LXC的进入方法:
sudo virsh -c lxc:/// lxc-enter-namespace 2902f3c8-4d13-490d-97c4-9121cfb28008 --noseclabel /bin/bash
