在测试云网络OVS安全组的过程中,遇到了一些奇怪的问题,其中最神奇的当属下面一个
简单说下基于OpenvSwitch的安全组,实质上就是一个firewall,只不过是基于ovs的,也就是针对PORT的,但是也分ingress和egress,这里可以对应iptables里的INPUT和OUTPUT链,需要切记的是ingress的限制是白名单,也就是添加一条规则,表明了允许该规则的包能够进来,而egress的限制是黑名单,也就是默认出方向畅通无阻,但该方向安全组规则是不让满足规则的包出去,与此同时对应的四元组也有所不同,对于IP Addr来说,ingress针对的是src ipaddr,egress针对的是dst ipaddr,而对于PORT来说,ingress和egress针对的永远是dst port,因此可以看到,假如在测试当中,源和目的两方,假如security group不做任何修改,入方向包就全部丢掉了,所以ingress方向需要首先全部放行,再来逐一验证安全组规则
大致原理内容简述完了,下面是一个神奇的BUG
测试的具体用例就是egress方向端口的屏蔽,上面已经说过了,egress方向默认全开放,指定的端口就是要被屏蔽的,因此这里通过iperf工具来进行流量测试,这里只需要知道一点,iperf的server端默认监听端口是5001
A:安全组规则egress方向指定tcp协议,端口5001,经过测试流量的确被过滤掉了
B:安全组规则egress方向指定tcp协议,端口5002,经过测试流量正常发送和接收,没被过滤掉,大概长这样
$ neutron security-group-show 61e43f12-12ec-48a1-b2d5-c7392290978c +----------------------+--------------------------------------------------------------------+ | Field | Value | +----------------------+--------------------------------------------------------------------+ | description | | | id | 61e43f12-12ec-48a1-b2d5-c7392290978c | | name | group-egress-tcp-115.236.127.223-5002 | | security_group_rules | { | | | "remote_group_id": null, | | | "direction": "egress", | | | "remote_ip_prefix": "115.236.127.223/32", | | | "protocol": "tcp", | | | "tenant_id": "10e5051a1cee4f8ebb0e8b5d877de581", | | | "port_range_max": 5002, | | | "security_group_id": "61e43f12-12ec-48a1-b2d5-c7392290978c", | | | "port_range_min": 5002, | | | "ethertype": "IPv4", | | | "id": "f2994944-958b-4f06-aed4-4f3dccf903fc" | | | } | | tenant_id | 10e5051a1cee4f8ebb0e8b5d877de581 | +----------------------+--------------------------------------------------------------------+ ~# iperf -c 115.236.127.223 -t 10000 -i 1 ------------------------------------------------------------ Client connecting to 115.236.127.223, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 115.236.127.222 port 56948 connected with 115.236.127.223 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 57.4 MBytes 481 Mbits/sec [ 3] 1.0- 2.0 sec 57.2 MBytes 480 Mbits/sec [ 3] 2.0- 3.0 sec 56.9 MBytes 477 Mbits/sec [ 3] 3.0- 4.0 sec 57.1 MBytes 479 Mbits/sec [ 3] 4.0- 5.0 sec 57.2 MBytes 480 Mbits/sec [ 3] 5.0- 6.0 sec 56.9 MBytes 477 Mbits/sec [ 3] 6.0- 7.0 sec 57.4 MBytes 481 Mbits/sec [ 3] 7.0- 8.0 sec 57.6 MBytes 483 Mbits/sec ~# iperf -s -i 1 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 115.236.127.223 port 5001 connected with 115.236.127.222 port 56948 [ ID] Interval Transfer Bandwidth [ 4] 0.0- 1.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 1.0- 2.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 2.0- 3.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 3.0- 4.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 4.0- 5.0 sec 57.0 MBytes 479 Mbits/sec [ 4] 5.0- 6.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 6.0- 7.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 7.0- 8.0 sec 57.0 MBytes 479 Mbits/sec
C:安全组规则egress方向指定tcp协议,端口5000,经过测试流量被过滤掉了,居然被丢掉了!
$ neutron security-group-show a762e5d7-d3c1-490b-92e2-ffe844a80650 +----------------------+--------------------------------------------------------------------+ | Field | Value | +----------------------+--------------------------------------------------------------------+ | description | | | id | a762e5d7-d3c1-490b-92e2-ffe844a80650 | | name | group-egress-tcp-115.236.127.223-5000 | | security_group_rules | { | | | "remote_group_id": null, | | | "direction": "egress", | | | "remote_ip_prefix": "115.236.127.223/32", | | | "protocol": "tcp", | | | "tenant_id": "10e5051a1cee4f8ebb0e8b5d877de581", | | | "port_range_max": 5000, | | | "security_group_id": "a762e5d7-d3c1-490b-92e2-ffe844a80650", | | | "port_range_min": 5000, | | | "ethertype": "IPv4", | | | "id": "5edefe65-48da-4b52-b257-dd5b147ef1a6" | | | } | | tenant_id | 10e5051a1cee4f8ebb0e8b5d877de581 | +----------------------+--------------------------------------------------------------------+ ~# iperf -c 115.236.127.223 -t 10000 -i 1 ^C
上面的操作,iperf的client和server端都没有做任何变动,修改的只是neutron port的security group规则,理论上指定了5001端口,才会被过滤,5000和5002理应都不会丢掉流量,可见5000也丢掉是一个奇怪的问题,以为有其他规则影响,查找原因足足测试了3次!!终于确认这是一个BUG
至于测试过程中为什么偏偏选个5000和5002,就近原则吧,当然random一个也可以,不过作为一般边界值测试容易出问题来说,靠近5001两边的值出问题的可能性也不小
至于这个问题,在社区也能够找到这个BUG,可惜这功能咱们上得太晚了,不然给社区提这个BUG的说不定就是我了
https://bugs.launchpad.net/neutron/+bug/1611991
这哥们测试用例应该和我一致,只不过端口号他选的是22和23,没有写ingress还是egress应该是默认的ingress,只不过只设置了22端口,结果23端口也生效了
Seen on master devstack, ubuntu xenial. Steps to reproduce: 1. Enable ovs firewall in /etc/neutron/plugins/ml2/ml2.conf [securitygroup] firewall_driver = openvswitch 2. Create a security group with icmp, tcp to 22. 3. Boot a VM, assign a floating ip. 4. Check that port 23 can be accessed via tcp (telnet, nc, etc).
可见这个问题影响远远不止一两个端口,而是某种算法或者什么原因导致规则出现BUG
下面有一个哥们说
The bug is in port masking, 22 is masked by tp_src=0x16/0xfffe which matches number 23 as well. Good catch! Changed in neutron: importance: Undecided → High
这里还是说的一样的,接着往下看,一堆人轮流轰炸了一堆之后,来了个fix
Reviewed: https://review.openstack.org/353782 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0494f212aa625a03587af3d75e823008f1198012 Submitter: Jenkins Branch: master commit 0494f212aa625a03587af3d75e823008f1198012 Author: Inessa Vasilevskaya Date: Thu Aug 11 02:21:29 2016 +0300 ovsfw: fix troublesome port_rule_masking In several cases port masking algorithm borrowed from networking_ovs_dpdk didn't behave correctly. This caused non-restricted ports to be open due to wrong tp_src field value in resulting ovs rules. This was fixed by alternative port masking implementation. Functional and unit tests to cover the bug added as well. Co-Authored-By: Jakub Libosvar Co-Authored-By: IWAMOTO Toshihiro
这里说明了,ovs Firewall,修复了port_rule_masking的问题,在某些情况下端口屏蔽的算法不对;结果就是由于ovs规则里错误的tp_src字段的值导致非限制的端口也屏蔽了
修改了一大波,可以看看commit
https://git.openstack.org/cgit/openstack/neutron/commit/?id=dd75f7e96afc713b57ad4ab21f01175be7b571fe
这规则屏蔽算法是在看着心碎,本来还想来个BUG分享,看这问题原因还是算了吧,总结就四个字:算法有误
解决办法就是,将上面的Reviewd版本合进来,这已经Commit到Master里了
Reviewed: https://review.openstack.org/353782 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0494f212aa625a03587af3d75e823008f1198012
更新完之后恢复正常
$ neutron security-group-show a762e5d7-d3c1-490b-92e2-ffe844a80650 +----------------------+--------------------------------------------------------------------+ | Field | Value | +----------------------+--------------------------------------------------------------------+ | description | | | id | a762e5d7-d3c1-490b-92e2-ffe844a80650 | | name | group-egress-tcp-115.236.127.223-5000 | | security_group_rules | { | | | "remote_group_id": null, | | | "direction": "egress", | | | "remote_ip_prefix": "115.236.127.223/32", | | | "protocol": "tcp", | | | "tenant_id": "10e5051a1cee4f8ebb0e8b5d877de581", | | | "port_range_max": 5000, | | | "security_group_id": "a762e5d7-d3c1-490b-92e2-ffe844a80650", | | | "port_range_min": 5000, | | | "ethertype": "IPv4", | | | "id": "5edefe65-48da-4b52-b257-dd5b147ef1a6" | | | } | | tenant_id | 10e5051a1cee4f8ebb0e8b5d877de581 | +----------------------+--------------------------------------------------------------------+ ~# iperf -s -i 1 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 115.236.127.223 port 5001 connected with 115.236.127.222 port 39420 [ ID] Interval Transfer Bandwidth [ 4] 0.0- 1.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 1.0- 2.0 sec 57.0 MBytes 479 Mbits/sec [ 4] 2.0- 3.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 3.0- 4.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 4.0- 5.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 5.0- 6.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 6.0- 7.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 7.0- 8.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 8.0- 9.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 9.0-10.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 10.0-11.0 sec 57.0 MBytes 478 Mbits/sec [ 4] 11.0-12.0 sec 57.0 MBytes 478 Mbits/sec