之前有写过Metadata的部分内容:http://lihuia.com/2015/11/28/openstack%E9%87%8Cmetadata%E7%9A%84%E4%BD%9C%E7%94%A8/
其实和获取其它信息一样,都是通过EC2的169.254接口来获取metadata信息,唯一不同的是执行频率,比如注入公钥,修改主机Hostname,都会在开始执行一次即可,而判断虚拟机操作系统状态是否可用,需要持续不断地进行查询判断,因此需要一段时间间隔必须执行一下操作系统状态的判断和上报,因此在cron里设定了一个定时任务,每隔10秒钟就会获取上报一次操作系统状态
整个判断心跳的流程走的路线还是metadata的常规路线,最终心跳会写入memcache,nova服务再从中进行检查和判断
流程可以做下简单的测试,其实定时任务的HTTP请求如下:
curl -s -X PUT http://169.254.169.254/heartbeat
这里还是依赖私有网络,由于有定时任务的存在,可以直接在租户router namespace里进行抓包,不需要手动执行一次请求,并且顺便可以查看包的时间戳
~$ sudo ip netns exec qrouter-0aeca8b7-8a38-41b1-ace7-ee9918e2b229 tcpdump -i ha-2b8cf5e0-b1 src host 10.180.65.125 and port 80 -en tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ha-2b8cf5e0-b1, link-type EN10MB (Ethernet), capture size 262144 bytes 14:26:01.407697 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 74: 10.180.65.125.56726 > 169.254.169.254.80: Flags [S], seq 586087273, win 13600, options [mss 1360,sackOK,TS val 41025798 ecr 0,nop,wscale 3], length 0 14:26:01.408455 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56726 > 169.254.169.254.80: Flags [.], ack 2855258046, win 1700, options [nop,nop,TS val 41025798 ecr 3108610719], length 0 14:26:01.412820 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 229: 10.180.65.125.56726 > 169.254.169.254.80: Flags [P.], seq 0:163, ack 1, win 1700, options [nop,nop,TS val 41025799 ecr 3108610719], length 163 14:26:01.420102 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56726 > 169.254.169.254.80: Flags [.], ack 186, win 1834, options [nop,nop,TS val 41025801 ecr 3108610722], length 0 14:26:01.420208 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56726 > 169.254.169.254.80: Flags [F.], seq 163, ack 186, win 1834, options [nop,nop,TS val 41025801 ecr 3108610722], length 0 14:26:01.420732 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56726 > 169.254.169.254.80: Flags [.], ack 187, win 1834, options [nop,nop,TS val 41025801 ecr 3108610722], length 0 14:26:11.397906 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 74: 10.180.65.125.56728 > 169.254.169.254.80: Flags [S], seq 1171244961, win 13600, options [mss 1360,sackOK,TS val 41028296 ecr 0,nop,wscale 3], length 0 14:26:11.398715 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56728 > 169.254.169.254.80: Flags [.], ack 783697473, win 1700, options [nop,nop,TS val 41028296 ecr 3108613217], length 0 14:26:11.398751 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 229: 10.180.65.125.56728 > 169.254.169.254.80: Flags [P.], seq 0:163, ack 1, win 1700, options [nop,nop,TS val 41028296 ecr 3108613217], length 163 14:26:11.405364 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56728 > 169.254.169.254.80: Flags [.], ack 186, win 1834, options [nop,nop,TS val 41028297 ecr 3108613219], length 0 14:26:11.405465 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56728 > 169.254.169.254.80: Flags [F.], seq 163, ack 186, win 1834, options [nop,nop,TS val 41028298 ecr 3108613219], length 0 14:26:11.406003 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56728 > 169.254.169.254.80: Flags [.], ack 187, win 1834, options [nop,nop,TS val 41028298 ecr 3108613219], length 0 14:26:21.438077 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 74: 10.180.65.125.56729 > 169.254.169.254.80: Flags [S], seq 4243864003, win 13600, options [mss 1360,sackOK,TS val 41030797 ecr 0,nop,wscale 3], length 0 14:26:21.439515 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 229: 10.180.65.125.56729 > 169.254.169.254.80: Flags [P.], seq 4243864004:4243864167, ack 3004751206, win 1700, options [nop,nop,TS val 41030806 ecr 3108615727], length 163 14:26:21.439577 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56729 > 169.254.169.254.80: Flags [.], ack 1, win 1700, options [nop,nop,TS val 41030806 ecr 3108615727], length 0 14:26:21.447846 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56729 > 169.254.169.254.80: Flags [.], ack 186, win 1834, options [nop,nop,TS val 41030808 ecr 3108615729], length 0 14:26:21.447937 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56729 > 169.254.169.254.80: Flags [F.], seq 163, ack 186, win 1834, options [nop,nop,TS val 41030808 ecr 3108615729], length 0 14:26:21.449482 fa:16:3e:81:0b:1e > fa:16:3e:8e:51:d7, ethertype IPv4 (0x0800), length 66: 10.180.65.125.56729 > 169.254.169.254.80: Flags [.], ack 187, win 1834, options [nop,nop,TS val 41030809 ecr 3108615730], length 0
可以看到这里抓了三批请求包,14:26:01,14:26:11,14:26:21,每隔10S向169.254.169.254发送一次请求,或者说执行了某个定时脚本,因为从每次请求包的数量来看,有好几个,这都可以定时脚本里进行控制
包到了这个预留地址之后,按理说不会有啥反应才对,看下namespace里的iptables规则,看PREROUTING这条链
~$ sudo ip netns exec qrouter-0aeca8b7-8a38-41b1-ace7-ee9918e2b229 iptables -t nat -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -N neutron-postrouting-bottom -N neutron-vpn-agen-OUTPUT -N neutron-vpn-agen-POSTROUTING -N neutron-vpn-agen-PREROUTING -N neutron-vpn-agen-float-snat -N neutron-vpn-agen-snat -A PREROUTING -j neutron-vpn-agen-PREROUTING -A OUTPUT -j neutron-vpn-agen-OUTPUT -A POSTROUTING -j neutron-vpn-agen-POSTROUTING -A POSTROUTING -j neutron-postrouting-bottom -A neutron-postrouting-bottom -j neutron-vpn-agen-snat -A neutron-vpn-agen-POSTROUTING ! -i qg-0aeca8b7-8a ! -o qg-0aeca8b7-8a -m conntrack ! --ctstate DNAT -j ACCEPT -A neutron-vpn-agen-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 -A neutron-vpn-agen-snat -j neutron-vpn-agen-float-snat -A neutron-vpn-agen-snat -s 10.180.64.0/23 -j SNAT --to-source 169.254.43.17
可以发现到请求目host为169.254.169.254的tcp包会全部映射转发给了端口9697,那么接着继续查看这个端口对应的啥服务
这里需要注意一下的是,这里是network namespace,每个网络环境都是相互隔离的,因此不要跑到L3节点上来直接查询这个端口服务,而是要继续在这个namespace里来查询
~$ sudo ip netns exec qrouter-0aeca8b7-8a38-41b1-ace7-ee9918e2b229 lsof -i:9697 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME neutron-n 33788 root 6u IPv4 682699699 0t0 TCP *:9697 (LISTEN)
可以看到进程PID,接着直接查看进程详细信息
~$ ps aux | grep 33788 111776 3584 0.0 0.0 10776 2316 pts/0 S+ 14:36 0:00 grep 33788 root 33788 0.2 0.0 126624 26784 ? S Nov11 7:55 /srv/stack/neutron/bin/python /srv/stack/neutron/bin/neutron-ns-metadata-proxy --pid_file=/data/neutron/external/pids/0aeca8b7-8a38-41b1-ace7-ee9918e2b229.pid --metadata_proxy_socket=/data/neutron/metadata_proxy --router_id=0aeca8b7-8a38-41b1-ace7-ee9918e2b229 --state_path=/data/neutron --metadata_port=9697 --debug --verbose --log-file=neutron-ns-metadata-proxy-0aeca8b7-8a38-41b1-ace7-ee9918e2b229.log --log-dir=/data/log/neutron
也就会说向169.254请求的包都被转发到了neutron-ns-metadata-proxy服务
从进程参数可以看到有router_id,也就是说不同租户的虚拟机metadata请求的包都会转发给唯一的一个neutron-ns-metadata-proxy服务,该服务在租户初始化网络,创建router的时候,就会产生对应唯一的一个进程,监听9697端口
参数中还有一个–metadata_proxy_socket=/data/neutron/metadata_proxy,看到了unix socket字样,应该是socket连接,查询一下服务
~$ sudo netstat -lxp | grep /data/neutron/metadata_proxy unix 2 [ ACC ] STREAM LISTENING 694455933 15085/python /data/neutron/metadata_proxy
接着定位一下端口15085的服务
~$ ps aux | grep 15085 111776 3592 0.0 0.0 10776 2416 pts/0 S+ 15:38 0:00 grep 15085 neutron 15085 0.3 0.0 161968 55828 ? S Nov11 8:10 /srv/stack/neutron/bin/python /srv/stack/neutron/bin/neutron-metadata-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/metadata_agent.ini --config-file=/etc/neutron/neutron-pass.conf
这里又发现了neutron-metadata-agent服务,它一直和neutron-ns-metadata-proxy处于socket连接状态,其实neutron-ns-metadata-proxy会将请求包的头部添加x-forwarded-for信息,像RouterID,FixIP等封装在里面,然后neutron-metadata-agent会根据这些信息,获取对应的虚拟机UUID,租户TenantID,为接下来获取该虚拟机的metadata做准备,传入的这两个信息的作用,由于私有网络FixIP不唯一,不同租户能够相同,而加上一个RouterID就能够唯一确认租户和其网路资源,就能够定位到TenantID和UUID
到了这一步之后,封装了一些新的信息的请求,被neutron-metadata-agent转发给了nova-api-metadata服务,nova的这个服务是和nova-api一起启动,因此可以在nova API节点上
nova 128274 0.3 0.0 186120 59240 ? S Nov11 11:38 /srv/stack/nova/bin/python /srv/stack/nova/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-pass.conf nova 128497 1.1 0.0 320628 101436 ? S Nov11 36:25 /srv/stack/nova/bin/python /srv/stack/nova/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-pass.conf nova 128498 1.1 0.0 326740 107528 ? S Nov11 36:45 /srv/stack/nova/bin/python /srv/stack/nova/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-pass.conf nova 128499 1.1 0.0 325028 105748 ? S Nov11 36:27 /srv/stack/nova/bin/python /srv/stack/nova/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-pass.conf nova 128500 1.1 0.0 322648 103200 ? S Nov11 36:30 /srv/stack/nova/bin/python /srv/stack/nova/bin/nova-api-metadata --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-pass.conf
最终请求就是被这里接收和处理,会从nova数据库里根据云主机UUID将相关metadata信息获取,然后上面逐层返回给最初始的请求方
到了这一步,正常的metadata信息都会正常获取,该干啥干啥,也许你要存在某个文件中,也许你会将信息注入到你的某个定时脚本里做后续工作
而操作系统状态则不同,它可能突然出问题,这样状态就要置为down,因此必须要有一个实时检查的机制
下面还是先看HTTP的调用
~# curl -s -X PUT http://169.254.169.254/heartbeat | jq . { "c30e547e-e646-4db6-8494-5109b08e93a8_heart": "2016-11-13 08:06:36" }
这里返回了两个信息,虚拟机UUID和时间戳的一一对应,而这些返回的信息都会存在memcache里,从最开始抓包可以看出,定时任务是10秒执行一次操作系统状态的检查,因此这里的判断机制可以是:从memcache里获取最后一次的返回结果中的时间戳,然后与当前时间戳进行对比,假如相差大于了30S,就可以认为操作系统状态已经不可用了
具体获取memcache里的信息,可以通过脚本获取,查看时间戳即可
#!/usr/bin/env python import datetime import memcache import sys memcached_server = '' def get_heartbeat(instance_uuid): cache_key = str(instance_uuid + '_heart') memcache_client = memcache.Client([memcached_server]) last_heartbeat = "" if memcache_client is not None: cache_value = memcache_client.get(cache_key) if cache_value: last_heartbeat = datetime.datetime.strptime(cache_value, '%Y-%m-%d %H:%M:%S') print last_heartbeat if __name__ == '__main__': uuid = sys.argv[1] get_heartbeat(uuid)