云硬盘服务异常导致虚拟机无法删除场景分析

在虚拟机删除的操作当中,时长会有一些异常出现,比如nova-compute服务卡主,libvirtd中断等,当然也有可能还有其它的一些依赖服务出现问题,导致本来不是主机服务异常却引起无法删除虚拟机的场景出现,下面就是一个云硬盘服务的异常触发的问题

首先,虚拟机执行删除操作

lihui@MacBook  ~/server/source_txt  nova show 83e35fec-6da8-44c0-9211-e72e8ab95c61
+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| Property                                         | Value                                                                                                     |
+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                                | MANUAL                                                                                                    |
| OS-EXT-AZ:availability_zone                      | test1.badceph2                                                                                            |
| OS-EXT-SRV-ATTR:host                             | nova30.openstack.org                                                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname              | nova30.openstack.org                                                                                   |
| OS-EXT-SRV-ATTR:instance_name                    | instance-0001242d                                                                                         |
| OS-EXT-STS:power_state                           | 0                                                                                                         |
| OS-EXT-STS:task_state                            | deleting                                                                                                  |
| OS-EXT-STS:vm_state                              | active                                                                                                    |
| OS-SRV-USG:launched_at                           | 2016-11-21T02:25:48.000000                                                                                |
| OS-SRV-USG:terminated_at                         | -                                                                                                         |
| accessIPv4                                       |                                                                                                           |
| accessIPv6                                       |                                                                                                           |
| availability_zone                                | test1.badceph2                                                                                            |
| config_drive                                     | 1                                                                                                         |
| created                                          | 2016-11-21T02:24:06Z                                                                                      |
| flavor                                           | flavor_2 (2)                                                                                              |
| hostId                                           | c58c33d798ee65479cfb84695845d537f40079d4796e9313ba6da758                                                  |
| hypervisor_type                                  | qemu                                                                                                      |
| id                                               | 83e35fec-6da8-44c0-9211-e72e8ab95c61                                                                      |
| image                                            | debian_7_x86_64_pub_static_36840.raw (0d26f602-7c69-43f4-aa70-1371bd05b1e1)                               |
| key_name                                         | lihui_yq_test                                                                                               |
| metadata                                         | {}                                                                                                        |
| name                                             | lihui-test1.badceph2:nova30.openstack.org-7                                                              |
| os-extended-volumes:volumes_attached             | [{"id": "b3a7925e-e2de-4b4a-94e9-380dc10b0e36"}]                                                          |
| os-netease-extended-volumes:volumes_attached     | [{"delete_on_terminate": false, "id": "b3a7925e-e2de-4b4a-94e9-380dc10b0e36", "device_name": "/dev/vdd"}] |
| os-server-status                                 | down                                                                                                      |
| os_type                                          | linux                                                                                                     |
| private_9bcf446410594faf884aaab076f43cbf network | 10.18.194.182                                                                                             |
| progress                                         | 0                                                                                                         |
| security_groups                                  | default                                                                                                   |
| status                                           | ACTIVE                                                                                                    |
| tenant_id                                        | 9bcf446410594faf884aaab076f43cbf                                                                          |
| updated                                          | 2016-12-06T06:30:29Z                                                                                      |
| use_ceph                                         | yes                                                                                                       |
| user_id                                          | fea367d530534801bd5332ea6e06fac6                                                                          |
| vncPass                                          | cVMfx6                                                                                                    |
+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------+

可以过了好久,居然又从deleting状态恢复了,而且虚拟机依旧存在

lihui@MacBook  ~/server/source_txt  nova show 83e35fec-6da8-44c0-9211-e72e8ab95c61
+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| Property                                         | Value                                                                                                     |
+--------------------------------------------------+-----------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                                | MANUAL                                                                                                    |
| OS-EXT-AZ:availability_zone                      | test1.badceph2                                                                                            |
| OS-EXT-SRV-ATTR:host                             | nova30.openstack.org                                                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname              | nova30.openstack.org                                                                                   |
| OS-EXT-SRV-ATTR:instance_name                    | instance-0001242d                                                                                         |
| OS-EXT-STS:power_state                           | 0                                                                                                         |
| OS-EXT-STS:task_state                            | -                                                                                                         |
| OS-EXT-STS:vm_state                              | active                                                                                                    |
| OS-SRV-USG:launched_at                           | 2016-11-21T02:25:48.000000                                                                                |

从字段os-netease-extended-volumes:volumes_attached可以看出来,虚拟机挂载了一个云硬盘

接着看问题,首先计算节点查看nova-compute日志

2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 465, i
n _process_data
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     **args)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line
179, in dispatch
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     result = getattr(proxyobj, method)(ctxt, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 413, in decorate
d_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return function(self, context, *args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     payload)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 73, in wrapped
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return f(self, context, *args, **kw)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 303, in decorated_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     pass
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 289, in decorated_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return function(self, context, *args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 354, in decorated_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     function(self, context, *args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 331, in decorated_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     e, sys.exc_info())
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 318, in decorated_function
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return function(self, context, *args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2219, in terminate_instance
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     do_terminate_instance(instance, bdms, clean_shutdown)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 248, in inner
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return f(*args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2211, in do_terminate_instance
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     reservations=reservations)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 105, in inner
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     rv = f(*args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2182, in _delete_instance
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     user_id=user_id)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2154, in _delete_instance
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     clean_shutdown=clean_shutdown)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2097, in _shutdown_instance
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     connector)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 185, in wrapper
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     res = method(self, ctx, volume_id, *args, **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/volume/cinder.py", line 293, in terminate_connection
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     connector)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/cinderclient/v1/volumes.py", line 368, in terminate_connection
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     {'connector': connector})
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/cinderclient/v1/volumes.py", line 287, in _action
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return self.api.client.post(url, body=body)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 210, in post
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     return self._cs_request(url, 'POST', **kwargs)
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp   File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 199, in _cs_request
2016-12-06 14:37:11.453 148945 TRACE nova.openstack.common.rpc.amqp     raise exceptions.ConnectionError(msg)

从错误日志可以看出来,是nova里调用了cinderclient的时候出错,因此接着需要查看的是cinder API的日志

2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault Traceback (most recent call last):
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/api/middleware/fault.py", line 77, in __cal
l__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return req.get_response(self.application)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1296, in send
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     application, catch_exc_info=False)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1260, in call_application
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     app_iter = application(self.environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/keystoneclient/middleware/auth_token.py", line 598
, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return self.app(env, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     response = self.app(environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return resp(environ, start_response)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     resp = self.call_func(req, *args, **self.kwargs)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return self.func(req, *args, **kwargs)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/api/openstack/wsgi.py", line 898, in __call__
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     content_type, body, accept)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/api/openstack/wsgi.py", line 946, in _process_stack
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     action_result = self.dispatch(meth, request, action_args)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/api/openstack/wsgi.py", line 1022, in dispatch
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return method(req=request, **action_args)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/api/contrib/volume_actions.py", line 172, in _terminate_connection
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     self.volume_api.terminate_connection(context, volume, connector)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/volume/api.py", line 78, in wrapped
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     return func(self, context, target_obj, *args, **kwargs)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/volume/api.py", line 473, in terminate_connection
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     force)
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/volume/rpcapi.py", line 147, in terminate_connection
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     volume['host']))
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault   File "/usr/lib/python2.7/dist-packages/cinder/openstack/common/rpc/proxy.py", line 129, in call
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault     exc.info, real_topic, msg.get('method'))
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault Timeout: Timeout while waiting on RPC response - topic: "cinder-volume:nova43.openstack.org@ceph", RPC method: "terminate_connection" info: ""
2016-12-06 14:38:50.567 129759 TRACE cinder.api.middleware.fault
2016-12-06 14:38:50.568 129759 INFO cinder.api.middleware.fault [req-7a37b38e-1195-4cf2-874d-f96536296587 d8385714a8374c2395a899a6450ef22b 73ba41a9f57e4af8b6a362295ab92b4a] http://10.185.0.253:8776/v1/73ba41a9f57e4af8b6a362295ab92b4a/volumes/b3a7925e-e2de-4b4a-94e9-380dc10b0e36/action returned with HTTP 500

这里原因很清晰了,云硬盘所在的cinder-volume节点服务返回500,因此查一下对应节点服务状况

可以看到,服务的确DOWN了

lihui@MacBook  ~/server/source_txt  cinder service-list | grep nova43.openstack.org
|  98 |  cinder-volume   |  nova43.openstack.org@ceph | test1 | enabled  |  down | 2016-11-23T09:12:32.000000 |

所以根本原因是,收到删除虚拟机的请求后,需要卸载掉主机挂载的云硬盘,但是此刻cinder-volume节点服务是DOWN的,因此无法卸掉,导致返回失败,因此云主机也无法正常删除

如果想清理掉主机云硬盘资源,得需要修改云硬盘cinder-volume服务,也就是进行迁移(非正常接口)

lihui@MacBook  ~/server/source_txt  cinder host-volumes-migrate --force-volumes-migrate=True nova43.openstack.org@ceph nova34.openstack.org@ceph

迁移之后,可以看到cinder-volume服务成功变成了目标节点

lihui@MacBook  ~/server/source_txt  cinder show b3a7925e-e2de-4b4a-94e9-380dc10b0e36
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                   Property                  |                                                                                                       Value                                                                                                       |
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                 attachments                 | [{u'device': u'/dev/nbs/xdjo', u'server_id': u'83e35fec-6da8-44c0-9211-e72e8ab95c61', u'id': u'b3a7925e-e2de-4b4a-94e9-380dc10b0e36', u'host_name': None, u'volume_id': u'b3a7925e-e2de-4b4a-94e9-380dc10b0e36'}] |
|              availability_zone              |                                                                                                       test1                                                                                                       |
|                   bootable                  |                                                                                                       false                                                                                                       |
|                  created_at                 |                                                                                             2016-11-21T06:49:16.000000                                                                                            |
|             display_description             |                                                                                                        None                                                                                                       |
|                 display_name                |                                                                                                    lihui-ceph-100                                                                                                   |
|                      id                     |                                                                                        b3a7925e-e2de-4b4a-94e9-380dc10b0e36                                                                                       |
|                   metadata                  |                                                                                  {u'readonly': u'False', u'attached_mode': u'rw'}                                                                                 |
|            os-vol-host-attr:host            |                                                                                            nova34.openstack.org@ceph                                                                                           |
|        os-vol-mig-status-attr:migstat       |                                                                                                        None                                                                                                       |
|        os-vol-mig-status-attr:name_id       |                                                                                                        None                                                                                                       |
|      os-vol-provider-attr:provider_auth     |                                                                                                        None                                                                                                       |
|    os-vol-provider-attr:provider_geometry   |                                                                                                        None                                                                                                       |
|    os-vol-provider-attr:provider_location   |                                                                                                        None                                                                                                       |
| os-vol-provider-attr:provider_pool_location |                                                                                                        None                                                                                                       |
|         os-vol-tenant-attr:tenant_id        |                                                                                          9bcf446410594faf884aaab076f43cbf                                                                                         |
|                     size                    |                                                                                                        100                                                                                                        |
|                 snapshot_id                 |                                                                                                        None                                                                                                       |
|                 source_volid                |                                                                                                        None                                                                                                       |
|                    status                   |                                                                                                       in-use                                                                                                      |
|                  volume_qos                 |                                                         {u'read_bps': u'86558041', u'write_bps': u'86558041', u'read_iops': u'122', u'write_iops': u'204'}                                                        |
|                 volume_type                 |                                                                                                      ceph_bad                                                                                                     |
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

最后再进行删除主机操作即可

lihui@MacBook  ~/server/source_txt  nova delete 83e35fec-6da8-44c0-9211-e72e8ab95c61
 ✘ lihui@MacBook  ~/server/source_txt  nova show 83e35fec-6da8-44c0-9211-e72e8ab95c61
ERROR: No server with a name or ID of '83e35fec-6da8-44c0-9211-e72e8ab95c61' exists.

 

 

 

发表评论