磁盘IO读写的监控有一些实用的工具,做个总结
1:iotop
顾名思义,top前面加了一个io;安装起来也很方面,直接装就是了,运行也简单
~# iotop -o Total DISK READ: 0.00 B/s | Total DISK WRITE: 664.62 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 8410 be/4 root 0.00 B/s 0.00 B/s 0.00 % 77.37 % dd if=/dev/zero of=/root/1Gb.file bs=1M count=1000 1998 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.15 % [flush-254:0] 2230 be/4 root 0.00 B/s 456.74 K/s 0.00 % 0.00 % rsyslogd -c5
-o只列出有IO的进程
2:iostat
debian7上,这玩意还比较难找,用apt-cache search可以找到,应该安装的是这个
sysstat - system performance tools for Linux
这里打印IO的同时,device也打印了出来
~# iostat -d -m 1 5 Linux 3.2.0-4-amd64 (haitao-47) 11/07/2015 _x86_64_ (32 CPU) Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn vda 122.72 0.00 60.25 359 7841638 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn vda 359.00 0.00 178.71 0 178 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn vda 619.00 0.00 308.50 0 308 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn vda 473.00 0.00 236.50 0 236 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn vda 805.00 0.00 402.50 0 402
3:iodump
这是一个用perl写的脚本,输出有点非主流;首先系统准备工作,清除dmesg的信息,关闭klogd,打开消息开关;当这个block开关打开之后,内核就会记录下每一个IO的操作信息,最后perl脚本里再来进行分析
#!/usr/bin/env perl =pod =head1 NAME iodump - Compute per-PID I/O stats for Linux when iotop/pidstat/iopp are not available. =head1 SYNOPSIS Prepare the system: dmesg -c /etc/init.d/klogd stop echo 1 > /proc/sys/vm/block_dump Start the reporting: while true; do sleep 1; dmesg -c; done | perl iodump CTRL-C Stop the system from dumping these messages: echo 0 > /proc/sys/vm/block_dump /etc/init.d/klogd start =head1 AUTHOR Baron Schwartz, inspired by L =head1 LICENSE This software is released to the public domain, with no guarantees whatsoever. =cut use strict; use warnings FATAL => 'all'; use English qw(-no_match_vars); use sigtrap qw(handler finish untrapped normal-signals); my %tasks; my $oktorun = 1; my $line; while ( $oktorun && (defined ($line = <>)) ) { my ( $task, $pid, $activity, $where, $device ); ( $task, $pid, $activity, $where, $device ) = $line =~ m/(\S+)\((\d+)\): (READ|WRITE) block (\d+) on (\S+)/; if ( !$task ) { ( $task, $pid, $activity, $where, $device ) = $line =~ m/(\S+)\((\d+)\): (dirtied) inode \(.*?\) (\d+) on (\S+)/; } if ( $task ) { my $s = $tasks{$pid} ||= { pid => $pid, task => $task }; ++$s->{lc $activity}; ++$s->{activity}; ++$s->{devices}->{$device}; } } printf("%-15s %10s %10s %10s %10s %10s %s\n", qw(TASK PID TOTAL READ WRITE DIRTY DEVICES)); foreach my $task ( reverse sort { $a->{activity} <=> $b->{activity} } values %tasks ) { printf("%-15s %10d %10d %10d %10d %10d %s\n", $task->{task}, $task->{pid}, ($task->{'activity'} || 0), ($task->{'read'} || 0), ($task->{'write'} || 0), ($task->{'dirty'} || 0), join(', ', keys %{$task->{devices}})); } sub finish { my ( $signal ) = @_; if ( $oktorun ) { print STDERR "# Caught SIG$signal.\n"; $oktorun = 0; } else { print STDERR "# Exiting on SIG$signal.\n"; exit(1); } }
但是最终通过扑捉信号,不过既然这个很多人都强烈推荐,那就按官方要求运行吧
while true; do sleep 1; dmesg -c; done | perl iodump.pl
最终通过ctrl+c获得结果
~# while true; do sleep 1; dmesg -c; done | perl iodump.pl ^C# Caught SIGINT. TASK PID TOTAL READ WRITE DIRTY DEVICES dd 10679 2001 0 2001 0 vda1 dd 10550 2001 0 2001 0 vda1 dd 10711 2001 0 2001 0 vda1 dd 10560 2001 0 2001 0 vda1 dd 10567 2001 0 2001 0 vda1 dd 10655 2001 0 2001 0 vda1 dd 10625 2001 0 2001 0 vda1 dd 10645 2001 0 2001 0 vda1 dd 10634 2001 0 2001 0 vda1 dd 10548 1428 0 1428 0 vda1 flush-254:0 1998 613 0 613 0 vda1 dd 10731 602 0 602 0 vda1 jbd2/vda1-8 370 331 0 331 0 vda1 sendmail 10595 2 0 2 0 vda1 exim4 10614 1 0 1 0 vda1 python 10605 1 0 1 0 vda1 exim4 10616 1 0 1 0 vda1
但是这里打印数据的单位是block,这里block块大小取决于创建文件系统的时候,可以通过命令查看
~# stat /boot File: `/boot' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fe01h/65025d Inode: 652812 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-11-06 17:49:27.866159150 +0800 Modify: 2015-11-07 14:20:21.374693644 +0800 Change: 2015-11-07 14:20:21.374693644 +0800 Birth: -
4:dstat
这一个应该是最爽的,有时候看网络流量也行,安装很简单,直接装就行,运行也简单
~# dstat You did not select any stats, using -cdngy by default. ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 1 98 0 0 0|2862B 63M| 0 0 | 0 0 | 11k 17k 0 1 97 2 0 0| 0 371M| 652M 965k| 0 0 | 10k 15k 0 1 97 1 0 0| 0 114M| 668M 1011k| 0 0 |9942 16k 0 1 98 1 0 0| 0 115M| 680M 1036k| 0 0 |9914 17k 0 2 97 1 0 0| 0 152M| 652M 979k| 0 0 | 10k 16k 0 2 97 0 0 1| 0 0 | 701M 1089k| 0 0 | 13k 18k 0 2 97 0 0 1| 0 0 | 700M 1077k| 0 0 | 13k 18k 0 2 97 1 0 0| 0 293M| 641M 962k| 0 0 | 10k 16k 0 1 97 1 0 0| 0 355M| 599M 891k| 0 0 |8955 14k 0 1 97 2 0 0| 0 344M| 634M 946k| 0 0 | 11k 16k 0 4 96 0 0 0| 0 8196k| 688M 1042k| 0 0 | 11k 17k 0 2 97 1 0 0| 0 203M| 642M 962k| 0 0 | 10k 16k 0 1 97 1 0 0| 0 391M| 608M 878k| 0 0 |9231 14k
可以看到,此时不仅仅有磁盘写,而且还有网络流量600~700MB