blktrace的软件包。
[root@localhost ~]# yum provides blktrace
blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo : @System
Matched from:
Provide : blktrace = 1.2.0-10.el8
blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo : BaseOS
Matched from:
Provide : blktrace = 1.2.0-10.el8
blktrace的作用
blktrace是一种块层IO跟踪机制,它提供了关于请求队列操作的详细信息,直至用户空间。有三个主要组件:内核组件、记录内核到用户空间的i/o跟踪信息的实用程序,以及分析和查看跟踪信息的实用程序。此手册页描述了blktrace,它将特定块设备的i/o事件跟踪信息记录到文件中。
DESCRIPTION
blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility
to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information. This man page describes blktrace, which records the i/o event trace
information for a specific block device to a file.
The blktrace utility extracts event traces from the kernel (via the relaying through the debug file system). Some background details concerning the run-time behaviour of blktrace will help to
understand some of the more arcane command line options:
- blktrace receives data from the kernel in buffers passed up through the debug file system (relay). Each device being traced has a file created in the mounted directory for the debugfs, which
defaults to /sys/kernel/debug -- this can be overridden with the -r command line argument.
- blktrace defaults to collecting all events that can be traced. To limit the events being captured, you can specify one or more filter masks via the -a option.
Alternatively, one may specify the entire mask utilising a hexadecimal value that is version-specific. (Requires understanding of the internal representation of the filter mask.)
- As noted above, the events are passed up via a series of buffers stored into debugfs files. The size and number of buffers can be specified via the -b and -n arguments respectively.
- blktrace stores the extracted data into files stored in the local directory. The format of the file names is (by default) device.blktrace.cpu, where device is the base device name (e.g, if we
are tracing /dev/sda, the base device name would be sda); and cpu identifies a CPU for the event stream.
The device portion of the event file name can be changed via the -o option.
- blktrace may also be run concurrently with blkparse to produce live output -- to do this specify -o - for blktrace.
- The default behaviour for blktrace is to run forever until explicitly killed by the user (via a control-C, or sending SIGINT signal to the process via invocation the kill (1) utility). Also you
can specify a run-time duration for blktrace via the -w option -- then blktrace will run for the specified number of seconds, and then halt.
blktrace的使用
debugfs
blktrace需要debugfs的支持,如果系统提示debugfs没有mount,需先mount。
执行 mount -t debugfs none /sys/kernel/debug 再执行blktrace命令。
blktrace、blkparse、btt
这三个命令都属于blktrace这个软件包,是一家人。
blktrace实时查看硬盘的信息
[root@localhost blktrace]# blktrace -d /dev/nvme0n1 -o- |blkparse -i -
259,0 58 1 0.000000000 38746 A WSM 1819522465 + 3 <- (253,0) 52456865
259,0 58 2 0.000001051 38746 A WSM 1822850465 + 3 <- (259,3) 1819522465
259,0 143 10 0.000419495 0 C WM 1850159232 + 32 [0]
^CCPU58 (259,0):
Reads Queued: 0, 0KiB Writes Queued: 1, 1KiB
Read Dispatches: 0, 0KiB Write Dispatches: 1, 1KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
Read depth: 0 Write depth: 10
IO unplugs: 1 Timer unplugs: 0
CPU141 (259,0):
Reads Queued: 0, 0KiB Writes Queued: 9, 92KiB
Read Dispatches: 0, 0KiB Write Dispatches: 9, 92KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
Read depth: 0 Write depth: 10
IO unplugs: 1 Timer unplugs: 0
CPU143 (259,0):
Reads Queued: 0, 0KiB Writes Queued: 0, 0KiB
Read Dispatches: 0, 0KiB Write Dispatches: 0, 0KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 10, 94KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
Read depth: 0 Write depth: 10
IO unplugs: 0 Timer unplugs: 0
Total (259,0):
Reads Queued: 0, 0KiB Writes Queued: 10, 94KiB
Read Dispatches: 0, 0KiB Write Dispatches: 10, 94KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 10, 94KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 2 Timer unplugs: 0
Throughput (R/W): 0KiB/s / 0KiB/s
Events (259,0): 64 entries
Skips: 0 forward (0 - 0.0%)
blktrace收集硬盘的信息
可以先用如下命令采集信息,待所有信息采集完毕后,统一分析所有采集到的数据。搜集信息的命令如下:
blktrace -d /dev/sdb -w 300
注意,这个命令并不是只输出一个文件,他会根据CPU的个数上,每一个CPU都会输出一个文件,如下所示:
[root@localhost home]# blktrace -d /dev/nvme0n1 -w 10
=== nvme0n1 ===
CPU 0: 0 events, 0 KiB data
CPU 1: 0 events, 0 KiB data
CPU 2: 0 events, 0 KiB data
CPU142: 0 events, 0 KiB data
CPU143: 5 events, 1 KiB data
Total: 43 events (dropped 0), 3 KiB data
[root@localhost home]# ls |grep -i nvme |wc -l
144
[root@localhost home]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 144
[root@localhost home]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 144
blkparse 分析采集的数据
[root@localhost home]# blkparse -i nvme0n1
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0 138 1 0.000000000 41470 A WSM 887728752 + 23 <- (253,2) 879338096
259,0 138 2 0.000000428 41470 A WSM 891056752 + 23 <- (259,3) 887728752
259,0 138 3 0.000001042 41470 Q WSM 891056752 + 23 [kworker/138:0]
Total (nvme0n1):
Reads Queued: 0, 0KiB Writes Queued: 6, 79KiB
Read Dispatches: 0, 0KiB Write Dispatches: 3, 79KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 3, 79KiB
Read Merges: 0, 0KiB Write Merges: 3, 48KiB
IO unplugs: 2 Timer unplugs: 0
Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 - 0.0%)
但blkparse 分析的内容仍然太多,太杂,不利于人的理解,这个时候可以使用btt工具了就。
btt工具的使用
首先使用gblkparse将不同cpu的多个文件聚合成一个文件。
[root@localhost home]# blkparse -i nvme0n1 -d nvme0n1.blktrace.bin
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0 138 1 0.000000000 41470 A WSM 887728752 + 23 <- (253,2) 879338096
259,0 138 2 0.000000428 41470 A WSM 891056752 + 23 <- (259,3) 887728752
Total (nvme0n1):
Reads Queued: 0, 0KiB Writes Queued: 6, 79KiB
Read Dispatches: 0, 0KiB Write Dispatches: 3, 79KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 3, 79KiB
Read Merges: 0, 0KiB Write Merges: 3, 48KiB
IO unplugs: 2 Timer unplugs: 0
Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 - 0.0%)
[root@localhost home]# ll |grep *.bin
-rw-r--r--. 1 root root 2152 Jul 3 18:54 nvme0n1.blktrace.bin
然后btt就可以分析这个sdb.blktrace.bin了:
[root@localhost home]# btt -i nvme0n1.blktrace.bin
==================== All Devices ====================
ALL MIN AVG MAX N
--------------- ------------- ------------- ------------- -----------
Q2Q 0.000004206 0.000073153 0.000339809 5
Q2G 0.000001058 0.000004088 0.000006848 3
Q2M 0.000000636 0.000001158 0.000002053 3
M2D 0.000009427 0.000014686 0.000019899 3
D2C 0.000080536 0.000092041 0.000108863 6
Q2C 0.000096052 0.000110561 0.000123132 6
==================== Device Overhead ====================
DEV | Q2G G2I Q2M I2D D2C
---------- | --------- --------- --------- --------- ---------
(259, 0) | 1.8488% 0.0000% 0.5237% 0.0000% 83.2490%
---------- | --------- --------- --------- --------- ---------
Overall | 1.8488% 0.0000% 0.5237% 0.0000% 83.2490%
==================== Device Merge Information ====================
DEV | #Q #D Ratio | BLKmin BLKavg BLKmax Total
---------- | -------- -------- ------- | -------- -------- -------- --------
(259, 0) | 6 3 2.0 | 23 53 72 159
==================== Device Q2Q Seek Information ====================
DEV | NSEEKS MEAN MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
(259, 0) | 6 295065809.3 0 | 0(3)
---------- | --------------- --------------- --------------- | ---------------
Overall | NSEEKS MEAN MEDIAN | MODE
Average | 6 295065809.3 0 | 0(3)
==================== Device D2D Seek Information ====================
DEV | NSEEKS MEAN MEDIAN | MODE
---------- | --------------- --------------- --------------- | ---------------
(259, 0) | 3 590131618.7 879337976 | 128(1)
...(2 more)
---------- | --------------- --------------- --------------- | ---------------
Overall | NSEEKS MEAN MEDIAN | MODE
Average | 3 590131618.7 879337976 | 128(1)
...(2 more)
==================== Plug Information ====================
# xfsaild : q activity
0.000340851 3.0
0.000340851 3.4
0.000366808 3.4
0.000366808 3.0
# xfsaild : c activity
blktrace结果的分析
blkparse显示的各指标点示意
Q------->G------------>I--------->M------------------->D--------------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime---|
|----------------------- await time in iostat output -----------------------------------------|
各个指标含义
Q2Q — time between requests sent to the block layer
Q2G — time from a block I/O is queued to the time it gets a request allocated for it
G2I — time from a request is allocated to the time it is Inserted into the device's queue
Q2M — time from a block I/O is queued to the time it gets merged with an existing request
I2D — time from a request is inserted into the device's queue to the time it is actually issued to the device
M2D — time from a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer for a request
Q2I – time it takes to process an I/O prior to it being inserted or merged onto a request queue – Includes split, and remap time
I2D – time the I/O is “idle” on the request queue
D2C – time the I/O is “active” in the driver and on the device
Q2I + I2D + D2C = Q2C
Q2C: Total processing time of the I/O
案例
后续补充。