blktrace简单介绍

blktrace的软件包。

[root@localhost ~]# yum provides blktrace
blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo        : @System
Matched from:
Provide    : blktrace = 1.2.0-10.el8

blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo        : BaseOS
Matched from:
Provide    : blktrace = 1.2.0-10.el8

blktrace的作用

blktrace是一种块层IO跟踪机制,它提供了关于请求队列操作的详细信息,直至用户空间。有三个主要组件:内核组件、记录内核到用户空间的i/o跟踪信息的实用程序,以及分析和查看跟踪信息的实用程序。此手册页描述了blktrace,它将特定块设备的i/o事件跟踪信息记录到文件中。

DESCRIPTION
       blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility
       to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information.  This man page describes blktrace, which records  the  i/o  event  trace
       information for a specific block device to a file.

       The  blktrace  utility  extracts  event traces from the kernel (via the relaying through the debug file system). Some background details concerning the run-time behaviour of blktrace will help to
       understand some of the more arcane command line options:

       - blktrace receives data from the kernel in buffers passed up through the debug file system (relay). Each device being traced has a file created in the mounted directory for  the  debugfs,  which
         defaults to /sys/kernel/debug -- this can be overridden with the -r command line argument.

       - blktrace defaults to collecting all events that can be traced. To limit the events being captured, you can specify one or more filter masks via the -a option.

         Alternatively, one may specify the entire mask utilising a hexadecimal value that is version-specific. (Requires understanding of the internal representation of the filter mask.)

       - As noted above, the events are passed up via a series of buffers stored into debugfs files. The size and number of buffers can be specified via the -b and -n arguments respectively.

       - blktrace  stores  the extracted data into files stored in the local directory. The format of the file names is (by default) device.blktrace.cpu, where device is the base device name (e.g, if we
         are tracing /dev/sda, the base device name would be sda); and cpu identifies a CPU for the event stream.

         The device portion of the event file name can be changed via the -o option.

       - blktrace may also be run concurrently with blkparse to produce live output -- to do this specify -o - for blktrace.

       - The default behaviour for blktrace is to run forever until explicitly killed by the user (via a control-C, or sending SIGINT signal to the process via invocation the kill (1) utility). Also you
         can specify a run-time duration for blktrace via the -w option -- then blktrace will run for the specified number of seconds, and then halt.

blktrace的使用

debugfs

blktrace需要debugfs的支持,如果系统提示debugfs没有mount,需先mount。
执行 mount -t debugfs none /sys/kernel/debug 再执行blktrace命令。

blktrace、blkparse、btt

这三个命令都属于blktrace这个软件包,是一家人。

blktrace实时查看硬盘的信息

[root@localhost blktrace]# blktrace  -d /dev/nvme0n1 -o- |blkparse -i -
259,0   58        1     0.000000000 38746  A WSM 1819522465 + 3 <- (253,0) 52456865
259,0   58        2     0.000001051 38746  A WSM 1822850465 + 3 <- (259,3) 1819522465

259,0   143       10     0.000419495     0  C  WM 1850159232 + 32 [0]
^CCPU58 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           1,        1KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        1,        1KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             1               Timer unplugs:           0
CPU141 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           9,       92KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        9,       92KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             1               Timer unplugs:           0
CPU143 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       10,       94KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             0               Timer unplugs:           0

Total (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:          10,       94KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       10,       94KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       10,       94KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (259,0): 64 entries
Skips: 0 forward (0 -   0.0%)

blktrace收集硬盘的信息

可以先用如下命令采集信息,待所有信息采集完毕后,统一分析所有采集到的数据。搜集信息的命令如下:

blktrace -d /dev/sdb -w 300

注意,这个命令并不是只输出一个文件,他会根据CPU的个数上,每一个CPU都会输出一个文件,如下所示:

[root@localhost home]# blktrace -d /dev/nvme0n1 -w 10
=== nvme0n1 ===
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    0 events,        0 KiB data
  CPU  2:                    0 events,        0 KiB data

  CPU142:                    0 events,        0 KiB data
  CPU143:                    5 events,        1 KiB data
  Total:                    43 events (dropped 0),        3 KiB data
[root@localhost home]# ls |grep -i nvme |wc -l
144
[root@localhost home]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144

[root@localhost home]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144

blkparse 分析采集的数据

[root@localhost home]# blkparse -i nvme0n1
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0   138        1     0.000000000 41470  A WSM 887728752 + 23 <- (253,2) 879338096
259,0   138        2     0.000000428 41470  A WSM 891056752 + 23 <- (259,3) 887728752
259,0   138        3     0.000001042 41470  Q WSM 891056752 + 23 [kworker/138:0]


Total (nvme0n1):
 Reads Queued:           0,        0KiB  Writes Queued:           6,       79KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        3,       79KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        3,       79KiB
 Read Merges:            0,        0KiB  Write Merges:            3,       48KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 -   0.0%)

但blkparse 分析的内容仍然太多,太杂,不利于人的理解,这个时候可以使用btt工具了就。

btt工具的使用

首先使用gblkparse将不同cpu的多个文件聚合成一个文件。

[root@localhost home]# blkparse -i nvme0n1 -d nvme0n1.blktrace.bin
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0   138        1     0.000000000 41470  A WSM 887728752 + 23 <- (253,2) 879338096
259,0   138        2     0.000000428 41470  A WSM 891056752 + 23 <- (259,3) 887728752


Total (nvme0n1):
 Reads Queued:           0,        0KiB  Writes Queued:           6,       79KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        3,       79KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        3,       79KiB
 Read Merges:            0,        0KiB  Write Merges:            3,       48KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 -   0.0%)

[root@localhost home]# ll |grep *.bin
-rw-r--r--. 1 root root 2152 Jul  3 18:54 nvme0n1.blktrace.bin

然后btt就可以分析这个sdb.blktrace.bin了:

[root@localhost home]# btt -i nvme0n1.blktrace.bin 
==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000004206   0.000073153   0.000339809           5
Q2G               0.000001058   0.000004088   0.000006848           3
Q2M               0.000000636   0.000001158   0.000002053           3
M2D               0.000009427   0.000014686   0.000019899           3
D2C               0.000080536   0.000092041   0.000108863           6
Q2C               0.000096052   0.000110561   0.000123132           6

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (259,  0) |   1.8488%   0.0000%   0.5237%   0.0000%  83.2490%
---------- | --------- --------- --------- --------- ---------
   Overall |   1.8488%   0.0000%   0.5237%   0.0000%  83.2490%

==================== Device Merge Information ====================

       DEV |       #Q       #D   Ratio |   BLKmin   BLKavg   BLKmax    Total
---------- | -------- -------- ------- | -------- -------- -------- --------
 (259,  0) |        6        3     2.0 |       23       53       72      159

==================== Device Q2Q Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE           
---------- | --------------- --------------- --------------- | ---------------
 (259,  0) |               6     295065809.3               0 | 0(3)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE           
   Average |               6     295065809.3               0 | 0(3)

==================== Device D2D Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE           
---------- | --------------- --------------- --------------- | ---------------
 (259,  0) |               3     590131618.7       879337976 | 128(1)
                                                               ...(2 more)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE           
   Average |               3     590131618.7       879337976 | 128(1)
                                                               ...(2 more)


==================== Plug Information ====================

#          xfsaild : q activity
  0.000340851   3.0
  0.000340851   3.4
  0.000366808   3.4
  0.000366808   3.0

#          xfsaild : c activity

blktrace结果的分析

blkparse显示的各指标点示意

Q------->G------------>I--------->M------------------->D--------------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime---|

|----------------------- await time in iostat output -----------------------------------------|

各个指标含义

Q2Q — time between requests sent to the block layer
Q2G — time from a block I/O is queued to the time it gets a request allocated for it
G2I — time from a request is allocated to the time it is Inserted into the device's queue
Q2M — time from a block I/O is queued to the time it gets merged with an existing request
I2D — time from a request is inserted into the device's queue to the time it is actually issued to the device
M2D — time from a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer for a request

Q2I – time it takes to process an I/O prior to it being inserted or merged onto a request queue – Includes split, and remap time
I2D – time the I/O is “idle” on the request queue
D2C – time the I/O is “active” in the driver and on the device
Q2I + I2D + D2C = Q2C
Q2C: Total processing time of the I/O

案例

后续补充。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,657评论 6 505
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,889评论 3 394
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,057评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,509评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,562评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,443评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,251评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,129评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,561评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,779评论 3 335
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,902评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,621评论 5 345
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,220评论 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,838评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,971评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,025评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,843评论 2 354

推荐阅读更多精彩内容

  • 在Linux系统上,如果I/O发生性能问题,有没有办法进一步定位故障位置呢?iostat等最常用的工具肯定是指望不...
    pandazhong阅读 1,598评论 0 1
  • 出处: https://lwn.net/Articles/738449/The Linux block layer...
    项男阅读 805评论 0 0
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc阅读 2,856评论 0 0
  • 文章主要列出了常见的排查工具,也有部分线上问题案例; JVM 问题排查 JDK工具包 javap 反编译工具jav...
    霸体阅读 703评论 0 0
  • http://lwn.net/Articles/438256/ Since the dawn of time, o...
    项男阅读 336评论 0 0