问题现象
用户反馈多台机器PMC raid卡下硬盘写入性能延迟高的问题,机器OS为ESXi系统。
问题分析
日志分析
2023-11-13T17:34:53.663Z cpu13:2098447)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25206 microseconds to 757505 microseconds.
2023-11-13T17:34:54.980Z cpu0:2098449)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 757505 microseconds to 147922 microseconds.
2023-11-13T17:35:02.615Z cpu62:2098460)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 147922 microseconds to 49903 microseconds.
2023-11-13T17:52:47.544Z cpu30:2098457)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25277 microseconds to 763337 microseconds.
2023-11-13T17:52:48.769Z cpu32:2098446)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 763337 microseconds to 149346 microseconds.
2023-11-13T17:53:06.086Z cpu17:2098445)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 149346 microseconds to 50366 microseconds.
2023-11-14T16:01:46.047Z cpu31:2098445)WARNING: ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has deteriorated. I/O latency increased from average value of 25397 microseconds to 780078 microseconds.
2023-11-14T16:01:46.618Z cpu0:2098451)ScsiDeviceIO: 1513: Device naa.600508b1001c55603c71cb71e23b8301 performance has improved. I/O latency reduced from 780078 microseconds to 154412 microseconds.
从日志中确实可以看到IO延迟较大。
继续查看vcenter下硬盘的延迟情况
查看结果为,硬盘的延迟均表现为写延迟较高(且非常高),读延迟正常,读延迟无任何问题。
vcenter下的告警也均为写延迟
分析结论:
从各个层面来看,均为写延迟,且延迟数值较高,因此怀疑为cache层面问题。
继续排查cache
logical drive的cache策略如下
Logical Device number 1
Logical Device name : VD_1
Disk Name : Not Applicable
Block Size of member drives : 512 Bytes
Array : 1
RAID level : 6
Status of Logical Device : Optimal
Parity Initialization Status : Completed
Size : 122076928 MB
Stripe-unit size : 256 KB
Full Stripe Size : 2048 KB
Interface Type : Serial Attached SCSI
Device Type : Data
Boot Type : None
Heads : 255
Sectors Per Track : 32
Cylinders : 65535
Caching : Enabled
Mount Points : Not Applicable
LD Acceleration Method : Controller Cache
logical driver已经设置了cache策略,解释说明如下:
LD Acceleration Method : Setting of the LD acceleration method. Controller cache or SSD I/O BypassormaxCache.
controller cache策略如下
Cache Properties
--------------------------------------------------------
Cache Status : Ok
Cache Serial Number : Not Applicable
Cache memory : 3856 MB
Read Cache Percentage : 100 percent
Write Cache Percentage : 0 percent
No-Battery Write Cache : Disabled
Wait for Cache Room : Disabled
Write Cache Bypass Threshold Size : 1040 KB
--------------------------------------------------------
从上面结果可以看出controller的cache策略将cache100%分配给了读,而没有给分配为写,这是不正常。
关于此参数的解释说明:
Cache Ratio : The controller cache ratio setting determines the controller ability to adjust theamount
of memory for read-ahead cache versus write cache.
Cache Ratio (Read) : Sets the ratio of controller cache memory used for read-ahead cache
versus write cache. Cache ratio values range from 0-100, in increments of 5
解决方法
调整read cache占比10%,write cache占比90%.
调整命令参考如下: