OP5 Monitor - How to analyze sysload issues? Troubleshooting high runq-sz, and iops/tps values. – Support - ITRS Group

The following table is taken from Wikipedia:

Device	Type	IOPS	Interface
5,400 rpm SATA drives	HDD	~50–80 IOPS	SATA 3 Gbit/s
7,200 rpm SATA drives	HDD	~75–100 IOPS	SATA 3 Gbit/s-SAS 12Gbps
10,000 rpm SAS drives	HDD	~125–150 IOPS	SAS
15,000 rpm SAS drives	HDD	~175–210 IOPS	SAS

If you are experiencing high sysload coupled with low CPU usage, it's possible you need to turn your attention to the tps value produced by:

# iostat -cd 60

This command will give you output similar to the following, every 60 seconds:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.45   11.40   12.74    0.35    0.02   52.03

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvdj              0.03         0.30         7.00     232578    5465744
xvde            112.75        39.67      2510.80   30952562 1959249512

As you can see in this example, xvde is seeing ~113 transfers per second. This amount of transfers would not be extraordinary in a case where you are frequently running thousands of checks, and the results of these are written to disk.

Another indication of this type of issue could be the output of:

# sar -q

Example:

00:00:01 runq-sz %runocc swpq-sz %swpocc
00:05:02    26.4      72     0.0       0
00:10:02    25.9      71     0.0       0
00:15:02    27.4      73     0.0       0
00:20:01    27.3      62     0.0       0
00:25:01    25.5      66     0.0       0

The common guidance for the runq-sz value seems to be:

The number of kernel threads in memory that are waiting for a CPU to run. Typically, this value should be less than 2. Consistently higher values mean that the system might be CPU-bound.

If you are seeing issues related to high sysload and measurements similar to the above, together with a disk setup that seems under specification according to the table, we strongly suggest that you add more IO capacity to your server to lower the load.

Articles in this section

OP5 Monitor - How to analyze sysload issues? Troubleshooting high runq-sz, and iops/tps values.

Comments

Articles in this section

Related articles