The following table is taken from Wikipedia:
|5,400 rpm SATA drives||HDD||~50–80 IOPS||SATA 3 Gbit/s|
|7,200 rpm SATA drives||HDD||~75–100 IOPS||SATA 3 Gbit/s-SAS 12Gbps|
|10,000 rpm SAS drives||HDD||~125–150 IOPS||SAS|
|15,000 rpm SAS drives||HDD||~175–210 IOPS||SAS|
If you are experiencing high sysload coupled with low CPU usage, it's possible you need to turn your attention to the tps value produced by:
# iostat -cd 60
This command will give you output similar to the following, every 60 seconds:
avg-cpu: %user %nice %system %iowait %steal %idle 23.45 11.40 12.74 0.35 0.02 52.03 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdj 0.03 0.30 7.00 232578 5465744 xvde 112.75 39.67 2510.80 30952562 1959249512
As you can see in this example, xvde is seeing ~113 transfers per second. This amount of transfers would not be extraordinary in a case where you are frequently running thousands of checks, and the results of these are written to disk.
Another indication of this type of issue could be the output of:
# sar -q
00:00:01 runq-sz %runocc swpq-sz %swpocc 00:05:02 26.4 72 0.0 0 00:10:02 25.9 71 0.0 0 00:15:02 27.4 73 0.0 0 00:20:01 27.3 62 0.0 0 00:25:01 25.5 66 0.0 0
The common guidance for the runq-sz value seems to be:
The number of kernel threads in memory that are waiting for a CPU to run. Typically, this value should be less than 2. Consistently higher values mean that the system might be CPU-bound.
If you are seeing issues related to high sysload and measurements similar to the above, together with a disk setup that seems under specification according to the table, we strongly suggest that you add more IO capacity to your server to lower the load.