Troubleshoot high iowait issue on Linux

When your system has a high disk I/O wait time, it means that the disk is idle for a long time. This causes performance issues because the processor can’t access data as soon as it needs it. When you troubleshoot the high iowait issue on Linux, there are several things you can do to determine what is causing this performance issue and then reduce or eliminate it.

What is iowait?

The iowait statistic measures the amount of time a CPU spends waiting for input/output (I/O). The most common cause of this wait is disk I/O, but it could also be network, interprocess communication (IPC), or some other peripheral device.

The iowait value will vary depending on your workload and hardware configuration. In general, you should consider an iowait value higher than 10ms to be abnormal—that means more than one-tenth of your monitored CPUs are spending more than 100ms waiting for I/O every second! This can lead to poor performance and even system hangs if left unchecked.

Checking whether your machine has high iowait

  • Run the following command and look for high iowait:

iostat -d -k 10 10> iowait.txt 2>&1

  • Open the file in a text editor and search for ‘high’. You’ll see something like this:

IOWAIT-wio/s, %waited : 0.00%, name : [rnd], queue : 1 (default), IO scheduler: cfq, pid : 8095, comm : syncthing

Possible causes of high iowait

If you’re seeing high I/O wait times, it’s likely due to one of the following causes:

  • The system has too much RAM. In this case, Linux will try to swap out memory in order to free up some space for page cache data and other things that need fast access. This can lead to issues with file systems being slow.
  • You have a faulty hard drive or controller card. If your computer is experiencing periodic hangs or freezes, it’s possible that your hard drive is failing and causing the system interruptions you see when it tries to retrieve data from bad sectors on your disk drives.

Reduce the Disk I/O

The performance of a server can be improved by reducing the disk I/O.

  • Use a smaller block size for filesystem, LVM, or RAID array. The default settings are usually too large and can cause unnecessary I/O overhead.
  • Increase the number of requests that Linux will queue up before it starts to discard them (default settings is 128). This can be done using tune2fs or fstab.
  • Use a faster disk (e.g., SSD) than spinning disks with excessive head movement which increases latency and lowers throughput.

Use multiple disks in a RAID array

  • If you use multiple disks in a RAID array, the performance of your disk subsystem will be improved and/or you will be able to increase the storage capacity of your machine.
  • The following types of RAID arrays are supported: RAID0, RAID1, and RAID10.

Optimal scheduler for rotating and SSD disks

The Linux kernel can be configured to use a number of different schedulers. These are used to determine how processes are handled, and the best one to use depends on the type of disk you have installed on your system.

  • noop scheduler—Used for rotating disks (HDD)
  • deadline scheduler—Used for rotating disks (HDD)
  • cfq scheduler—Used for SSDs and hybrid drives with SSD cache.

Use of noop scheduler for SSD disks

If you’re using SSD disks, the noop scheduler (the default in many Linux distributions) is the best one to use. The noop scheduler does not do any actual scheduling. Instead, it passes requests through without doing any processing on them. This is useful for SSD disks because they can handle I/O requests much faster than traditional hard drives and don’t need any processing done by the kernel before passing them on to a storage device driver.

Use Tuned profiler to automate the process of optimization.

Tuned is a tool that is part of the Linux kernel, which can be used for optimization. The [Tuned Profiler](https://www.tuned-profiler.org/) can help you find which process and parameter are causing high iowait issue, and then you can optimize it manually according to the results from Tuned profiler.

If you understand what iowait is, you can take steps to reduce it.

If you understand what iowait is, you can take steps to reduce it. The problem occurs when the CPU waits for an I/O operation to complete.

iowait is caused by disk or network I/O operations—which are often called disk I/O waits and network I/O waits, respectively. It’s possible that your high iowait problem could be related to a specific application or database process that’s accessing storage devices or networking resources. For example, if multiple processes are competing for the same limited resource within a computer system (such as memory), they may experience increased latency due to their competing actions and device accesses; this can cause increased overall latency in your database server’s performance, including any query latency issues you might see during peak usage times (e.g., late afternoon).

Conclusion

In this article, we have discussed what iowait is and how it can be reduced. We also covered some of the common causes of high iowait. If you are looking for an automated solution to reduce iowait on your machines, then Tuned Profiler may be a good option for you.

Categories

Leave a Reply

Your email address will not be published. Required fields are marked *