This is my fourth week at Pythian and in Canada and I’m starting to get back to my normal life cycle — my personal things are getting sorted and my working environment is set. Here at Pythian I’m in a team of four people together with Christo, Joe, and Virgil. (I should write another post about beginning at Pythian — will do one day.)
Yesterday, I asked Christo to show me how he monitors I/O on Linux. I needed to collect statistics on a large Oracle table on a production box, and wanted to keep an eye on the impact. So we grabbed Joe as well and sat all three around my PC. While we were discussing, Paul was around and showed some interest in the topic we discussed — otherwise, why would we all three be involved?. Anyway, Dave and Paul thought that this would be a nice case for a blog post. So here we are…
Indeed, while the technique we discuss here is basic, it gives a good overview and is very easy to use. So let get focused… We will use iostat
utility. In case you need you know where to find more about it — right, man pages.
So we will use the following form of the command:
iostat -x [-d] <interval>
- -x option displays extended statistics. You definitely want it.
- -d is optional. It removes CPU utilization to avoid cluttering the output. If you leave it out, you will get the following couple lines in addition:
avg-cpu: %user %nice %sys %iowait %idle 6.79 0.00 3.79 16.97 72.46
- is the number of seconds
iostat
waits between each report. Without a specified interval,iostat
displays statistics since the system was up then exits, which is not useful in our case. Specifying the number of seconds causesiostat
to print periodic reports where IO statistics are averaged for the time period since previous report. I.e., specifying 5 makesiostat
dump 5 seconds of average IO characteristics, every 5 seconds until it’s stopped.
If you have many devices and you want to watch for only some of them, you can also specify device names on command line:
iostat -x -d sda 5
Now let’s get to the most interesting part — what those cryptic extended statistics are. (For readability, I formatted the report above so that the last two lines are in fact a continuation of the first two.)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s sda 0.00 12.57 10.18 9.78 134.13 178.84 67.07 wkB/s avgrq-sz avgqu-sz await svctm %util 89.42 15.68 0.28 14.16 8.88 17.72
r/s
andw/s
— respectively, the number of read and write requests issued by processes to the OS for a device.rsec/s
andwsec/s
— sectors read/written (each sector 512 bytes).rkB/s
andwkB/s
— kilobytes read/written.avgrq-sz
— average sectors per request (for both reads and writes). Do the math —(rsec + wsec) / (r + w) = (134.13+178.84)/(10.18+9.78)=15.6798597
If you want it in kilobytes, divide by 2.
If you want it separate for reads and writes — do you own math usingrkB/s
andwkB/s
.avgqu-sz
— average queue length for this device.- await — average response time (ms) of IO requests to a device. The name is a bit confusing as this is the total response time including wait time in the requests queue (let call it
qutim
), and service time that device was working servicing the requests (see next column —svctim
).So the formula isawait = qutim + svctim
. svctim
— average time (ms) a device was servicing requests. This is a component of total response time of IO requests.%util
— this is a pretty confusing value. The man page defines it as, Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%. A bit difficult to digest. Perhaps it’s better to think of it as percentage of time the device was servicing requests as opposed to being idle. To understand it better here is the formula:utilization = ( (read requests + write requests) * service time in ms / 1000 ms ) * 100%
or
%util = ( r + w ) * svctim /10 = ( 10.18 + 9.78 ) * 8.88 = 17.72448
Traditionally, it’s common to assume that the closer to 100% utilization a device is, the more saturated it is. This might be true when the system device corresponds to a single physical disk. However, with devices representing a LUN of a modern storage box, the story might be completely different.
Rather than looking at device utilization, there is another way to estimate how loaded a device is. Look at the non-existent column I mentioned above — qutim
— the average time a request is spending in the queue. If it’s insignificant, compare it to svctim
— the IO device is not saturated. When it becomes comparable to svctim
and goes above it, then requests are queued longer and a major part of response time is actually time spent waiting in the queue.
The figure in the await
column should be as close to that in the svctim
column as possible. If await
goes much above svctim
, watch out! The IO device is probably overloaded.
There is much to say about IO monitoring and interpreting results. Perhaps this is only the first of a series of posts about IO statistics. At Pythian we often come across different environments with specific characteristics and various requirements that our clients have. So stay tune — more to come.
Update 12-Feb-2007: You might also find useful Oracle Disk IO Basics session of Pythian Goodies.
37 Comments. Leave new
Hi Alex,
I would be interested in hearing more about your experience at Pythian. I heard it is a great place to work at.
Cheers
Hi Tobias,
It is a great place to work indeed. I plan to post a bit on this topic soon. Stay tuned! ;-)
Cheers,
Alex
Keep up the good work, Alex.
If anyone wants to load their iostat data into Oracle, there’s a script to massage it into sqlldr format at https://preferisco.blogspot.com/2006/09/loading-iostat-output-into-oracle.html.
Regards Nigel
Great article.
[…] Jeremy Cole shows how to get a visual take on MySQL and I/O statistics on Linux. (Something Pythian’s Alex Gorbachev looked at in an older article on basic IO Monitoring on Linux). […]
[…] sar, sysstat. I made serious progress last week, when Dushyanth from my team shared this post on IO Monitoring on Linux, by the folks over at Pythian, on our internal mailing list. Here are my notes on the […]
I loved this post so much, it prompted me to write one on my blog, using reference material from here and performing some analysis on my post. I have been gazing at iostat outputs since the past few days and I am a little confused about the explanation youhave given above. For instance here is an iostat output from one of my servers –
iostat -dkx 10
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
dm-0 0.00 0.00 8.79 46.95 71.13 187.81 9.29 0.34 6.18 0.85 4.77
Now my confusion is that there were only 55 IO requests issued to the disk, and clearly the disk is not at all utilized. despite that how come the await time is so much more higher than svctim. Technically none of the IO requests should have had to wait during that time since each IO request was taking only 0.85 ms to process.
While you may put this down to requests issued prior to this 10 second interval etc … I have seen this type of stats in my continuous monitoring which does not seem to make sense. ie a very high await time in comparison to svctim even when the number of requests are low and the disk is not utilized
Bhavin,
Don’t forger that these are *averaged* results. It’s very easy to draw wrong conclusions on the averaged data. One example is that you requests are coming as a bunch at once and clearly wait in the queue.
What is IO response time from applications (database?) telling you?
You might try to reduce period to one second and see if you see the spikes.
Cheers,
Alex
I find that at most of my servers, await is far more larger than svctm, it seems that this is a quite common phenomenon. Is it?
[…] iostat is a popular tool amongst the database crowd, so not surprisingly you’ll find a lot of great discussions documenting the use. Depending on your application you will need to focus on different metrics, but […]
That was an excellent post alex! Thanks a lot – your explantion of what iostat’s stats really mean is very good.
minor point: some strange characters (wrong encoding?) pop up in the text and the font-size you use is too small..
@rafsoaken: Thanks a lot for your feedback. The post has been fixed as well (minor issue during the move to the new web-site platform).
Great article indeed – just found it googling about looking for articles on i/o monitoring.
I wonder if you guys have any more articles on the subject, especially with regards to mySQL (and maybe innoDB tables?).
@Steve: This is not about MySQL but you might like this video about Oracle IO basics.
Thanks for this great article. It explains iostat in an easy way. I especially enjoyed your use of qutim to calculate how satured the device is.
Regards,
Kenneth Holter
Hii Alex
Good explanation of iostat output.
I have three question in order to understand iostat output I hope you answer
Service time is corresponding to disk device by itself ?
If we see high service time that more than 10-15 ms often then could we conclude the iosystem is insufficient ?
Sometimes I see low service time (1-2ms) but high await so qutime is high too what is the exact reason of this ?
@Emre:
Thanks for comment Emre. Some follow up below:
Service time is corresponding to disk device by itself ?
Service time is the average value to serve requests to a particular device during the reported period.
If we see high service time that more than 10-15 ms often then could we conclude the iosystem is insufficient ?
No. Depending on your workload patterns it might be absolutely fine to have reasonably high request service times. Especially, if application performance is acceptable. Don’t forget that these are averaged values so it might be that some critical business functions suffer while the rest doesn’t matter much. You can’t see it from iostat.
It also might be that it’s not the IO subsystem that’s too slow but that it’s inefficient workload (bad SQL/design) that results in excessive IO. If you see high service time, it doesn’t mean that you need to tune your subsystem. Most often, you want first try to eliminate as much IO requests as possible and if that’s not possible, start tuning the IO subsystem. In both cases, you first need to prove that IO response time is your problem and that reducing time spent on IO requests, you will have noticeable effect on you application response time. In other words, you need to profile your application / database traffic and conclude that IO represents major component of your response time.
Sometimes I see low service time (1-2ms) but high await so qutime is high too what is the exact reason of this ?
In this case I bet you also have high avgqu-sz and large amount of requests. I.e. device serves requests very quickly but there are lots of them that gets queued.
Don’t forget that you are looking at the averages statistics and in case of an IO request burst, you will have lots of IOs in the queue waiting during this bursts but then no requests outside of this few seconds burst.
Finally, you might be hitting some specifics of IO scheduler that are not very efficient for your workload.
Thanks a lot, its was really help full .
Hi Alex,
We are currently having some performance issues, our currently linux installation is on Red Hat 4.1.2-48. It has an Oracle 11g installed and based on the Oracle dba’s we are having I/O contention. After reading your post and analyzing the iostat results we received, the stats does not seem to point to a contention.
My question is, how does the avgqu-sz relate to await, as based on the stats i get, await is greater than svctm, but avgqu-sz is not relevant. Can you help me interpret the numbers below? Btw, disk is on a SAN setup, and the stats below are based on some of the most questionable numbers.
avgqu-sz await svctm
0.70 107.03 0.77
0.79 15.35 0.45
0.70 12.11 0.10
0.28 10.96 0.97
0.56 9.95 0.60
Hi Alex,
Doing a dd test from linux to iSCCI luns, we get high values for await though the svctm col gives good values :
# dd if=/dev/zero of=/vdbench/test bs=1024k
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdj 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00
sdj1 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00
So here, the virtual qutim value is very very high, hence the average time a request is spending in the queue is quite abnormal. Do you think that increasing the iscsi queue depth on client side (linux) would improve something ?
Well, I may be far away from the root cause of this.
Many thanks.
fred
Without any further analysis, to me it seems you’ve reached the throughput for this device – you are processing writes that seems to be almost 512K in size (strace to get exact numbers).
You already have 144 IOs in the device queue and your device simply can’t process them faster than that. If IOs are done sequentially and each is 6.75ms long and you are doing 148.2 IOs per second then in one second you get 1,000 ms of IOs. Providing they are all serialized to a single IO thread – there is no way unless you can split those IOs.
I don’t see how increasing queue size would help (unless you are talking about different queue which is set to 1 right now somewhere in the layer below normal linux device queue – didn’t look into iSCSI software much). You could try to look into why you average actual IO size is half of max 1MB that you are requesting. Maybe limitation of your iSCSI device or some Linux config – this could be a simple way to increase throughput.
Another place to look is your Linux IO scheduler.
I don’t know if running two dd’s in parallel would make any difference.
Anyway, dd test is pretty artificial. If you need to simulate Oracle workload – do yourself a favor and have a look at ORION.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdj 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00
sdj1 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00
you say:
“You already have 144 IOs in the device queue”
avgqu-sz ,is not average queue lenth,it represents the waiting time(ms) that all requests in the queue .
you can deep into kernel code.
mark, not sure how to interpret your comment and how this metric is supposed to be interpreted if you are right about it and what time exactly it measures in ms.
Average time spent by an IO request in the queue is await-svctm to the best of my knowledge. What exactly do you extract reading the kernel source?
Alex, as you know most enterprise systems are on filesystems stored on SAN.
how do we run iostat to get information on something like
/dev/mapper/mpath0p1 mounted on /somepartition
@sid: there is no “iostat” command but you can issue simple “mount” command to see which devices mounted on which mountpoints.
Hi Alex,
Hi, what does this mean:
avgqu-sz await svctm %util
1.17 3.66 3.11 99.72
1. Why is the disk hitting almost 100% utilization? Is it alright and can we say there is no IO bottlneck as await is almost equal to svctm and avgqu-sz is close to 1.
This is happening on a RAID 5 SAN partition.
2. Iostat shows the SAN disks as dm-0 and dm-1 so what does that mean in terms of IOstat output? Should it be read just as any normal disk?
It’s OK to have 100% utilization (it’s good) as long as response time and throughput are satisfactory. But remember that iostat & sar data is aggregate – so if you have problems that are tens of seconds long (and you periods are several minutes) and otherwise steady state – you might not see it in sar.
In your case, it might very well be that IO requests are processed by the application mostly synchronously – issuing and IO and waiting on it an then another one and waiting and so on.
In many cases, you can’t say whether IO is a bottleneck using iostat – say doesn’t tell you how much time IO contributed to you response time. What you see in iostat is what’s *average* IO workload and how you IO subsystem delivers. You need to dig into application instrumentations to understand the bottleneck. In case of an Oracle database as an application using IO as a service – you can user 10046 trace.
dm-0 and dm-1 is how your multipathed devices are represented.
[…] Basic I/O Monitoring on Linux […]
Great post! This is an excellent right up on some of the more obscure and poorly documented stats needed for i/o monitoring.
Hi,
understanding some concepts like Major page fault, and minor page fault is very much necessary in understanding IO usage. Also, some basic linux tools like vmstat,SAR from sysstat can also be used to find the IO bottlenecks.
The below mentioned article might be helpful in understanding some more conepts related.
https://www.slashroot.in/linux-system-io-monitoring
How to read IO ?
Ans.
[[email protected]# iostat -dxm /dev/sda
Linux 2.6.32-279.el6.x86_64 (example.st1.com) 05/11/2013 _x86_64_ (1 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 8.55 9.32 4.59 3.10 0.09 0.05 36.39 0.06 7.97 3.16 2.43
============================
d- Device utilization information
m- Statistics in Megabyte
n- Statistics of network file system, For Ex. Some DBA Applications are running from NFS filer and NFS is not reachable because so and so reason. (Note : It works only with Kernel 2.6* )
x – Extended stastics .
==============================================================
rrqm/s – The number of read requests merged per second that were queued to the device.
wrqm/s – The number of write requests merged per second that were queued to the device.
r/s – The number of read requests that were issued to the device per second.
w/s – The number of write requests that were issued to the
device per second.
rsec/s – The number of sectors read from the device per second.
wsec/s – The number of sectors written to the device per second.
rMB/s – The number of megabytes read from the device per second.
wMB/s – The number of megabytes written to the device per second.
avgrq-sz – The average size (in sectors) of the requests that were
issued to the device.
avgqu-sz – The average queue length of the requests that were issued
to the device.
await – The average time (in milliseconds) for I/O requests.
svctm – The average service time (in milliseconds) for I/O request.
%util – Percentage of CPU time during which I/O requests.
close to 100%.
I hope this will help ..
Hello Alex,
Very interesting article. Thanks!!
We are experiencing a strange issue with Oracle 11.2.0.2 and high IOWaits and High Virtual Memory Paging (according to ADDM).
The waits are significant enough where I get io_getevents (timed out after 600 sec) in the alert log, when our monitor attempts to test a connection from an associated website that connects to the db, connections are denied. I am also also seeing the following type entry in iostat output for the same time period:
So this line from the output below is typical of the report:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdaf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21.80 0.00 0.00 99.11 (Avg Queue size is 21.80 but await and svctim is 0. %util is 99.11)
So on average, there is stuff sitting in the queue but at this point, nothing is being serviced (svctim = 0 and await = 0 (svctim + qutim), but the disk util is at 99.11. Since there are no r + w in the sdaf line, I am not sure why there would be anything in the %util column?
Seems to indicate a problem somewhere… like no requests are being accepted… the mechanism for reading and writing from disk has been disabled…
Also, Oracle indicates that there is significant Virtual Memory Paging at this same time which is listed as the number one issue.
After the issue has resolved, when I look at the same output (iostat from OSWatcher) for 9am there are no lines in the output that have an entry in avgqu-sz and %util, unless there is also a corresponding entry in await, svctim and the r/s or w/s columns.
Any comments?
Thanks
Hi guys,
I am getting Disk Device dm-13 is 100% busy in production database server.
Will it harm database health????Please let me know the resolution if any???
I am stuck in this error for days.
Thanks in Advance
Hi guys,
I am getting Disk Device dm-13 is 100% busy in production database server.
Will it harm database health????Please let me know the resolution if any???
I am stuck in this error for days.
Thanks in Advance
Reply
understanding some concepts like Major page fault, and minor page fault is very much necessary in understanding IO usage.
Great explanation.
Thanks for sharing your thoughts about limax io. Regards