If you system is running slowly, and this goes for RHEL, Debian and other variants then take a look at this article which is a simple walkthrough of the tools you can use to solve problems. These specific examples are from a system running Openstack, but that’s not important to most of you:
- top – The place to start is generally the ‘top’ command which shows a resource summary and task list.
- iotstat – Shows the reads and writes on your disk
- iotop – Realtime iostat
- iozone – Generate some test traffic to see how the system reacts.
Top on my 21 day old compute node show, but they numbers
- Load Averages – displays the number of processes waiting for a processor, So higher numbers indicate increased load. 1 is pretty nice especially because we have multiple processors in this system. There is a lot of misunderstanding of load averages on the Internet, I’ll just say, do be concerned about load until they start to approach 1/2 of your available processors.
- CPU Line – You will want to focus on three places here, the “id” or idle percentage, and the “us” user percentage and “wa” which shows the I/O wait time
- Memory Line – Remember that many operating systems (RHEL) will go ahead and expand into the available space, but the “free” along with “buffer” will show extra space.
- Swap Line – Generally, swapping is bad because is relies on disks which are far slower than RAM. If you see
top - 08:19:16 up 21 days, 17:03, 1 user, load average: 1.15, 0.97, 1.07 Tasks: 289 total, 2 running, 287 sleeping, 0 stopped, 0 zombie Cpu(s): 2.3%us, 0.6%sy, 0.0%ni, 97.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 165049476k total, 9652204k used, 155397272k free, 181888k buffers Swap: 5464060k total, 0k used, 5464060k free, 1534804k cached
- CPU Bound – If you have too many CPU intensive processes running it will probably manifest in the user percentage being high and the wait i/o being low. Find this by looking at the task list below and you’ll see processes consuming high %CPU.
- RAM Bound – RAM issues usually cause a downward spiral of death and you may not even be able to use “top” is you show up too late to the game. What happens is no more free Ram and the SWAP climbs. As swap is used more and more the system gets slower. So, you’ll see CPU Wait rising along with swapping. Use the “M” key to sort the task list by %MEM
- I/O Bound – Rule out swapping first. Good, you are I/O bound. Actually, you are now in need of another tool like iostat or iotop
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 70919 libvirt- 20 0 6312m 1.4g 7608 R 31 0.9 170:55.15 /usr/bin/kvm 76109 libvirt- 20 0 5800m 1.3g 7608 S 31 0.9 83:06.68 /usr/bin/kvm 56981 libvirt- 20 0 5288m 1.3g 7608 S 24 0.9 82:52.20 /usr/bin/kvm 55388 libvirt- 20 0 5585m 2.0g 7672 S 13 1.3 123:55.51 /usr/bin/kvm 78113 root 20 0 72056 18m 3616 R 11 0.0 0:00.33 /usr/bin
I’ll be honest, I prefer iotop because it gives a realtime view so I’m just doing the basics below
avg-cpu: %user %nice %system %iowait %steal %idle
5.91 0.00 1.81 0.10 0.00 92.18
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sdc 0.69 0.19 7.38 358759 13862237 dm-0 0.02 0.00 2.33 1265 4383672 dm-1 0.00 0.00 0.00 1148 0 dm-2 0.91 0.19 5.05 354353 9478184
- tps: transactions per second.
- kB_read/s: blocks read per second.
- kB_wrtn/s: blocks written per second.
- kB_read: total blocks read.
- kB_wrtn: total blocks written.
If you happen to ever meet Guillaume Chazarain, you have to buy him a beer or give him a hug or something. That’s right, he wrote iotop and I am grateful. Like top this is an active display, but we view the disk activity of processes. Guillaume lets you use the arrow keys to change the sort which is going to help you quickly isolate your big process consumers.
Total DISK READ: 0.00 B/s | Total DISK WRITE: 22.54 K/s TID PRIO USER< DISK READ DISK WRITE COMMAND 2256 be/4 daemon 0.00 B/s 0.00 B/s 0.00 % 0.00 % atd 55393 be/4 libvirt- 0.00 B/s 0.00 B/s 0.00 % 0.00 % kvm -name
Once you get iotop started you may want to run iozone, which will generate some traffic for your system. In my case, when we spin up instances in our OpenStack (/var/lib/nova/instances/) they tend to consume 500M. So I’m going to simulate this with iozone. I could also just tell my OpenStack Controller to spin up a bunch of these, but iozone is going to take down sone numbers, and I can watch all of this with iotop.
iozone -R -l 5 -u 5 -r 32k -s 500m -F /var/lib/nova/instances/iozone1 /var/lib/nova/instances/iozone2 /var/lib/nova/instances/iozone3 /var/lib/nova/instances/iozone4 /var/lib/nova/instances/iozone5 | tee -a /tmp/iozone_results.txt
- -s – Size of the files
- -F – Where the files will be written (temporary)
- -I – Minimum number of simultaneous processes
- -u Maximum number of simultaneous processes
- -R – Format it nicely for Excel, but also just for cutting and pasting.
- tee – send the output to a file, but let me see it too.
How do you fix any problems? Well, that’s not in the scope of this post, but I’d be happy to discuss any issues you have and even happier to hear from people who have some refined techniques. Let’s share and learn together.