Linux implementation is well described in Linux.com article CLI Magic Tracking system performance with sar
Sadc (system activity data collector) is the program that gathers performance data. It pulls its data out of the virtual /proc filesystem, then it saves the data in a file (one per day) named /var/log/sa/saDD where DD is the day of the month. Two shell scripts from the sysstat package control how the data collector is run. The first script, sa1, controls how often data is collected, while sa2 creates summary reports (one per day) in /var/log/sa/sarDD. Both scripts are run from cron. In the default configuration, data is collected every 10 minutes and summarized just before midnight.
If you suspect a performance problem with a particular program, you can usesadc
to collect data on a particular process (with the-x
argument), or its children (-X
), but you will need to set up a custom script using those flags.
As Dr. Heisenberg showed, the act of measuring something changes it. Any tool that collects performance data has some overall negative impact on system performance, but with sar, the impact seems to be minimal. I ran a test with the sa1 cron job set to gather data every minute (on a server that was not busy) and it didn't cause any serious issues. That may not hold true on a busy system.
Creating reports
If the daily summary reports created by the sa2 script are not enough, you can create your own custom reports using sar. The sar program reads data from the current daily data file unless you specify otherwise. To have sar read a particular data file, use the-f /var/log/sa/saDD
option. You can select multiple files by using multiple-f
options. Since many of sar's reports are lengthy, you may want to pipe the output to a file.
To create a basic report showing CPU usage and I/O wait time percentage, usesar
with no flags. It produces a report similar to this:
01:10:00 PM CPU %user %nice %system %iowait %idle 01:20:00 PM all 7.78 0.00 3.34 20.94 67.94 01:30:00 PM all 0.75 0.00 0.46 1.71 97.08 01:40:00 PM all 0.65 0.00 0.48 1.63 97.23 01:50:00 PM all 0.96 0.00 0.74 2.10 96.19 02:00:00 PM all 0.58 0.00 0.54 1.87 97.01 02:10:00 PM all 0.80 0.00 0.60 1.27 97.33 02:20:01 PM all 0.52 0.00 0.37 1.17 97.94 02:30:00 PM all 0.49 0.00 0.27 1.18 98.06 Average: all 1.85 0.00 0.44 2.56 95.14If the %idle is near zero, your CPU is overloaded. If the %iowait is large, your disks are overloaded.
To check the kernel's paging performance, usesar -B
, which will produce a report similar to this:
11:00:00 AM pgpgin/s pgpgout/s fault/s majflt/s 11:10:00 AM 8.90 34.08 0.00 0.00 11:20:00 AM 2.65 26.63 0.00 0.00 11:30:00 AM 1.91 34.92 0.00 0.00 11:40:01 AM 0.26 36.78 0.00 0.00 11:50:00 AM 0.53 32.94 0.00 0.00 12:00:00 PM 0.17 30.70 0.00 0.00 12:10:00 PM 1.22 27.89 0.00 0.00 12:20:00 PM 4.11 133.48 0.00 0.00 12:30:00 PM 0.41 31.31 0.00 0.00 Average: 130.91 27.04 0.00 0.00Raw paging numbers may not be of concern, but a high number of major faults (majflt/s) indicate that the system needs more memory. Note that majflt/s is only valid with kernel versions 2.5 and later.
For network statistics, usesar -n DEV
. The-n DEV
option tells sar to generate a report that shows the number of packets and bytes sent and received for each interface. Here is an abbreviated version of the report:
11:00:00 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s 11:10:00 AM lo 0.62 0.62 35.03 35.03 11:10:00 AM eth0 29.16 36.71 4159.66 34309.79 11:10:00 AM eth1 0.00 0.00 0.00 0.00 11:20:00 AM lo 0.29 0.29 15.85 15.85 11:20:00 AM eth0 25.52 32.08 3535.10 29638.15 11:20:00 AM eth1 0.00 0.00 0.00 0.00To see network errors, trysar -n EDEV
, which shows network failures.
Reports on current activity
Sar can also be used to view what is happening with a specific subsystem, such as networking or I/O, almost in real time. By passing a time interval (in seconds) and a count for the number of reports to produce, you can take an immediate snapshot of a system to find a potential bottleneck.
For example, to see the basic report every second for the next 10 seconds, usesar 1 10
. You can run any of the reports this way to see near real-time results.
Benchmarking
Even if you have plenty of horsepower to run your applications, you can use sar to track changes in the workload over time. To do this, save the summary reports (sar only saves seven) to a different directory over a period of a few weeks or a month. This set of reports can serve as a baseline for the normal system workload. Then compare new reports against the baseline to see how the workload is changing over time. You can automate your comparison reports with AWK or your favorite programming language.
In large systems management, benchmarking is important to predict when and how hardware should be upgraded. It also provides ammunition to justify your hardware upgrade requests.