Wednesday, April 13, 2011

Profiling Xapian

If you think you've found a bottleneck in Xapian's code, it will be much easier for us to investigate if you can run a profiling tool on your application and we can look at the results and see where most of the time is spent. Slow cases are usually due to a small number of "hot spots" which account for the majority of the time, and it's often easy to obtain a good speed up by looking at these. But we don't know where they are without the profiling data.
The information currently here is mostly Unix-specific, and fairly biased towards Linux - if you have further profiling hints, particularly for other platforms, please add them below.
The first thing to do is find out if the testcase is I/O or CPU bound. To do this, run it under time which will reveal this, and also whether a lot of time is spent in the kernel, for example:
$ time ./test-script

real    0m27.863s
user    0m4.514s
sys     0m0.236s
The exact output format may vary between versions of time, but you should get these 3 statistics.
The above example is I/O bound (the user+sys times are much less than the real time, so assuming the machine is otherwise idle, the difference is due to waiting for I/O).
If we're CPU bound, most profiling tools are suitable. If we're I/O bound, then using a profiling tool which doesn't just look at CPU time will produce much more useful data.
In I/O bound cases, if you're running on Linux, we recommend oprofile. This looks at the whole system, so we see time spent in the kernel, and doing I/O (and slow operations are often partly due to I/O). Also, oprofile has a low overhead so it doesn't change the situation we're trying to measure too much.
The most useful output (from oprofile or other tools) is usually a callgraph which shows how time splits down between functions and the functions they in turn call.

How to profile Xapian using perf

Recent Linux kernels (2.6.31 onwards) have a new profiling framework with a userland tool called "perf":
 http://perf.wiki.kernel.org/index.php/Main_Page
It seems to be very versatile, and likely to supplant oprofile, but we haven't yet tried it with Xapian. If you do, we encourage you to fill in any useful tips here.

How to profile Xapian using oprofile

What follows is a run through of how you might install and use oprofile to generate a callgraph.
Prerequisites:
  • A Linux box (ideally running a 2.6 series kernel for best results)
  • An architecture where oprofile supports callgraphs (x86 and x86_64 seem to)
  • root access
  • If you're running virtualised under Xen, you apparently need to do special things to get oprofile working. We're not sure exactly what, but for some promising-sounding resources see:  http://www.google.com/search?q=oprofile+xen
First, make sure you've got oprofile installed. I'm running Ubuntu, so to install I just do:
sudo apt-get install oprofile
You may also want to ensure you have debug symbols for libraries where time is spent so we can see which functions time is spent in. On Debian and derived distros, debug symbols are often packaged separately in a package named after the library package with -dbg appended. E.g. for libc6 debug symbols install libc6-dbg:
sudo apt-get install libc6-dbg
Debian should be the same as Ubuntu for installing oprofile and debug symbols, but this part may vary for other distros.
From now on, the commands should work for any distro, provided you use sudo (except that in edgy, feisty, and Ubuntu gutsy's oprofile package has a bug and you'll need to use bash opcontrol in all the commands below to work around this).
Once you have it installed:
sudo opcontrol --init
sudo opcontrol --no-vmlinux --callgraph=12
sudo opcontrol --start-daemon
sudo opcontrol --start ; [run testcase] ; sudo opcontrol --stop
opreport --callgraph=12 > oprofile-callgraph.txt
sudo opcontrol --deinit
The output of opreport can be enormous, so please don't post reports to the mailing list as attachments. The best thing to do is open a bug report and attach the report there so that it doesn't get lost.

How to profile Xapian using sysprof

Sysprof is another Linux system-wide profiling tool. It is very easy to use (once installed), but provides less detail than oprofile (in particular, it doesn't provide access to the hardware counters).
First, note that you will need the sysprof kernel module to be installed and loaded before sysprof will work. On Ubuntu, the sysprof module can be installed by running:
sudo apt-get install sysprof-module-source
sudo module-assistant auto-install sysprof-module
It will then need to be loaded by running:
sudo modprobe sysprof-module
Install the sysprof application using:
sudo apt-get install sysprof
When you want to profile, start the sysprof GUI application (under Applications>Programming, or by typing sysprof in a terminal window). To start profiling, click the "Start" button, then start the activity you wish to profile, and once this has run for a sufficient period of time, click "Profile" to stop the collection of information.
A list of functions, and the time spent in them and in their descendants is displayed in the sysprof window. You can click on a function to get a list of callers and callees, and click on a caller to change the view to that function. You can also sort the list by time-in-self or by accumulated time.

Profiling just CPU usage

The following profilers aren't so useful when there's a significant amount of waiting on I/O, but in mostly CPU-bound cases, they can be handy.

gprof

gprof isn't as useful as some of the other profiling options, but it works on more platforms than some, and is easy to enable - just configure with:
./configure CXXFLAGS=-pg LDFLAGS=-pg
If your testcase is a separate program, then you also should also pass -pg when you compile and link it.
Then run the testcase as normal, which will create a file gmon.out in the current directory with the raw profiling data in. To analyses this data, run:
gprof testcase
Which produces a report of stdout (so you probably want to redirect output to a file or a pager).

valgrind's callgrind and cachegrind

These naturally only work on platforms which valgrind supports - see  http://valgrind.org/ for an up-to-date list of these.
No recompiling is required - just prefix your test command like so:
valgrind --tool=cachegrind ./testcase
(or --tool=callgrind for callgrind).