Kojirion

Home Projects Links About RSS

Profiling

There is an oft-repeated quote about optimizing prematurely. There are good reasons for it: it is not good use of a programmer’s time to worry about the performance of parts of code that may be insignificant; particularly if this is done at the expense of code clarity.

However, there are cases where performance does matter. Beyond the vague notion of profiling, these are some concrete steps to do so on a GNU/Linux operating system:

callgraph

This will reveal the functions and lines where most of the application time is spent.

An excellent talk to use both of these was given by Chandler Carruth at CppCon 2015.

For reference, the compiler flags used are:

-O3
-std=c++14
-stdlib=libc++
-lc++abi
-Wl, -rpath=/home/chandlerc/lib64
-fno-exceptions
-fno-rtti
-Wall
-pedantic
-Werror
-isystem /home/chandlerc/include
-pthreads
-fno-omit-frame-pointer

and the tricks to prevent compiler from optimizing away the variables/code of interest:


static void escape (void *p) {
    asm volatile("" : : "g"(p) : "memory");
}

static void clobber() {
    asm volatile("" : : : "memory");
}

The auto variety report the time on destruction - if not timing an entire function, create arbitrary scopes with a pair of braces. A meaningful name can be given to the timer output on construction. Then pick the streaming output with gnuplot/matplotlib to plot the results in real-time as the application runs:

walltime

This is by no means a comprehensive guide. Each of the tools mentioned offers rich functionality and there are others. Furthermore, once the measurements are in, it is time to proceed with optimizations; the most crucial aspects of doing so is understanding the cache-friendliness of the code and the algorithmic complexity of operations on large data structures. Some recent noteworthy talks on performance: