Code Optimizing

From Daya Bay

Jump to: navigation, search
Offline Documentation: [Offline Category] [FAQ] [Howto] [Reference] [Manual]


Offline Documentation
This article is part
of the Offline Documentation
Other pages...

Offline Category
FAQ
How Tos
Reference
Manual
Getting Started
Software Installation

See also...

General Help on this Wiki


This topic describes how to optimize your code.

First rule: Don't!

It has been said premature optimization is the root of all evil. Make your code work right first and then, maybe, worry about optimization later and only if you have proved that your code is really the problem.

Google Perftools

To find out what code is causing problems you should run it under a profiler. For compiled code, Google provides a very good suite of profilers called google-perftools (not to be confused with similar package of a simiar name).

This is a statistical profiler which means it samples the state of the program periodically. The CPU profiler records the call stack each sample and builds up a weighted call graph.

Build with debug

It is best to rebuild the code with debug symbols included in the compiled objects ("-g" compiler flag). Without these the results will lack information such as line numbers and even function names (if libraries are stripped).

CPU profiling

Googles documentation is available and some key instructions are added here.

To run the CPU profiler one just needs to set a couple of environment variables:

bash> CPUPROFILE=the_command.perf LD_PRELOAD=/path/to/libprofiler.so.0 the_command

Specifically for nuwa.py the line would look like:

bash> CPUPROFILE=the_command.perf LD_PRELOAD=/path/to/libprofiler.so.0 $(which python) $(which nuwa.ph) [nuwa.py command line options...]

This will create one or more files named like the_command.perf which will be used later.

Increase Sample Efficiency

Calls can be missed with the default frequency. To increase sample rate set:

CPUPROFILE_FREQUENCY=100000

Generating Call Graphs

One then uses pprof to generate a graph showing how much time each function in your program used directly and how much time was spent calling other functions.

pprof --ps the_command the_command.perf > the_command.eps
gv the_command.eps

For nuwa.py jobs, the_command should be $(which python) (or the full path to the python executable).

Note: sometime gv will not display anything. You can fix this by turning off anti-aliasing by hitting the "a" key.

Note: Using the --pdf option of pprof may be better than the postscript option.

Understanding the Call Graph

Callgraph produced by google-perftools CPU profiler.

The image to the right is an example call graph produced by DetSim and shows the use of TouchableToDetectorElement class which looks up DetDesc DetectorElements based on a Geant4 TouchableHistory.

Each box is one function (object method) call. At the top is the function name. The following numbers give

  • Total (and percentage of) samples where this function was on top of the call stack.
  • Total (and percentage of) samples where this function was anywhere on the call stack.

The arrows leading from one box to another gives the number of samples where the function at the arrow's tail is calling the function at the arrow's head. Arrows and boxes are also sized by their count.

Interpreting the Call Graph

To make use of the call graph, first look for the leaf nodes that are directly taking up the most CPU. They will usually be basic functions in STL or other libraries and are already very optimized. They show up large because they are simply used often. However, if they are your own code then they are candidates for optimization.

Next follow the call graph backwards from these CPU hogs and understand how they are being called in the first place. There may be ways to avoid calling them so much, such as caching results of previous calls.

For example, in the graph to the right the FindObjectInDirectory() function calls both existDet() and getDet(). The first call is to check if the object is in the store so as to avoid an error message in the second call. This results in two accesses to the store which are relatively expensive. Insted another store function can be used the will indicate error in a testable (and quiet) manner. This fix leads to a speed up of a factor of 2!.


Memory Profiling

One can also profile memory usage. [to be added]

Problems with sub-processes

Profiling a process that starts a sub-processes may be a problem. For example, generating muon kinematics by running Muon.exe to generate on the fly may lead to no kinematics generated and a crash. More information at Trac ticket #422.

Python cProfile

Python has some built-in profilers. The recommended one is cProfile. Run it like:

python -m cProfile -o cprofile.out $(which nuwa.py) [...your favorite nuwa.py args...]

The resulting output file is a binary and can be interrogated from Python. See the docs.

Profiling of r17658

2012-08-01: Understand CPU time usage for latest trunk. Details here

Offline Software Documentation: [Offline Categories] [FAQ] [Offline Documentation Category]
Personal tools