Code Optimizing
Offline Documentation: [Offline Category] [FAQ] [Howto] [Reference] [Manual] |
Offline Documentation |
This article is part of the Offline Documentation |
Other pages... |
Offline Category |
See also... |
This topic describes how to optimize your code.
First rule: Don't!
It has been said premature optimization is the root of all evil. Make your code work right first and then, maybe, worry about optimization later and only if you have proved that your code is really the problem.
Google Perftools
To find out what code is causing problems you should run it under a profiler. For compiled code, Google provides a very good suite of profilers called google-perftools (not to be confused with similar package of a simiar name).
This is a statistical profiler which means it samples the state of the program periodically. The CPU profiler records the call stack each sample and builds up a weighted call graph.
Build with debug
It is best to rebuild the code with debug symbols included in the
compiled objects ("-g
" compiler flag). Without these the
results will lack information such as line numbers and even function
names (if libraries are stripped).
CPU profiling
Googles documentation is available and some key instructions are added here.
To run the CPU profiler one just needs to set a couple of environment variables:
bash> CPUPROFILE=the_command.perf LD_PRELOAD=/path/to/libprofiler.so.0 the_command
Specifically for nuwa.py
the line would look like:
bash> CPUPROFILE=the_command.perf LD_PRELOAD=/path/to/libprofiler.so.0 $(which python) $(which nuwa.ph) [nuwa.py command line options...]
This will create one or more files named like
the_command.perf
which will be used later.
Increase Sample Efficiency
Calls can be missed with the default frequency. To increase sample rate set:
CPUPROFILE_FREQUENCY=100000
Generating Call Graphs
One then uses pprof
to generate a graph showing how much
time each function in your program used directly and how much time was
spent calling other functions.
pprof --ps the_command the_command.perf > the_command.eps gv the_command.eps
For nuwa.py
jobs, the_command
should be $(which python)
(or the full path to the python executable).
Note: sometime gv
will not display anything. You can fix
this by turning off anti-aliasing by hitting the "a
" key.
Note: Using the --pdf
option of pprof
may be better than the postscript option.
Understanding the Call Graph
The image to the right is an example call graph produced by DetSim and shows the use of TouchableToDetectorElement class which looks up DetDesc DetectorElements based on a Geant4 TouchableHistory.
Each box is one function (object method) call. At the top is the function name. The following numbers give
- Total (and percentage of) samples where this function was on top of the call stack.
- Total (and percentage of) samples where this function was anywhere on the call stack.
The arrows leading from one box to another gives the number of samples where the function at the arrow's tail is calling the function at the arrow's head. Arrows and boxes are also sized by their count.
Interpreting the Call Graph
To make use of the call graph, first look for the leaf nodes that are directly taking up the most CPU. They will usually be basic functions in STL or other libraries and are already very optimized. They show up large because they are simply used often. However, if they are your own code then they are candidates for optimization.
Next follow the call graph backwards from these CPU hogs and understand how they are being called in the first place. There may be ways to avoid calling them so much, such as caching results of previous calls.
For example, in the graph to the right the FindObjectInDirectory() function calls both existDet() and getDet(). The first call is to check if the object is in the store so as to avoid an error message in the second call. This results in two accesses to the store which are relatively expensive. Insted another store function can be used the will indicate error in a testable (and quiet) manner. This fix leads to a speed up of a factor of 2!.
Memory Profiling
One can also profile memory usage. [to be added]
Problems with sub-processes
Profiling a process that starts a sub-processes may be a problem. For example, generating muon kinematics by running Muon.exe
to generate on the fly may lead to no kinematics generated and a crash. More information at Trac ticket #422.
Python cProfile
Python has some built-in profilers. The recommended one is cProfile. Run it like:
python -m cProfile -o cprofile.out $(which nuwa.py) [...your favorite nuwa.py args...]
The resulting output file is a binary and can be interrogated from Python. See the docs.
Profiling of r17658
2012-08-01: Understand CPU time usage for latest trunk. Details here
Offline Software Documentation: [Offline Categories] [FAQ] [Offline Documentation Category] |