Dealing With Memory Leaks
This describes some strategies to find and fix memory leaks.
What are memory leaks
Memory leaks occur when a program allocates memory but never returns it.
Mismatched new/delete
Every call to new
should have a delete
otherwise the allocated memory will be lost until the OS cleans up
after the job finishes.
Mismatched pointer/array new/delete
You can allocate an array of objects like:
MyClass* array = new MyClass[100];
You should not delete it like:
// WRONG! delete array;
This only deletes the 1st object. Instead do:
// Right! delete [] array;
valgrind
Valgrind can intercept all allocation/deallocation calls to look for mismatches. It can also look for errors due to incorrect memory access such as reading uninitialized memory. An example of running it is:
valgrind --tool=memcheck --demangle=yes --num-callers=50 --error-limit=no --leak-check=full --show-reachable=yes $(which python) your_job_script.py > valgrind.log 2>&1
It will print out errors as they are found followed by a summary of leaks in order from smallest to largest.
Another example with explicit nuwa.py
usage:
valgrind -v --tool=memcheck --demangle=yes --num-callers=50 --error-limit=no --leak-check=full --show-reachable=yes $(which python) $(which nuwa.py) -G $XMLDETDESCROOT/DDDB/dayabay_dry.xml --history=off -n 50 -H 584736 -R 637485 -m FullChain -o out/test_r11090_04.root > valgrind.log 2>&1
To suppress ROOT-related complaints, see ROOT Issues below.
The valgrind user manual describes the valgrind output, how to redirect it and how to suppress complaints that you don't want to fix (such as those in the C++ library).
Problem with self managed memory
Some code, for example, Geant4, ROOT, Python, BOOST (which is used by GOD generated DataModel classes) will manage its own memory by
allocating large chunks from the OS and dishing out pieces when
needed. This is done to optimize performance. Since the large chunks
are correctly cleared when the job is shutting down valgrind can not
detect that this memory may have actually been leaked.
The tcmalloc
can help with this.
ROOT Issues
Valgrind will report some issues inside of ROOT code which the ROOT team say can be ignored. To cause valgrind to ignore them you can add to the command line:
--suppressions=$ROOTSYS/etc/valgrind-root.supp
tcmalloc
Google has made available tcmalloc as part of its Perf Tools collection( RACF info: BNL_RACF_Cluster#Google_PerfTools). This collection also includes a CPU profiler described in the topic on Code Optimizing. The tcmalloc tool can be used in two ways.
HEAPCHECK
Example run:
LD_PRELOAD=/usr/lib/libtcmalloc.so HEAPCHECK=normal $(which python) your_job_script.py
It will then tell you what "pprof" command to run. Something like:
pprof $(which python) "/tmp/python.2042._main_-end.heap" --inuse_objects --lines --edgefraction=1e-10 --nodefraction=1e-10 --heapcheck --ps > heapcheck.ps
It will say to use "--gv". Using "--ps" lets you control where the PS file goes (ie, to stdout).
HEAPPROFILE
Run the jobs to collect the profiling info:
LD_PRELOAD=/usr/lib/libtcmalloc.so HEAPPROFILE=heap.prof $(which python) your_job_script.py
This will produce files like:
heap.prof.XXXX.heap
Process the profiling info into a graph showing who is leaking what:
pprof $(which python) heap.prof.00*.heap --ps > heapprof.ps
Or to see who is allocating what add
--alloc_space
Guard Malloc on Mac OS X
Note: I have never been able to get this to run past initialization for debug nuwa simulation jobs. djaffe 23aug2011
On mac you can try Guard Malloc as suggested by Simon Blyth (see #769 for some discussion).
Here is how it can be invoked (example is mac pro at 10.6.7):
[djaffe@bnlku24 OPW]$ export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib [djaffe@bnlku24 OPW]$ gdb --args $(which python) $(which nuwa.py) -R 7 -n 21 -m"fmcpmuon --use-pregenerated-muons --use-basic-physics --Enable-Debug --wsLimit=1 --wsWeight=1 --adVolumes=['oil','lso','gds'] --adLimits=[1,3000,1000] --adWeights=[1,10,10]" -o 232bisagainmonkey.root GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GuardMalloc: Allocations will be placed on 16 byte boundaries. GuardMalloc: - Some buffer overruns may not be noticed. GuardMalloc: - Applications using vector instructions (e.g., SSE or Altivec) should work. GuardMalloc: GuardMalloc version 23 GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ..... done (gdb) set env MallocScribble 1 (gdb) set env MallocStackLoggingNoCompact 1 (gdb) set env MallocStackLogging 1 (gdb) run Starting program: /Users/dayabay/offline-10.6.7/external/Python/2.7/i386-darwin-gcc42-dbg/bin/python /Users/dayabay/offline-10.6.7/NuWa-trunk/dybgaudi/InstallArea/scripts/nuwa.py -R 7 -n 21 -mfmcpmuon\ --use-pregenerated-muons\ --use-basic-physics\ --Enable-Debug\ --wsLimit=1\ --wsWeight=1\ --adVolumes=\[\'oil\',\'lso\',\'gds\'\]\ --adLimits=\[1,3000,1000\]\ --adWeights=\[1,10,10\] -o 232bisagainmonkey.root GuardMalloc[bash-45954]: recording malloc stacks to disk using standard recorder
As noted in #769, it is incredibly slow, at least when using simulation with debug nuwa.
MemStatAuditor
Gaudi provides an "Auditor" object that can help understand where memory is going. You use it like:
FIXME: t.b.d.
Hephaestus
nuwa.py -K
will invoke Hephaestus