Dealing With Memory Leaks

From Daya Bay
Revision as of 13:07, 21 September 2011 by Djaffe (talk | contribs) (→‎Other)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This describes some strategies to find and fix memory leaks.

What are memory leaks

Memory leaks occur when a program allocates memory but never returns it.

Mismatched new/delete

Every call to new should have a delete otherwise the allocated memory will be lost until the OS cleans up after the job finishes.

Mismatched pointer/array new/delete

You can allocate an array of objects like:

MyClass* array = new MyClass[100];

You should not delete it like:

// WRONG!
delete array;

This only deletes the 1st object. Instead do:

// Right!
delete [] array;

valgrind

Valgrind can intercept all allocation/deallocation calls to look for mismatches. It can also look for errors due to incorrect memory access such as reading uninitialized memory. An example of running it is:

valgrind --tool=memcheck --demangle=yes --num-callers=50 --error-limit=no --leak-check=full --show-reachable=yes $(which python) your_job_script.py > valgrind.log 2>&1

It will print out errors as they are found followed by a summary of leaks in order from smallest to largest.

Another example with explicit nuwa.py usage:

 valgrind -v --tool=memcheck --demangle=yes --num-callers=50 --error-limit=no --leak-check=full --show-reachable=yes $(which python) $(which nuwa.py)  -G $XMLDETDESCROOT/DDDB/dayabay_dry.xml --history=off -n 50 -H 584736 -R 637485 -m FullChain  -o out/test_r11090_04.root > valgrind.log 2>&1

To suppress ROOT-related complaints, see ROOT Issues below.

The valgrind user manual describes the valgrind output, how to redirect it and how to suppress complaints that you don't want to fix (such as those in the C++ library).

Problem with self managed memory

Some code, for example, Geant4, ROOT, Python, BOOST (which is used by GOD generated DataModel classes) will manage its own memory by allocating large chunks from the OS and dishing out pieces when needed. This is done to optimize performance. Since the large chunks are correctly cleared when the job is shutting down valgrind can not detect that this memory may have actually been leaked. The tcmalloc can help with this.

ROOT Issues

Valgrind will report some issues inside of ROOT code which the ROOT team say can be ignored. To cause valgrind to ignore them you can add to the command line:

--suppressions=$ROOTSYS/etc/valgrind-root.supp

tcmalloc

Google has made available tcmalloc as part of its Perf Tools collection( RACF info: BNL_RACF_Cluster#Google_PerfTools). This collection also includes a CPU profiler described in the topic on Code Optimizing. The tcmalloc tool can be used in two ways.

HEAPCHECK

Example run:

LD_PRELOAD=/usr/lib/libtcmalloc.so HEAPCHECK=normal $(which python) your_job_script.py

It will then tell you what "pprof" command to run. Something like:

pprof $(which python) "/tmp/python.2042._main_-end.heap" --inuse_objects --lines --edgefraction=1e-10 --nodefraction=1e-10 --heapcheck --ps > heapcheck.ps

It will say to use "--gv". Using "--ps" lets you control where the PS file goes (ie, to stdout).

HEAPPROFILE

Run the jobs to collect the profiling info:

LD_PRELOAD=/usr/lib/libtcmalloc.so HEAPPROFILE=heap.prof $(which python) your_job_script.py

This will produce files like:

heap.prof.XXXX.heap

Process the profiling info into a graph showing who is leaking what:

pprof $(which python) heap.prof.00*.heap --ps > heapprof.ps

Or to see who is allocating what add

--alloc_space

Guard Malloc on Mac OS X

Note: I have never been able to get this to run past initialization for debug nuwa simulation jobs. djaffe 23aug2011

On mac you can try Guard Malloc as suggested by Simon Blyth (see #769 for some discussion).

Here is how it can be invoked (example is mac pro at 10.6.7):

[djaffe@bnlku24 OPW]$ export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
[djaffe@bnlku24 OPW]$ gdb --args $(which python) $(which nuwa.py) -R 7 -n 21   -m"fmcpmuon --use-pregenerated-muons --use-basic-physics --Enable-Debug --wsLimit=1 --wsWeight=1 --adVolumes=['oil','lso','gds'] --adLimits=[1,3000,1000] --adWeights=[1,10,10]" -o 232bisagainmonkey.root 
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GuardMalloc: Allocations will be placed on 16 byte boundaries.
GuardMalloc:  - Some buffer overruns may not be noticed.
GuardMalloc:  - Applications using vector instructions (e.g., SSE or Altivec) should work.
GuardMalloc: GuardMalloc version 23
GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ..... done

(gdb) set env MallocScribble 1
(gdb) set env MallocStackLoggingNoCompact 1
(gdb) set env MallocStackLogging 1
(gdb) run
Starting program: /Users/dayabay/offline-10.6.7/external/Python/2.7/i386-darwin-gcc42-dbg/bin/python /Users/dayabay/offline-10.6.7/NuWa-trunk/dybgaudi/InstallArea/scripts/nuwa.py -R 7 -n 21 -mfmcpmuon\ --use-pregenerated-muons\ --use-basic-physics\ --Enable-Debug\ --wsLimit=1\ --wsWeight=1\ --adVolumes=\[\'oil\',\'lso\',\'gds\'\]\ --adLimits=\[1,3000,1000\]\ --adWeights=\[1,10,10\] -o 232bisagainmonkey.root
GuardMalloc[bash-45954]: recording malloc stacks to disk using standard recorder


As noted in #769, it is incredibly slow, at least when using simulation with debug nuwa.

MemStatAuditor

Gaudi provides an "Auditor" object that can help understand where memory is going. You use it like:

FIXME: t.b.d.

Hephaestus

nuwa.py -K

will invoke Hephaestus

Valgrind_nosetest

Other

FAQ:How much memory is nuwa using?