This describes tools and techniques that can identify memory leaks in Long running Python programs:
- Is it a Leak?
- Sources of Leaks
- A Bit About (C)Python Memory Management
- Reference Counts
- Garbage Collection
- The Big Picture
- CPython’s Object Allocator (pymalloc)
Here is a visualisation of memory allocators from top to bottom (from the Python source Objects/obmalloc.c):
_____ ______ ______ ________
[ int ] [ dict ] [ list ] ... [ string ] Python core |
+3 | <----- Object-specific memory -----> | <-- Non-object memory --> |
_______________________________ | |
[ Python's object allocator ] | |
+2 | ####### Object memory ####### | <------ Internal buffers ------> |
______________________________________________________________ |
[ Python's raw memory allocator (PyMem_ API) ] |
+1 | <----- Python memory (under PyMem manager's control) ------> | |
__________________________________________________________________
[ Underlying general-purpose allocator (ex: C library malloc) ]
0 | <------ Virtual memory allocated for the python process -------> |
=========================================================================
_______________________________________________________________________
[ OS-specific Virtual Memory Manager (VMM) ]
-1 | <--- Kernel dynamic storage allocation & management (page-based) ---> |
__________________________________ __________________________________
[ ] [ ]
-2 | <-- Physical memory: ROM/RAM --> | | <-- Secondary storage (swap) --> |
He found that the more HTTP client requests he did, the more memory his Node process would consume, but it was really slow.
[...] Then I ran Node with UMEM_DEBUG set to record various important information about the memory allocations
[...] Every hour, it grabbed the output of pmap -x and a core file and stored those in Joyent Manta
[...] In MDB there's a particularly helpful command ::findleaks that will show you the memory addresses and the stack traces for leaked memory, not unlike using valgrind, but without all the performance penalty.
[...] At this point we knew that we were looking for something in v0.10 that called MakeCallback but that didn't first have a HandleScope on the stack. I then worked up this simple DTrace script.
- Input injection
- Parsing XML
- Assert statements
- Timing attacks
- A polluted site-packages or import path
- Temporary files
- Using yaml.load
- Pickles
- Using the system Python runtime and not patching it
- Not patching your dependencies
Using asyncio.Task.all_tasks
+ tracemalloc
GNU libiberty
The GNU C Library