Tracing exceptions in multiprocessing in Python

I had problems with debugging my programme using multiprocessing.Pool.

Traceback (most recent call last):
  File "src/homologies2mysql_multi.py", line 294, in <module>
    main()
  File "src/homologies2mysql_multi.py", line 289, in main
    o.noupload, o.verbose)
  File "src/homologies2mysql_multi.py", line 242, in homologies2mysql
    for i, data in enumerate(p.imap_unordered(worker, pairs), 1):
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 520, in next
    raise value
ValueError: need more than 1 value to unpack

I could run it without multiprocessing, but then I’d have to wait some days for the program to reach the point where it crashes.
Luckily, Python is equipped with traceback, that allows handy tracing of exceptions.
Then, you can add a decorator to problematic function, that will report nice error message:

import traceback, functools, multiprocessing
 
def trace_unhandled_exceptions(func):
    @functools.wraps(func)
    def wrapped_func(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except:
            print 'Exception in '+func.__name__
            traceback.print_exc()
    return wrapped_func
 
@trace_unhandled_exceptions
def go():
    print(1)
    raise Exception()
    print(2)
 
p = multiprocessing.Pool(1)
 
p.apply_async(go)
p.close()
p.join()

The error message will look like:

1
Exception in go
Traceback (most recent call last):
  File "<stdin>", line 5, in wrapped_func
  File "<stdin>", line 4, in go
Exception

Solution found on StackOverflow.

Multiprocessing in Python and garbage collection

Working with multiple threads in Python often leads to high RAM consumption. Unfortunately, automatic garbage collection in child processes isn’t working well. But there are two alternatives:

  • When using Pool(), you can specify no. of task after which the child will be restarted resulting in memory release.
p = Pool(processes=4, maxtasksperchild=1000)
  • If you use Process(), you can simply delete unwanted objects call gc.collect() inside the child. Note, this may slow down your child process substantially!