Posts tagged performance

Monitoring GPU usage

If you (like me) happen to be the performance freak, most likely you are well aware of process viewers like htop. Since I’ve started working with GPU-computing I missed htop-like tool tailored to monitor GPU usage. This is becoming more of an issue if you’re working in multi-GPU setups.

You can use nvidia-smi which is shipped with NVIDIA drivers, but it’s not very interactive.

gpustat provide nice and interactive view of the processes running and resources used across your GPUs, but you’ll need to switch between windows if you want to also monitor CPU usage.

nvidia-smi output

Read more ...


Python code profiling and accelerating your calculations with numba

You wrote up your excellent idea as Python program/module but you are unsatisfied with its performance. The chances are high most of us have been there at least once. I’ve been there last week.

I found excellent method for outlier detection (Enhanced Isolation Forest). eIF was initially written in Python and later optimised in Cython (using C++). C++ is ~40x faster than vanilla Python version, but it lacks the possibility to save the model (which is crucial for my project). Since adding model saving to C++ version is rather complicated buisness, I’ve decided to optimise Python code. Initially I hoped for ~5-10x speed improvement. The final effect surprised me, as rewritten Python code was ~40x faster than initial version matching C++ version performance!

How is it possible? Speeding up your code isn’t trivial. First you need to find which parts of your code are slow (so-called code profiling). Once you know that, you can start tinkering with the code itself (code optimisation).

line_profiler output

Read more ...


Multiprocessing in Python and garbage collection

Working with multiple threads in Python often leads to high RAM consumption. Unfortunately, automatic garbage collection in child processes isn’t working well. But there are two alternatives:

When using Pool(), you can specify number of task after which the child will be restarted resulting in memory release.

If you use Process(), you can simply delete unwanted objects and call gc.collect() inside the child. Note, this may slow down your child process substantially!

Read more ...