19–25 Oct 2024
Europe/Zurich timezone

GIL-free scaling of Uproot in Python 3.13

23 Oct 2024, 14:42
18m
Large Hall A

Large Hall A

Talk Track 5 - Simulation and analysis tools Parallel (Track 5)

Speaker

Jim Pivarski (Princeton University)

Description

Uproot is a Python library for ROOT I/O that uses NumPy and Awkward Array to represent and perform computations on bulk data. However, Uproot uses pure Python to navigate through ROOT's data structures to find the bulk data, which can be a performance issue in metadata-intensive I/O: (a) many small files, (b) many small TBaskets, and/or (c) low compression overhead. Worse, these performance issues can't be alleviated by multithreading because Python imposes a thread-lock between each instruction on its virtual machine, infamously known as the Global Interpreter Lock (GIL).

Python 3.13, released this month, introduces a fundamental new feature: a single Python process can run multiple interpreters, each in its own thread, each with its own (thread-local!) GIL. Subinterpreters are an intermediate choice between share-everything threads and share-nothing processes. Subinterpreters can only share Python objects through FIFO Queues (or, equivalently, Channels), and not by reference. However, they can freely operate on shared array data. Similar solutions can be cobbled together with multiple Python processes, using multiprocessing.Queue and multiprocessing.SharedMemory, but these rely on POSIX pipes and shared memory, depend on ulimit settings, and are much slower than subinterpreter communication.

In this talk, I'll show how Uproot takes advantage of subinterpreters to improve scaling for metadata-intensive I/O.

Author

Jim Pivarski (Princeton University)

Presentation materials