We have taken the TOP-C parallelization of Geant4 (based on MPI),
to semi-automatically create a thread-parallel Geant4 based on event
parallelism and a master-worker style of parallelism. We currently
address two issues:
1) detecting global variables and data structures, which must be
made thread-local. We modify the parser of the gcc compiler to
do this.
2) handling of random generator engines from CLHEP. This is needed to create
reproducible results by assigning known random seeds to each distinct thread.
The very preliminary tests show linear speedup with the number of cores,
up to the four cores of a quad-core processor. Future work will consider
moving some of the thread-local data back into process-global data,
in order to reduce the image size (eliminate separate copies per thread),
and to further ensure scalability for large experiments. We have also
demonstrated that our checkpointing package, DMTCP, works in this
thread-parallel environment operating in CERN 64-bit Scientific Linux.