Investigating CPU performances
Old decoding - second OpenMP section

- Each OMP thread gets a sector --> thread iterates over rows
- 36 threads active, 92 inactive
New decoding - second kernel

- Each OMP thread gets a row of a sector --> iterates over sectorRows
- Every thread is active
Results
Mean (ms) ± std. dev. |
First kernel |
Second kernel |
Total time |
Old decoding |
158.47 ± 11.32 |
18.21 ± 3.16 |
179.18 ± 12.83 |
New decoding |
190.54 ± 4.02 |
81.85 ± 4.37 |
273.06 ± 7.31 |
Playing with OMP_PLACES and OMP_PROC_BIND
- OMP_PLACES=sockets/cores/threads
- OMP_PROC_BIND=spread
- OMP_NUM_THREADS=64
Mean (ms) ± std. dev. |
First kernel |
Second kernel |
Total time |
Old decoding |
204.47 ± 3.37 |
19.16 ± 1.47 |
225.71 ± 3.46 |
New decoding |
269.95 ± 9.23 |
44.43 ± 0.97 |
315.02 ± 9.37 |