Notes to the reviewer:

Par 3.2
- last block. Let me understand: "making the protocol actually push would require an higher memory usage, so you are not following this option". Have I understood correctly?

Yes you understood correctly. The ratio between maximum event size and average event size is highly dependent on the subdetector but can easily be very inconvenient. That is why we would like reduce the wasted memory as much as possible.

- I would replace "fragment compositions" with "event fragments"

In order to calculate the size of the full event is sufficient to receive the fragment compositions so I would keep it.

Par 4
- Why do you use iperf for comparison? Is it impossible to use the daq application (DAQPIPE) configured with standard communication library (ie non RDMA based)? I'm not 100% convinced that the a comparison with a different application (iperf) is appropriate. At least, are you sure that iperf has no user space overheads? In other words, is the reported iperf CPU percentage the "%system" part (i.e. kernel space)?

The iperf comparison is just to give a figure of the CPU usage required by a 40 Gb/s transmission over Ethernet. We have two implementations of the code without RDMA based libraries but it is not running a the full line speed. In addition to this the architecture of the transport itself changes when changing the communication library. IBVerbs for example requires several new threads for checking the completion queues and works in a non blocking way. This makes the comparison of different implementations less meaningful for the CPU usage. For those reasons, according to me, the comparison between iperf and the IBVerbs implementation can give a good figure of the CPU usage benefit that an RDMA implementation can provide. In order to exclude any user space overhead the iperf CPU usage is calculated taking into account only the kernel space counters.

- In the table: are the errors negligible?

The fluctuations in the performances are comparable with the fluctuations that we get from other network benchmarking programs like iperf and ib_write_bw. So we can say that the errors are negligible because they are compatible with the fluctuations of the network system itself.

- The performance would depend on the fragment size (event or multi-event). Did you do tests with different fragment size?

Yes several fragments sizes has been tested and the value of 100 MB has been chosen because it was the optimal value for performances and it is a feasible number for the real system.

Best regards

Flavio Pisani