Speaker
Description
APS at USA completed the high-precision training and inference in Nvidia GPU clusters taking the ptychoNN algorithm combined with ePIE Conjugate Gradient method. By the reference of that idea, we came up with a new model called W1-Net whose training speed was faster with higher precision of inference. After this development, we implemented the model onto DCU cluster. However, the performance was only 1/6 of Nvidia GPU A100. Profiling action was done to the training process and the low speed was caused by the atom operation during the training function. After tuning the code, the training time was reduced by half then the previous model. Apart from DCU, we also trained on HUAWEI NPU card. This paper will show the profiling result of HUAWEI NPU 910*8 cluster.
Significance
The training process is on the heterogeneous computing card on HUAWEI Ascend 910 which is different from Nvidia and training speed can be comparable to Nvidia GPU A100.
References
Title: 'W1-Net:A fast training and highly scalable ptychography convolutional neural network'. This paper is underreview by 'The European Physical Journal Plus'.
Experiment context, if any | The data comes from this website: https://github.com/mcherukara/PtychoNN |
---|