Linux HPC User Meeting

Name: Linux HPC User Meeting
Start: 2019-04-04T14:00:00+02:00
End: 2019-04-04T15:45:00+02:00
Location: CERN

Thursday 4 Apr 2019, 14:00 → 15:45 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

Hide

● HPC service news

Please refer to the slides with HPC service updates.
Nils gave a short introduction and the HPC service context. Applications that can run in a single host with 1-48 cores should run under the HTCondor batch service. The HPC SLURM cluster, focus of this meeting, is intended for MPI jobs spanning multiple nodes.
This was followed by a short explanation of the CephFS HPC storage back-end by Dan.
Pablo then outlined recent and upcoming changes to the SLURM HPC batch service. The storage back end default "/hpcscratch" will move to the CephFS cluster currently called "/bescratch" early May. (Date of the intervention to be announced on the IT Service Status Board.) Pablo described the possibility of using the Intel tool suite to do application profiling of MPI applications. (Works with the recommended Mvapich3 as well as Intel MPI.) These Intel tools can be useful for users who develop their own MPI applications.
The SLURM partitions (queues) will be reviewed, and a maximum run-time of 1 week is proposed. As some applications do not have checkpointing and a longer run-time would be desirable, a possible compromise would be to keep a run-time of 1 week for the inf-long partition and still allow 2-3 weeks on batch-long.
- It should be noted that for HTCondor, a run-time beyond the 1-week offered by the NextWeek job flavour can be achieved by setting the Condor job run time (+MaxRunTime= {number of seconds} ) in the Condor job submission file.
Q: How to copy files to EOS from the job script? A: Following the deployment of AUKS and Kerberos credentials for the job can be achieved with the eos cp command. Please refer to the EOS FAQ for information about EOS.
Q: For the profiling: How to see what a CPU is doing when one MPI rank takes longer time? Using cpu level profiling tools. Doing that for python level code however (to see which python functions take more time in a parallel environment ) is not clear. Some debug information and output from the application is required for this kind of CPU profiling, it does not come out of the box.

● Engineering applications and HPC

Maria (slides) summarized the migration of engineering applications from the former Windows HPC service to the Linux batch service on HTCondor and HPC SLURM cluster .
Ansys Mechanical + Workbench, Comsol and CST are all now running on LXBATCH in large HTCondor nodes.
Ansys Fluent, the MPI-enabled CST solver and LS-Dyna are MPI applications that scale well as distributed applications and run under SLURM.
Q: What about Ansys/EMAG and HFSS? To be checked by the engineering software team. If the required Ansys modules are part of the Ansys installation on Linux, they could be used with HTCondor.

● ABP users HPC usage

Xavier gave a summary of BE-ABP applications running on the HPC infrastructure and the user experience. (Ref. presentation and also plots in the agenda.)
PyHeadtail, and PyECLOUD are the heaviest ABP applicaions for now, typically spanning 20-30 nodes.
The COMBI application (hybrid OpenMP/MPI) runs distributed for multi-bunch on HPC/Slurm and for single bunch studies on HTCondor
For the PyOrbit application, the current environment with a shared software distribution on AFS works well. What would be a possible replacement? CVMFS or something more lightweight? Can EOS-fuse handle distributed applications for lxplus, batch and Linux workstations for small teams? To be addressed with the IT storage group.
Q: Would it be possible to get larger head-nodes for postprocessing? Yes, we could expand to lxplus-like machines if needed, otherwise CPU-intensive post-processing could run on a batch machine
The requirement to have /bescratch also on the "batch" nodes will be addressed by migration to new /hpcscratch.

● AWAKE HPC usage

Hossein gave an overview of the AWAKE HPC use cases and requirements. (Ref. presentation and plots of simulation results in the agenda.)
Studies of larger beams would require many nodes, e.g. 70 full cluster nodes. As the runs would be limited to a couple of days, this should be tried when there is available HPC cluster capacity.

● AOB and discussion

Following the switch to lxplus7 and CC7 as default OS for lxplus, it may be necessary to set MPI transport to "sockets" for local MPI tests on lxplus. (This was not necessary on SLC6.)

The Intel toolsuite at CERN also includes Python profiling:

https://software.intel.com/en-us/articles/profiling-python-with-intel-vtune-amplifier-a-covariance-demonstration

There are minutes attached to this event. Show them.

- 14:00 → 14:20
  HPC service news 20m
  
  Minutes
  
  Update on the HPC service infrastructure.
  
  Change of scratch home directories and required user actions.
  
  Speakers: Dan van der Ster (CERN), Nils Hoimyr (CERN), Pablo Llopis Sanmillan (CERN)
  
  SLURM User Meeting March 2019.pdf
  Please refer to the slides with HPC service updates.
  
  Nils gave a short introduction and the HPC service context. Applications that can run in a single host with 1-48 cores should run under the HTCondor batch service. The HPC SLURM cluster, focus of this meeting, is intended for MPI jobs spanning multiple nodes.
  
  This was followed by a short explanation of the CephFS HPC storage back-end by Dan.
  
  Pablo then outlined recent and upcoming changes to the SLURM HPC batch service. The storage back end default "/hpcscratch" will move to the CephFS cluster currently called "/bescratch" early May. (Date of the intervention to be announced on the IT Service Status Board.) Pablo described the possibility of using the Intel tool suite to do application profiling of MPI applications. (Works with the recommended Mvapich3 as well as Intel MPI.) These Intel tools can be useful for users who develop their own MPI applications.
  
  The SLURM partitions (queues) will be reviewed, and a maximum run-time of 1 week is proposed. As some applications do not have checkpointing and a longer run-time would be desirable, a possible compromise would be to keep a run-time of 1 week for the inf-long partition and still allow 2-3 weeks on batch-long.
  
  It should be noted that for HTCondor, a run-time beyond the 1-week offered by the NextWeek job flavour can be achieved by setting the Condor job run time (+MaxRunTime= {number of seconds} ) in the Condor job submission file.
  
  Q: How to copy files to EOS from the job script? A: Following the deployment of AUKS and Kerberos credentials for the job can be achieved with the eos cp command. Please refer to the EOS FAQ for information about EOS.
  
  Q: For the profiling: How to see what a CPU is doing when one MPI rank takes longer time? Using cpu level profiling tools. Doing that for python level code however (to see which python functions take more time in a parallel environment ) is not clear. Some debug information and output from the application is required for this kind of CPU profiling, it does not come out of the box.
- 14:20 → 14:30
  Engineering applications and HPC 10m
  
  Minutes
  
  Engineering applications migrated from Windows to Linux HPC
  
  Speaker: Maria Alandes Pradillo (CERN)
  
  SLURMmeeting-WindowsHPCnews.pdf
  
  SLURMmeeting-WindowsHPCnews.pptx
  Maria (slides) summarized the migration of engineering applications from the former Windows HPC service to the Linux batch service on HTCondor and HPC SLURM cluster .
  
  Ansys Mechanical + Workbench, Comsol and CST are all now running on LXBATCH in large HTCondor nodes.
  
  Ansys Fluent, the MPI-enabled CST solver and LS-Dyna are MPI applications that scale well as distributed applications and run under SLURM.
  
  Q: What about Ansys/EMAG and HFSS? To be checked by the engineering software team. If the required Ansys modules are part of the Ansys installation on Linux, they could be used with HTCondor.
- 14:30 → 14:40
  ABP users HPC usage 10m
  
  Minutes
  
  Speaker: Xavier Buffat (CERN)
  
  2019-04-04_ABPForHPCMeeting-expanded.pdf
  
  media2.mp4
  
  media3.mp4
  Xavier gave a summary of BE-ABP applications running on the HPC infrastructure and the user experience. (Ref. presentation and also plots in the agenda.)
  
  PyHeadtail, and PyECLOUD are the heaviest ABP applicaions for now, typically spanning 20-30 nodes.
  
  The COMBI application (hybrid OpenMP/MPI) runs distributed for multi-bunch on HPC/Slurm and for single bunch studies on HTCondor
  
  For the PyOrbit application, the current environment with a shared software distribution on AFS works well. What would be a possible replacement? CVMFS or something more lightweight? Can EOS-fuse handle distributed applications for lxplus, batch and Linux workstations for small teams? To be addressed with the IT storage group.
  
  Q: Would it be possible to get larger head-nodes for postprocessing? Yes, we could expand to lxplus-like machines if needed, otherwise CPU-intensive post-processing could run on a batch machine
  
  The requirement to have /bescratch also on the "batch" nodes will be addressed by migration to new /hpcscratch.
- 14:40 → 14:50
  AWAKE HPC usage 10m
  
  Minutes
  
  HPC applications for AWAKE - requirements
  
  Speakers: Alexey Petrenko (Budker Institute of Nuclear Physics (RU)), Dr Hossein Saberi (Institute for Research in Fundamental Sciences (IR))
  
  e-inject.mp4
  
  HPC_user_meeting.odp
  
  HPC_user_meeting.pdf
  
  ssm.mp4
  Hossein gave an overview of the AWAKE HPC use cases and requirements. (Ref. presentation and plots of simulation results in the agenda.)
  
  Studies of larger beams would require many nodes, e.g. 70 full cluster nodes. As the runs would be limited to a couple of days, this should be tried when there is available HPC cluster capacity.
- 14:50 → 15:10
  
  AOB and discussion 20m
  
  Minutes
  
  Following the switch to lxplus7 and CC7 as default OS for lxplus, it may be necessary to set MPI transport to "sockets" for local MPI tests on lxplus. (This was not necessary on SLC6.)
  
  The Intel toolsuite at CERN also includes Python profiling:
  
  https://software.intel.com/en-us/articles/profiling-python-with-intel-vtune-amplifier-a-covariance-demonstration

Choose timezone