Indico celebrates its 20th anniversary! Check our blog post for more information!

Linux HPC User Meeting

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map

● HPC service news

  • Please refer to the slides with HPC service updates.
  • Nils gave a short introduction and the HPC service context. Applications that can run in a single host with 1-48 cores should run under the HTCondor batch service.  The HPC SLURM cluster, focus of this meeting, is intended for MPI jobs spanning multiple nodes.  
  • This was followed by a short explanation of the CephFS HPC storage back-end by Dan.   
  • Pablo then outlined recent and upcoming changes to the SLURM HPC batch service. The storage back end default "/hpcscratch" will move to the CephFS cluster currently called "/bescratch" early May. (Date of the intervention to be announced on the IT Service Status Board.)  Pablo described the possibility of using the Intel tool suite to do application profiling of MPI applications. (Works with the recommended Mvapich3 as well as Intel MPI.) These Intel tools can be useful for users who develop their own MPI applications.
  • The SLURM partitions (queues) will be reviewed, and a maximum run-time of 1 week is proposed. As some applications do not have checkpointing and a longer run-time would be desirable, a possible compromise would be to keep a run-time of 1 week for the inf-long partition and still allow 2-3 weeks on batch-long.
    • It should be noted that for HTCondor, a run-time beyond the 1-week offered by the NextWeek job flavour can be achieved by setting the Condor job run time (+MaxRunTime= {number of seconds} ) in the Condor job submission file. 
  • Q: How to copy files to EOS from the job script? A: Following the deployment of AUKS and Kerberos credentials for the job can be achieved with the eos cp command. Please refer to the EOS FAQ for information about EOS.
  • Q: For the profiling: How to see what a CPU is doing when one MPI rank takes longer time?  Using cpu level profiling tools. Doing that for python level code however (to see which python functions take more time in a parallel environment ) is not clear. Some debug information and output from the application is required for this kind of CPU profiling, it does not come out of the box.

● Engineering applications and HPC

  • Maria (slides) summarized the migration of engineering applications from the former Windows HPC service to the Linux batch service on HTCondor and HPC SLURM cluster .
  • Ansys Mechanical + Workbench, Comsol and CST are all now running on LXBATCH in large HTCondor nodes.
  • Ansys Fluent, the MPI-enabled CST solver and LS-Dyna are MPI applications that scale well as distributed applications and run under SLURM.
  • Q: What about Ansys/EMAG and HFSS? To be checked by the engineering software team. If the required Ansys modules are part of the Ansys installation on Linux, they could be used with HTCondor.

● ABP users HPC usage

  • Xavier gave a summary of BE-ABP applications running on the HPC infrastructure and the user experience. (Ref. presentation and also plots in the agenda.)
  • PyHeadtail, and PyECLOUD are the heaviest ABP applicaions for now, typically spanning 20-30 nodes.
  • The COMBI application (hybrid OpenMP/MPI) runs distributed for multi-bunch on HPC/Slurm and for single bunch studies on HTCondor
  • For the PyOrbit application, the current environment with a shared software distribution on AFS works well. What would be a possible replacement? CVMFS or something more lightweight? Can EOS-fuse handle distributed applications for lxplus, batch and Linux workstations for small teams? To be addressed with the IT storage group.
  •  Q: Would it be possible to get larger head-nodes for postprocessing? Yes, we could expand to lxplus-like machines if needed, otherwise CPU-intensive post-processing could run on a batch machine
  • The requirement to have /bescratch also on the "batch" nodes will be addressed by migration to new /hpcscratch.

● AWAKE HPC usage

  • Hossein gave an overview of the AWAKE HPC use cases and requirements.  (Ref. presentation and plots of simulation results in the agenda.)
  • Studies of larger beams would require many nodes, e.g. 70 full cluster nodes. As the runs would be limited to a couple of days, this should be tried when there is available HPC cluster capacity.

● AOB and discussion

Following the switch to lxplus7 and CC7 as default OS for lxplus, it may be necessary to set MPI transport to "sockets" for local MPI tests on lxplus. (This was not necessary on SLC6.)

The Intel toolsuite at CERN also includes Python profiling:

https://software.intel.com/en-us/articles/profiling-python-with-intel-vtune-amplifier-a-covariance-demonstration

There are minutes attached to this event. Show them.
    • 14:00 14:20
      HPC service news 20m

      Update on the HPC service infrastructure.

      Change of scratch home directories and required user actions.

      Speakers: Dan van der Ster (CERN), Nils Hoimyr (CERN), Pablo Llopis Sanmillan (CERN)
      • Please refer to the slides with HPC service updates.
      • Nils gave a short introduction and the HPC service context. Applications that can run in a single host with 1-48 cores should run under the HTCondor batch service.  The HPC SLURM cluster, focus of this meeting, is intended for MPI jobs spanning multiple nodes.  
      • This was followed by a short explanation of the CephFS HPC storage back-end by Dan.   
      • Pablo then outlined recent and upcoming changes to the SLURM HPC batch service. The storage back end default "/hpcscratch" will move to the CephFS cluster currently called "/bescratch" early May. (Date of the intervention to be announced on the IT Service Status Board.)  Pablo described the possibility of using the Intel tool suite to do application profiling of MPI applications. (Works with the recommended Mvapich3 as well as Intel MPI.) These Intel tools can be useful for users who develop their own MPI applications.
      • The SLURM partitions (queues) will be reviewed, and a maximum run-time of 1 week is proposed. As some applications do not have checkpointing and a longer run-time would be desirable, a possible compromise would be to keep a run-time of 1 week for the inf-long partition and still allow 2-3 weeks on batch-long.
        • It should be noted that for HTCondor, a run-time beyond the 1-week offered by the NextWeek job flavour can be achieved by setting the Condor job run time (+MaxRunTime= {number of seconds} ) in the Condor job submission file. 
      • Q: How to copy files to EOS from the job script? A: Following the deployment of AUKS and Kerberos credentials for the job can be achieved with the eos cp command. Please refer to the EOS FAQ for information about EOS.
      • Q: For the profiling: How to see what a CPU is doing when one MPI rank takes longer time?  Using cpu level profiling tools. Doing that for python level code however (to see which python functions take more time in a parallel environment ) is not clear. Some debug information and output from the application is required for this kind of CPU profiling, it does not come out of the box.
    • 14:20 14:30
      Engineering applications and HPC 10m

      Engineering applications migrated from Windows to Linux HPC

      Speaker: Maria Alandes Pradillo (CERN)
      • Maria (slides) summarized the migration of engineering applications from the former Windows HPC service to the Linux batch service on HTCondor and HPC SLURM cluster .
      • Ansys Mechanical + Workbench, Comsol and CST are all now running on LXBATCH in large HTCondor nodes.
      • Ansys Fluent, the MPI-enabled CST solver and LS-Dyna are MPI applications that scale well as distributed applications and run under SLURM.
      • Q: What about Ansys/EMAG and HFSS? To be checked by the engineering software team. If the required Ansys modules are part of the Ansys installation on Linux, they could be used with HTCondor.
    • 14:30 14:40
      ABP users HPC usage 10m
      Speaker: Xavier Buffat (CERN)
      • Xavier gave a summary of BE-ABP applications running on the HPC infrastructure and the user experience. (Ref. presentation and also plots in the agenda.)
      • PyHeadtail, and PyECLOUD are the heaviest ABP applicaions for now, typically spanning 20-30 nodes.
      • The COMBI application (hybrid OpenMP/MPI) runs distributed for multi-bunch on HPC/Slurm and for single bunch studies on HTCondor
      • For the PyOrbit application, the current environment with a shared software distribution on AFS works well. What would be a possible replacement? CVMFS or something more lightweight? Can EOS-fuse handle distributed applications for lxplus, batch and Linux workstations for small teams? To be addressed with the IT storage group.
      •  Q: Would it be possible to get larger head-nodes for postprocessing? Yes, we could expand to lxplus-like machines if needed, otherwise CPU-intensive post-processing could run on a batch machine
      • The requirement to have /bescratch also on the "batch" nodes will be addressed by migration to new /hpcscratch.
    • 14:40 14:50
      AWAKE HPC usage 10m

      HPC applications for AWAKE - requirements

      Speakers: Alexey Petrenko (Budker Institute of Nuclear Physics (RU)), Dr Hossein Saberi (Institute for Research in Fundamental Sciences (IR))
      • Hossein gave an overview of the AWAKE HPC use cases and requirements.  (Ref. presentation and plots of simulation results in the agenda.)
      • Studies of larger beams would require many nodes, e.g. 70 full cluster nodes. As the runs would be limited to a couple of days, this should be tried when there is available HPC cluster capacity.
    • 14:50 15:10
      AOB and discussion 20m

      Following the switch to lxplus7 and CC7 as default OS for lxplus, it may be necessary to set MPI transport to "sockets" for local MPI tests on lxplus. (This was not necessary on SLC6.)

      The Intel toolsuite at CERN also includes Python profiling:

      https://software.intel.com/en-us/articles/profiling-python-with-intel-vtune-amplifier-a-covariance-demonstration