Speaker
Description
In a HEP Computing Center, at least 1 batch systems are used. As an example, at IHEP, we’ve used 3 batch systems, PBS, HTCondor and Slurm. After running PBS as local batch system for 10 years, we replaced it by HTCondor (for HTC) and Slurm (for HPC). During that period, problems came up on both user and admin sides.
On user side, the new batch systems bring a set of new commands, which users have to learn and remember more. In particular, some users would have to use HTCondor and Slurm in the meantime. Furthermore, HTCondor and Slurm provide more functions, which means more complicated usage mode, compared to the simple PBS commands.
On admin side, HTCondor gives more freedom to users, which becomes a problem to admins. Admins have to find the solutions for many problems: preventing users from requesting the resources they are not allowed to use, checking if the required attributes are correct, deciding which site is requested (Slurm cluster, remote sites, virtual machine sites), etc.
For the above requirements, HepJob was developed. HepJob provides a set of simple commands to users, hep_sub, hep_q, hep_rm, etc. In the submission procedure, HepJob checks all the attributes and ensure all attributes are correct; Assigns the proper resources to users, the user and group info is obtained from the management database; Routes jobs to the targeted site; Goes through the remaining steps.
Users can start with HepJob very easily and admins can take many prevention actions in HepJob.
Consider for promotion | Yes |
---|