8th Open Science Practitioners Forum: Training and Onboarding

Europe/Zurich
Zoom Meeting ID
62949945337
Host
Merten Dahlkemper
Useful links
Join via phone
Zoom URL

News

  • Clemens explains the OSPF e-group
  • Antonia Winkler is the first Data Steward at CERN, did a PhD about several Open Science activities before
    • She will reach out to several of you in the coming weeks to define the role and see how she can best support in all things related to research data management
  • OSSB took place on 12th of May
    • Mandates of all the Open Science bodies have been reviewed
    • The Open Science Policy will be revised with the target of submission to CERN council by end of this year
  • Today’s topic on training and onboarding. Idea is to bring together stakeholders from across the organisation
    • How can we work together on the basics, so not everyone need to invent their own stuff
  • Next meetings
    • In September: Either Open Science infrastructures or input for the policy revision
    • In November: In-person event jointly with OSPO
    • Any other suggestions: ospf-organisers@cern.ch

Central training Initiatives

HSF training group

  • Trainings should be
    • unified
    • reusable
    • sustainable
  • More than 25 training modules and more than 65 community contributors
  • designed for self-study
  • Range from core skills to analysis tools, analysis reproducibility, advanced topics
  • Flagship format: software basics training
    • virtual & hybrid fromats
  • Python for analysis training
    • targeted at learners with some experience, who want to apply their skills
  • Analysis reproducibility training
    • Run once every year
    • CI/CD with github/gitlab, docker and Reana
  • C++ Course
    • Delivered at major HEP labs
  • Upcoming events in BNL and Fermilab: https://indico.cern.ch/category/11386/
  • Newest addition: Responsible use of agentic AI in HEP
  • Need always new instructors and mentors
  • How do asynchronous workshops are perceived by participants?
    • In the asynchronous trainings, there is always at least one day where people can reach out for one-on-one Zoom calls, and issues are tried to being addressed as much as possible
    • Over the last 2-3 years, participation in online events has been declining, in parts 50% of registrants were not attending
    • One practice that has emerged from this observation: in-person trainings are rather intermediate vs advanced, while the basic trainings are left to asynchronous events as they require less effort

HEP training portal

  • Newly created website built by Kenneth Rioja and Stefan Roiser as springoff of the EVERSE project (see below)
  • Motivation: Trainings are very decentralized
  • Website was created to collect all the different training contents
  • Featuring HSF training contents and CERN School of computing
  • Looking into creating a long-term training intitative for HEP
  • Ingest materials from different sources, such as Indico, Google Spreadsheets, github, or CDS Videos
  • Content can also be ingested into a website via HTML widget
  • To add a resource, login via CERN SSO (or guest access) if you have training resources, or contact contact.heptraining@cern.ch if you know resources from others or want to be a source of content
  • Different “subspaces” can be created, such as “experiment-heptraining.cern.ch”, which can then serve as a single point of reference for an individual experiment/project/section
  • Contents can also be pulled from different disciplines
  • Summer student will implement the possibility to have subspaces accessible only to the collaborations in question.
  • platform is a springoff from the EVERSE project
    • Goal of the project is to create a training and recognition framework for software trainings
    • For example to include automatic crediting for teaching HEP trinings on ORCID
  • So far, Kenneth is the only curator, but would be appreciated to have a team of experts
  • What is the general scope of the portal? Should there also be internal collaboration trainings on there?
    • Yes, with the restricted access, it is one use case to have internal trainings accessible only to this group. The project is foreseen to be finished by end of August

Training and onboarding intiatives in the experiments

ATLAS

  • Challenge: 5500+ members, of which are 1200+ doctoral students
  • Three offers
    • Analysis software tutorials: 3x per year (this year April, July, November)
      • target group: PhD students & early career researchers
      • 3x per year: in-person week and online week 2 weeks later
      • self-guided material, so in principle also possible in self-study
      • Onboarding (installing everything) on an Induction day before tutorial starts
      • walk through one real analysis (example chosen by organisers, heavy steps are pre-cooked so noone has to wait for hours)
      • All materials on atlas-software.docs.cern.ch which replaced the older internal wiki (one single source of truth)
      • lectures are recorded and posted
      • Tutorial is being update before each event, feedback feeds into the material
      • matured pedagogy (giving code examples which have to be completed; one coherent analysis instead of disconnected demos)
      • Largely positive feedback (75% of participants say that the level of the material is “just right”)
    • Advanced & local tutorials
      • Going deeper, such as Athena development, mutli-threading, ML etc
      • Less frequent (every 1-2 years)
      • Local tutorials (e.g. at SLAC) built on main tutorial
    • ATLAS lecture series: Weekly lectures introducing the experiment
  • Lessons learned
    • Teach one analysis from end to end
    • Put everything in one source
    • Lower barriers
    • Living material
  • Many students like to ask questions anonymously, so a Google sheet is provided where people can ask anonomously

ALICE

  • No ALICE-specific onboarding
  • For onboarding (such as setting up accounts etc) rely on written documentation
  • annual collaboration-wide analysis tutorial
    • preparation day before to set up repositories etc
    • linking internal documentation, as well as external resources such as Git Book for git & github
    • Advertised HEP C++ course
    • Machine Learning session for community-specific sw packages
    • dedicated Mattermost channels to ask questions
  • Analysis framework documentation with links to git and github tutorials

Questions:

  • Does every newcomer get these resources for self-study?
    • yes, this is the usual approach and it has been working well so far. Feedback is also collected and the documentation is updated accordingly. There are also worked examples on how to use the analysis software, and there are dedicated mattermost channels for questions
  • Was there a reason for why you suggest these specific git and github tutorials (as there are many such tutorials)?
    • No specific reason, rather it was suggested by someone, there were no objections, so it got implemented.
  • Any reason why github and not gitlab?
    • Most of the central ALICE code is on github, private repositories might be on gitlab
  • Is once per year sufficient for anlysis tutorials?
    • 2x/year might be useful as things get outdated and usually the materials are only updated for the tutorial, but so far there hasn’t been enough pressure from the comunity

CMS

  • CMS Induction course
    • runs for 3 days giving newcomers a genral overview of the experiment
    • One big challenge: How to find things (as web search usually doesn’t work)
  • Software and analysis training
    • Hands-on Advanced tutorial sessions
      • Single-day or half-day sessions
      • Hybrid, at Fermilab, therefore time zone problem
    • CMS schools (such as Data Analysis schools, or physics objects schools)
      • Main event
      • 2-3 times per year for Data analysis, others a bit less frequent
      • in-person only
  • A lot of collaboration with HSF & IRIS-HEP
    • Some things have to be specific for CMS

LHCb

  • Historically, the main idea has been the starterkit (estd 2015)
    • Anything going into LHCb starterkit is strictly LHCb software, anyhting else goes directly into HSF
    • community workshop in November before LHCb week
  • Starterkit lessons
    • first analysis steps from the starterkit workshops
    • second analysis steps more in-depth
  • Starterkit workshop in person
    • partly software basics (git, python etc)
    • partly first analysis steps from the lessons
  • Change: Training and Documentation have been attached to a software project, so now it is not run entirely by volunteers anymore
  • New starterkit now written in mkdocs - gives the option to translate, as 15% of collaboration is based in China
  • Documentation of the organisation process has now been written down
  • Sister events to the starterkit events are hosted in the US and China (whereas event in China are held in Chinese)
  • 24% of starterkit visits view a chinese language page
  • beyond starterkit also several different training initiatives such as talks at the LHCb weeks
  • also ad-hoc longer trainings on bespoke topics (e.g., run 3 software for those who are familiar with that from run 2, or quantum computing in HEP)
  • During LS2, and upcoming LS3: hackathons for developing LHCb software
  • Q: Do you collect feedback on where you need to improve the training?
    • This is done after every training, and materials are updated accordingly. Usually feedback is fairly positive, but it’s always possible to improve

Small experiments

n_TOF:

  • Very different as every experiment is different
  • Learning hub is mandatory from CERN side
  • people are already expert in their individual experiment
  • no general software, therefore no general training
  • learned about HSF today, and general trainings, which could be recommended to new students
  • currently trying to develop common software for data capture

FASER:

  • About a dozen students joining every year, therefore no central onboarding
  • Central resources sound very useful to recommend to new students

Discussion

  • General take-aways:
    • Keeping material up to date
      • Before every training in ATLAS, the docs are updated, (about every three months)
    • Attaching training and documentation to software projects (such as LHCb) seems very useful for sustainability
    • Less is more
      • It seems very useful to only have one analysis to discuss, as it’s easier to maintain
    • Having teams which can respond to feedback
  • It makes sense to use what’s already there
    • HSF materials are already there and they’re open to use
    • On HEP training we can set up a directory of listed trainings, or something like a learning path for each experiment (if there’s interest from the experiments)
  • HSF tries to be experiment-agnostic, also to help the small experiments. Maintaining the materials takes a lot of time and needs people-power. Maybe AI can help (but it won’t replace people).
  • Issue with in-person vs. virtual: decline in online-participation has been observed, but in-person events require resources. However, in-person events make sense already for networking purposes. Funding is an issue.
  • Question from Micha (INSPIRE manager): Are information-retrieval tools like INSPIRE or PDG mentioned to newcomers in the trainings?
    • In CMS the PDG is mentioned, but INSPIRE at least not explicitly. However it would make sense for students to know a bit about it, so people know how to track their own publications to update their CV for example. Sooner or later researchers end up on INSPIRE, but at least in CMS it is not enforced systematically at the moment
There are minutes attached to this event. Show them.
    • 15:00 15:10
      Welcome & News from OS Office 10m
      Speakers: Anne Gentil-Beccot (CERN), Clemens Lange (Paul Scherrer Institute (CH))
    • 15:10 15:30
      Centralised Training Initiatives
    • 15:30 16:25
      Training and onboarding within experiments
    • 16:25 16:55
      Discussion
      Convener: Clemens Lange (Paul Scherrer Institute (CH))
      • 16:25
        Coordination of training initiatives. 30m

        Guiding questions:

        • What resources do you use in experiment-specific trainings which could or should be used in central trainings?
        • How else can we use synergies between existing training initiatives?
    • 16:55 17:00
      AoB 5m
      Speaker: Clemens Lange (Paul Scherrer Institute (CH))