ADC Weekly

Europe/Zurich
3162/2-E01 (CERN)

3162/2-E01

CERN

20
Show room on map
Alessandro Di Girolamo (CERN), I Ueda (Department of Particle Physics-University of Tokyo)
    • 15:40 15:45
      possible delay 5m
    • 15:45 16:00
      Hot topics
      • 15:45
        Managing Panda Resources at Tier-1s 5m
        Announcement to Tier-1s
        Mail forwarded to cloud-all
        • Changing the main panda resource at a T1 needs a careful procedure.
        • The information is related only to T1s
        • The message is forwarded to the cloud support so that they are aware of it
      • 15:50
        GDP 10m
        Speakers: Andrej Filipcic (Jozef Stefan Institute (SI)), Dr Rodney Walker (Ludwig-Maximilians-Univ. Muenchen (DE))
        Slides
    • 16:00 16:15
      AMOD/ADCoS report 15m
      Speakers: Alexey Sedov (Universitat Autònoma de Barcelona (ES)), Helmut Wolters (LIP Coimbra, Portugal)
      Slides
      • The issue reported as BNL proxy expiration was caused by the CERN VOMS server issues. The slide will be corrected.
      • Renewal of VOMS proxy may need to include a step for checking the validity of new proxy before replacing the old one?
        • D.Cameron: rather than putting in such a check, keeping a backup and rolling-back manually would be better.
        • It is rather a problem in voms-proxy-init. It should not return an invalid proxy. A fix to be requested to the developers.
    • 16:15 16:30
      Monitoring jobs failing-over to FAX 15m
      Speaker: Ilija Vukotic (University of Chicago (US))
      ADC Weekly, June 18
      Slides
      • At the last s&c week, it was announced that the FAX team has activated the input file fail-over to FAX for some selected sites. It triggered a discussion and it was agreed that FAX team will prepare instructions/procedures. The presentation is supposed to address this.
      • Slide 2: "we would suggest all the sites and all the queues to have it on"
        • "we" means FAX team.
        • ADC-ops does not recommend/suggest to activate the failing-over to FAX before seeing results from stress tests. It is not up to the sites/clouds/FAX team to decide to switch it on
          • Joining to FAX is on a voluntary basis and up to the sites/clouds, but activating the fail-over could affect other sites, especially T1s, and should be a decision by ADC-ops.
          • The level of stress needs to be agreed offline in a dedicated discussion by mail.
      • When a SE is down, what happens to the output?
        • Ilija: need to be written to another site
        • This means a need for a dev in panda/pilot
      • HC (AFT/PFT) should exclude queues when SE is down, so the FAX fail-over cannot be used for this case.
        • AFT/PFT should ignore this flag.
      • we need more numbers to monitor
        • total number of successful jobs, total number of failed jobs, number of successful jobs because of the fail-over, number of failed jobs despite the fail-over
        • we should also monitor the negative impacts; eg. if fax takes long time and if jobs fail after unsuccessful fail-over, it is a loss of computing resources.
    • 16:30 16:35
      AOB 5m
      • network reports (if any)
        - for T2(D)s against T1s - for T1s against T2Ds
      • Analysis Availability Reports
        Slides
      • Draft reccomendation for T2 space reservation
        https://twiki.cern.ch/twiki/bin/view/AtlasComputing/StorageSetUp#Space_Reservation_for_Tier_2