ADC Weekly

Name: ADC Weekly
Start: 2013-10-08T15:40:00+02:00
End: 2013-10-08T17:30:00+02:00
Location: CERN

Tuesday 8 Oct 2013, 15:40 → 17:30 Europe/Zurich

3162/2-E01 (CERN)

3162/2-E01

CERN

Show room on map

Alessandro Di Girolamo (CERN), I Ueda (Department of Particle Physics-University of Tokyo)

- 15:40 → 15:45
  
  possible delay 5m
- 15:45 → 16:00
  Hot topics
  - 15:45
    Managing Panda Resources at Tier-1s 5m
    
    Announcement to Tier-1s
    
    Mail forwarded to cloud-all
    
    Changing the main panda resource at a T1 needs a careful procedure.
    
    The information is related only to T1s
    
    The message is forwarded to the cloud support so that they are aware of it
  - 15:50
    
    GDP 10m
    
    Speakers: Andrej Filipcic (Jozef Stefan Institute (SI)), Dr Rodney Walker (Ludwig-Maximilians-Univ. Muenchen (DE))
    
    Slides
- 16:00 → 16:15
  AMOD/ADCoS report 15m
  
  Speakers: Alexey Sedov (Universitat Autònoma de Barcelona (ES)), Helmut Wolters (LIP Coimbra, Portugal)
  
  Slides
  The issue reported as BNL proxy expiration was caused by the CERN VOMS server issues. The slide will be corrected.
  
  Renewal of VOMS proxy may need to include a step for checking the validity of new proxy before replacing the old one?
  
  D.Cameron: rather than putting in such a check, keeping a backup and rolling-back manually would be better.
  
  It is rather a problem in voms-proxy-init. It should not return an invalid proxy. A fix to be requested to the developers.
- 16:15 → 16:30
  Monitoring jobs failing-over to FAX 15m
  
  Speaker: Ilija Vukotic (University of Chicago (US))
  
  ADC Weekly, June 18
  
  Slides
  At the last s&c week, it was announced that the FAX team has activated the input file fail-over to FAX for some selected sites. It triggered a discussion and it was agreed that FAX team will prepare instructions/procedures. The presentation is supposed to address this.
  
  Slide 2: "we would suggest all the sites and all the queues to have it on"
  
  "we" means FAX team.
  
  ADC-ops does not recommend/suggest to activate the failing-over to FAX before seeing results from stress tests. It is not up to the sites/clouds/FAX team to decide to switch it on
  
  Joining to FAX is on a voluntary basis and up to the sites/clouds, but activating the fail-over could affect other sites, especially T1s, and should be a decision by ADC-ops.
  
  The level of stress needs to be agreed offline in a dedicated discussion by mail.
  
  When a SE is down, what happens to the output?
  
  Ilija: need to be written to another site
  
  This means a need for a dev in panda/pilot
  
  HC (AFT/PFT) should exclude queues when SE is down, so the FAX fail-over cannot be used for this case.
  
  AFT/PFT should ignore this flag.
  
  we need more numbers to monitor
  
  total number of successful jobs, total number of failed jobs, number of successful jobs because of the fail-over, number of failed jobs despite the fail-over
  
  we should also monitor the negative impacts; eg. if fax takes long time and if jobs fail after unsuccessful fail-over, it is a loss of computing resources.
- 16:30 → 16:35
  AOB 5m
  - network reports (if any)
    
    - for T2(D)s against T1s - for T1s against T2Ds
  - Analysis Availability Reports
    
    Slides
  - Draft reccomendation for T2 space reservation
    
    https://twiki.cern.ch/twiki/bin/view/AtlasComputing/StorageSetUp#Space_Reservation_for_Tier_2

Choose timezone

ADC Weekly

3162/2-E01

CERN