DOMA / TPC Meeting
Topic: WLCG DOMA TPC Meeting
Join Zoom Meeting
https://cern.zoom.us/j/99836057922?pwd=ZFhWN3NpYi9oZmwvM3pIRE9zdzFnZz09
Meeting ID: 998 3605 7922
Passcode: 733660
One tap mobile
+41315280988,,99836057922# Switzerland
+41432107042,,99836057922# Switzerland
Dial by your location
        +41 31 528 09 88 Switzerland
        +41 43 210 70 42 Switzerland
        +41 43 210 71 08 Switzerland
        +33 1 7037 2246 France
        +33 1 7037 9729 France
        +33 1 8699 5831 France
Meeting ID: 998 3605 7922
Find your local number: https://cern.zoom.us/u/aeB4ArMgmT
- 
                    
                    - 
        
            
                
        16:00
    
    
        →
        
            16:10
        
    
            
        
        Network data challenges 10mSpeakers: Dr Riccardo Di Maria (CERN), Rizart Dona (CERN)- WLCG Doma Openstack project created, this is going to host the machines that will run the tests etc.
- Repo to host testing code: https://gitlab.cern.ch/wlcg-doma/data-challenge-2021
- JIRA to track the activities
- WLCG Grafana Org, Data Challenges folder: https://monit-grafana.cern.ch/dashboards/f/qY7d-gjMz/data-challenges
	- We now have access via this org to the data sources that are described here
- Users that want edit access to this folder should contact Monit via a SNOW ticket
- Starting with FTS based data sources
 
 
- 
        
            
                
        16:10
    
    
        →
        
            16:20
        
    
            
        
        Future uniform tape access 10mSpeakers: Cedric Caffy (CERN), Mihai Patrascoiu (CERN)
- 
        
            
                
        16:20
    
    
        →
        
            16:30
        
    
            
        
        SRM+HTTP tape access 10mSpeakers: Mihai Patrascoiu (CERN), Petr Vokac (Czech Technical University (CZ))Actions: dedicated meeting with TAPE providers Rucio 1.25.4 comes with support for SRM+GridFTP together SRM+HTTP protocol - only one of these protocols can be configured on RSE
- FTS transfer protocol preference for SRM must be set to https;gsiftp;root
	- no FTS interface to use different SRM preference for individual transfers
- SRM+GridFTP used only for storage that doesn't support SRM+HTTP at all
 
- this is sufficient to cover ATLAS use-cases - transfers tape <-> disk
	- motivation - Data Challenges with as little as possible GridFTP (RAL Castor system)
- CMS plans with tape transfers(?)
 
 New / additional tape bringonline test - upload ~ 10TB dataset with 1GB files to each tape endpoint
- ask dCache/StoRM administrators to clean these files from disk buffer
	- unfortunately storage administrators can't easily remove individual files and cleanup of whole buffer would certainly affect production
- use existing old production data(set) with high probability to be on the tape(?)
		- it would be necessary to use production Rucio instance
- require similar config overwrites (patches for Rucio) used e.g. by ATLAS Functional Tests WebDAV(?)
- we would have to be more careful, but anyway at some point we have to move SRM+HTTP to production
 
 
- add Rucio rule to trigger transfer of NEARLINE file
- don't reuse files, because after test transfer they'll be ONLINE
- once we run out of NEARLINE source files ask again for disk buffer cleanup
	- with current test infrastructure all files will be used ~ in 30 days
- run less tests or ask for bigger space to reduce cleanup requests(?)
 
- what would be good test for transfers with SRM+HTTP TAPE destination(?)
	- is transfer to normal disk instead of disk buffer sufficient(?)
- how to verify that file really reached tape storage(?)
 
 Keep current Fuctional Tests TAPE(?) - not very useful to test TAPE
- just SRM+HTTP transfer from tape disk buffer
- concern that files are not really deleted from tapes
	- test files will be physically stored on tapes for years
- currently 200GB/day
 
- modify to SRM+HTTP tests from disks?
	- e.g. "read timeout" issue is visible also for disks
 
 
- 
        
            
                
        16:30
    
    
        →
        
            16:40
        
    
            
        
        XrootD 5.1.x news 10mSpeaker: Wei Yang (SLAC National Accelerator Laboratory (US))
- 
        
            
                
        16:40
    
    
        →
        
            16:50
        
    
            
        
        Experiments production 10mSpeakers: Alessandra Forti (University of Manchester (GB)), Diego Davila Foyo (Univ. of California San Diego (US)), Petr Vokac (Czech Technical University (CZ))ATLAS- StoRM
	- sites experience stability issues after moving everything to WebDAV (TPC + job stage-out). Need to tune the configurations
		- Improved documentation for WebDAV doors tuning and monitoring (see storm section)
 
 
- sites experience stability issues after moving everything to WebDAV (TPC + job stage-out). Need to tune the configurations
		
- dCache
	- SRR status - mail discussion WLCG + dCache devs
		- hopefully update in next dCache release
 
- How to fix files uploaded without right WriteToken GGUS:151836?
 
- SRR status - mail discussion WLCG + dCache devs
		
- Still missing
	- (US) XRootD sites (XRootD 5.2rc1)
- RAL Echo update DOMATPC-2 still not very optimistic
		- critical for September Data Challenges
 
- (US) HPC & gridftp DTN
		- we need somebody actively working on this topic
			- work in progress on Rucio + Globus Online integration
- waiting for XRootD 5.2 RSE installation at BNL
				- multihop from FTS to Globus world via this RSE
 
 
- avoid dependency on legacy gridftp by the end of 2021?
 
- we need somebody actively working on this topic
			
- T3 sites - deadline end of 2021
- tapes - September 2021
		- minus RAL CASTOR (autumn 2021 start of migration to CTA)
 
- 29/92 sites
 
 Available Rucio DOMA tests- Full transfer matrix tested
- Are all these tests still relevant
	- Experiments rely on their own monitoring
- Test parameters modification(?) suggestions(?)
 
- Tests
	- Functional Tests WebDAV & XRootD, 1GB every hour (28 & 16 sites)
- Functional Tests OIDC, 1GB every hour (7 sites)
- Functional Tests TAPE, 1GB every hour (10 endpoints)
- Stress Tests WebDAV & XRootD, 250x 4GB transfers every 4 hours  (6 & 4 sites)
		- 0.5PB/week with 1.5Gb/s average throughput in/out per participating site
 
- Stress Tests WebDAV NFiles, 10000x 1KB transfers once a day (7 sites)
 
 CMSThis week we enabled 'davs' in Prod for T2_US (except Vanderbilt). We found some issues at: - Purdue and Florida: permissions on specific paths (fixed)
- DESY (Put on Prod long time ago): wrong port used (fixed)
 Next Week I'm planning to enable 'davs' in Prod for Vanderbilt and the T1s Current Status:total sites 55 with davs 50 90.91% passes manual tests 44 80.00% in Prod 7 12.73% 
- StoRM
	
- 
        
            
                
        16:50
    
    
        →
        
            16:55
        
    
            
        
        StoRM update 5mSpeaker: Andrea Ceccanti (Universita e INFN, Bologna (IT))StoRM 1.11.21Released at the end of this week: https://issues.infn.it/jira/projects/STOR/versions/16713 Scripts to updated storage usage report scripts (for sites that do not use quotas or GPFS and want to avoid dus): https://github.com/italiangrid/storm-utils/tree/main/space-reporting These are also packaged as an RPM: https://repo.cloud.cnaf.infn.it/repository/storm-rpm-beta/centos7/storm-utils-1.0.0-0.el7.x86_64.rpm StoRM WebDAV configuration documentation improved: 
- 
        
            
                
        16:55
    
    
        →
        
            17:00
        
    
            
        
        Token Authorization testbed 5mSpeakers: Andrea Ceccanti (Unknown), Andrea Ceccanti (Universita e INFN, Bologna (IT))Since GH actions disables scheduled runs if there's no activity on the repo, I've deployed a run of the test suite also on our Jenkins: https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/ Reports accessible to anybody. The situation on compliance hasn't improved: https://ci.cloud.cnaf.infn.it/view/wlcg/job/wlcg-jwt-compliance-tests/job/master/18/artifact/reports/reports/20210505_112038/joint-report.html 
- 
        
            
                
        17:00
    
    
        →
        
            17:05
        
    
            
        
        AOB 5m
 
- 
        
            
                
        16:00
    
    
        →
        
            16:10