DOMA / TPC Meeting
→
Europe/Zurich
- Attending: Brian, Wei, Andy, Paul, Fabrizio, Al, Ian, Lucia, Dmitry, Petr, Oliver
- XRootD Protocol updates (Wei):
- X509 TPC works with most storage systems in the current release.
- Exception is EOS. Not clear how they plan to implement this currently (may require them to deploy dedicated servers).
- Found a bug in the VOMS plugin.
- Fabrizio: Bug was revealed due to crash in DPM testbeds. Seems to be a conflict with XrdHttp? Wei: new VOMS plugin utilized new set of APIs in OpenSSL; crash goes away when XrdHttp is loaded or when older VOMS plugin is utilized.
- Production VOMS plugin is 0.3; new one (broken) is 0.6.
- Andy and Fabrizio will look into this.
- Brian: Can we file a bug report since we have a stack trace?
- Katy: Have rolled out a new test machine on the cluster; things are working well so far. Would like to change the tests to get more sites to test against that, gain confidence before rolling out everywhere.
- ACTION ITEM (Brian): Update Rucio configs for new gateway hostname.
- Al:
- All things are ready for releasing Xrootd-based delegation for TPC in dCache 5.2. 5.2 should be out in a "matter of weeks" (1 July?).
- smoke-tests for Xrootd (written in python) is getting into shape. Will be out next week, but will come back to this subsequently.
- Fabrizio: There's a problem in the heatmap currently. Certificate used by Rucio had an expired VOMS extension?
- Brian: Way back when, OSG didn't require VOMS verification (extension was "verified" separately by downloading a list of groups directly from VOMS Admin). Maybe that's happening here?
- Paul: Still looking at this, but it's possible the proxy itself is valid and the VOMS extension is expired. The proxy could then get a new VOMS extension (valid), resulting in a chain with one VOMS extension that is invalid and one that is valid.
- Several theories as to why this is - but will need to debug via email.
- Wei: Ale and Mario are putting together a stress test inside ATLAS. More to report later. Tim: Due to how it integrates with AGIS / missing functionality in Rucio, this will be a dedicated stress test for ATLAS.
- Petr: GFAL stopped working recently, but xrdcp directly appears to be fine. When run with debugging, GFAL appears to attempt to delegate. Wei has some ideas - GFAL is perhaps setting the env var too late?
- Tim requests that we post the instructions for doing xrootd TPC with GFAL.
- Petr: GFAL appears to work with dCache (built by hand with recent-ish trunk) but not with DPM. Al: maybe this is because dCache is doing some sort of fallback?
- HTTP updates:
- Minor fixes to the smoke-tests. Fix-up for RHEL7 compat with curl. Brian needs to resolve merge conflicts for the scitokens PR now.
- dCache SciTokens support appears to work fine with the (unmerged) PR.
- ACTION ITEM: Get status update from EOS.
- Caltech joined in the smoke tests but not passing yet (macaroon-related; unclear what the issue is).
- Purdue, Wisconsin, & UCSD are waiting in the wings.
- Nebraska has HTTP transfers working in PhEDEx, but there aren't many sources in CMS for HTTP (few sites touch PhEDEx anymore).
- Paul: DESY is working to roll out HTTP TPC support for production instances. 3 July planned for ATLAS instance.
- ACTION ITEM (Brian): Follow up with DESY CMS contact to export HTTP in PhEDEx
- ACTION ITEM (Brian): Get info from Mario on what Rucio is lacking to do test transfers. For protocol-specific links (i.e., only test HTTP on a specified link, not for entire destination), this could be done by the next major release (October?).
- Paul: KIT joined the test matrix.
- Paul: TRIUMF DynaFed endpoint is working, but others are still failing tests.
- ACTION ITEM (Paul): Ping mailing list?
- Think that RAL should be working?
- Will be a topic in the July pre-GDB!
There are minutes attached to this event.
Show them.