GridPP Technical Meeting

Europe/London
Virtual Only

Virtual Only

Andrew McNab (University of Manchester), David Colling (Imperial College Sci., Tech. & Med. (GB))
Description

Fortnightly meeting for technical topics looking further ahead than the weekly ops meetings on Tuesdays. There are also dedicated storage group meeting on Wednesdays. Each topic can go beyond the nominal 5 minute slot whenever necessary.

GridPP Technical Meeting

 

Future storage discussion

 

Sam:

Solutions are experiment specific. Buying no storage at all is not a good idea so at least some storage - how much is uncertain. Moving away from hardware raid  to things like zfs as rebuild time is too long. This is also cheaper. Not clear if this is a cache or traditional storage - rather experiment specific. CMS happy to do things remotely but Atlas is opposite (trying to be politically correct). Also smaller experiments need storage. Looking at federated storage which means that you need storage to federate. Atlas would be unhappy with large number of storageless sites. LHCb really want CPU at sites.

 

Alessandra:

LHCb run 3 computing model essentially start now. She is looking into light-weight sites in Atlas and she is putting togethera team to look at this - this team will test things. Duncan commented that Atlas wanted fewer sites. Skippy noted really they wanted fewer endpoints so federated storage might be the solution.

 

Advice to Atlas sites: Sites should try to keep storage try to keep essentially the same volume of storage but on newer hardware. If you are below 400TB then don't bother otherwise stay the same size but simplify (eg zfs) and on newer hardware. The smaller sites still provide ~3PB of storage. B'ham and Sussex are causing lots of Atlas problems because they don't have the manpower. Aleesandra is looking to a world where Atlas have squid caches for Atlas Data.

 

General discussion

Data popularity is playing an important role. It is hard to tell how a pure cache would.

 

Chris B.:

Diskless T3 discussion: CMS can operate a diskless T3 and PhEDex can simply have a different endpoint from the site. This is being tested with Oxford and RAL T2. The one thing that is missing is the ability to manage the job mix. There was a plan for this but the original plan probably wont work. There was a discussion as to why the site needed a registered site at all and we will take baby steps to making all data available. Currently only jobs that don't care about where the data is or don't need data that can run anywhere.

 

Luke:

Bristol most jobs run on data thatit is at the site, but users don't care about and it is possible to bring the firewall to its knees.

Several sites have firewall bypasses. 

Duncan:

QMUL run CMS analysis jobs and they notice some high network usage and there are number observation here. Xrootd proxies can have an important role.

 

Brian:

Might want to have data intensive and non-data intensive as broad categories of jobs.

 

Tentative Conclusions

 -It is probably possible to run diskless T3 and this is being refined.

- Atlas sites should renew storage above 400TB but keeping it as simple as possible eg using zfs. Below 400TB can replace quite cheaply.

- Storage is needed for other communities and this is most easily done done at sites that are active in these communities. Not all communities can read data remotely.

- We have to remember that the effort is being reduced and so we have to save here.

 

Simon G.:

 

Would very much welcome suggestions on how to make storage easier to manage=, including remote management. Skippy commented that that this is coming in dpm release 1.9.x

 

Other items on the agenda:

Storage news - more detail of the dpm workshop in ops and storage meeting - people should move to the new release when it comes out in December.  Will move SRMless operation etc. It will also dpm-tester.

 

There are minutes attached to this event. Show them.
    • 11:00 11:20
      Federated Storage (or similar) 20m
      Speaker: Samuel Cadellin Skipsey
    • 11:20 11:25
      Tier-2 Evolution: Jobs in VMs 5m
      • Vac, Vac-in-a-Box, Vcycle 5m
        Speaker: Andrew McNab (University of Manchester)
      • ATLAS VMs & news 5m
        Speaker: Peter Love (Lancaster University (GB))
      • CMS VMs & news 5m
        Speaker: Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))
      • LHCb VMs & news 5m
        Speaker: Andrew McNab (University of Manchester (GB))
      • GridPP DIRAC VMs and other experiments 5m
      • Site updates relating to Vac, Cloud, and VMs 5m

        Anything sites want to report this week

    • 11:25 11:30
      Tier-2 Evolution: Storage 5m
      Speaker: Samuel Cadellin Skipsey
    • 11:30 11:35
      Other updates from the Storage Group 5m
    • 11:35 11:40
      Networking including IPv6 5m
    • 11:40 11:45
      Security 5m
      Speaker: Ian Neilson (STFC RAL (GB))
    • 11:45 11:50
      GridPP DIRAC service 5m
      Speaker: Daniela Bauer (Imperial College Sci., Tech. & Med. (GB))
    • 11:50 11:55
      HEP Software Foundation 5m
      Speaker: Andrew McNab (University of Manchester)
    • 11:55 12:00
      AoB 5m