LHCOPN-LHCONE meeting #46 - Virtual meeting

Europe/Zurich
Edoardo Martelli (CERN)
Description

The meeting will be virtual.

Starting time is 15:00 CET - Geneva time

Check here the starting time for your time-zone

Videoconference
Zoom Meeting ID
97610360540
Host
Edoardo Martelli
Useful links
Join via phone
Zoom URL

 

===
LHCOPN update

Pic will upgrade before the summer
in2p3 will upgrade backup link to 100G.
cnaf: working up upgrading link MI-Bo to allow load balancing
FNAL: in the process to migrate to Esnet6. Will upgrade to 400G native
NDGF: 2x20G no plan to upgrade
SARA: testing Nikhef SARA 400G

Here is  summary of news from USCMS-Tier1:
- Totally 3x 100G  circuits for offsite connectivity (same as before)
- 2x 100G is primarily for LHC, 1x 100G  is shared for general IP traffic, some science and LHC traffic  as well
- Transition to ESnet6: ChiExpress has been migrated over to the  new optical systems, (CIENA to Infinera)
- USCMS-T1 is undergoing upgrades to native  400G technology

IN2P3
  10G LHCOPN backup link has been shutdown.
  Our LHCOPN backup is now on LHCONE.
  We are working with RENATER to have a 100G link for LHCOPN backup.

===
RAL update

New datacentre leaf/spine design. Based on Mellanox with Cumulus
Tier1 will connect to LHCONE in May/June
New network ready by the Summer 2021.
May add second 100G link in the mid of Run3

===
LHCONE update

No new NREN, no major changes
CERNlight increased, due to connection to SINET Japan
Decrease of 20% on the GEANT backbone
NORDUnet and ESnet Internet2
Slight decrease of traffic between EU and US
Increase of traffic between Asia and US, probably due to BelleII
Traffic constant between Asia and EU
No change between Eu and South America. Expected increase when the new BELLA transatlantic cable will be ready
Overall 20% reduction
Lower decrease on the General Internet, although campuses are empty (6-10%)

Inder: also ESnet has seen a decrease

===
Transatlantic network capacity

Less traffic due to LHC shutdown and Pandemic. When is the right time for upgrade?
Existing capacity is enough for the start of Run3. Upgrades will happen during Run3
For HC-LHC ATLAS and CMS have requested ~1.4Tbps
Plus, the redundancy is needed because undersea cables can get several weeks to  get repaired
Historical data shows that ~1,5Tbps will be needed by the time of HL-LHC.
Is that capacity feasible? From the cost point of view, yes.
Looking to share Spectrum with other OTT
Esnet will increase also  the EU capacity to meet the TA capacity
Full support from DOE to meet LHC exp needs
The main question to the group is: what is the most appropriate investment?

===
BelleII update

So far collected 2PB of, stored at KEK and BNL
1.2PB to be collected in 2021
Long shutdown will start July 2022 for machine upgrade
30% of site in LHCONE, but they generate 80% of the BelleII traffic
All data flow are managed by Rucio
New data distribution schema: no longer only in BNL, but share among all the raw data centres
Because of this, new round of data challenges. Also to simulate the expected increase of capacity (45TB/day)
Pilot tests starting now with KIT and CNAF
BelleII will contribute to the packet marking activity
The adoption of Rucio has brought many improvements
Tier0 is deploying IPv6, but no prefix advertised yet

===
CRIC database

DUNE is using CRIC also
Shawn: once data is migrated, twiki becomes read only and we inform site how to update CRIC
Dale: use the looking glass to update the prefixes
Peter: CRIC is clunky, clear instructions should be given

===
Monitoring update

perfsonar latest version id 4.3.4. 4.3 was a major upgrade because of the move to Python3
RNP Brazil has joined the perfsonar collaboration
207 production endpoints in T1/T2
Improvement in the 100G mesh
Using new Kibana dashboard. All resources are organized in a new toolkitinfo web page
Additional monitoring will be needed for the upcoming WLCG data challenges
Eli: it would be good to get info on packet loss. They will be useful for the WLCG data challenges
Dale: how is the 100G mesh working? Shawn: We need to make the mesh all green, that we can use it as a baseline to understand what's the available bandwidth
Tim: what are you trying to measure with the 100G probes? Shawn: The main use is to understand if there's any issue in the network. The 100G can estimate what namely the network can provide.


==
Juno update

JUNO is a neutrino detector in China.
It detects the neutrino generated by two nuclear plants nearby
Status: civil work started, caves almost ready
Data taken foreseen for late 2021 has been postponed
78 institutes participate to JUNO
Estimate to produce 2PB/year
Datacentres: iHEP (Tier0) JINR, MSU (RU) IN2P3, CNAF
Data will be moved first to CNAF, then from there to the other data centres
Rucio will be adopted

===
New LHCONE AUP

https://docs.google.com/document/d/1BUjk51LZ4ivYzvAGmxEL2obxihRVTfx6JVykT0YHVkU/edit#
Two main meetings to discuss the changes
Change the role of the WLCG management board
New text will be circulated


===
WLCG network challenges

Process started in June 2020 with ESnet collection of requirements from the LHC experiments.
It was a good timing, because HL-LHC was also giving recommendation on improving networking
The computing model of the experiments at HL-LHC will be different from what they have today
ATLAS and CMS will produce 350PB/year/experiment. To be exported in real time to Tier1s
It gives 4.8Tbps from CERN to the Tier1s of which 1.25Tbps over the Atlantic
The Flexible scenario would allow the experiments to exercise better the system and try to improve the efficient use of the resources.
In summary, big Tier1s are supposed to get connected to CERN and to their Tier2s at 1Tbps
Based on these targets, a plan for data challenges is proposed
Data challenges will use the production infrastructure. The challenges will co-exist with production activities.
First challenges will start at the end of 2021
Data challenges are being discussed in the DOMA-TPC sub-wg
Rob: at a certain point we should express the requirements for Tier2s more clearly
Simone: Tier2s will be part of the challenges from the beginning. The Tier1s will have to demonstrate not only to take the data from CERN at the expected rate, but also stream to all their T2s at an equal rate.
Rob: clear numbers should be stated for Tier2s to allow proper planning
Shawn: fully agree on the monitoring. We are aiming to set up a system that allows to understand how the infrastructure behave
Eli: Esnet is interesting in participating and supporting the challenges
Bruno the challenges at 2027 are only at half of the final capacity required. I'd like to see bigger numbers for 2027
Simone: the rationale was that if we can fill 50% of the capacity, we are doing as well as today. But agree that the targets for 2027 can be increased, maybe by this forum
Harvey: the biggest challenges are not the mere bandwidth utilization, but the software needed to reach higher numbers.
Harvey: we should also consider how to implement the system so that it doesn't interfere/damage the other sciences that will need to use the network
Simone: I hope the other communities will also benefit in seeing how we implement and use the system
Dale: be careful on defining those numbers because if they are understood as needs, people will start building and investing for that
Magnus: we need to look at the protocols to be able to use at the network more efficiently, without being scared of short burst.

===
Datacentre network architectures

Looking for interest on datacentre netwotk design
Shawn: it's important to share this information. I'm also going through this decision
Mark BNL : also interested in participant
Lars: should we broadened to other communities? Maybe GEANT can organize something?
Shawn: WLCG is already a big community, it may become a too big meetings. Limiting to the experiments

Stefano and Edoardo to organize an event on this subject

===
NOTED update

NOTED analyse transfer generated by FTS, because it's the main tool used by WLCG.
Tested automation with transfers to PIC and TRIUMF: successful

===
Research Networking Technical WG update

Contribution not only from WLCG and NREN, but also from outside like RFC editors, Linux kernel developers
Packet marking: considered many options: multiple addresses, headers, MPLS. Ipv6 flowlabel seems to be the more promising
Proposed a packet marking schema for ipv6 flowlabel
Using standard iperf3 for testing
Collaboration with Fernando Gont, who has developed many ipv6 tools

===
Autogole update

Autogole  infrastructure is growing. Recently added Guam, Hawaii, (two more)
The SURFnet autogole connects Moxy (CA) Esnet in Amsterdam, GEANT in Amsterdam, CERN
Demonstration done at SC20
AutoGOLE could be extended to provide more advanced service beyond layer2, like multiple VRF, circuits for DTN transfer, perfSONAR monitoring
Bill: what about AAA? Gerben: Authentication is in place, based on certificates or tokes. Authorization is done in the MEICAN dashboard.

=====
GNA-G DIS Working Group update

Working on challenges presented by high requirements of HL-LHC.
Important to keep in mind that there are other science project that can compete with LHC for network resources.
Many projects and testbeds:
- autoGOLE and Sense
- RARE: working on a router process and P4
- caches for WLCG

====
ROBIN project update

Comparing Rucio/FTS vs Rucio/SENSE/DTNs
Used DTNS at CERN and FNAL
ROBIN outperforms FTS between 2 and 30 times.
Justas: why such big differences? Wenji: the integration between FTS and Gridftp is not very efficient, especially with small files. The ROBIN transfer engine is highly optimized both on file transfers and integration with Rucio. FTS also suffered because of the long RTT between CERN and FNAL.
Justas: it would be nice to compare all the protcols
Petr: FTS is moving away from GridFTP. It should be compared with HTTP-TPC. ROBIN can be very useful to transfers data to HPC centres


====
Zoom chat, day 1

16:37:13 From Harvey B. Newman to Everyone : Here is the (just became public) HL LHC network needs document
16:37:22 From Harvey B. Newman to Everyone : https://indico.cern.ch/event/1004222/contributions/4241102/attachments/2194497/3745965/HL-LHC%20network%20challenges%20-%20V2.pdf
16:37:45 From Harvey B. Newman to Everyone : Also see the Network R&D part that I will refer to tomorrow.
16:54:54 From Tony Cass to Everyone : That network needs doc is on tomorrow's agenda as well, for SImone's slot.
16:59:29 From Harvey B. Newman to Everyone : The Rucio Monitoring system seems a great match to P4 iINT (in Network Telemetry) to handle flows and "events" in the networks an/or at sites
17:34:29 From Dale Carder to Everyone : Tim, I’d claim the loss testing is much more operationally relevant to operational monitoring than throughput testing.
17:36:04 From Tim Chown to Everyone : yes, that’s a fair point, and even ‘small node’ pS running 1G throughput can give useful indications from its latency tests which report loss.
17:36:46 From Tim Chown to Everyone : the more general question is as we start being able to run pS throughput tests at >10G, how should we approach that?
17:39:08 From Dale Carder to Everyone : Carefully, as not to overwhelm sites w/ test traffic that could out-compete production.
17:39:45 From Eli Dart - ESnet (he/him) to Everyone : From my perspective, there are a couple of very important items in this space
17:40:05 From Eli Dart - ESnet (he/him) to Everyone : One is that, as Dale says, packet loss tells you where the problems are. This is really important.
17:40:38 From Eli Dart - ESnet (he/him) to Everyone : Throughput testing tells you about the basic building block of data transfer, which is a single TCP connection.
17:42:08 From Eli Dart - ESnet (he/him) to Everyone : Parallel testing masks problems, because it’s straightforward to fill up a host interface with multiple parallel streams….and that makes it harder to tell what’s happening (e.g. if loss is limiting you to 5Gbps or less, and you’re running 32-way parallel, you’re just not going to see it)
17:42:28 From Tim Chown to Everyone : Agreed Dale, and there is likely significant overtesting over the same links; it would be good to have more coordinated testing that minimises that, while identifying where the issues (loss) are.
17:42:39 From Eli Dart - ESnet (he/him) to Everyone : Also, short-distance tests can easily cause problems for other traffic - the worst is across a site border or an exchange point
17:42:51 From Eli Dart - ESnet (he/him) to Everyone : we’ve seen this in production
17:43:16 From Tim Chown to Everyone : and yes Eli, I recall a poster at TNC with a chap who had achieved 10Gbps over a very long distance, but had used something massive like 1,024 parallel streams.
17:43:43 From Eli Dart - ESnet (he/him) to Everyone : yup
17:45:11 From Eli Dart - ESnet (he/him) to Everyone : The exchange point and site perimeter issue has implications for perfSONAR host NIC speed. If you do short-distance tests between 100G perfSONAR tests across a 100G site perimeter or 100G exchange point interface, the blast radius can be large.
17:45:33 From Eli Dart - ESnet (he/him) to Everyone : (between 100G perfSONAR test hosts that is)
17:46:25 From Tim Chown to Everyone : we have found it useful to do those short distance tests in some cases to emphasise where issues are. but the point is to test from the other end to both inside and border, and not from border to inside.
17:46:47 From Eli Dart - ESnet (he/him) to Everyone : yes
17:47:01 From Eli Dart - ESnet (he/him) to Everyone : ideally from a distant other end - that usually makes the problem obvious :)
17:48:31 From Tim Chown to Everyone : but i think it’s still useful to develop the knowledge to be able to run pS at >10G.  At the moment we see some 100G NIC pS systems drive more than 2-3G.  Determining why is useful.  Tuning your ferrari is worthwhile even if you only drive it at 55 :)
17:49:18 From Tim Chown to Everyone : *drive no more than
17:49:34 From Eli Dart - ESnet (he/him) to Everyone : yes
17:49:45 From Eli Dart - ESnet (he/him) to Everyone : It might be valuable to just run those hosts at 50G
17:50:25 From Tim Chown to Everyone : indeed, if pS tests well at 50G, you’ll not likely learn more running higher.
17:50:31 From Eli Dart - ESnet (he/him) to Everyone : Depending on the path, it may be that there just isn’t 100G of bandwidth available, so the tests will always encounter losss
17:50:49 From Tim Chown to Everyone : but this is what we’re trying to figure out, what are the meaningful tests, what is the optimal config for these systems.
17:51:30 From Edoardo Martelli to Everyone : https://docs.google.com/document/d/1BUjk51LZ4ivYzvAGmxEL2obxihRVTfx6JVykT0YHVkU/edit#heading=h.lx9lvzy1fmn5

====
Zoom chat, day 2

15:29:31 From Eli Dart - ESnet (he/him) to Everyone : Other science groups (outside particle physics) are adopting or are considering adopting Rucio+FTS….it will be valuable to communicate data challenge results and R&D results to these groups
15:29:42 From Eli Dart - ESnet (he/him) to Everyone : One example is CMB-S4
15:31:26 From Rob Gardner to Everyone : Another is XENON, which is already using Rucio+FTS3
15:34:32 From Harvey B. Newman to Everyone : Yes we emphasize that in SENSE/AutoGOLE and the GNA-G. As Frank pointed out there are about 30 science programs that use FTS/RUCIO
15:35:58 From Gerben van Malenstein to Everyone : In addition, the NOTED project is connecting FTS to AutoGOLE/SENSE: both Joanna and I will be presenting on this today. I expect that groups outside of LHC will be following
15:36:07 From Rob Gardner to Everyone : IceCube, LIGO/IGWN ..
15:36:19 From Harvey B. Newman to Everyone : Yes the "petabyte in a shift" at Tier2s means you need 400G installed at least. This is not obvious.
15:38:21 From Harvey B. Newman to Everyone : It is not "the network" but the network by segment, by VO, by activity, by priority, etc. An operational model related to the monitoring.
15:39:41 From Harvey B. Newman to Everyone : I will say a lot more later. Including the fact that technologies to build this on are existing and/or emerging.
15:46:49 From Enzo Capone to Everyone : 2027 will also (probably) see the start of science production for SKA. That will be a major aspect to take into account (for NRENs, if not directly from the HEP community)
15:52:53 From Harvey B. Newman to Everyone : google P4 INT
15:53:31 From Harvey B. Newman to Everyone : GEANT RARE freertr
16:00:20 From Harvey B. Newman to Everyone : https://docs.freertr.net/
16:10:28 From Lars Fischer to Everyone : In many ways this comes down to having a 2021 best practice DC overview
16:16:55 From Harvey B. Newman to Everyone : https://arxiv.org/pdf/1909.12101.pdf
16:17:19 From Harvey B. Newman to Everyone : Programmable Event Detection for In-BandNetwork Telemetry
16:37:38 From Dale Carder to Everyone : Great work, Joanna!
16:37:55 From Lars Fischer to Everyone : Thanks for a very interesting presentation, Joanna
18:10:01 From Justas Balcas to Everyone : I am having a hard time understanding these numbers. Looking more at mdmtp website (comparison with GridFTP/FDT):https://mdtm.fnal.gov/Evaluation.htmlAnd also what was presented for 3rd party copy benchmarks (GridFTP vs XRootD):
https://docs.google.com/presentation/d/1lWKNFc7Lf6vItlSNC-Kz5c-4-tiEra14yHS05h_op4g/edit#slide=id.p
https://arxiv.org/abs/2103.12116
Numbers simply do not agree with each other. 20 to 30 times is a huge difference. Is it known why?Also, mdtm does transfer scheduling - not only transfers itself. Was FTS Configured for high throughput? Is it a tool issue (xrootd) or a scheduling issue of FTS?

 

There are minutes attached to this event. Show them.