FTS3 Steering

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Alejandro Alvarez Ayllon (CERN)
  • Release candidate is 3.5.x. Patch version will depend on the iterations while running on the pilot.

From Atlas document

  • Atlas asked how to figure out the version that is running on a server
    • REST publishes it on the root directory. i.e https://fts3.cern.ch:8446/
      •  "api": {"major": 3, "minor": 4, "patch": 2}
    • Devs should find a way of publishing also the core version [FTS-703]
  • Atlas agreed to move part of the production load to the Pilot service
    • Rather than a one-time only, this load can remain indefinitely, providing invaluable feedback for devs
  • XrdCp transfers from EOS to Castor, used by anybody?
    • Not via FTS, although CMS uses XrdCp directly. Satisfied with the results.
    • The gfal2 xrootd plugin relies on the xrood libraries, so it is mature enough
    • There are some concerns about the suitability of  xrootd outside internal CERN transfers
  • Fair-share: Since FTS schedules per link, and then by activity, some higher priority transfers A ->D can be starved by lower priority transfers from other links (X->D) since they are scheduled first, and exhaust the storage limitation of D
    • Breaking the strict ordering of FTS when scheduling may be enough, and easier to implement [FTS-704]
  • There is a fair amount of small files coming from Atlas. Session reuse will help when switching to GsiFTP only.
    • Can FTS decide when to use it?
    • As of today, session reuse is for the whole job, or nothing.
    • Low hanging fruit: jobs with several small files [FTS-705]
  • Very long term: cross check theoretical bandwidth with achieved throughput. Can this provide feedback for FTS?

Configuration

  • Needs to be armonized, so VOs know what are other doing, what are the values...
    • ​Need to involve all parties (VOs, devs...)
    • Consultancy from devs may be required for setting the values as well
    • Maybe better to iterate that keep discussing
  • Agreed on creating a JIRA project to keep track of why changes are done
    • FTS provides an audit, but not tracking of rationales
    • JIRA for the moment, to consider integration (i.e. automatic integration of ticket creation)
    • [FTS-706]

Other

  • Stalled connections: small improvements at CERN, but not yet 100% solved
    • To notify sites configuration changes required once CERN dissappear from the alerts
    • Disabling Gridsite passcode files and max requests per client seem to help, but only help
  • Deletion:
    • ATLAS, CMS and LHCb do not use, and do not plan to use deletions
    • To be removed [FTS-707]
  • SOAP
    • Only CMS pending migration, but going as planned
    • ~2 months the REST implementation will go to production (this is, ~beginning of November)
    • Calendar is maintained: [FTS-600], [FTS-601]
      • Monitoring being put in place to trace users still using SOAP
      • SOAP can be shutdown progressively before the rollout of 3.6 to detect outliers
  • Downtime may be required for 3.6 and database optimizations
    • No objections by anyone
    • Pilot can be used as pivot
    • No need to drain, but yes to stop submitting
      • Poller may keep running, so either read-only access or 503 statuses need to be returned
    • To discuss the dates. January doesn't seem to fit.

 

There are minutes attached to this event. Show them.
    • 15:30 16:15
      FTS 3.5.0 and roadmap 45m
      • 3.5
      • Plans for 3.6
      • Deletion, still a requirement?
      • SOAP deprecation
      • Configuration traceability
    • 16:15 16:30
      Follow up: Messaging and REST stalled connections 15m
    • 16:30 16:50
      AOB 20m

      [HIGH]
      - Many things will “come in next release”: can you please clarify which versions is ready to go in production, and can we know from sites running FTS what they run?
      E.g. stale connection status: is this fixed, can we ask sites to move to that release?
      - FTS pilot: ATLAS could increase the load if needed.
      max/min for optimiser
      - Xrdcp third party transfers: is it true that other experiments are using already FTS with xrdcp? Is it mature enough for ATLAS to use it, or at least to test it for EOS to Castor RAW files transfers?

      [MEDIUM]

      • Fair-share: We need to apply it per destination and not per linlefik, e.g., express
      • Config between various FTS servers and defaults clearly visible.
      • Config between Optimizer and fixed streams
      • Automatic session reuse . ATLAS is pushing to have more sites SRMless but with GridFTP. If we have a multifile job with gridftp, can FTS automatically decide to use the gridftp session reuse?

      [LOW]
      - Network map, at least LHCOPN (Tier-0 Tier-1s)
      - Monitoring: different numbers in different places. Need to be understood.