Indico celebrates its 20th anniversary! Check our blog post for more information!

FTS3 - Meeting and demo

Europe/Zurich
28/R-015 (CERN)

28/R-015

CERN

15
Show room on map
Alejandro Alvarez Ayllon (CERN)
    • 15:00 15:30
      Transition, current status and release plan 30m
      * Current RC version * Discuss versioning schema * Discuss release and deployment pace * Future goals - Short term - Medium and long term * "Help Wanted" - "Ops" discussion
    • 15:30 16:00
      Demo: Web configuration 30m
      Demo of the first version of the Web UI configuration interface
      more information
    • 16:00 16:45
      Discussion on requirements, standing issues and priorities 45m
      Time allocation for open discussions, specially experiment issues, request, future plans...
      • ATLAS topics 15m
        1. Transactional submission

          1. Rucio state machine: 1) QUEUED → 2) SUBMITTING → 3) FTS Submission → 4) SUBMITTED (update of rucio db with the FTS jobid) etc….

          2. If something happens between 3) and 4) like FTS timeout, Rucio db issues, etc we don’t know if a job has been successfully submitted to FTS.  For the recovery options:

            1. id that we can pass at submission time and use to check if the job is there or not

            2. Way to check that a job has been submitted for a destination url(s)

        1. get error information from FTS with messages and by polling:

          1. Recoverable error or not

          2. source or destination error

          3. Possibility to provide regexps to classify the error.

        1. Multiple sources in one bulk job

          1. Today we submit one multi source transfer per job

        1. Issue with messages confirmed yesterday.

          1. Wrong FAILED  ↔ SUCCESS transition and race conditions

          2. We don’t consume messages anymore and only poll which is less performant and put more load.

         

        1. Many errors when writing to tape (TRIUMF) - why optimizer does not back off?

         

        1. Scalability Performances issue when getting all requests done for the last 2 hours ?

          1. 1 hour to get more than 10k requests for the last 2 hours. All FTS servers.