ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))

● Outstanding tickets

  • GGUS #143208 Oxford deletions now seem to be working. Alessandra will close the ticket.
  • GGUS #143186 Liverpool have fixed a RAID fault on one of their pool servers and the transfers seem to be working now. Alessandra will close the ticket.
  • GGUS #143106 There seem to be problems with all Storm sites that use the new Rucio mover method. Alessandra put gfal access back for QMUL. If that fixes the problem, this ticket can be closed. Rod is leading a discussion on the Rucio mover with Storm. Dan will try Alessandra's davix-http suggestion.
  • GGUS #143094 Glasgow was very busy with transfers. Seems better now. Alessandra will close the ticket.
  • GGUS #142774 Lancaster has been migrating data off a bad server. No news as to whether that is complete, but the transfers seem to be better. Alessandra will close the ticket.

● Other new issues

All transfers from FZK and BNL to Manchester, Lancaster, and ECDF (not ECDF-RDF) are broken. Looks like an IPv6 issue. FZK and BNL also failed with the RAL FTS, and hence all UK sites were affected until ATLAS (and CMS) switched to using the CERN FTS.

Dan is deploying a new ARC6 CE. A test queue is available. Alessandra switched it from BROKEROFF to TEST so it can run HammerCloud tests.


● Centos 7 migration

RHUL and Sussex are the last UK sites to migrate. All remaining SL6 PanDA queues are now in TEST (production) or BROKEROFF (analysis), so can't run any more jobs. Hence all ATLAS jobs now run Pilot2 on CentOS7.

RHUL moved to CentOS7 and are moving to HTcondor-CE/HTcondor at the same time, so this is taking some time.

Sussex: Patrick is having some trouble setting up the ARC-CE with the Grid Engine. Suggested to ask others who also use ARC-CE, eg. Daniela Bauer and Simon Fayer (IC), Matt Doidge (Lancaster), and Rob Currie (ECDF).  Alessandra suggested to mail TB-SUPPORT@JISCMAIL.AC.UK.
Other advice: don't need to set up an Argus server - use a simple script (eg. from IC) instead.
To test, try a simple job submission with the ARC tools.


● Singularity

Durham has enabled user namespaces, so is running Singularity from CVMFS.
Only remaining UK CentOS7 site without Singularity is ECDF. They will need to enable user namespaces to use CVMFS, but have said they can't do that until the end of the year.


● News round-table

Alessandra: NTR
Dan: NTR
Patrick: NTR
Sam: NTR
Tim: STFC is advertising the ATLAS Tier-1 Liaison post. The new hire will transition taking over from Stewart and Tim until next April.

There are minutes attached to this event. Show them.
    • 10:00 10:10
      Outstanding tickets 10m
      • GGUS #143208 Oxford deletions now seem to be working. Alessandra will close the ticket.
      • GGUS #143186 Liverpool have fixed a RAID fault on one of their pool servers and the transfers seem to be working now. Alessandra will close the ticket.
      • GGUS #143106 There seem to be problems with all Storm sites that use the new Rucio mover method. Alessandra put gfal access back for QMUL. If that fixes the problem, this ticket can be closed. Rod is leading a discussion on the Rucio mover with Storm. Dan will try Alessandra's davix-http suggestion.
      • GGUS #143094 Glasgow was very busy with transfers. Seems better now. Alessandra will close the ticket.
      • GGUS #142774 Lancaster has been migrating data off a bad server. No news as to whether that is complete, but the transfers seem to be better. Alessandra will close the ticket.
    • 10:10 10:20
      Other new issues 10m

      All transfers from FZK and BNL to Manchester, Lancaster, and ECDF (not ECDF-RDF) are broken. Looks like an IPv6 issue. FZK and BNL also failed with the RAL FTS, and hence all UK sites were affected until ATLAS (and CMS) switched to using the CERN FTS.

      Dan is deploying a new ARC6 CE. A test queue is available. Alessandra switched it from BROKEROFF to TEST so it can run HammerCloud tests.

    • 10:20 10:40
      Ongoing issues 20m
      • Centos 7 migration 5m

        RHUL and Sussex are the last UK sites to migrate. All remaining SL6 PanDA queues are now in TEST (production) or BROKEROFF (analysis), so can't run any more jobs. Hence all ATLAS jobs now run Pilot2 on CentOS7.

        RHUL moved to CentOS7 and are moving to HTcondor-CE/HTcondor at the same time, so this is taking some time.

        Sussex: Patrick is having some trouble setting up the ARC-CE with the Grid Engine. Suggested to ask others who also use ARC-CE, eg. Daniela Bauer and Simon Fayer (IC), Matt Doidge (Lancaster), and Rob Currie (ECDF).  Alessandra suggested to mail TB-SUPPORT@JISCMAIL.AC.UK.
        Other advice: don't need to set up an Argus server - use a simple script (eg. from IC) instead.
        To test, try a simple job submission with the ARC tools.

      • Singularity 5m

        Durham has enabled user namespaces, so is running Singularity from CVMFS.
        Only remaining UK CentOS7 site without Singularity is ECDF. They will need to enable user namespaces to use CVMFS, but have said they can't do that until the end of the year.

    • 10:40 10:50
      News round-table 10m

      Alessandra: NTR
      Dan: NTR
      Patrick: NTR
      Sam: NTR
      Tim: STFC is advertising the ATLAS Tier-1 Liaison post. The new hire will transition taking over from Stewart and Tim until next April.

    • 10:50 11:00
      AOB 10m