ATLAS UK Cloud Support

Europe/London
Vidyo

Vidyo

Tim Adye (Science and Technology Facilities Council STFC (GB)), Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))

● Outstanding tickets

GGUS #144759 and #144688: Gareth: Glasgow is complicated at the moment. Working on fixing squids for CVMFS.

GGUS #144716: Sam: RHUL applied DPM fix to space reporting. Tony closed the ticket during the meeting, reporting that this fixes the problem.

CPU

Lancaster usage is down a bit today. Matt: reported that they are inundated with jobs that are starting and dying. This is eating all prod pilots. User janders (John Anders) AP_UPGR (Upgrade simulation jobs). [After the meeting, Stewart emailed the user, who admitted a mistake. He will cancel the jobs and resubmit.]

Dan said that he'd just enabled some new VOs at QMUL, which has suppressed ATLAS jobs temporarily.


● Other new issues

Simon George has requested a cleanup of RHUL LocalGroupDisk. We will follow the usual procedure of contacting users.


● Centos 7 migration

Dan: ARC-CE up and running at Sussex. They are ready for ATLAS jobs. Elena will set up a test PanDA queue.

Dan mentioned that Patrick is frustrated getting his GridPP Certificate registered for ATLAS VO. He'll contact ATLAS UK support. Stewart mentioned another possibility of getting a CERN Certificate, using https://ca.cern.ch/ca/ , though this is obviously not ideal.


● Storageless sites

Elena: Still 10 TB left on Sheffield DataDisk. Need to clean by end of month.
Elena checked Rucio copytool setting in Sheffield AGIS. It's correct for prod queue. She will fix analysis queue.

Decommissioning UKI-SCOTGRID-ECDF-RDF storage is nearly done. See: https://its.cern.ch/jira/browse/ADCINFR-161 .


● Glasgow Ceph storage

Matt: Glasgow needs to decomission the old storage soon, so this is becoming more urgent.
Tim will put his skates back on and set up a test queue as planned.


● News round-table

Dan: Migration to Lustre progressing well.

Elena: NTR

Emanuele: NTR

Gareth: NTR

Matt: Followed up on several issues, reported above.

Sam: NTR

Stewart: NTR

Tim: Tape carousel reprocessing hopefully now starting early next week. RAL has a Castor intervention on Wednesday, so ATLAS won't start RAL tape recalls until after that is complete.
Rucio copytool fixed for use with RAL Echo (last month). Works in RAL test queue. Now ready to enable it for other queues.

There are minutes attached to this event. Show them.
    • 10:00 10:10
      Outstanding tickets 10m

      GGUS #144759 and #144688: Gareth: Glasgow is complicated at the moment. Working on fixing squids for CVMFS.

      GGUS #144716: Sam: RHUL applied DPM fix to space reporting. Tony closed the ticket during the meeting, reporting that this fixes the problem.

      CPU

      Lancaster usage is down a bit today. Matt: reported that they are inundated with jobs that are starting and dying. This is eating all prod pilots. User janders (John Anders) AP_UPGR (Upgrade simulation jobs). [After the meeting, Stewart emailed the user, who admitted a mistake. He will cancel the jobs and resubmit.]

      Dan said that he'd just enabled some new VOs at QMUL, which has suppressed ATLAS jobs temporarily.

    • 10:10 10:20
      Other new issues 10m

      Simon George has requested a cleanup of RHUL LocalGroupDisk. We will follow the usual procedure of contacting users.

    • 10:20 10:40
      Ongoing issues 20m
      • Centos 7 migration 5m

        Dan: ARC-CE up and running at Sussex. They are ready for ATLAS jobs. Elena will set up a test PanDA queue.

        Dan mentioned that Patrick is frustrated getting his GridPP Certificate registered for ATLAS VO. He'll contact ATLAS UK support. Stewart mentioned another possibility of getting a CERN Certificate, using https://ca.cern.ch/ca/ , though this is obviously not ideal.

      • Storageless sites 5m

        Elena: Still 10 TB left on Sheffield DataDisk. Need to clean by end of month.
        Elena checked Rucio copytool setting in Sheffield AGIS. It's correct for prod queue. She will fix analysis queue.

        Decommissioning UKI-SCOTGRID-ECDF-RDF storage is nearly done. See: https://its.cern.ch/jira/browse/ADCINFR-161 .

      • Glasgow Ceph storage 5m

        Matt: Glasgow needs to decomission the old storage soon, so this is becoming more urgent.
        Tim will put his skates back on and set up a test queue as planned.

    • 10:40 10:50
      News round-table 10m

      Dan: Migration to Lustre progressing well.

      Elena: NTR

      Emanuele: NTR

      Gareth: NTR

      Matt: Followed up on several issues, reported above.

      Sam: NTR

      Stewart: NTR

      Tim: Tape carousel reprocessing hopefully now starting early next week. RAL has a Castor intervention on Wednesday, so ATLAS won't start RAL tape recalls until after that is complete.
      Rucio copytool fixed for use with RAL Echo (last month). Works in RAL test queue. Now ready to enable it for other queues.

    • 10:50 11:00
      AOB 10m