- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
CERN Monit not working since 6am, so everything is/looks stopped. There is a problem with CERN Ceph storage.
GGUS #144884 Seems to be some user analysis jobs that used too much RAM. We should try to get a better error message than "Job submission to LRMS failed".
GGUS #144759 There may be a misconfiguration of the Glasgow squids. Gareth will investigate when he has a chance (new hardware arriving)
GGUS #144688 Gareth said this is a common issue where a burst of transfers to same disk causes transfer errors like this. Once it cools down things seem OK again. Can close this ticket, but it could reoccur.
[later] Had requested (ADCINFR-162) to reduce old DPM storage to allow decommissioning old servers, but CRC shifter said (on GGUS) they still needed the space. Gareth: uncomfortable keeping it this like. Disks are old and may fail sometime.
Patrick/Dan: everything looks good at Sussex, but can't tell with Monit down.
Alessandra [later]: Sussex is running payloads. Failed because they can't access QMUL storage. QMUL is Storm site, and Storm mover uses POSIX access, which isn't supported yet. Alessandra will discuss with DDM.
Elena still has 8.2 TB left on Sheffield disks. Elena will push for last bit to be removed. Will start by posting on JIRA.
For access to RAL disk, Elena has finished switching to use rucio copytool.
Sam: Current setup not final production configuration. Will need more servers. Plan to switch to production cluster once new servers arrive from Dell, probably available mid-February. Dan also noted issues with delivery from Dell. This will mean current disk will probably lose its data. It is a little concerning that the data now on the disk is marked "primary".
Tim: Added Ceph DataDisk in AGIS last Thursday. This apparently is not the correct procedure: DDM need to do some magic *before* the disk is enabled. Dimitrios fixed this on Monday and switched the disk to type "TEST" (instead of DATADISK).
There were still problems transferring to the disk, which Sam fixed in the voms-mapfile.
Tim then setup a new test queue, and HammerCloud jobs started today. (Elena suggested to contact atlas-adc-expert@cern.ch if HC doesn't run.) Jobs fail: they need to be configured to upload the output through the correct gateway. Sam will give details on JIRA.
Alessandra: NETR [nothing else to report; some comments noted above.]
Dan: Last WN moved SL6->C7. Waiting for Dell storage, hope for delivery in February.
Elena: NETR
Emanuele: NTR
Matt: One Lancaster server rebuilding 3 disks, but seems OK. Purchasing gpnode.
Patrick: NTR
Sam: NTR
Stewart: LocalGroupDisk is filling up. Identifying people across UK who have left.
Tim: Switched RAL to use Rucio copytool. All seems good. Data Carousel reprocessing started on Tuesday without RAL, which had a Castor intervention scheduled for Wednesday. That's done, so can start today.
Vip: NTR