- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Indico celebrates its 20th anniversary! Check our blog post for more information!
New signup sheet for topical presentations, https://docs.google.com/document/d/1NIc67p3AB2RkYjJsP6Nx_lwPXFX03w1n2SFOgCU47ro/edit?usp=sharing
Begin forwarded message:
From: Romain Wartel <Romain.Wartel@cern.ch>
Subject: Introduction call -- DRAFT Meeting notes
Date: July 16, 2019 at 9:22:37 AM CDT
To: "wlcg-security-SLATE-wg (WLCG SLATE WG)" <wlcg-security-SLATE-wg@cern.ch>
Resent-From: <rwg@uchicago.edu>
DRAFT Meeting notes of today’s call. All corrections, additions very welcome!
Cheers,
Romain.
—
***************************
* Meeting participants:
***************************
- Chris Weaver
- Dave Kelsey
- Frank Wuerthwein
- Igor Sfiligoi
- Jim Basney
- Johannes Elmsheuser
- Lincoln Bryant
- Nikolai Hartmann
- Paul Millar
- Robert Gardner
- Romain Wartel
- Petr Vokac
- Tom Barton
- Stephane Jezequel
- Shawn McKee
- Vincent Brillault
- Brian Bockelman
- Xavier Espinal
- Elizabeth Sexton-Kennedy
***************************
* Agenda
***************************
- Romain explained several additional parties (including OSG, Fermilab, ESnet) are interested in participating in this working group but had not yet time to appoint someone.
- Introduction by Chris Weaver (slides on Indico)
Q&A:
Vincent: The logs are stored locally with Systemd. Maybe these logs should be forwarded to a central remote system for added security?
Chris: Agree.
Vincent: The portal stores tokens giving significant access to the infrastructure. How’s the security managed on the portal and for token management?
Chris: There are different components. Most of the SLATE development team has access to the different systems. Only the “current" tokens are in memory, and the main database is the Amazon DynamoDB, who is accessible only by a couple of people in the SLATE development team.
Vincent: Scanning the container is very nice. Have you also considered scanning for insecure configurations (e.g. exposing unprotected SMB on the Internet)?
Chris: The current scans are any case quite limited. This said, the number of new SLATE applications is quite low and the review is largely done manually for the moment.
Searching for configuration issues is a good idea— This is not done yet as the tools are currently quite limited.
- General discussion:
Romain: What should this group try to achieve? What are the security aspects that needs to be covered by this WG?
Rob: The resources providers would probably like to see an effort to increase the trust in the SLATE service, security model, and overall operational/deployment strategy. SLATE is a significant cultural shift in the way services are operated across a distributed infrastructure.
A WLCG-wide discussion is needed to address the different (security) challenges to overcome, in order to improve service adoption and gain additional capabilities.
Igor: As a side admin, I worry about the permissions I need to give to the SLATE developers to make it work; How can I be sure SLATE will not compromise by Kubernetes system?
Rob: There is documentation available that should address questions around deployment impact, permissions needed, etc.
Romain: The security team would probably also like to see some basic bases covered (image security, security updates, incident response, etc.)
Romain: Do we need to also discuss security policies and trust framework as part of this working group?
Dave: Yes, we need to review the risks and understand more details around SLATE. Depending on the findings we may or may not need new security policies.
Romain: Do we need a security review?
Dave: Probably. This would enable everybody to understand in more details how SLATE works and implications for the resource providers.
Tom: Regarding the security review, Jim Basney and TrustedCI have already started some security review work.
In addition, preparing some kind of “declarations” would help bringing additional transparency, which would be very welcome to improve trust from the different parties involved.
Jim: TrustedCI would like to engage with this WG and ideally share tasks. TrustedCI is looking at the security policy aspects around the SLATE infrastructure, as well as image security scanning tools. It would be helpful to understand how to map this with WLCG policy and operational security practices.
Romain: Direct, close cooperation between TrustedCI and this WG is absolutely crucial.
Romain: We should maybe also explore the implications on incident response of the role/responsibility changes implied by the shift in deployment model.
We will address the trust framework, security policies, security architecture, operational security aspects (vulnerability management, incident response, etc.). Any other obvious area to explore?
Rob: More topics will probably come up in the near future!
***************************
* Next meetings
***************************
- Co-locate a side meeting around the September GDB at Fermilab (coordination: Rob, Fermilab)
https://indico.cern.ch/event/739882/
- Co-locate a side meeting around the October NSF Security Summit / WISE meeting (coordination: Tom, Dave, Romain)
https://trustedci.org/2019-nsf-cybersecurity-summit
One ticket #142370 on 22-Jul-2019 for transfer errors on timeout. But the logs seem to show the transfer did happen. The failure rate is now back to normal.
The June WLCG report showed AGLT2 with a low (88%) availability and reliability. We realized one of our gatekeepers (gate03) had stopped accepting test jobs. We are trying to get the numbers amended.
Hardware:
General maintenance replacing failed dcache storage disks and worker nodes memory.
MSU is finishing recovering from July 1st CRAC shutdown from cottonwood problem.
Site problems following the migration to pilot2+containers starting 19 July
UIUC worker issues following ICC PM
Network reconfiguration at IU
IPv6 status
UTA:
1) First part of our recent purchase (storage + compute nodes) starting to get delivered.
2) Migration of UTA_SWT2 to CentOS7 almost done. (Had deferred this update while optimizing slurm configuration at SWT2_CPB). Expect to complete this later today (7/24).
3) Two tickets over the past two weeks (deletion errors at UTA_SWT2 - bad drive in a RAID set; our local tier-3 needed an update to its frontier/squid settings). Both resolved.
OU:
- Overall things running well.
- Had bad RAM DIMM in one xrootd storage server, which caused some deletion and job failures. Just replaced this morning (as well as the motherboard, which seemed to have issues as well). Everything should be back to normal now.
- Getting quotes for compute nodes for remaining hardware funds.