OSCT 5

Europe/Zurich
CCIN2P3

CCIN2P3

Lyon
Romain Wartel (CERN)
Description
Operation Security Coordination Team meeting - OSCT-5 Registration was at http://egee.in2p3.fr/events/OSCT5/ . You will find there also details about accommodation, access etc. For your convenience, here is an additional link with a short list of hotels and their distances to the CC-IN2P3: http://cc.in2p3.fr/rubrique228.html Note that this is not an exhaustive list and it does *not* imply any recommandation. Note that the airport at Lyon, named St. Exupéry, abbreviation LYS, can be reached by various airlines, also low cost ones. Audio connection to the meeting: ==================== The audio connection is available half an hour before the meeting starts, on both days (8:30). If you want to participate in the discussion, you will either need a phone or a H323 client. See the attached file named "Audioconf" for the details (read it, you really will need it if you want to participate). If you just want to listen, use the following: On Tuesday, http://rms.in2p3.fr/compagnon.php?idConf=16210&identConf=51819_OSCT-5-First or rtsp://193.48.95.69/conf_h263_g711u_384000_51819_OSCT-5-First On Wednesday, http://rms.in2p3.fr/compagnon.php?idConf=16211&identConf=51819_OSCT-5-Second or rtsp://193.48.95.69/conf_h263_g711u_384000_51819_OSCT-5-Second Please send a mail to rumler at the in2p3 domain (in2p3.fr) if you have problems or questions.
AudioConf
No meeting minute-taker was found so the following is a summary of the actions
and decisions made by the Chair.

#################################
# Agenda
#################################

Since unfortunately there were no attendee from the MWSG, the session
"Interaction and feedback to and from the MWSG" has been cancelled and replaced
by AOB and other technical discussions.

#################################
#  Status, progress and issues
#################################

Status of actions from each ROC, based on:
https://twiki.cern.ch/twiki/bin/view/LCG/PendingActions

- CERN: As a result of the IT re-organisation at CERN, Romain becomes the OSCT
Security Contact for the CERN region, replacing Remi. David and Louis are also
no longer involved in the OSCT. All this obviously impacted our action list, but
great progress were made in the security service challenges (SSC) area, thanks
to Pal. There were also progress with Pakiti 2 and SAM tests results.

- Russia: (via the phone, but connection problems) Still maintaining the two
OSCT websites. Should be notified of any manual (=AFS) modification on the
public site.

- France: New EGEE incident response lists, with less/no spam. Long awaited,
good progress. Transition should be dealt with CERN offline. But little progress
was made to monitor the SAM Security tests results.

- UKI: SSC2 ongoing on 19 sites, feedback received from 18 sites. Overall
results are positive, although some problems are still detected.
SAM Security tests results highlighted VO-related problems, which must be
escalated in the project (Romain started discussing with the affected VOs).

- DECH: Nobody could attend the meeting to represent the region. From previous off-line discussions, due to resources issues, it was not possible to handle the action list and will not be possible at least in the near future. However, some action is being taken on the SAM Security tests (status: 1 site took corrective actions, others are still pending)

- Italy: Lots of efforts to reorganise security communication channels, but
there are still some problems. Does this need to be escalated?
Several security incidents also required efforts from the ROC security contact.
With regards to the development of SAM Security tests, not much progress seems
possible due to the limitations of the framework (lack of privileges). Will be
discussed further in the monitoring session.
Some progress was made in collaboration with CERN on SSC3.

- CE: Another SSC2 ongoing. Progress in the monitoring area include: Pakiti
pilot deployment (300 nodes), SAM test attempts to obtain the list of installed
packages (some problems still not resolved), and traceability improvement work
on the LB. Will be discussed further in the monitoring session.
CE also contributed to the RSS feed (1 article on CRLs update)
Romain: Traceability improvement and monitoring have been identified as
essential activities for us in EGEE-III

- NE: (via the phone) Commented on the monitoring plan back in January.

- SEE: (via the phone, but connection problems) Still working on accessing the
GOCDB to find sites without CSIRTs.

- APROC: (via the phone, but connection problems) No report received.

- SWE: Another SSC2 ongoing (12 sites). Upgraded the OSCT jabber server.
Progress on SAM and or on the IR scenario was not possible.


#################################
# Security Challenges
#################################

* Feedback

- Very useful

- Early warnings will be sent to the regions/sites for the next iteration

- Two ROCs noted that some sites may ignore the challenge if it is announced as
a test, but the majority agreed challenges must remained flagged as test to
avoid disturbing the production work

- Interesting finding: a challenged site realised traffic from the WNs was not
logged in the site firewall if it used an carefully chosen TCP port!

* Plan before 31/04/2007

- Complete T1/T0 testing (France, CERN) -> Pal, Romain

- Improve the SSC3 so that other ROCs can use it too -> Pal

- Produce a standard assessment form/graphs -> Pal, Romain

* Plan before 31/05/2007

- Identify regional VOs to run SSC3 in the ROCs: FRANCE, CE, UK, SWE and ITALY
at least have such VOs. -> Action on all ROCs

- Test partners T1 (OSG, NDGF) -> Pal, Romain

* Before EGEE 08

- Test major T2s, at least once, and present assessment forms/graphs at EGEE08
-> Action on all ROCs

#################################
# Security in the regions
#################################

Presentation by France, and then Italy, and then Russia (all talks are on the
agenda page)

#################################
# EGEE-III Planning/Changes
#################################

* Organisation for EGEE-III

- Face to face meeting 2/year, organised by the ROCs

- Ops meeting on the phone conf: 1/week, time to decided:
    - OSCT-DC handover
    - Issues in the regions

- Once a month, the meeting will be longer and include in addition a
progress/status report from each ROC

- Quarterly reports will be based on the progress reports from the partners

- Input to the deliverables will be asked to all ROCs

- Main activities coordination will be in the ROCs


* Resources allocation

Base level of efforts is estimated for each ROC to 0.3 FTE (~ 8 PM):
     - Day-to-day issues

     - OSCT-DC

     - SAM tests results

     - Work in the region + NGI

     - Input for deliverables

     - JSPG contributions

     - Meeting organisation

In addition, team activities will presumably cover:

     - Monitoring
         - Monitoring coordination (CE?)
         - Monitoring contributions (RUSSIA?, ITALY?)
     - Detect and escalate grid-wide SAM problems

     - Incident response
         - Incident response coordination (SWE?)
         - Incident response channels (FRANCE?)
         - Incident response scenarios
         - Security service challenges (CERN?)


     - Training and dissemination
         - Training and dissemination coordination (UK?)
         - Training and dissemination contributions (ITALY?, SWE?)
         - Website, communication and outreach (RUSSIA?)

     - Global architecture security review (UK?)
       (including traceability and controls, and documenting
        how-tos for these)

     - Audit (VO scheduler, Web applications, etc.)

A more detailed list of tasks will be drafted and circulated in the coming
weeks. Each ROC/partner will also be contacted individually to discuss available
resources for TSA 1.4, and contributions to activities in the team.

#################################
# Monitoring Session 1
#################################

Update on progress (Daniel/Michal, Romain), listing the current tools, including
status and issues. The objective is that all ROC Security Contacts understand
where we are today, and what is available today.

- Presentation on L&B: RB and CE events are logged on the LB server
    * Would it be possible to produce a plan to integrate security traceability
requirements in this, including current work being done by Ales?
-> Action on CE

https://twiki.cern.ch/twiki/bin/view/LCG/LogRetention
https://edms.cern.ch/document/428037/

CE ROC is happy to deal with the missing grid job traceability functionalities.

Another ROC should be involved to investigate data mining possibility to detect
pattern changes in the job submission.

#################################
# Monitoring Session 2
#################################

Objectives of security monitoring: checking sites compliance with security
standards (to be defined?), and detected grid security incidents.

Three types for monitoring:
- SAM security test: all sites, but limited (scope, not root permission)
- Select useful tools and ask the site to run them: "volunteer approach=will not
monitor the sites we want to monitor" "forced approached=too problematic"
- Integrated in the middleware: needs to be maintained/test/certified, probably
not affordable for the OSCT

A list of the risks, as well as exactly what parameters should be monitored,
should be prepared to check what approaches would be suitable. Probably a mix of
the three is needed, but it is important to establish priority.

At least 24 PM more (in addition to the offer from CE to coordinate) should be
allocated there.

We should investigate the possibility of preparing a security audit tool to be
used by the ROCs and run at the sites. Results could/would then be gathered
centrally.

Again, monitoring and traceability are key activities for the team in EGEE-III.

#################################
# RTIR: presentation/usage
#################################

Is this suitable for us to use to deal with security incidents?

Overall feedback: not clear there will be benefits apart from keeping an
history/statistics. A test installation will be made available to get a more
in-depth understanding of the tool.


#################################
# Training and dissemination
#################################

Should improve the structure of the information and obtain more grid-specific
information, per node type.

Proposed structure:
- OSCT Webpage: Should follow the work-list of sysadmin tasks:

    - Installation/configuration of Linux system
        -> "recycle" items from the RSS feed

    - Installation/configuration of grid services
        -> Pointer to this: a framework for experts/developers
        to provide security recommendations per node type on common grid
        services

    - Procedures: containing user jobs, how to block a user, important log
        files, incident handling, IR procedures, log retention, etc.
        -> Needs to be done.

RSS feed would cover these 3 parts.

Carlos: self security audit should be done by the sites

#################################
# AOB
#################################

* xrootd/alice: Currently in production anonymous write (but no deletion) is
possible if the surl is known.

Obtaining the surl can be done via the catalogue, whose access is authorised.
Valid surls pointing to alice data may be difficult (but NOT impossible) to
guess. An attacker could also write in different other locations, possibly
causing damage to the service.

This is a a violation of the logging and traceability policy, which must be
addressed urgently by the VO.
The VO confirmed the issue is actively being worked on, and a solution based on
X509 credentials is prepared, but is not yet available in production (hopefully
in the coming weeks).

* Mapping whole VO to a single uid account vulnerability:
Mapping simultaneously several users to the same uid account means any user can
access credentials from other users, at least on the WNs. This is a clear
violation of the logging and traceability policy and therefore must not be
implemented at the sites.

* Central syslog: An operational notice should be prepared to require a "central
syslog service or equivalent". Some documentation should be provided.

* Log retention:
90 days for all services, 180 days for core services, or 180 days for all services?

Impact/cost is not clear for big sites. Romain will check the situation at CERN,
 assess the cost of such change, and report back on the OSCT list.

(In case some text is produced, it should include the estimated cost of storage.)

* Next meeting
- Short operational weekly meeting (will start early April, exact date to be
confirmed)
- The meeting will be longer once a month (first meeting of the month) for a
progress review reports from each partner.
There are minutes attached to this event. Show them.
    • 09:00 10:30
      Introduction: status, progress and issues

      Quick review of progress made and issues faced by each ROC in general and on the actions from last meeting:
      https://twiki.cern.ch/twiki/bin/view/LCG/PendingActions

      Each ROC is asked to produce a status report on progress since the last meeting.

      ROC-FR progress report (pdf)
      ROC-FR progress report (ppt)
    • 10:30 11:00
      Coffee break 30m
    • 11:00 12:30
      Security Service Challenges
      slides
      slides
      slides
      slides
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 15:30
      Security in the regions: presentation from three regions: France, Italy, Russia

      How is security organised in the region?
      how are structured the security information flows?
      Issues, progress, success?

      ROC-FR
      ROC-Russia
      slides
    • 15:30 16:00
      Coffee break 30m
    • 16:00 17:30
      EGEE-III

      How will EGEE-III affect the team?
      What level of support will be available from the ROCs?
      How can the ROC influence the security strategy?
      How can we spread the workload?

  • Wednesday, 19 March
    • 09:00 10:30
      Monitoring

      Brainstorming: SAM usage, Nagios, other tools

      slides
      slides
      slides
    • 10:30 11:00
      Coffee break 30m
    • 11:00 12:30
      Monitoring

      Task identification and allocation

    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 14:30
      RTIR and incident management
      slides
      slides
    • 14:30 15:30
      Training and dissemination

      Priorities, task allocation and next event

    • 15:30 16:00
      Coffee break 30m
    • 16:00 17:30
      (CANCELED: Interaction and feedback to and from the MWSG) REPLACED BY AOB

      Alice usage of xrootd, logs retention extension proposal, next meeting, and AOB