WLCG Cloud Traceability Working Group F2F

31/S-028 (CERN)



Show room on map
Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
Face to face meeting of Cloud Traceability working group. Reports on actions since last face to face meeting and discussion of next steps.
  • Claudio Grandi
  • Dave Dykstra
  • David Crooks
  • George Ryall
  • Ian Neilson
  • Ian Peter Collier
  • Maarten Litmaath
  • Raul Cardoso Lopes
  • Vincent Brillault

F2F 9 Jun 2015

09 June 2015 14:02

Minutes/Notes Cloud Traceability WG @CERN 9-Jun-2015 Notes taken by Ian Neilson. https://indico.cern.ch/event/396202/
Action Areas

As per slides IanC

DavidC: Hypervisor & Netflow Logging As per slides

Note HEPSYSMAN lots of traction with ELK Next - Testing: OpenSOC, Nfsen, silk

Soft and Hardw sources

Looking for Common solutions and investigating data drtes Glasgow Cloud

Hardware in + OpenStack underway Will pursue netflow testing

Comment from Raul:
Hardware expensive from Cisco
Brunel working on collection of logs (syslogng)
For opensource from Cisco start from graylog
hostsflow project option
Build prototype collect syslogng -> graylog & export to ?? (lisp lang) (chose graylog because could find plugin)
Working on interfaces using Elastic search.

IanC : Use of capture cards?
Liviu: trying to stay away from dedicated hardware and try to use Bro initially
Liviu: Use nprobe to generate netflows from raw traffic
Raul: Question as to use of data, data segregation, what to do with "non-grid" data or how to exclude. Discussion as to dp/privacy compliance.
Maarten: Only an issue if some off-site/central service.
IanC: Need for incident response. Netflows more easily falls towards "Personal Data" than central syslog. Aggregation needs to be thought through.
Michel: First objective not aggregation but need for traceability at source site.
IanC: Aggregation necessary for tracing
David: How to treat non-grid different from grid build this in from the start.
Liviu:Depending on network config. But may be necessary to filter on IP
Raul: Brunel may have to change Security Policy to gather data.h

IanC: Log management tools
RAL building ELK infrastructure (on back of castor work), new hardware etc., longer than planned. Next step to send logs from cloud into that.
IBM NY Research - rolling demo of Cloud Sec product. a la OpenSOC. Interesting machine learning approach.
James Adams presentation HEPSYSMAN tuning low level logstash
Raul: volume numbers? Ref: Liviu

IanC: Security Service Challenges for clouds
Sven planning in context of EGI. 

Vincent: definitely gaps in EGI-Cloud

IanC: Quarantine/Deferred deletion
Condor makes it easy. Working on building into storage service - hard because of all the hooks needed/not clean - maybe go to OpenNebula devel.
Tim: Images or instances ? A:Images
Would be good to do same for OpenStack. David: On the plan.
Tim: OpenStack has the functions to defer deletion but causes resource problems and chain to do it is not there.
Michel: Not framework (StratusLab, OpenStack etc) but at libvirt level. Ask developers?
Tim: 300-400 images/hr on Cern production cloud so problems with volume (especially with local storage on hypervisors). Upstream would probably be happy to talk about this. 

DaveD: observing on behalf of OSG sec. team. IanC: good, keen to not duplicate effort. DaveD: more people investigating commercial clouds. Discussion on risk against commercial providers?
Romain: NetFlix published entire stack Tim: Have open sourced. Romain: convergence on OpenSOC (+Bro) unique/interesting


Liviu: Security Operation Centres Presentation slides.

Haven't pushed all the data to hadoop yet. Plan to scale out
Bro not java is C as efficiency issues.
Solr has high memory requirements, recommended dedicated macines.
Raul: question written over quality of jar quality in distributions?
Romain: cannot see other solution, is necessary.
Raul: not only useful for cloud traceability but general problem solving compared to static (nagios) monitoring
David/Vincent: good traction with groups using ELK for other uses will be low step to use for security.
Michel/Liviu/DavidC: Will share tech/experience with outside community. Definite collaboration with inteligence feeds.
Raul/Liviu: Not (yet) looked at machine learning. Have looked at Spark for data centre data (temp etc.), looked at machine learning for partial data set modelling.
DavidC: would normal site monitoring and security be separated?
Liviu: Not much overlap between Nagios/Gangli and OpenSOC.
Vincent: Issues with log merging may be how long are stored and ACLs. Elastic Search has not many ALC modules (at least free)

IanC: Close up

  • Carry actions from previous meetings.

  • Direct experience with OpenSOC would be good in addition to CERN project. Will look at RAL effort. DavidC interested. Romain: is necessary but only alternative to commercial solutions is involve in home grown solutions.

  • IanC: SSC try to use not just against fed cloud.

  • Investigate traceability in vac. DavidC interested in following this up. 

There are minutes attached to this event. Show them.
    • 14:00 14:15
      Introduction 15m
      Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
    • 14:15 14:30
      Hypervisor & Netflow Logging 15m
      Speaker: David Crooks (University of Glasgow (GB))
    • 14:30 14:45
      More on hypervisor & net flows 15m
      Speaker: Raul Cardoso Lopes (Brunel University (GB))
    • 14:45 15:00
      Log management 15m
      Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
    • 15:00 15:15
      Security Challenges & Image Quarantine 15m
      Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))
    • 15:15 15:45
      Coffee Break 30m
    • 15:45 16:30
      Security Operations Centres 45m
      Speaker: Liviu Valsan (CERN)
    • 16:30 17:30
      Discussion 1h
    • 17:30 17:50
      Summary & Actions 20m
      Speaker: Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))