Hops Hadoop and Q&A with Visiting Guest Speaker Jim Dowling

Name: Hops Hadoop and Q&A with Visiting Guest Speaker Jim Dowling
Start: 2018-04-16T15:00:00+02:00
End: 2018-04-16T16:30:00+02:00
Location: CERN

Monday 16 Apr 2018, 15:00 → 16:30 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

- 15:00 → 16:00
  
  Hops Hadoop, Hopsworks and Q&A with Guest Speaker 1h
  
  This sessions follows up from the morning Computing seminar, see https://indico.cern.ch/event/716743/
  
  Hops is a drop-in replacement for Hadoop that can scale the Hadoop Filesystem (HDFS) to over 1 million ops/s by migrating the NameNode metadata to an external scale-out in-memory database. This talk will introduce recent improvements in HopsFS: storing small files in the database (both in-memory and on SSD disk tables), a new scalable block-reporting protocol, support for erasure-coding with data locality, and work on multi-data center replication. For small files (under 64-128 KB), HopsFS can reduce read latency to under 10ms, while also improving read throughput by 3-4X and write throughput by >15X. Our new block reporting protocol reduces block reporting traffic by up to 99% for large clusters, at the cost of a small increase in metadata. While our solution for erasure-coding is implemented at the block-level preserving data locality. Finally, our ongoing work on geographic replication points a way forward for HDFS in the cloud, providing data-center level high availability without any performance hit.
  One novel aspect of Hops we will discuss, is its use of TLS certificates as an alternative authentication/authorization mechanism to Kerberos. Apart from the improved scalability of certificate managers, compared to the Kerberos KDC, certificates offer the ability to support multi-tenancy and easier integration with devices/clients in external administrative domains. Finally, we will discuss operational support for Hops, and how it supports new features such as Anaconda, Spark, Hive, and TensorFlow.
  
  Speaker: Dr Jim Dowling (KTH Royal Institute of Technology in Stockholm)
  
  CERN-data-engineering-apr-2018.pdf
  
  CERN-data-engineering-apr-2018.pptx
- 16:00 → 16:30
  
  Q&A 30m
  
  Q&A on the topics presented in this sessions and in the morning computing seminar

Choose timezone

Hops Hadoop and Q&A with Visiting Guest Speaker Jim Dowling

513/1-024

CERN