Hops Hadoop and Q&A with Visiting Guest Speaker Jim Dowling
Monday 16 April 2018 -
15:00
Monday 16 April 2018
15:00
Hops Hadoop, Hopsworks and Q&A with Guest Speaker
-
Jim Dowling
(
KTH Royal Institute of Technology in Stockholm
)
Hops Hadoop, Hopsworks and Q&A with Guest Speaker
Jim Dowling
(
KTH Royal Institute of Technology in Stockholm
)
15:00 - 16:00
Room: 513/1-024
This sessions follows up from the morning Computing seminar, see https://indico.cern.ch/event/716743/ Hops is a drop-in replacement for Hadoop that can scale the Hadoop Filesystem (HDFS) to over 1 million ops/s by migrating the NameNode metadata to an external scale-out in-memory database. This talk will introduce recent improvements in HopsFS: storing small files in the database (both in-memory and on SSD disk tables), a new scalable block-reporting protocol, support for erasure-coding with data locality, and work on multi-data center replication. For small files (under 64-128 KB), HopsFS can reduce read latency to under 10ms, while also improving read throughput by 3-4X and write throughput by >15X. Our new block reporting protocol reduces block reporting traffic by up to 99% for large clusters, at the cost of a small increase in metadata. While our solution for erasure-coding is implemented at the block-level preserving data locality. Finally, our ongoing work on geographic replication points a way forward for HDFS in the cloud, providing data-center level high availability without any performance hit. One novel aspect of Hops we will discuss, is its use of TLS certificates as an alternative authentication/authorization mechanism to Kerberos. Apart from the improved scalability of certificate managers, compared to the Kerberos KDC, certificates offer the ability to support multi-tenancy and easier integration with devices/clients in external administrative domains. Finally, we will discuss operational support for Hops, and how it supports new features such as Anaconda, Spark, Hive, and TensorFlow.
16:00
Q&A
Q&A
16:00 - 16:30
Room: 513/1-024
Q&A on the topics presented in this sessions and in the morning computing seminar