Hops Hadoop and Q&A with Visiting Guest Speaker Jim Dowling

513/1-024 (CERN)



Show room on map
    • 3:00 PM 4:00 PM
      Hops Hadoop, Hopsworks and Q&A with Guest Speaker 1h

      This sessions follows up from the morning Computing seminar, see https://indico.cern.ch/event/716743/

      Hops is a drop-in replacement for Hadoop that can scale the Hadoop Filesystem (HDFS) to over 1 million ops/s by migrating the NameNode metadata to an external scale-out in-memory database. This talk will introduce recent improvements in HopsFS: storing small files in the database (both in-memory and on SSD disk tables), a new scalable block-reporting protocol, support for erasure-coding with data locality, and work on multi-data center replication. For small files (under 64-128 KB), HopsFS can reduce read latency to under 10ms, while also improving read throughput by 3-4X and write throughput by >15X. Our new block reporting protocol reduces block reporting traffic by up to 99% for large clusters, at the cost of a small increase in metadata. While our solution for erasure-coding is implemented at the block-level preserving data locality. Finally, our ongoing work on geographic replication points a way forward for HDFS in the cloud, providing data-center level high availability without any performance hit.
      One novel aspect of Hops we will discuss, is its use of TLS certificates as an alternative authentication/authorization mechanism to Kerberos. Apart from the improved scalability of certificate managers, compared to the Kerberos KDC, certificates offer the ability to support multi-tenancy and easier integration with devices/clients in external administrative domains. Finally, we will discuss operational support for Hops, and how it supports new features such as Anaconda, Spark, Hive, and TensorFlow.

      Speaker: Dr Jim Dowling (KTH Royal Institute of Technology in Stockholm)
    • 4:00 PM 4:30 PM
      Q&A 30m

      Q&A on the topics presented in this sessions and in the morning computing seminar