Powering Tensorflow with Big Data: A look at Apache Beam & Apache Spark for Tensorflow
This talk will explore how to use TensorFlow in conjunction with Apache Spark, Flink, and Beam to create a full machine learning pipeline — including feature engineering and data prep components that many might prefer to pretend don’t exist. This talk will look at tools like TensorFlowOnSpark, TFX / TFT / TFDV, and the challenges of integrating data prep into serving. We'll wrap up by examining changing industry trends, like Apache Arrow, and how they impact cross-language development for things like deep learning.
About the speaker
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, Airflow, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
Luca Canali and Maria Girone, CERN openlab
CERN Computing Seminars and Colloquia