CERN Computing Seminar

CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning

by Prof. Ethan L. Miller (University of California, Santa Cruz)

31-3-004 - IT Amphitheatre (CERN)

31-3-004 - IT Amphitheatre


Show room on map

Parameter tuning is an important task of storage performance optimization. Current practice usually involves numerous tweak-benchmark cycles that are slow and costly. To address this issue, we developed CAPES, a model-less deep reinforcement learning-based unsupervised parameter tuning system driven by a deep neural network (DNN). It is designed to nd the optimal values of tunable parameters in computer systems, from a simple client-server system to a large data center, where human tuning can be costly and often cannot achieve optimal performance. CAPES takes periodic measurements of a target computer system’s state, and trains a DNN which uses Q-learning to suggest changes to the system’s current parameter values. CAPES is minimally intrusive, and can be deployed into a production system to collect training data and suggest tuning actions during the system’s daily operation. Evaluation of a prototype on a Lustre system demonstrates an increase in I/O throughput up to 45% at saturation point.

About the speaker

Prof. Miller is the Director of the NSF I/UCRC Center for Research in Storage Systems (CRSS) and the Associate Director of the Storage Systems Research Center here at UC Santa Cruz, where he explores issues in file and storage systems and, more generally, distributed systems and operating systems. Current projects include file systems for next-generation storage technologies, archival storage systems, file system security, scalable file system indexing, and exascale storage systems. He is also interested in storage system benchmarks, algorithms to more efficiently manage storage, and information retrieval from very large text and multimedia corpora as well as other problems in computer systems and security.

More information
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now