CERN Computing Colloquium

Towards an Exabyte File System

by James Hughes (Huawei)

Salle Bohr 40/S2-B01 (CERN)

Salle Bohr 40/S2-B01


Building 40/S2-B01
James Hughes, Linda Schneider, Pat Patterson, Yang Deping Huawei Technologies The demand for storage has proven to be insatiable. One expects that this is simply following market economics - when the price decreases, the demand increases. Storage device prices have decreased six orders of magnitude over the last 30 years, and the perceived demand is still increasing with no sign of abating. While the price of storage for for the individual PC has dropped, the same has not been true for large scale POSIX file systems or large SQL databases. The challenge of current file systems and databases is that, as the solutions scale up, the price per GB also increases. This antithesis of the economies of scale occurs because the complexity of the system increases at scale. There must be a better solution. This presentation will also cover the trend of relaxing the requirements of Posix and SQL, fundamentally allowing scale. We also address the realization of this vision by solving the issues of linear operational complexity, archival quality while retaining near asymptotic low cost. Linear operation complexity requires the complexity of a single unit to be bounded such that the complexity is the same regardless of whether the unit is the first or the millionth. A 1EB file system needs to be "archive quality" since the concept of backing up such a system is unthinkable. At 1EB, the cost of the storage devices themselves are significant and adding additional cost beyond the storage devices becomes problematic. The solution that will be presented is the use of a highly scaled DHT (e.g. Cassandra) as a universal scalable, reliable storage layer for all storage needs at the same time. That is, using a no-SQL Key Value Store as the storage layer below a variety of storage adaptations such as file systems (similar to Amazon S3), no-SQL databases (similar to Amazon SimpleDB), data analytics (similar to Hadoop or Amazon EMR), volume level services (similar to Amazon's EBS) and even middleware applications like message queue services (similar to Amazon SQS); the list goes on. This places the reliability below the storage API adaptation layer, which fundamentally changes the way storage is scaled. This is contrasted to today, where all storage adaptations live in different instantiations of the same paradigm of sector addressable storage and scaled up though RAID controllers, SAN and other networking technology. This presentation will look at the system from the applications layer while abandoning the older more unscalable APIs like Posix and maybe even SQL through to a newer generation of APIs that manage persistent data down into a scalable reliable storage system that gets faster as it gets larger. Reliability will be discussed from low level read/write errors to disaster recovery scenarios as well as complete hardware lifecycle management. This architecture will be discussed using a simplified model of storing and processing the massive amount of data (similar to what Cern is producing and processing). Bio: James Hughes James Hughes is a fellow at Huawei's US technology center and is the lead of the Huawei US Cloud Computing team. Prior, he was a Sun Fellow at Sun Microsystems where he was the Chief Technologist for the Solaris Operating System. Over the past 30 years he has also been a fellow at Storage Technology Corporation (large storage systems) and Network Systems Corporation (HYPERchannel, HIPPI and supercomputing networks). James of over 20 patents and another 20 patent applications and is the chair of the IEEE Information Assurance standards organization and the iEEE Technical Committee on Computer elements. James is an adjunct professor at both Peking University in China and Indiana University in the United States.
Organized by

Bob Jones, IT Department