In November 2018, running on a mere half-rack of ordinary SuperMicro servers, WekaIO's Matrix Filesystem outperformed 40 racks of specialty hardware on Oak Ridge National Labs' Summit system, yielding the #1 ranked result for the IO-500 10-Node Challenge. How can that even be possible?
This level of performance becomes important for modern use cases whether they involve GPU-accelerated servers for artificial intelligence and deep learning or traditional CPU-based servers at massive scale. Teams of researchers and data scientists should be free to focus on their work and not lose precious time waiting for results caused by IO bottlenecks. An example use case within HEP where this technology may be most useful is the production of pre-mixing libraries in experiments like CMS. CMS uses at present a 600TB “library” to simulate overlapping proton proton collisions during its simulation campaigns. The production of this library is an IO limited workflow on any filesystem in use within the experiment today.
In this tech-talk, the architecture of the Matrix filesystem will be put under the microscope, explored and discussed. This talk will include real-world examples of data intensive workloads along with a variety of benchmark results that show the filesystem's versatility and ability to scale.