12–16 Oct 2015
Brookhaven National Laboratory
America/New_York timezone

Accelerating High Performance Cluster Computing Through the Reduction of File System Latency

14 Oct 2015, 17:10
20m
Bldg. 510 - Physics Department Large Seminar Room (Brookhaven National Laboratory)

Bldg. 510 - Physics Department Large Seminar Room

Brookhaven National Laboratory

Upton, NY 11973
Storage & Filesystems Storage and Filesystems

Speaker

Mr David Fellinger (DataDirect Networks, Inc)

Description

The acceleration of high performance computing applications in large clusters has primarily been achieved with a focus on the cluster itself. Lower latency interconnects, more efficient message passing structures, higher performance processors, and general purpose graphics processing units have been incorporated in recent cluster designs. There has also been a great deal of study regarding processing techniques such as symmetric multi-processing versus efficient message passing to accomplish true parallel processing. There has been, however, only incremental changes in parallel file system technology. Clusters perform input/output operations through gateway servers and a file creation infers locking operations in all parallel file systems. In fact, a file creation is a serial process which locks and assigns V-nodes, I-nodes and extent lists through one server to complete the operation. For years, web users have explored parallel methods of moving data to get around network connection limitations. Applications such as Napster and Bit Torrent have used the technology of a Distributed Hash Table to effectively allow true parallel file operations where “pieces” can be placed to, or gathered from, a number of service nodes arranged in a redundant fashion. This paper will explore the use of a Distributed Hash Table technology to service the data needs of a large scale cluster allowing the same parallelism in data mobility as is assumed in processing. This new paradigm will displace the concept of the gateway server and will allow data intensive operations in a “non-blocking” construct.
Length of presentation (max. 20 minutes) 20

Author

Mr David Fellinger (DataDirect Networks, Inc)

Presentation materials