Deep Storage for Big Scientific Data

Apr 14, 2015, 5:30 PM
C209 (C209)



oral presentation Track3: Data store and access Track 3 Session


David Yu (BNL)


Brookhaven National Lab (BNL)’s RHIC and Atlas Computing Facility (RACF), is supporting science experiments such as RHIC as its Tier-0 center and the U.S. ATLAS/LHC as a Tier-1 center. Scientific data is still growing exponentially after each upgrade. The RACF currently manages over 50 petabytes of data on robotic tape libraries, and we expect a 50% increase in data next year. Not only do we have to address the issue of efficiently archiving high bandwidth data to our tapes, but we also have to face the problem of randomly restoring files from tapes. In addition, we have to manage tape resource usage and technology migration, which is moving data from low-capacity media to newer, high-capacity tape media, in order to free space within a tape library. BNL’s mass storage system is managed by a software called IBM HPSS. To restore files from HPSS, we have developed a file retrieval scheduling software, called TSX. TSX provides dynamic HPSS resource management, schedules jobs efficiently, and enhances visibility of real-time staging activities and advanced error handling to maximize the tape staging performance.

Primary author


Presentation materials