20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

Name: 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)
Start: 2013-10-14T09:00:00+02:00
End: 2013-10-18T13:00:00+02:00
Location: Amsterdam, Beurs van Berlage

14–18 Oct 2013

Amsterdam, Beurs van Berlage

Europe/Amsterdam timezone

CHEP2013 Logistics Management

info@chep2013.org

dCache Billing data analysis with Hadoop

14 Oct 2013, 15:00

45m

Grote zaal (Amsterdam, Beurs van Berlage)

Grote zaal

Amsterdam, Beurs van Berlage

Poster presentation Data Stores, Data Bases, and Storage Systems Poster presentations

Kai Leffhalm (Deutsches Elektronen-Synchrotron (DE))

The dCache storage system writes billing data into flat files or a relational database. For a midsize dCache installation there are one million entries - representing 300 MByte - per day. Gathering accounting information for a longer time interval about transfer rates per group, per file type or per user results in increasing load on the servers holding the billing information. Speeding up these requests renders new approaches to performance optimization worthwhile. Hadoop is a framework for distributed processing of large data using multiple computer nodes. The essential point in our context is the scalability for big data. Data is distributed over many nodes in the Hadoop Distributed File System (HDFS). Queries are processed in parallel on every node to extract the information and combine it in another step. This is called a MapReduce algorithm. As the dCache storage is distributed over many storage nodes combining both on every node is obvious. The transformation of the billing information into the HDFS structure is done by a small script. The MapReduce functions to create the results to the most important queries are implemented for each request. We will present the system's setup and performance comparisons of the created queries using Postgresql, flat files and Hadoop. The overall gain in performance and its dependence on both the amount of analysed data and available machines for paralleling the request will be demonstrated.

Kai Leffhalm (Deutsches Elektronen-Synchrotron (DE))

Mr Andreas Knoepke (DESY)

Slides

Chep2013-dCache-Hadoop.pdf

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

dCache Billing data analysis with Hadoop

Grote zaal

Amsterdam, Beurs van Berlage

Speaker

Description

Primary author

Co-author

Presentation materials

Choose timezone

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

Speaker

Description

Primary author

Co-author

Presentation materials

Share this page

Direct link

Social networks

Calendaring