CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Using HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

11 Oct 2016, 14:15

15m

GG C2 (San Francisco Mariott Marquis)

GG C2

San Francisco Mariott Marquis

Oral Track 3: Distributed Computing Track 3: Distributed Computing

Alexei Klimentov (Brookhaven National Laboratory (US)) Ruslan Mashinistov (National Research Centre Kurchatov Institute (RU))

PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics.

Modern biology uses complex algorithms and sophisticated software, which is impossible to run without access to significant computing resources. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of ancient genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. PALEOMIX include typical set of software used to process NGS data including adapter trimming, read filtering, sequence alignment, genotyping and phylogenetic or metagenomic analysis. Sophisticated computing software WMS and efficient usage of the supercomputers can greatly enhance this process.

In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. We used PanDA to manage computational tasks on a multi-node parallel supercomputer. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA, what was earlier demonstrated for the ATLAS applications on the Grid. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.

Primary Keyword (Mandatory)	Data processing workflows and frameworks/pipelines
Secondary Keyword (Optional)	Experience/plans from outside experimental HEP/NP
Tertiary Keyword (Optional)	High performance computing

Alexei Klimentov (Brookhaven National Laboratory (US)) Alexey Poyda (National Research Centre Kurchatov Institute (RU)) Ruslan Mashinistov (National Research Centre Kurchatov Institute (RU))

Alexander Novikov (National Research Centre Kurchatov Institute (RU)) Anton Teslyuk (National Research Centre Kurchatov Institute (RU)) Artem Nedoluzhko (National Research Centre Kurchatov Institute (RU)) Eygene Ryabinkin (National Research Centre Kurchatov Institute (RU)) Fedor Sharko (National Research Centre Kurchatov Institute (RU)) Ivan Tertychnyy (National Research Centre Kurchatov Institute (RU)) Kaushik De (University of Texas at Arlington (US)) Tadashi Maeno (Brookhaven National Laboratory (US)) Torre Wenaus (Brookhaven National Laboratory (US))

Highlights-045.pdf

Oral-045.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Using HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

GG C2

San Francisco Mariott Marquis

Speakers

Description

Primary authors

Co-authors

Presentation materials

Choose timezone

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Speakers

Description

Primary authors

Co-authors

Presentation materials

Share this page

Direct link

Social networks

Calendaring