Sep 24 – 27, 2019
CERN
Europe/Zurich timezone

Partitioned Interleaved Bloom filters using Optane DC Persistent Memory

Sep 25, 2019, 4:15 PM
30m
80/1-001 - Globe of Science and Innovation - 1st Floor (CERN)

80/1-001 - Globe of Science and Innovation - 1st Floor

CERN

60
Show room on map

Speaker

Enrico Seiler (Freie Universität Berlin)

Description

The recent improvements of full genome sequencing technologies, commonly subsumed under the term NGS (Next Generation Sequencing), have tremendously increased the sequencing throughput. Within 10 years it rose from 21 billion base pairs collected over months
to about 400 billion base pairs per day (current throughput of Illumina's HiSeq 4000).
The costs for producing one million base pairs could also be reduced from 140,000 dollars to a few cents.
As a result of this dramatic development, the number of new data submissions, generated by various biotechnological protocols (ChIP-Seq, RNA-Seq, etc.), to genomic databases has grown dramatically and is expected to continue to increase faster than the cost and capacity of storage devices will decrease.

The main task in analyzing NGS data is to search sequencing reads or short sequence patterns (i.e. exon/intron boundary read-through patterns) or expression profiles in large collections of sequences (i.e. a database).
Searching the entirety of such databases mentioned above is usually only possible by searching the metadata or a set of results initially obtained from the experiment. Searching (approximately) for specific genomic sequence in all the data has not been possible in reasonable computational time.

In this work we describe results of our new data structure, called binning directory that can distribute approximate search queries based on an extension of our recently introduced Interleaved Bloom Filters (IBF) called x-partitioned IBF (x-PIBF). The results presented here make use of Intel's Optane DC Persistent Memory architecture and achieves significant speedups compared to a disk based solution.

Co-author

Knut Reinert (Freie Universität Berlin)

Presentation materials