CERN Computing Seminar

Breakthroughs in security, efficiency, and performance with on-die hetero-processing

by Dr Garret Swart (Oracle Corp.)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

The Oracle database is the world's leading data management product. After the acquisition of Sun Microsystems in 2010, engineers from Oracle and Sun started work on a new category of microprocessor designed to process data several times faster, many times more efficiently, and qualitatively safer. This kind of goal cannot be reached by running software unchanged -- we needed to design new hardware and write new software at all levels of the system to utilize it. The approach we took, and are taking, exploits the following ideas:

  • Big is better: Large scale computing systems give access to huge amounts of data without the costs of moving data between systems. This allows for larger tables, bigger sorts, fatter graphs, and more cloud tenants sharing the the same resource pool on SPARC systems that scale linearly in cost and performance from 8 to 512 cores, 64 to 4096 threads.
  • Secure is better: Cache-line level memory access checking allows our instrumented memory allocators to manage memory at production speed while detecting bugs and reporting attacks in real time.
  • Information Density is better: With hardware designed for scanning n-gram compressed, bit packed, dictionary and run-length encoded columnar data at full memory bandwidth, we make maximal use of every bit stored and every cache line transferred over the memory channels with no impact on performance.
  • Fast is better: With hardware support for database operators running on specialized streaming processors, we can drive the memory channels at maximum rate, freeing up power and cores for running user computations on the result of these operators.
  • Connected is better: Integrating EDR InfiniBand on-chip and on-board with low-latency, high-throughput, one-sided networking.
  • Portable is better: By supporting platform independent acceleration APIs inside the database we can support a wide variety of acceleration techniques and give applications and query planners the information to make the best use of the available hardware.
  • Integrated is better: By supporting and accelerating multiple storage types (In-memory, NFS, NVMe, Exadata, HDFS, Fibre Channel), data formats (row major, column major, graph, JSON, spatial, MIME, Hive), algorithms, query languages, network protocols, and hardware platforms in a single product, we can share resources, increase usability and reduce the cost and the cognitive load in acquiring, storing, securing and understanding data.

In this talk, I will describe the experience that drives our acceleration priorities, the constraints and joys of the hardware-software co-design process, the HW features that resulted, and how software engineers have exploiting these features in ways we expected and ways we didn't. The industry has seen this approach used in the acceleration of linear algebra and computer graphics, and in this talk we'll see how we apply similar techniques to data processing but with changes to match the lower compute density of the problem space.

About the speaker

Garret Swart works at Oracle designing new products and capabilities for Database, Cloud, Big Data, Java and SPARC systems. He manages a small advanced development team that works with the product teams to prepare new technologies for incorporation into Oracle's products. Recent summer student projects include lock-free multi-master fault-tolerant transactional hash tables, adaptive compression for memory-speed decompressors, ROI driven adaptive sampling for load balancing, SIMD optimized arbitrary precision arithmetic, and HW-aware optimization strategies for large-to-large hash joins.

Previously, Garret was a researcher at IBM Almaden, developing technology underlying IBM's BLU and Pure Scale products, taught and researched database and storage systems at University College Cork, led start-ups developing ERP for service companies and interactive television for couch potatoes, designed parallel image processing systems at Xerox and distributed operating systems at DEC Systems Research Center where he lead the POSIX pthread standard. His has a PhD from the University of Washington in computational geometry and an BSc from Brown University.


Organised by: Eric Grancher and Miguel Angel Marquina - IT Department
CERN Computing Seminars and Colloquia

More information