Sep 24 – 27, 2019
CERN
Europe/Zurich timezone

Natural Language Processing with Intel Quantum Simulator

Sep 26, 2019, 11:00 AM
30m
80/1-001 - Globe of Science and Innovation - 1st Floor (CERN)

80/1-001 - Globe of Science and Innovation - 1st Floor

CERN

60
Show room on map

Speaker

Myles Doyle (Irish Centre for High End Computing)

Description

Natural language processing (NLP) is often used to perform tasks like sentiment analysis, relationship extraction and word sense disambiguation. Most traditional NLP algorithms operate over strings of words and are limited since they analyse meanings of the component words in a corpus without information about grammatical rules of the language. Consequently, the qualities of results of these traditional algorithms are often unsatisfactory with increase in problem complexity.

An alternate approach called “compositional semantics” incorporates the grammatical structure of sentences in a language into the analysis algorithms. One such model is “distributional compositional semantics” (DisCo) which gives grammatically informed algorithms that compute the meaning of sentences. This algorithm has been noted to offer significant improvements to the quality of results. However, the main challenge in its implementation is the need for large classical computational resources.

The DisCo model was developed by its authors with direct inspiration from quantum theory, and presents two quantum algorithms: the “closest vector problem” algorithm and the “CSC sentence similarity” algorithm. Their quantum implementation lowers storage and compute requirements compared to a classic HPC implementation.

In this project, the Irish Centre for High-End Computing collaborates with Intel Corporation to implement the two DisCo model quantum algorithms on the Intel Quantum Simulator (Intel-QS) deployed on the Irish national supercomputer. The Intel-QS performs a number of single- and multi-node optimizations, including vectorization, multi-threading, cache blocking, as well as overlapping computation with communication.

In this project, we target improving the scalability of Intel-QS beyond the limitations imposed by standard MPI implementations and target corpuses with ~1000 most common words using up to 36 qubits simulation. The implemented solution will be able to compute the meanings of two sentences (built from words in the corpus) and decide if their meanings match.

Presentation materials