Driven by a rapid increase in quantity, type, and number of sources of data, many data analytics approaches use the data lake model. Data pieces in the lake vary in size, encoding, age, quality, etc. Additionally, it often lacks consistency or any common schema. Analytics applications pull data from the lake as they see fit, digesting and processing it on-demand for a current analytics task. The applications of data lakes are vast, including machine learning, data mining, artificial intelligence, ad-hoc query, and data visualization. However, the bottleneck of the data transformation required by traditional analytical systems poses great challenges to the fast processing of raw data which is critical for many of the aforementioned applications.
In the presentation, we will discuss how ACCORDA addresses the data transformation bottleneck by applying accelerations, and cover how ACCORDA avoids disruptions in existing analytic software through a uniform worker model enabled by the in-memory integration of our small but highly efficient unstructured data processor, an application-specific instruction-set processor(ASIP). We will also briefly cover how the insights on accelerating analytics could apply in scientific computing, especially in analyses for high-energy physics.