CERN Computing Seminar

Visual mining of semi-structured data

by Dr Jorge Posada (Vicomtech), Dr Marco Quartulli (Vicomtech), Seán Gaines (Vicomtech)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

Background

Vicomtech is visiting CERN to expose their activities and explore possible lines of collaboration. As part of the programme they will be offering a presentation, staged in three parts:

  • Presentation of Vicomtech – Seán Gaines
  • Descriptions of technologies and specialities – Dr. Jorge Posada
  • Details on projects related to the development of visually-based algorithms for intelligent storage, processing, visualization and interaction with Big Data, for massive sources of information. – Dr. Marco Quartulli.

The full programme to the visit is here

Abstract

Mining semi-structured data is fundamental for archive monitoring, understanding and exploitation.

Typical analysis systems are based on a three-tiered architecture, in which efficient databases feed highly parallelised application servers that in turn feed client user interfaces. Yet the sharing of analysis, content identification and semantic level summarization tasks among the two bottom layers of the architecture – the highly parallel application server and a shared database – are key to allowing the user interface deal interactively with large data volumes.

In this framework, we present approaches based on NoSQL/MapReduce, experiences with column-based scientific DBMSs and considerations on graph databases for the efficient processing of large repositories. Specific attention is devoted to Visual Analytics methodologies resulting in multi-user interfaces centring on multiple linked representations implementing interaction techniques based on concurrent brushing in multiple dimensions.

These considerations and experiences provide the core for operational and pre-operational systems for mining multimedia archives, for analysing social streams for cyber-security applications and for search engines dedicated to Earth observation products.

Extensions to content-based global and local image retrieval, to mining compressed streams, to multi-modal search and to mining activity data collected by distributed mobile networks are introduced.


Organised by: John Harvey/PH Department and Miguel Angel Marquina
Computing Seminars /IT Department

more information
Video in CDS