Currently, patient data are geographically dispersed, difficult to access, and often, patient data are stored in siloed project-specific databases preventing large-scale data aggregation, standardisation, integration/harmonisation and advanced disease modelling. The ELIXIR Cloud and Authentication & Authorisation Infrastructure (AAI) for Human Data Communities project aim to leverage a coordinated network of ELIXIR Nodes to deliver a Global Alliance for Genomic Health (GA4GH) standards-compliant federated environment to enable population-scale genomic and phenotypic data analysis across international boundaries and a potential infrastructure to enable 1M Genome analysis across ELIXIR Nodes (member states).
The ELIXIR Cloud & AAI project will lay the groundwork to deliver the foundational capability of “federation” of identities, sensitive data access, trusted hybrid cloud providers and sensitive data analysis services across ELIXIR Nodes by underpinning the bi-directional conversation between partners with the GA4GH standards and specifications and ELIXIR trans-national expertise. The project is also developing a framework for secure access and analysis of sensitive human data based on national federations and standardised discovery protocols. The secure authentication and authorisation process alongside guidelines and compliance processes is essential to enable the community to use these data without compromising privacy and informed consent.
The project therefore, provides mechanisms to enable a globally available curated repository to store bioinformatics software containers and workflows (Biocontainers - GA4GH TRS), a service to discover and resolve the locations of datasets (RDSDS - GA4GH DRS) and distributed workflow and task execution service (WES-ELIXIR/TESK - GA4GH WES/TES) to leverage the federated life-science infrastructure of ELIXIR.
The ambition of the project is to provide a global ecosystem of joint sensitive data access and analysis services where federated resources for life science data are used by national and international projects across all life science disciplines, with widespread support for standard components securing their long-term sustainability. Connecting distributed datasets via common standards will allow researchers unprecedented opportunities to detect rare signals in complex datasets and lay the ground for the widespread application of advanced data analysis methods in the life sciences.