Large scientific X-ray instruments such as MAX IV [1] or XFEL [2] are massive producers of annual data collections from experiments such as imaging sample materials. MAX IV for instance has 16 fully funded beamlines, where 6 of which can produce up to 40 Gbps of experimental data during a typical 5 to 8 hour time-slot, resulting in up to 90 to 144 TBs for a particular beamline...
Rucio is an open-source software framework that provides scientific
collaborations the functionality to organize, manage, monitor, and
access their distributed data across heterogeneous infrastructures.
Rucio was originally developed to meet the requirements of the
high-energy physics experiment ATLAS, and is continuously extended to
serve a diverse set of scientific communities, from...
Within the Netherlands, iRODS is gaining substantial traction with universities and other research institutes as a tool to help manage large amounts of heterogeneous research data. In this context iRODS is usually used as middleware, providing value through data virtualization, metadata management and/or rule-driven workflows. This is then typically combined with other tools and technology to...
SWAN (Service for Web-based ANalysis) is CERN’s general purpose Jupyter notebook service. It offers a preconfigured, fully fledged and easy to use environment, integrates CERN's storage, compute and analytics services and is available at a simple mouse click.
Due to this simplicity, the Jupyter usage via SWAN has been steadily increasing at CERN in the last years (more than 2000 unique users...
Currently, patient data are geographically dispersed, difficult to access, and often, patient data are stored in siloed project-specific databases preventing large-scale data aggregation, standardisation, integration/harmonisation and advanced disease modelling. The ELIXIR Cloud and Authentication & Authorisation Infrastructure (AAI) for Human Data Communities project aim to leverage a...
From Jupyter notebooks to web dashboards for big geospatial data analysis
The Joint Research Centre (JRC) of the European Commission has set up the JRC Big Data Platform (JEODPP) as a petabyte scale infrastructure to enable EC researchers to process and analyse big geospatial data in support to EU policy needs[1]. One of the service layer of the platform is the JEO-lab environment[2]...
The Cern VM File System (CVMFS) is a service for fast and reliable software distribution on a global scale. It is capable of delivering scientific software onto physical nodes, virtual machines, and HPC clusters by providing POSIX read-only file system access. Files and metadata are downloaded on demand by means of HTTP requests and take advantage of aggressive caching on the client and at...
The [Reva][1] project is dedicated to create a platform to bridge the gap between Cloud Storages and Application Providers by making them talk to each other in an inter-operable fashion by leveraging on the community-driven CS3 APIs. For this reason, the goal of the project is not to recreate other services but to offer a straightforward way to connect existing services in a simple, portable...
- Accessing large amounts of data in the cloud poses several problems:
- Many bioinformatics applications require POSIX access, which does not scale well. Re-writing the application is not always an option.
- Data sitting in the cloud costs money, whether it’s being used or not.
- An ideal solution in many cases would be to provide federated data access to data stored on-premises,...