With the HL-LHC, the HEP community will experience orders of magnitude more data at a multi-exabyte scale. To prepare for such unprecedented scientific data collection, the different research sites are combining their diverse resources into integrated Analysis Facilities' systems.
SWAN, CERN's Service for Web based ANalysis, is following this approach, evolving from a plain notebook-based...
The Italian National Institute for Nuclear Physics (INFN) has a long history of designing and implementing large-scale computing infrastructures and applications.
INFN has spent the past ten years heavily investing in developing solutions to enable, optimise and simplify transparent access to a multi-site federated Cloud infrastructure. A primary goal of this effort is to provide a generic...
The Joint Research Centre (JRC) of the European Commission has set up the JRC Big Data Analytics Platform (BDAP) as a multi-petabyte scale infrastructure to enable EC researchers to process and analyse big data in support to EU policy needs [1]. One of the service layers of the platform is based on Jupyter notebooks and the Python programming language to enable exploratory visualization and...
CERNBox is key enabler service for users at CERN and beyond. The service is used by more than 37K users and stores over 15PB of data, representing all the user communities at the laboratory.
In this talk we will explain the current status of the service, the challenges we faced in 2022 and we look into the future: CERNBox as the gateway for heterogeneous storage spaces at CERN and beyond.
Sunet Drive is a federated and scalable Enterprise File Sync and Share solution, that has been developed, deployed, and packaged as part of the European Open Science Cloud and can be transparently extended to new participating organizations. The two main building blocks of Sunet Drive are Nodes and Buckets, both elements designed to promote data sovereignty and FAIR principles. Participating...
TripleO, https://docs.openstack.org/tripleo-docs/latest/, is a set of tools for the deployment and management of OpenStack. Its strategy consists in using a underlying OpenStack installation (undercloud) to install and manage the main one (overcloud).
It's the installation method used by RDO, https://www.rdoproject.org/.
In our project to deploy a HyperConverged (HCI) OpenStack cloud...
LIQO ([https://liqo.io][1]) is an open-source multi-cluster orchestrator that enables the creation of "virtual Kubernetes clusters" spanning across an arbitrary number of real clusters, even crossing multiple administrative boundaries.
Liqo enables the sharing of resources (e.g., CPU, memory, GPUs) and services (e.g., an existing cloud-native service) among different clusters, and facilitates...
Today requirements in teaching, learning and ultimately also examination at universities make more and are more digital alignments and resilient Learning IT Management Systems necessary.
Within this contribution we want to show the system components and their interaction. We will show what added value the use of sync & share storage provides.
Research Data Services (RDS) is a self-hosted cross-platform interoperability layer which allows research data to be curated, prepared and published directly from an EFSS solution such as Sciebo (ownCloud) or Sunet Drive (Nextcloud). It provides modular interoperability to external data repositories like the Open Science Framework (OSF), InvenioRDM (e.g., Zenodo), Harvard Dataverse, or Doris...
The Max Planck Society runs a customized installation of Seafile called KEEPER (https://keeper.mpdl.mpg.de/) for its scientists which offers the possibility to certify research data with or without metadata by leveraging on blockchain technology. Snapshot data and a certificate representing the data on the blockchain are stored on application side and presented to the user.
This...
After two years of planning for Virtual Organisations (VO; Community AAI[1] based group of any size) as the basis for a new kind of EFSS Federation [2,3] by HIFIS in coordination with the CS3 community, the development of this new feature for the Nextcloud software has been completed, thanks to the strong support of Nextcloud and their subcontractor publicplan.
Admins of Nextcloud...
In the last year Cubbit has delivered many Cubbit cells in Italy. A cubbit cell is a very simple device that provides encrypted block storage service. These cubbit cells connect to each other from different datacenters of different Italian companies. Each cell relies on a different data link and is powered by a different power line. Even cubbit cells hosted in a specific company do not contain...
ScienceMesh is an interoperable research platform developed for the European Open Science Cloud (EOSC), in the context of the CS3MESH4EOSC project.
It is designed to be an interoperable research platform for seamless sharing and collaboration on data across different EFSS systems, including major open-source platforms such as Owncloud,...
We built a two-way connection between Nextcloud / OC-10 and Reva, which is deployed at many of the sites that are currently connected to the ScienceMesh testnet.
In this short presentation we'll explain how the connection between the EFSS GUI and Reva works in different scenarios.
SeaTable is like a lego kit for IT. It enables you to develop and build efficient business process in the shortest possible time. You can easily design your database structure, store any kind of data, define access rights for your team or externals and visualize your data with various charts. Automations help to streamline your work.
In this presentation, I will give an overview of the...
Research Drive, the Dutch Sync & Share service based on ownCloud, uses OpenStack Swift S3 as its storage backend. Where the integration of S3 within the software is not that good, we will migrate back to a posix compliant file system, namely CephFS. But how to migrate almost 2 PB of data without too much downtime...
Different high-performance, high-available file systems can store big data (hundreds of PB) and provide high data throughput (hundreds of TB per second). Each of these solutions highlights its advantages, and it is challenging to compare them.
Based on 30 years of storage development experience, Comtrade provided test scenarios to compare these file systems. On the appropriate...
Data are said to live forever, however their life is a complex journey. Initiated at acquisition or production date, data start a whole life cycle. During the different epochs of this life cycle, data will be moved, processed, compressed, shipped, archived.
To ease the management of this data orchestration, modern storage systems provide powerful tools. The foundation of these tools remains...
Enterprise File Sync and Share (EFSS) systems have become an integral part of every researcher's life, handling an abundance of scientific data for multiple projects. Those projects generally span multiple collaborators and can extend over a significant geographic area. However, there is an inherent conflict when handling research data, between the researcher's need to collaborate and share...
Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collaborative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrative control over a private Galaxy instance. These situations include, but are not limited to, worries about data...
How to change the login method for almost 90000 users from 5 different login scenarios and different backends to 1 method with OIDC. Welcome in the world of flows with Keycloak. What could possibly go wrong?
In this talk we describe the 2022 reboot of the ScienceBox project, the demonstrator package for some of CERN’s storage and analysis services. We evolved the original implementation to make use of Helm charts across the entire dependency stack.
We’ve also incorporated the major architectural update to **CERNBox, replacing the previous PHP backend with a catalog of distributed...
Deploying Nextcloud at scale implies a close monitoring of critical software and infrastructure components. In enterprise environments, Nextcloud is typically run in a clustered setup and it requires both infrastructure and application monitoring. In this talk we are going to discuss the basic elements of monitoring, with a focus on understanding why some metrics are important to monitor to...
One of the main challenges in dealing with large amounts of data is to find a suitable presentation for the different target groups. With the new module External App, SeaTable allows you to build individual frontends for the different stackholders and process participants in no time.
In this way, processes can be streamlined and the transfer of information can be made more...
Onedata is a distributed, global, high-performance data management system, which provides transparent and unified access to globally distributed storage resources and supports a wide range of use cases from personal data management to data-intensive scientific computations. Due to its fully distributed architecture, Onedata allows for the creation of complex hybrid-cloud infrastructure...
Data Science is a complex field that requires a high level of expertise and collaboration among teams of experts. With the rise of big data, it has become increasingly important to create collaborative workflows that enable data scientists to combine their skills and knowledge to create better results. This, however, can be a challenge in an environment of heterogenous cloud and storage...