The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.
Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, IRFU, JLAB, KEK, LBNL, NDGF, NIKHEF, PIC, RAL, SLAC, TRIUMF, many other research labs and numerous universities from all over the world.
The workshop was hosted by the Dutch National Institute for Subatomic Physics Nikhef (formerly known as NIKHEF) at Science Park in Amsterdam, The Netherlands, at the neighbouring Amsterdam Science Park Congress Centre.
Although not part of the HEPiX workshop, two co-located events have been organised: a workshop on ARC on Friday 18 October in the afternoon, and a workshop of the WLCG Security Operations Centre working group from Monday 21 to Wednesday 23 October.
News and updates about organisation, facilities and technologies of the IN2P3 Computing Center.
News from CERN since the HEPiX Spring 2019 workshop.
Presentation of recent developments at Brookhaven National Laboratory's (BNL) Scientific Data & Computing Center (SDCC).
An update on developments at RAL.
Diamond Light Source (DLS) is an X-ray syncrhotron on the Rutherford Appleton Laboratory Site in Oxfordshire, UK. This site report covers the latest developments at DLS in compute, storage and cloud computing as well as the infrastructure that underpins it.
TRIUMF Site Report
News and developments from the Nordics
Status update of distributed cloud system development and applications at ASGC, Taiwan as well as the operation efficiency works will be reported.
I will give an update on the status and plans of the US ATLAS SWT2 Center.
This is the PIC report for HEPiX Autumn 2019 at Nikhef
The MALT project is a unique opportunity to consolidate all the current CERN telephony services (commercial PBX-based analogue/IP and proprietary IP) into a single IP-based cost-effective service, built on top of existing open source components and local developments, adapted to CERN users' needs, well integrated into the local environment and really multiplatform. This presentation describes the main technical aspects of this project as well as the features offered by this new service at a time when its pilot phase starts.
Moving users’ data is never an easy process. When you migrate to a different file system, and would like to fit to users’ needs on Windows, Linux and Mac for more than 15 000 accounts, then the magic recipe becomes as difficult as the one for making perfect macaroons !
In this presentation, you will learn key facts on how we handle this complex data migration.
E-mail service is considered as a critical collaboration system. I will share our experience as CERN, regarding technical and organizational challenges when migrating 40 000 mailboxes from Microsoft Exchange to free and open source software solution: Kopano.
CERN Web Services are in the process of consolidating web site and web application hosting services using container orchestration.
The Kubernetes Operator pattern has gained a lot of traction recently. It applies Kubernetes principles to custom applications.
I will present how we leverage the Operator pattern in container-based web hosting services to automate the provisioning and management of web sites, web applications and the container infrastructure itself.
News from GSI IT
We will present an update on our site since the Spring 2019 report, covering our changes in software, tools and operations.
Some of the details to cover include our use of backfilling jobs via BOINC with cgroups, work with our ELK stack at AGLT2, updates on Bro/MISP at the UM site and information about our newest hardware purchases and deployed middleware.
We conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
Short usual site presentation, as we'll be hosting the next 2020 autumn HEPIX meeting, for people to know us a bit.
If no room left, fine, if just 2-5mn fine too.
Updates of the KEK projects, including SuperKEKB and J-PARC as well as on the KEK computing research center from the last HEPiX workshop, will be presented.
This will be a quick update on what is happening at NERSC.
Computing center of IHEP has been supporting several HEP experiments for many years. LHCb is the new experiment we supported this year. We just upgraded AFS, HTCondor and EOS at IHEP.The presentation talks about the its current status and next plan.
An update on what's going on at INFN-T1 site
In the past, migrating from one Windows version to the latest one needed a full reinstallation of every single workstation, with all the inconveniences this represents for both users and IT staff.
For Windows 10, Microsoft claimed that the in-place upgrade works fine. How true is this statement?
This presentation will cover real-life feedback.
An update on CERN Linux support distributions and services.
An update on the CentOS community and CERN involvement will be given.
We will discuss software the collections, virtualization and OpenStack SIGs update.
Future plans regarding alternative architectures (ARM for SoCs, etc.) and CentOS 8.
The successful series of HTCondor workshops in Europe started in 2014 continued in 2019 with a workshop held from 24 to 27 September at the European Commission's Joint Research Centre in Ispra, Lombardy, Italy. We will give a short report of this workshop.
In this talk we present an HTC cluster which has been set up
at Bonn University in 2017/2018. On this fully-puppetised cluster all jobs
are run inside Singularity containers. Job management is handled
by HTCondor which nicely shields the container setup from the users.
The users only have to choose the desired OS via a job parameter from an
offered collection of container images. The container images along with
various software packages are provided by a CernVM filesystem (CVMFS).
The data to be analysed is stored on a CephFS file system.
The presentation describes how the various components are set up and
provides some operational experience with this cluster.
The goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community.
This talk will present recent changes and enhancements to HTCondor, including details on some of the enhancements created in recent releases, changes created on behalf of the HEP community, and the upcoming HTCondor development roadmap. We seek to solicit feedback on the roadmap from HEPiX attendees.
The talk provides an overview of the DESY configurations for HTCondor. It
focuses on features we need for user registry integration, node
maintenance operations and fair share / quota handling. We are working on
Docker, Jupyter and GPU integration into our smooth and transparent
operating model setup.
In this talk we will provide details about the scalable limits of the HTCondor transfer mechanism. How it depends on latency, finish rate and how it compares with pure HTTP transfer.
BEIJING-LCG2 is a one of the WLCG Tier 2 grid site. In this topic I will introduce how to running a tire 2 grid site. Including deployment, configuration, monitoring, security, troubleshooting, and VO support.
The benchmarking and accounting of compute resources in WLCG needs to be revised in view of the adoption by the LHC experiments of heterogeneous computing resources based on x86 CPUs, GPUs, FPGAs.
After evaluating several alternatives for the replacement of HS06, the HEPIX benchmarking WG has chosen to focus on the development of a HEP-specific suite based on actual software workloads of the LHC experiments, rather than on a standard industrial benchmark like the new SPEC CPU 2017 suite.
This presentation will describe the motivation and implementation of this new benchmark suite, which is based on container technologies to ensure portability and reproducibility. This approach is designed to provide a better correlation between the new benchmark and the actual production workloads of the experiments. It also offers the possibility to separately explore and describe the independent architectural features of different computing resource types, which is expected to be increasingly important with the growing heterogeneity of the HEP computing landscape. In particular, an overview of the initial developments to address the benchmarking of non-traditional computing resources such as HPCs and GPUs will also be provided.
In this presentation we'll discuss the design architecture of the HEP Workload benchmark containers, and the proposed replacement for HEPSPEC06, which is based on these containers. We'll also highlight the development efforts which have been completed thus far, and the tooling being used by the project. Finally we'll detail our plan for extending the the existing container benchmark suite to include support for GPU benchmarking.
by invitation
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and constituents. We will report on the changes and updates that have occurred since the last HEPiX meeting.
The primary areas to cover include the status of and plans for the WLCG/OSG perfSONAR infrastructure, the WLCG Throughput Working Group and the activities in the IRIS-HEP and SAND projects.
to be filled soon
The information security threats currently faced by WLCG sites are both sophisticated and highly profitable for the actors involved. Evidence suggests that targeted organisations take on average more than six months to detect a cyber attack, with more sophisticated attacks being more likely to pass undetected.
An important way to mount an appropriate response is through the use of a Security Operations Centre (SOC). A SOC can provide detailed traceability information along with the capability to quickly detect malicious activity. The core building blocks of such a SOC are an Intrusion Detection System and a threat intelligence component, required to identify potential cybersecurity threats as part of a trusted community. The WLCG Security Operations Centre Working Group has produced a reference design for a minimally viable Security Operations Centre, applicable at a range of WLCG sites. In addition, another important factor in the sharing of threat intelligence is the formation of appropriate trust groups.
We present the status and progress of the working group so far, including both a discussion of the reference SOC design and the approach of the working group to facilitating the collaboration necessary to form these groups, including both technological and social aspects. Threat intelligence and the formation of trust groups in our community will be the focus of the WLCG SOC WG workshop that will be taking place immediately following HEPIX, during 21-23 October 2019. We emphasise the importance of collaboration not only between WLCG sites, but also between grid and campus teams. This type of broad collaboration is essential given the nature of threats faced by the WLCG, which can often be a result of compromised campus resources.
This presentation provides an update on the global security landscape since the last HEPiX meeting. It describes the main vectors of risks and compromises in the academic community including lessons learnt, presents interesting recent attacks while providing recommendations on how to best protect ourselves. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing and around computer security.
This talk is based on contributions and input from the CERN Computer Security Team.
High Energy Physics (HEP) experiments have greatly benefited from a strong relationship with Research and Education (R&E) network providers and thanks to the projects such as LHCOPN/LHCONE and REN contributions, have enjoyed significant capacities and high performance networks for some time. RENs have been able to continually expand their capacities to over-provision the networks relative to the experiments needs and were thus able to cope with the recent rapid growth of the traffic between sites, both in terms of achievable peak transfer rates as well as in total amount of data transferred. For some HEP experiments this has lead to designs that favour remote data access where network is considered an appliance with almost infinite capacity. There are reasons to believe that the network situation will change due to both technological and non-technological reasons starting already in the next few years. Various non-technological factors that are in play are for example anticipated growth of the non-HEP network usage with other large data volume sciences coming online; introduction of the cloud and commercial networking and their respective impact on usage policies and securities as well as technological limitations of the optical interfaces and switching equipment.
As the scale and complexity of the current HEP network grows rapidly, new technologies and platforms are being introduced that greatly extend the capabilities of today’s networks. With many of these technologies becoming available, it’s important to understand how we can design, test and develop systems that could enter existing production workflows while at the same time changing something as fundamental as the network that all sites and experiments rely upon. In this talk we’ll give an update on the working group's recent activities, updates from sites and R&E network providers as well as plans for the near-term future.
In August 2018, we upgraded our campus network. We replaced core switches, border routers, distribution switches to provide 1G/10G connectivity with authentication to end nodes. We have newly introduced firewall sets to segment inner subnets into several groups and transplanted all access control lists from core switches to the inner firewall.
We report our migration and operation history of last year.
The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board. More and more WLCG data transfers now take place over IPv6. During this year, the HEPiX IPv6 working group has not only been chasing and supporting the transition to dual-stack services, but has also been encouraging network monitoring providers to allow for filtering of plots by the IP protocol used. The dual-stack deployment does however result in a networking environment which is much more complex than when using just IPv6. Some services, e.g. the EOS storage system at CERN, are using IPv6-only for internal communication, where possible. The group is investigating the removal of the IPv4 protocol in more places. We will present our recent work and future plans.
We describe the software tool-set being implemented in the context of the NOTED [1] project to better exploit WAN bandwidth for Rucio and FTS data transfers, how it has been developed and the results obtained.
The first component is a generic data-transfer broker that interfaces with Rucio and FTS. It identifies data transfers for which network reconfiguration is both possible and beneficial, translates the Rucio and FTS information into parameters that can be used by network controllers and makes these available via a public interface.
The second component is a network controller that, based on the parameters provided by the transfer broker, decides which actions to apply to improve the path for a given transfer.
Unlike the transfer broker, the network controller described here is tailored to the CERN network as it has to choose the appropriate action given the network configuration and protocols used at CERN. However, this network controller can easily be used as a model for site-specific implementations elsewhere.
The paper describes the design and the implementation of the two tools, the tests performed and the results obtained. It also analyses how the tool-set could be used for WLCG in the context of the DOMA [2] activity.
[1] Network Optimisation for Transport of Experimental Data - CERN project
[2] Data Organisation, Management and Access - WLCG activity
The NSF funded SAND project was created to leverage the rich network-related dataset being collected by OSG and WLCG, including perfSONAR metrics, LHCONE statistics, HTCondor and FTS transfer metrics and additional SNMP data from some ESnet equipment. The goal is to create visualizations, analytics and user-facing alerting and alarming related to the research and education networks used by HEP, WLCG and OSG communities.
We will report on the project status half-way through its initial 2-year funding period and cover what has been achieved as well as highlighting some new collaborations, tools and visualizations.
Due to the amount of data expected from the experiments during RUN3, the CERN Computer Center network has to be upgraded. This presentation will explain all the ongoing works around the Computer Center network: change of router models to provide higher 100G ports density, links upgrade between the experiments and the Computer Center (CDR links), expected closure of Wigner Computer Center and move of the CPU servers to dedicated containers, creation of a WDM (up to 1Tbps) connection between ALICE containers and main Computer Center, introduction of router redundancy and Layer2 flexibility with VxLAN, etc…
Network overview concerning the new LHCb containers located at LHC point 8.
A total of 184 switches installed connected to 4 different routers.
New DWDM line system will be used to connect the IT datacentre extension in the LHCb containers.
This presentation will cover how CERN is proposing to provide the computing capacity needed for the LHC experiments for RUN3 and for RUN4. It will start with some history on the failed attempt to have a second Data Centre ready for RUN3, then describe the solution adopted for RUN3 instead and finally the current plans for RUN4.
The Open Compute Project (OCP) is an organization that shares designs for data centre products among companies.
Its mission is to design and enable the delivery of the most efficient server, storage and data centre hardware designs for scalable computing.
The project was started in 2011, and includes today about 200 members.
This talk will give a report from the 2019 OCP Global Summit, highlighting the most interesting talks and keynotes.
Part of the presentation will focus also on Open19, a specification that defines a cross-industry common server form factor whose goal is to create flexible and economic data centres for operators of all sizes.
Finally, it will also be discussed how OCP and Open19 could be relevant for CERN's computing infrastructure and for the HEPiX community.
The monitoring infrastructure used at the computing centre at DESY, Zeuthen
aged over the years and showed more and more deficits in many areas.
In order to cope with current challenges, we decided to build up a new
monitoring infrastructure designed from scratch using different open source
products like Prometheus, ElasticSearch, Grafana, etc.
The talk will give an overview of our future monitoring landscape
as well as a status report on where we are now, which challenges we hit and an
outlook to further developments.
A number of co-located meetings were held at Fermilab in early September in the area of Federated Identities and AAI (Authentication and Authorisation Infrastructures) for Physics, including a F2F meeting of the WLCG Authorization Working Group and a mini-FIM4R meeting. This talk gives a high-level overview of these meetings and related recent progress in this area.
Presentation on SciTokens, a distributed authorization framework, and work to integrate distributed authorization technologies such as SciTokens and OAuth 2.0 into HTCondor.
BNL SDCC(Sentific Data and Computing Center) recently enabled SSO authentication strategy using Keycloak, supporting various SSO authentication protocols(SAML/OIDC/OAuth), and allowing multiple authentication options provided under one umbrella including Kerberos Auth, AD(Active Directory) and Federated Identity Authentication via CILogon with Incommon and social provider login. This solution has been integrated to recent tools/services deployment in the facility for protected resource access and delivered the efficiency in the areas of AuthN/AuthZ.This talk will focus on technical overviews and strategies to tackle the challenges/obstacles for this solution.
I'll be showing collection and presentation tools for monitoring an IB network. Also discussing the ideas behind some of the collection decisions.
The increase in the scale of LHC computing during Run 3 and Run 4 (HL-LHC) will certainly require radical changes to the computing models and the data processing of the LHC experiments. The working group established by WLCG and the HEP Software Foundation to investigate all aspects of the cost of computing and how to optimise them has continued producing results and improving our understanding of this process. In this contribution we expose our recent developments and results and outline the directions of future work.
Please note that the Platinum sponsor of the workshop, Fujifilm Recording Media, has contributed material that will not be presented, but is available for consultation. See this contribution: https://indico.cern.ch/event/810635/contributions/3596108/
An update on the CERN Database on Demand service, which hosts more than 800 databases for the CERN user community supportin different open source systems such as MySQL, PostgreSQL and InfluxDB.
We will present the current status of the platform and the future plans for the service.
Please note that the Platinum sponsor of the workshop, Fujifilm Recording Media, has contributed material that will not be presented, but is available for consultation. See this contribution: https://indico.cern.ch/event/810635/contributions/3596108/
The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used.
Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation.
The HL-LHC era data storage estimate requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" and "Hot/Cold Storage" projects to increase the usage of less expensive storage , i.e., tape or even commercial cloud storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage , such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems and more intensive data migration between hot (disk) and cold (tape) storage systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be used before LHC Run 3. In addition we will present the first results related to our R&D project with Google Cloud Platform for a similar studies.
The STFC CASTOR tape service is responsible for the management of over 80PB of data including 45PB generated by the LHC experiments for the RAL Tier-1. In the last few years there have been several disruptive changes that have or are necessitating significant changes to the service. At the end of 2016, Oracle, which provided the tape libraries, drives and media announced they were leaving the tape market. In 2017, the Echo (Tier-1 disk) storage service entered production and disk only storage migrated away from CASTOR. In 2017, CERN, which provides support for CASTOR, started to test their replacement to CASTOR called CTA.
Since October 2018, a new shared CASTOR instance has been in production. This instance is a major simplification from the previous four. In this presentation I describe the setup and performance of this instance which includes two sets of failure-tolerant management nodes that ensure improved reliability and a single unified tape cache that has displayed increased access rates to tape data compared to previous separate tape cache pools.
In March 2019, a new Spectra Logic Tape robot was delivered to RAL. This uses both LTO and IBM media. I will present the tests that were carried out on this system, which includes multiple sets of dense and sparse tape reads to assess the throughput performance of the library for various use cases.
Finally, I will describe the ongoing work exploring possible new, non-SRM tape management systems that will eventually replace CASTOR.
The IT storage group at CERN provides tape storage to its users in the form of three services, namely TSM, CASTOR and CTA. Both TSM and CASTOR have been running for several decades whereas CTA is currently being deployed for the very first time. This deployment is for the LHC experiments starting with ATLAS this year. This contribution describes the current status of tape storage at CERN and expands on the strategy and architecture of the current deployment of CTA.
In this contribution the evolution of the CERN storage services and their applications will be presented.
The CERN IT Storage group's main mandate is to provide storage for Physics data: to this end an update will be given about CASTOR and EOS, with a particular focus on the ongoing migration from CASTOR to CTA, its successor.
More recently, the Storage group has focused on providing higher-level tools to access, share and interact with the data. CERNBox is at the center of this strategy, as it has evolved to become the CERN apps hub. We will show how the recent (well known) changes in software licensing has affected the CERN apps portfolio offered to users.
Finally, a new EU-funded project will be briefly presented, which perfectly integrates with the above strategy to expand the CERNBox collaboration to other institutions and enterprises.
Please note that the Platinum sponsor of the workshop, Fujifilm Recording Media, has contributed material that will not be presented, but is available for consultation. See this contribution: https://indico.cern.ch/event/810635/contributions/3596108/
CephFS is used as the shared file system of the HTC cluster for
physicists of various fields at Bonn University since beginning
of 2018. The cluster uses IP over InfiniBand. High performance
for sequential reads is achieved even though erasure coding and
on-the-fly compression are employed.
CephFS is complemented by a CernVM-FS for software packages and
containers which come with many small files.
Operational experience with CephFS and exporting it via NFS
Ganesha to users’ desktop machines, upgrade experiences, and
design decisions e. g. concerning the quota setup will be
presented.
Additionally, Ceph RBD is used as backend for a libvirt/KVM based
virtualisation infrastructure operated by two institutes
replicated across multiple buildings.
Backups are performed via regular snapshots which allows for
differential backups using open-source tools to an external
backup storage. Via file system trimming through VirtIO-SCSI and
compression of the backups, significant storage is saved.
Writeback caching allows to achieve sufficient performance. The
system has been tested for resilience in various possible failure
scenarios.
As one the main data centres in France, the IN2P3 Computing Center (CC-IN2P3, https://cc.in2p3.fr) provides several High Energy Physics and Astroparticles Physics experiments with different storage systems that cover the different needs expressed by these experiments.
The quantity of data stored at CC-IN2P3 is growing exponentially. In 2019, about two billion files are stored. By 2030, this number of files is expected to increase by a factor of eight.
To monitor and supervise these storage systems, several applications leverage file metadata. Information such as size, number of blocks, last access time, or last update time for each storage system are used to export customized views to users, experiments, local experts, and support and management teams. However, these applications are usually monolithic and thus do not scale well. With the load expected by 2030, with a total amount of 4 TB of metadata, this could become problematic.
To improve the scaleability of these applications, CC-IN2P3 has initiated a research and development project couple months ago. The idea is to build on data analytics framework such as Hadoop/Spark/... to process the expected massive amount of storage metadata in a scalable way.
Our objective is to set up a software architecture that will act as scalable back end for all the existing and future monitoring and supervision applications. This should improve the production time of day-to-day statistics across all the storage services that are made available to the users, experiments, the support team and management of the CC-IN2P3.
A long-term objective of this project is to become able to supervise the whole life cycle of data stored on the resources of the CC-IN2P3 and thus to ensure that the Data Management Plans provided are respected by the experiments.
In this talk we will present the current status of this ongoing project, discuss the technical choices we made, present some preliminary results, and also expose the different issues we encountered along the road to success.
The Scientific Data & Computing Center (SDCC) in BNL is responsible for accommodating the diverse requirements for storing and processing petabyte-scale data generated by ATLAS, Belle II, PHENIX, STAR, Simons etc. This talk presents the current operational status of the main storage services supported in SDCC, summarizes our experience in operating largely distributed systems, optimizing in ATLAS Data Carousel, participating in Third Party Copy smoke testing of DOMA working group (DOMA-TPC) and moving toward the infrastructure of STAR/PHENIX Central storage. The presentation will also highlight our efforts of Ceph pools, XROOTD cache and BNL Box.
CERN runs a private OpenStack Cloud with ~300K cores, ~3K users and several OpenStack services.
CERN users can build services from a pool of compute and storage resources using OpenStack APIs such as Ironic, Nova, Magnum, Cinder and Manila.
For that reason, CERN cloud operators face some operational challenges at scale in order to offer these services in a stable manner.
In this talk, you will learn about the status of the CERN cloud, new services and plans for expansion.
The Large High Altitude Air Shower Observatory (LHAASO) experiment of IHEP is located in Daocheng, Sichuan province (at the altitude of 4410 m), which generates a huge large amount of data and requires massive storage and large computing power.
This article will introduce the current status of LHAASO computing platform at Daocheng. And focus on virtualization technologies such as docker k8s and distributed monitoring technologies to reduce the operation and maintenance cost as well as to make sure the system availability and stability.
The need for an effective distributed data storage has appeared important from the beginning of LHC, and this topic has become particularly vital in the light of the preparation for the HL-LHC run and the emergence of data-intensive projects in other domains such as nuclear and astroparticle physics.
LHC experiments have started an R&D within the DOMA project and we report the recent results related to the federated data storage systems configuration and testing. We will emphasize on different system configurations and various approaches to test storage federations. We are considering EOS and dCache storage systems as a backbone software for data federation and xCache for data caching. We’ll also report about synthetic tests and experiments specific tests developed by ATLAS and ALICE for federated storage prototype in Russia. Recently, the execution of the test has been automated and now it is conducted using the HammerCloud toolkit. Data Lake project launched in the Russian Federation in 2019 and its prospects will be covered distinctly.
ake project launched in the Russian Federation in 2019 and its prospects will be covered distinctly.
The Joint Genome Institute (JGI) is a part of the US department of energy and is serving the scientific community with access to high-throughput, high-quality sequencing, DNA synthesis, metabolomics and analysis capabilities. With ever increasing complexity of analysis workflows, and the demand burstable compute, it became necessary to be able to shift those workloads between sites. In this talk we will present JAWS, the JGI Analysis and Workflow system, which enables users to model their workflows using the Workflow Definition Language (WDL) and bring them to execution on a geographically distributed number of sites. We will discuss the architecture of JAWS, from underlying technologies, to data transfer and integration with HPC schedulers (eg Slurm). We will go into challenges encountered when running at multiple sites, among them integration and identity management and will present the status quo of our efforts.
We will provide an update on the SLATE project (https://slateci.io), an NSF funded effort to securely enable service orchestration in Science DMZ (edge) networks across institutions. The Kubernetes-based SLATE service provides a step towards a federated operations model, allowing innovation of distributed platforms, while reducing operational effort at resource providing sites.
The presentation will focus on updates since the spring HEPiX meeting and cover our expanding collaboration, containerized service application catalog, and updates from an engagement with TrustedCI.org and the WLCG Security teams to collect issues of concerns to a new trust model.