Enterprise tape drives are widely used at major laboratories in the world, such as CERN, US DoE Labs, KEK and so on as well as data centers in commercial companies. Demands on capacity and speed of I/O inflate infinitely in the tape market. Not only drive technology but also media technology is the key for answering such future requirements. Fujifilm is the world-leading company in the market...
We would like to introduce the brief history and report the current status of Computing Research Center at KEK. Many activities and near future plans on R&D, for example, networking, computer security, and private cloud deployment, which are submitted to the HEPiX workshop this time, will be summarized.
The Tokyo Tier-2 site, which is located in International Center for Elementary Particle Physics (ICEPP) at the University of Tokyo, is providing computing resources for the ATLAS experiment in the WLCG.
Updates on the site since the Spring 2017 meeting and a migration plan for the next system upgrade will be presented.
2017 has been a year of change for the Australian HEP site. The loss of a staff member, migration of batch system, and increased use of cloud are just some of the changes happening in Australia. We will provide an update on the happenings in Australia.
ASGC site report on facility deployment, recent activities, collaborations and plans.
This report will talk about the current status and recent updates at IHEP Site since the Spring 2017 report, covering computing, network, storage and other related work.
We will present the latest status of the GSDC. And migration plan of administrative system will be presented.
LHCONE is a worldwide network dedicated to the data transfers of HEP experiments. The presentation will explain the origin and the architecture of the network, the services and advantages it provides, the benefits achieved so far. It will also include an update with the latest achievements
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic outing. The OSG Networking Area is a partner of the WLCG effort and is focused on being the primary source of networking information for its partners and...
The TransPAC project has a long history of supporting R&E networking, connecting the Asia Pacific region to the United States to facilitate research. This talk will give an overview of the project for those who may not be familiar with it or its activities and a brief sketch of future plans. Then the talk will cover LHCONE connectivity from our perspective and lay out options for how TransPAC...
The Automated GOLE (AutoGOLE) fabric enables research and education networks worldwide to automate their inter-domain service provisioning. By using the AutoGOLE control plane infrastructure, services to other countries can be setup in minutes. Besides automated provisioning we experiment with connecting high-speed Data Transfer Nodes (DTNs) to the AutoGOLE environment. This talk will discuss...
The Global Research Platform is a world-wide software defined distributed environment designed specifically for data intensive science. The talk will show how this environment could be used for experiments like the LHC
Modern science is increasingly data-driven and collaborative in nature, producing petabytes of data that can be shared by tens to thousands of scientists all over the world. NetSage is a project to develop a unified open, privacy-aware network measurement, and visualization service to better understand network usage in support of these large scale applications. New capabilities to measure and...
As the WLCG data sets grow ever bigger, so will network usage. For those of us with limited budgets, it would be nice if network costs won't get ever bigger too.
As NDGF is one of the few tier-1 sites in WLCG required to pay full networking costs, including transit, we'll look at the cost breakdown of networking for a tier-1 site and talk about where optimizations might be found.
High Energy Physics (HEP) experiments have greatly benefited from a strong relationship with Research and Education Network (REN) providers and thanks to the projects such as LHCOPN/LHCONE and REN contributions, have enjoyed significant capacities and high performance networks for some time. RENs have been able to continually expand their capacities to over-provision the networks relative to...
A short introduction and status report
News from CERN since the workshop at the Hungarian Academy of Sciences.
This is the PIC report to HEPIX Fall 2017.
News about GridKa Tier-1 and other KIT IT projects and infrastructure. We'll focus on our experiences with our new 20+PB online storage installation.
News and updates from the NDGF Tier-1 site.
Focus on this report will be new disk and tape resources and some performance numbers from both.
Also some site news from our distributed sites.
PDSF, the Parallel Distributed Systems Facility, was moved to Lawrence Berkeley National Lab from Oakland CA in 2016. The cluster has been in continuous operation since 1996 serving high energy physics research. The cluster is a tier-1 site for Star, a tier-2 site for Alice and a tier-3 site for Atlas.
The PDSF cluster is in transition this year, moving the batch system from UGE to SLURM...
Site report, news and ongoing activities at the Swiss National Supercomputing Centre (CSCS-LCG2), running ATLAS, CMS and LHCb.
We will present an update on the ATLAS Great Lakes Tier-2 (AGLT2) site since the Spring 2017 report including changes to our networking, storage and deployed middleware. This will include the status of our transition to CentOS/SL7 for both our servers and worker nodes, our upgrade of VMware from 5.5 to 6.5 and our upgrade of Lustre to 2.10.1 + ZFS 0.7.1 as well as our work to install Open...
We will present an update on our sites and cover our work with various efforts
like xrootd storage elements, opportunistic usage of general HPC resources,
and containerization.
We will also report on our latest hardware purchases, as well as
the status of network updates.
We conclude with a summary of successes and problems we encountered
and indicate directions for future work.
As a major WLCG/OSG T2 site, the University of Wisconsin-Madison CMS T2 has consistently been delivering highly reliable and productive services towards large scale CMS MC production/processing, data storage, and physics analysis for last 11 years. The site utilises high throughput computing (HTCondor), highly available storage system (Hadoop), scalable distributed software systems (CVMFS),...
Last year, KEK had upgraded the upstream link to 100Gbps in Apr.
then officially started the peer with LHCONE since Sep.
Then KEK can distribute huge data to WLCG sites by adequate
throughput altough this upgrade didn't made large impact on
the firewalls for the ordinary internet usage from the campus
network.
We will report changes by the LHCONE peer and
how we connect our campus network and...
We give the design and plan of network architecture updates in IHEP at HEPIX Spring 2017, and it has been finished in August 2017. This report talks about the network architecture upgrades, Dual stack ipv6 test, network measurement and morning at IHEP and network security upgrades.
Network performance is key to the correct operation of any modern datacentre or campus infrastructure. Hence, it is crucial to ensure the devices employed in the network are carefully selected to meet the required needs.
The established benchmarking methodology [1,2] consists of various tests that create perfectly reproducible traffic patterns. This has the advantage of being able to...
This update from the HEPiX IPv6 Working Group will present the activities of the last 6-12 months. In September 2016, the WLCG Management Board approved the group’s plan for the support of IPv6-only CPU, together with the linked requirement for the deployment of production Tier 1 dual-stack storage and other services. A reminder of the requirements for support of IPv6 and the deployment...
Configuration Release Management (CRM) is rapidly gaining popularity among service managers, as it brings version control, automation and lifecycle management to system administrators. At CERN, most of the virtual and physical machines are managed through the Puppet framework, and the networking team is now starting to use it for some of its services.
This presentation will focus on the...
As presented during HEPiX Fall 2016, a full renewal of the CERN Wi-Fi network was launched in 2016 in order to provide a state-of-the-art Campus-wide Wi-Fi Infrastructure. This year, the presentation will give a status and feedback about this overall deployment. It will provide information about the technical choices made, the methodology used for such a deployment, the issues we faced and how...
As presented at HEPiX Fall 2016, CERN is currently in the process of renewing its standalone Wi-Fi Access Points with a new state-of-the-art, controller-based infrastructure. With more than 4000 new Access Points to be installed, it is desirable to keep the existing deployment procedures and tools to avoid repetitive and error-prone actions during configuration and maintenance steps.
This...
The CERN network infrastructure has several links to the outside world. Some are well identified and dedicated for experiments and research traffic (LHCOPN/LHCONE), some are more generics (general internet). For the latter, a specific firewall inspection is required for obvious security reasons, but with tens of gigabits per second of traffic, the firewalls capacities are highly challenged....
news about what happened at DESY during the last months
News and updates from GSI IT, e.g.:
- status GreenITCube
- new asset management system
This presentation discusses the new responsibilities of the Scientific Data & Computing Center (SDCC) in high-performance computing (HPC) and how we are leveraging effort and resources to improve BNL community's access to local and leadership-class facilities (LCF's).
Techlab is a CERN IT activity aimed at providing facilities for studies improving the efficiency of the computing architecture and making better utilisation of the processors available today.
It enables HEP experiments, communities and project to gain access to machines of modern architectures, for example Power 8, GPUs and ARM64 systems.
The hardware is periodically updated based on community...
The HEPiX Benchmarking Working Group has worked on a fast benchmark to estimate the compute power provided job slot or a IaaS VM. The Dirac Benchmark 2012 (DB12) is scaling well with the performance at least of Alice and LHCb when running within a batch job. Now the group has started the development of a next generation long running benchmark as a successor of the current HS06 metric.
Batch services at CERN have diversified such that computing jobs can
be run everywhere, from traditional batch farms, to disk servers, to
people's laptops, to commercial clouds. This talk offers an overview
of the technologies and tools involved.
The migration of the local batch system BIRD required the
adaptation of different properties like the Kerberos / AFS support, the
automation of various operational tasks and the user and project access. The
latter includes, inter alia, fairshare, accounting and resource access. For
this, some newer features of HTCondor had to be used. We are close to the
user release. Building common...
Founded in 1991, CSCS, the Swiss National Supercomputing Centre, develops and provides the key supercomputing capabilities required to solve important problems to science and/or society. The centre enables world-class research and provides resources to academia, industry and the business sector. Through an agreement with CHIPP, the Swiss Institute of Particle Physics, CSCS hosts a WLCG tier-2...
The increase of the scale of LHC computing expected for Run 3 and even more so for Run 4 (HL-LHC) over the course of the next 10 years will most certainly require radical changes to the computing models and the data processing of the LHC experiments. Translating the requirements of the physics programme into resource needs is an extremely complicated process and subject to significant...
I'll talk about how the data collect helped the center get through a heat wave in the Berkeley area. This is significant since Berkeley computing center does not have any mechanical cooling and relies on the external air temperature and water supply. Talking about what data we thought we needed and what data we did need and how the idea of saving all the data and collecting as much as we can...
In this presentation, we'll give an overview of the Singularity
container system, and our experience with it at the RACF/SDCC at
Brookhaven National Laboratory. We'll also discuss Singularity's
advantages over virtualization and other Linux namespace-based
container solutions in the context of HTC and HPC applications.
Finally, we'll detail our future plans for this software at our
facility.
The University of Victoria HEP group has been successfully running on distributed clouds for several years using the CloudScheduler/HTCondor framework. The system uses clouds in North America and Europe including commercial clouds. Over the last years, the operation has been very reliably, we are regularly running several thousands of jobs concurrently for the ATLAS and Belle II experiments....
Docker container virtualization provides an efficient way to create isolated scientific environments, adjusted and optimized for a specific problem or a specific group of users. It allows to efficiently separate responsibilities - with IT focusing on infrastructure for image repositories, preparation of basic images, container deployment and scaling, and physicists focusing on application...
The interest in the Internet of Things (IoT) is growing exponentially so multiple technologies and solutions have emerged to connect mostly everything. A ‘thing’ can be a car, a thermometer or a robot that, when equipped with a transceiver, will exchange information over the internet with a defined service. Therefore, IoT comprises a wide variety of user cases with very different...
We've redesigned our HPC/Grid network to be capable of full network function virtualisation, to be prepared for large amounts of 100Gbps connections, and to be 400G ready. In this talk we want to take you through the design considerations for a fully non-blocking 6 Tbps virtual network, and what type of features we have build-in for the cloudification of our clusters using OpenContrail....
CERN networks are dealing with an ever-increasing volume of network traffic. The traffic leaving and entering CERN must be precisely monitored and analysed to properly protect the networks from potential security breaches. To provide the required monitoring capabilities, the Computer Security team and the Networking team at CERN have joined efforts in designing and deploying a scalable...
In March 2017 Echo went in to production at the RAL Tier 1 providing over 7PB of usable storage to WLCG VOs. This talk will present details of the setup and operational experience gained from running the cluster in production.
Brief introduction, and call for contributions, to a working group on archival storage at WLCG sites
The EGI CSIRT main goal is, in collaboration with all resources providers, to keep the EGI e-Infrastructure running and secure. During the past years, under the EGI-Engage project, the EGI CSIRT has been driving the infrastructure in term of incident prevention and response, but also security training. This presentation provides an overview of these activities, focusing on the impact for the...
This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, and reveal inner mechanisms of the underground economy to expose how our resources are exploited by organised crime groups, as well as recommendations to protect ourselves. By showing how these attacks are both...
Recently Japanese universities and academic organizations had experienced sever cyber attacks. To mitigate computer security incidents, we are forced to rethink our strategies in aspects of security management and network design.
In this talk, we report current status and present future directions of KEK Computer security.
This is a TLP:RED presentation of a case study. Slides and details will not be made publicly available, and attendees have to agree to treat all information presented as confidential and refrain from sharing details on social media or blog. The presentation focuses on an insider attack and concentrates on the technical aspects of the investigation, in particular the network and file system...
In this contribution the vision for the CERN storage services and their applications will be presented.
Traditionally, the CERN IT Storage group has been focusing on storage for Physics data. A status update will be given about CASTOR and EOS, with the recent addition of the Ceph-based storage for High-Performance Computing.
More recently, the evolution has focused on providing higher-level...
NDGF-T1 is transferring the dCache storage to a model whese dCache is no longer run by the sysadmin but run as a normal user. This enables centralized management of the software versions and their configs.
This automation is done with 3 roles in Ansible and a playbook to tie them together.
The end result is software running in an environment much like the cloud.
We describe our use of the Dynafed data federator with cloud computing resources. Dynafed (developed by CERN IT) allows a dynamic data federation, based on the webdav protocol, with the possibility to have a single name space for data distributed over all available sites. It also allows a failover to another copy of a file in case the connection to the closest file location gets interrupted...
The CERN Physics Archive is projected to reach 1 Exabyte during LHC Run 3. As the custodial copy of the data archive is stored on magnetic tape, it is very important to CERN to predict the future of tape as a storage medium.
This talk will give an overview of recent developments in tape storage, and a look forward to how the archival storage market may develop over the next decade. The...
It is now a well-known fact in the HEPiX community that the Elastic stack (FKA ELK) is
an extremely useful tool to dive into huge log data entries. It has also been presented multiple times
as lacking the security features so often needed in multi-user environments. Although it now provides
a plugin addressing some of those concerns, it requires the acquisition of a commercial...
In this presentation, I will go over CERN's efforts in improving the security and usability of the management interfaces for various server manufacturers.
We present riemann: a low-latency transient shared state stream processor.
This opensource monitoring tool is written by Kyle Kingsbury and
maintained by the community. Its unique design makes it as flexible as
it gets by melting the walls between configuration and code. Whenever its rich API
doesn't fit the use-case, it's as simple as using any library in the clojure or java
ecosystem...
Various cluster monitoring tools are adapted or developed at IHEP, which show the health status of each device or aspect of IHEP computing platform separately. For example, Ganglia shows the machine load, Nagios monitors the service status, and Job-monitor tool developed by IHEP counts the job success rate and so on. But those monitoring data from different tools are independent and not easy...
Our cloud deployment at Wigner Datacenter (WDC) is undergoing significant changes. We are adapting a new infrastructure, an automated OpenStack deployment using TripleO and configuration management tools like Puppet and Ansible. Over the past few months, our team at WDC have been testing TripleO as the base of our OpenStack deployment. We are also planning a centralized monitoring and logging...
China Spallation Neutron Source (CSNS) is a neutron source facility for studying neutron characteristics and exploring microstructure of matter,it will also serve as a high-level scientific research platform oriented to dimensional academic subjects.Scientific research on CSNS requires the support of a high-performance computing environment.So from the research and practice...
CERN has a great number of applications that rely on a database for their daily operations and the IT Database Services group is responsible for current and future databases and their platform for accelerators, experiments and administrative services as well as for scale-out analytics services including Hadoop, Spark and Kafka. This presentation aims to give a summary of the current state of...
Following various A/C incidents in an Oxford Computer room, we developed a solution to automatically shutdown servers.
The solution has two parts the service which monitors the temperatures and publishes on a web page and the client which runs on the servers, queries the result to determine if shutdown is required. Digitemp software and one wire temperature sensors are used.
The document converter service provides conversion of most office and some engineering applications to PDF, PDF/A or PostScript. The service has been completely rewritten as an OSS [1] and is based on modern IT technology fostered by the CERN IT department. It is implemented as a RESTful API with a containerised approach using the Openshift technology, EOS storage to store documents and jobs,...
CCIN2P3 is one of the largest academic data centres in France. Its main mission is to provide the particle, astroparticle and nuclear physics community with IT services, including large-scale compute and storage capacities. We are a partner for dozens of scientific experiments and hundreds of researchers that make a daily use of these resources.
It is essential for users to have at their...
The CERN Linux Support is in charge of providing system images for all Scientific Linux and CentOS CERN users. We currently mostly test new images manually. To streamline their generation towards production, we're designing a continuous integration and testing framework which will automate image production, allow for more tests, running them more thoroughly, with more flexibility.
Some remarks to current design with printer subnets and managing CUPS configuration via CHEF data bags.
... at LAL
Private cloud deployment is on going at KEK. Our cloud will support self-service provisioning, and also will be integrated our batch system in order to provide heterogeneous clusters dynamically. It enables us to support various kinds of data analyses and enabling elastic resource allocation among the various projects supporting at KEK.
In this talk, we will introduce our OpenStack based cloud...
I will report on material describing how HEPiX was born at KEK in 1991.