ACAT 2010

Name: ACAT 2010
Start: 2010-02-22T08:00:00+01:00
End: 2010-02-27T18:00:00+01:00
Location: Jaipur, India

22 Feb 2010, 08:00 → 27 Feb 2010, 18:00 Europe/Zurich

Jaipur, India

Denis Perret-Gallix (Laboratoire d'Annecy-le-Vieux de Physique des Particules (LAPP))

Description

We are very happy to invite you to this exceptional session of the ACAT series (13th) that will mark a new turning point in the cross-fertilization of hot physics research and computing technology.

Monday 22 February
- Student Session
  - 1
    
    Simulation and Visualisation Techniques
    
    Speaker: Matevz Tadel (CERN)
    
    Slides
  - 2
    
    Statistical Methods, Multivariate Analysis and Pattern Recognition
    
    This lecture will present key statistical concepts and methods for multivariate data analysis and their applications in high energy physics. It will discuss the meaning of multivariate statistical analysis and its benefits, will present methods for data preparation for applying multivariate analysis techniques, the generic problems addresses by these techniques and a few classes of such techniques. The applicability of these methods to pattern recognition problems in high energy physics will be demonstrated with examples from specific physics analyses.
    
    Speaker: Dr Liliana Teodorescu (Brunel University)
    
    Slides
  - 10:20
    
    Coffee Break
  - 3
    
    Multicore Computing
    
    Speaker: Dr Alfio Lazzaro (Universita degli Studi di Milano & INFN, Milano)
    
    Slides
  - 4
    
    Software Development in High Energy Physics: a Critical Look
    
    Speaker: Dr Federico Carminati (CERN)
    
    Slides
  - 5
    
    Internet Law: What Students, Professors, and Software Developers Need to Know
    
    Speaker: Prof. Lawrence Pinsky (UNIVERSITY OF HOUSTON)
    
    Slides
- 13:00
  
  Lunch Break
- Afternoon session
  - 6
    
    Official Openning
    
    Slides
  - 7
    
    High Tea
  - 8
    
    Computing Outside the Box: On Demand Computing & its Impact on Scientific Discovery by Ian FOSTER
    
    Speaker: Ian Foster (Unknown)
  - 9
    
    History of the ROOT System: Conception, Evolution and Experience by Rene BRUN
    
    The ROOT system is now widely used in HEP, Nuclear Physics and many other fields. It is becoming a mature system and the software backbone for most experiments ranging from data acquisition systems, controls, simulation, reconstruction and of course data analysis. The talk will review the history of its conception at a time when HEP was moving from the Fortran era to C++. While the original target was a PAW-like system for data analysis, it became rapidly obvious that a more ambitious system had to be developed as a working alternative to the defunct object-oriented data base systems. Thanks to the collaboration of many individuals, ROOT has been gradually extended to include high quality math libraries, statistical analysis tools, visualization tools for statistics objects, detectors and event displays. The current ideas on the evolution of the system will also be presented.
    
    Slides
  - 19:00
    
    Dinner
    
    Buses will leave Hotel Ramada at 08:00, 10:00 and 11:00. Registration at LNM IIT.
Tuesday 23 February
- Tuesday, 23 February - Plenary Session
  - 10
    
    Pattern recognition and estimation methods for track and vertex reconstruction
    
    The reconstruction of charged tracks and interaction vertices is an important step in the data analysis chain of particle physics experiments. I give a survey of the most popular methods that have been employed in the past and are currently employed by the LHC experiments. Whereas pattern recognition methods are very diverse and rather detector dependent, fitting algorithms offer less variety and can be applied to both track and vertex estimation with minimal changes. In particular, I trace the development from standard least-squares estimators to robust and adaptive estimators in both contexts. I end with an outlook to what I consider the most important issues to be addressed by experiments at future colliders such as the SuperLHC, the upgraded B-factory at KEK, and the ILC.
    
    Speaker: Dr Rudolf Frühwirth (Institute of High Energy Physics, Vienna)
    
    Slides
  - 11
    
    LHC Cloud Computing with CernVM
    
    Using virtualization technology, the entire application environment of an LHC experiment, including its Linux operating system and the experiment's code, libraries and support utilities, can be incorporated into a virtual image and executed under suitable hypervisors installed on a choice of target host platforms. The Virtualization R&D project at CERN is developing CernVM, a virtual machine designed to support the full range of LHC physics computing on a wide range of hypervisors and platforms including end-user laptops, Grid and cluster nodes, volunteer PC's running BOINC, and nodes on the Amazon Elastic Compute Cloud (EC2). CernVM interfaces to the LHC experiments' code repositories by means of a specially tuned network file system CVMFS, ensuring complete compatibility of the application with the developers' native version. CernVM provides mechanisms to minimize virtual machine image sizes and to keep images efficiently up to date when code changes. CernVM also provides interfaces to the LHC experiments' job submission systems and workload management systems (e.g. ATLAS/PanDA, LHCb/DIRAC, ALICE/Alien), allowing clouds of CernVM-equipped worker nodes to be accessed by the experiments without changing their job production procedures. Currently supported clouds include Amazon EC2, private clusters, Tier3 sites, and a cloud of BOINC volunteer PC's which represents a very large potential resource, so far untapped by the LHC experiments. This paper presents the current state of development of CernVM support for LHC cloud computing.
    
    Speaker: Dr Ben Segal (CERN)
    
    Slides
  - 10:20
    
    Coffe Break
  - 12
    
    Analysis of medical images: the MAGIC-5 Project
    
    The MAGIC-5 Project focuses on the development of analysis algorithms for the automated detection of anomalies in medical images, compatible with the use in a distributed environment. Presently, two main research subjects are being addressed: the detection of nodules in low-dose high-resolution lung computed tomographies and the analysis of brain MRIs for the segmentation and classification of the hyppocampus as an early marker of the Alzheimer's disease. MAGIC-5 started as a spin-off of high energy physics software development and involves a community of developers in constant contact with - some of them also involved in - HEP projects. The most relevant results will be presented and discussed, together with a new model, based on virtual ant colonies, for the segmentation of complex structures. The possible used of such a model in HEP is addressed.
    
    Speaker: Dr Piergiorgio Cerello (INFN - TORINO)
    
    Slides
- 12:00
  
  Lunch Break
- Tuesday, 23 February - Computing Technology for Physics Research
  - 13
    
    EU-IndiaGrid2 - Sustainable e-infrastructures across Europe and India
    
    EU-IndiaGrid2 - Sustainable e-infrastructures across Europe and India capitalises on the achievements of the FP6 EU-IndiaGrid project and huge infrastructural developments in India. EU-IndiaGrid2 will act as a bridge across European and Indian e-Infrastructures, leveraging on the expertise obtained by partners during the EU-IndiaGrid project. EU-IndiaGrid2 will further the continuous e-Infrastructure evolution in Europe and India, to ensure sustainable scientific, educational and technological collaboration across the two continents. In particular the Large Hadron Collider (LHC) program represents one of the unique science and research facilities to share between India and Europe in the field of Scientific Research in general and in the ICT domain in particular. The Indian partners in the project represent both the ALICE and the CMS communities actively engaged in the LHC program. The role of the EU-IndiaGrid project in this specific activity has been widely recognised within the European Commission and the Indian Government and EU-IndiaGrid2 will continue its action in sustaining this community. The project, approved within the call FP7-INFRASTRUCTURES-2009-1, starts in January 2010 with a duration of 24 months.
    
    Speaker: Dr Alberto Masoni (INFN - Cagliari)
    
    Slides
  - 14
    
    Teaching a Compiler your Coding Rules
    
    Most software libraries have coding rules. They are usually checked by a dedicated tool which is closed source, not free, and difficult to configure. With the advent of clang, part of the LLVM compiler project, an open source C++ compiler is in reach that allows coding rules to be checked by a production grade parser through its C++ API. An implementation for ROOT's coding convention will be presented, demonstrating how to interface with clang's representation of source code, and explaining how to define rules.
    
    Speaker: Axel Naumann (CERN)
    
    Slides
  - 15
    
    Computing at Belle II
    
    The Belle II experiment, a next-generation B factory experiment at KEK, is expected to record a two orders of magnitude larger data volume than its predecessor, the Belle experiment. The data size and rate are comparable to or more than the ones of LHC experiments and requires to change the computing model from the Belle way, where basically all computing resources were provided by KEK, to a more distributed scheme. While we adopt existing grid technologies for our baseline design, we also investigate the possibility of using cloud computing for peaking resource demands. An important task of the computing framework is to provide easy and transparent access to data and to facilitate the bookkeeping of processed files and failed jobs. To achieve this we set up a metadata catalog based on AMGA and plan to use it in a bookkeeping service that is based on concepts implemented in the SAM data handling system used at CDF and D0. In this talk we summarize the expected Belle II performance and the resulting computing requirements and show the status and plans of the core components of the computing infrastructure.
    
    Speaker: Takanori Hara (KEK)
    
    Slides
  - 16
    
    BNL Batch and DataCarousel systems at BNL: A tool and UI for efficient access to data on tape with faireshare policies capabilities
    
    The BNL facility, supporting the RHIC experiments as its Tier0 center and thereafter the Atlas/LHC as a Tier1 center had to address early the issue of efficient access to data stored to Mass Storage. Random use destroys access performance to tape by causing too frequent, high latency and time consuming tape mount and dismount. Coupled with a high job throughput from multiple RHIC experiments, in the early 2000, the experimental and facility teams were forced to consider ingenuous approaches. A tape access “batch” system integrated to the production system was first developed, based on the initial OakRidge National Lab (ORNL) Batch code. In parallel, a highly customizable layer and UI known as the DataCarousel was developed in-house to provide multi-user fairshare with group and user level policies controlling the sharing of resources. The simple UI, based on a perl module, allowed to create user helper script to restore datasets on disks as well as had all the features necessary to interface with higher level storage aggregation solutions. Hence, beyond the simple access at data production level, the system was also successfully used in support of numerous data access tools such as interfacing with the Scalla/Xrootd MSS plugin back end, similarly the dCache back end access to MSS. Today, all RHIC and Atlas experiments use a combination of the Batch system and the Datacarousel following a 10 years search for efficient use of resources. In 2005, BNL’s HPSS team decided to enhance the new features such as improve the HPSS resource management, enhance the visibility of realtime staging activities, statistics of historical data for performance analysis. BNL Batch provides dynamic HPSS resource management and scheduled read job efficiently while the staging performance can still be further optimized in user level using the DataCarousel to maximize the tape staging performance (sorting by tape while preserving fareshareness policies). In this presentation, we will present an overview of our system and development and share the findings of our efforts.
    
    Speaker: Mr David YU (BROOKHAVEN NATIONAL LABORATORY)
    
    Slides
  - 15:40
    
    Coffee Break
- Tuesday, 23 February - Data Analysis - Algorithms and Tools
  - 17
    
    Likelihood-based Particle Flow Algorithm at CDF for Accurate Energy Measurement and Identification of Hadronically Decaying Tau Leptons
    
    We present a new technique for accurate energy measurement of hadronically decaying tau leptons. The technique was developed and tested at CDF experiment at the Tevatron. The technique employs a particle flow algorithm complemented with a likelihood-based method for separating contributions of overlapping energy depositions of spatially close particles. In addition to superior energy resolution provided by the method and improved discrimination against backgrounds, this technique provides a direct estimate of the uncertainty in the energy measurement of each individual hadronic tau jet. The estimate of the likelihood of the observed detector response for a given particle hypothesis allows improving rejection against difficult light lepton backgrounds. This new technique is now being deployed to improve sensitivity of the H→ττ search at the Tevatron. With appropriate adjustments, the algorithm can be further extended to the case of generic (quark or gluon) jets as well as adopted at other experiments.
    
    Speaker: Andrey Elagin (Texas A&M University (TAMU))
    
    Slides
  - 18
    
    Classifying extremely imbalanced data sets
    
    Imbalanced data sets containing much more background than signal instances are very common in particle physics, and will also be characteristic for the upcoming analyses of LHC data. Following up the work presented at ACAT 2008, we use the multivariate technique presented there (a rule growing algorithm with the meta-methods bagging and instance weighting) on much more imbalanced data sets, especially a selection of D0 decays without the use of particle identification. It turns out that the quality of the result strongly depends on the number of background instances used for training. We discuss methods to exploit this in order to improve the results significantly, and how to handle and reduce the size of large training sets without loss of result quality in general. We will also comment on how to take into account statistical fluctuation in receiver operation curves (ROC) for comparing classifier methods.
    
    Speaker: Markward Britsch (Max-Planck-Institut fuer Kernphysik (MPI)-Unknown-Unknown)
    
    Slides
  - 19
    
    SFrame - A high-performance ROOT-based framework for HEP analysis
    
    In a typical offline data analysis in high-energy-physics a large number of collision events are studied. For each event the reconstruction software of the experiments stores a large number of measured event properties in sometimes complex data objects and formats. Usually this huge amount of initial data is reduced in several analysis steps, selecting a subset of interesting events and observables. In addition, the same selection is applied to simulated MC events and the final results are compared to the data. A fast processing of the events is mandatory for an efficient analysis. In this paper we introduce the SFrame package, a ROOT-based analysis framework, that is widely used in the context of ATLAS data analyses. It features (i) consecutive data reduction in multiple user-defined analysis cycles performing a selection of interesting events and observables, making it easy to calculate and store new derived event variables; (ii) a user-friendly combination of data and MC events using weighting techniques; and in particular (iii) a high-speed processing of the events. We study the timing performance of SFrame and find a highly superior performance compared to other analysis frameworks. More information can be found at: http://sourceforge.net/projects/sframe/
    
    Speaker: Dr Attila Krasznahorkay (New York University)
    
    Slides
  - 20
    
    Online Filtering for Radar Detection of Meteors
    
    The penetration of a meteor on Earth’s atmosphere results on the creation of an ionized trail, able to produce the forward scattering of VHF electromagnetic waves. This fact inspired the RMS (Radio Meteor Scatter) technique, which consists in the meteor detection using passive radar. Considering the characteristic of continuous acquisition inherent to the radar detection technique and the generation of a significant amount of data, composed mainly of background noise, an online filtering system is very attractive. Therefore, this work addresses the development of algorithms for online automatic detection of these signals. In time-domain, the optimal filtering technique is applied. The model assumes that the received signal is masked by additive noise and both signal and noise statistics are used to design a linear filter that maximizes the signal-to-noise ratio. This filter is known as the matched-filter, as detection is performed by correlating the incoming signal with replicas of the target signal components in the receiver end. In frequency-domain, two possibilities are being studied using Short-time Fast Fourier Transform: a narrowband demodulation, which basically consists in performing demodulation in filtered data in order to obtain only the envelope of the signal, and cumulative power spectrum analysis. Demodulation is attractive, as phase delays are produced by the reflection of VHF wave to the various points in the meteors trails and the different paths the traveling wave finds between the transmitting and receiving antennas. The cumulative spectral power is obtained from integrating the power spectral density function, which drastically reduces the noise effect. Sets of experimental data are being analyzed and preliminary results of these techniques with their current status of development will be shown.
    
    Speaker: Mr Eric LEITE (Federal University of Rio de Janeiro)
    
    Slides
  - 15:40
    
    Coffee Break
  - 21
    
    Absorbing systematic effects to obtain a better background model in a search for new physics
    
    This contribution discusses a novel approach to estimate the Standard Model backgrounds based on modifying Monte Carlo predictions within their systematic uncertainties. The improved background model is obtained by altering the original predictions with successively more complex correction functions in signal-free control selections. Statistical tests indicate when sufficient compatibility with data is reached. In this way, systematic effects are absorbed into the new background model. The same correction is then applied on the Monte Carlo prediction in the signal region. Comparing this method to other background estimation techniques shows improvements with respect to statistical and systematic uncertainties. The proposed method can also be applied in other fields beyond high energy physics.
    
    Speaker: Mr Stephan Horner (Albert-Ludwigs-Universitaet Freiburg)
    
    Slides
  - 22
    
    Analysis of Photoluminescence measurement data from interdiffused Quantum Wells by Real coded Quantum inspired Evolutionary Algorithm
    
    Reliable analysis of any experimental data is always difficult due to the presence of noise and other types of errors. This paper analyzes data obtained from photoluminescence measurement, after the annealing of interdiffused Quantum Well Hetrostructures, by a recently proposed Real coded Quantum inspired Evolutionary Algorithm (RQiEA). The proposed algorithm directly measures interdiffusion parameters without using Arrhenius plot. Further, the results obtained are better than those with Genetic Algorithm and Least Square Method. The RQiEA is better suited than other state of art techniques of data analysis as it uses real coding rather than binary coding and its search process is inspired by quantum computing. It has also reliably detected extrinsic interdiffusion process. Photoluminescence is a widely used process for measurement of interdiffusion parameters in semiconductor quantum well heterostructures. This method correlates the changes of the confined energy levels (PL peak energy) into characteristic diffusion length (LD) of the quantum well structure by a linear theoretical model. The correlated LD2 is plotted against annealing time, t, to determine the interdiffusion coefficient, D (T), by using the following equation: LD2 = 4*D(T)*t The interdiffusion parameters viz., activation energy, Ea, and the interdiffusion prefactor, Do, are determined by using Arrhenius equation: D(T) = Do *exp(-Ea/(K*T)) Where K is Boltzmann Constant and T is annealing temperature in Kelvin. Evolutionary Algorithm (EA) mimics process of natural evolution. RQiEA has been designed by integrating superposition and entanglement ideas from quantum computing in EA. It uses adaptive quantum inspired rotation gates to evolve population qubits. It has been shown that QiEAs are more powerful than EAs as they can better balance Exploration and Exploitation during search process.
    
    Speaker: Mr Ashish Mani (Dayalbagh Educational Institute)
    
    Slides
- Tuesday, 23 February - Methodology of Computations in Theoretical Physics
  - 23
    
    Status of the FORM project
    
    Currently there is a lot of activity in the FORM project. There is much progress on making it open source. Work is done on simplification of lengthy formulas and routines for dealing with rational polynomials are under construction. In addition new models of parallelization are being studied to make optimal use of current multi-processor machines.
    
    Speaker: Dr Irina Pushkina (NIKHEF)
    
    Slides
  - 24
    
    Parallel versions of the symbolic manipulation system FORM
    
    The symbolic manipulation program FORM is specialized to handle very large algebraic expressions. Some specific features of its internal structure make FORM very well suited for parallelization. We have now parallel versions of FORM, one is based on POSIX threads and is optimal for modern multicore computers while another one uses MPI and can be used to parallelize FORM on clusters and Massive Parallel Processing systems. Most existing FORM programs will be able to take advantage of the parallel execution without the need for modifications.
    
    Speaker: Mikhail Tentyukov (Karlsruhe University)
    
    Slides
  - 25
    
    Deterministic numerical box and vertex integrations for one-loop hexagon reductions
    
    We provide a fully numerical, deterministic integration at the level of the three- and four-point functions, in the reduction of the one-loop hexagon integral by sector decomposition. For the corresponding two- and three-dimensional integrals we use an adaptive numerical approach applied recursively in two and three dimensions, respectively. The adaptive integration is coupled with an extrapolation for an accurate, automatic treatment of integrand singularities arising from vanishing denominators in the interior of the integration domain. Furthermore, the recursive procedure alleviates extensive memory use as incurred with standard adaptive, multidimensional integration software. Tensor integrals are handled automatically by this technique and the separation of infrared singularities follows naturally by dimensional regularization.
    
    Speaker: Prof. Elise de Doncker (Western Michigan University)
    
    Slides
  - 15:30
    
    break
  - 26
    
    Recursive reduction of tensorial one-loop Feynman integrals
    
    A new reduction of tensorial one-loop Feynman integrals with massive and massless propagators to scalar functions is introduced. The method is recursive: n-point integrals of rank R are expressed by n-point and (n-1)-point integrals of rank (R-1). The algorithm is realized in a Fortran package.
    
    Speaker: Tord Riemann (DESY)
    
    Slides
  - 27
    
    Automated Computation of One-loop Scattering Amplitudes
    
    The problem of an efficient and automated computation of scattering amplitudes at the one-loop level for processes with more than 4 particles is crucial for the analysis of the LHC data. In this presentation I will review the main features of a powerful new approach for the reduction of one-loop amplitudes that operates at the integrand level. The method, also known as OPP reduction, is an important building block towards a fully automated implementation of this type of calculations. I will illustrate the existing numerical codes available for the reduction and discuss the ongoing efforts to target important issues such as stability, versatility and efficiency of the method.
    
    Speaker: Giovanni Ossola (New York City College of Technology (CUNY))
    
    Slides
- Tuesday, 23 February - Poster Session
  - 28
    
    Poster session
    
    Poster list can be found here: Poster list
- Ian Foster - Public Lecture
  - 29
    
    Ian FOSTER - Public Lecture
    
    Speaker: Ian Foster (Unknown)
Wednesday 24 February
- Wednesday, 24 February - Plenary Session
  - 30
    
    Data access in the High Energy Physics community
    
    In this talk we pragmatically address some general aspects about massive data access in the HEP environment, starting to focus on the relationships that lie among the characteristics of the available technologies and the data access strategies which are consequently possible. Moreover, the upcoming evolutions in the computing performance available also at the personal level will likely pose new challenges for the systems that have to feed the computations with data. The talk will introduce then some ideas that will likely constitute the next steps in the evolution of this kind of worldwide distributed systems, towards new levels of performance, interoperability and robustness. Efficiently running data-intensive applications can be very challenging in a single site, depending on the scale of the computations; running them in a worldwide distributed environment with chaotic user-related random access patterns needs a design which avoids all the pitfalls which could harm its efficiency at a major degree.
    
    Speaker: Dr Fabrizio Furano (Conseil Europeen Recherche Nucl. (CERN))
    
    Slides
  - 31
    
    Statistics challenges in HEP
    
    The LHC was built as a discovery machine, whether for a Higgs Boson or Supersymmetry. In this review talk we will concentrate on the methods used in the HEP community to test hypotheses. We will cover via a comparative study, from the LEP hybrid "CLs" method and the Bayesian TEVATRON exclusion techniques to the LHC frequentist discovery techniques. We will explain how to read all the exclusion and prospective discovery plots with their yellow and green bands and how to include systematics in a significance calculation. The review is aimed to be pedagogical.
    
    Speaker: Dr Eilam Gross (Weissman Institute of Physical Sciences)
    
    Slides
  - 10:20
    
    Coffee Break
  - 32
    
    Automation of multi-leg one-loop virtual amplitudes
    
    In the last years, much progress has been made in the computation of one-loop virtual matrix elements for processes involving many external particles. I this talk I will show the importance of NLO-accuracy computations for phenomenologically relevant processes and review the recent progress that will make their automated computation tractable and their inclusion in Monte Carlo tools possible.
    
    Speaker: Dr Daniel Maitre (IPPP, Great Britain)
    
    Slides
- 13:30
  
  Sightseeing Trip
Thursday 25 February
- Thursday, 25 February - Plenary Session
  - 33
    
    Data Transfer Optimization - Going Beyond Heuristics
    
    Scheduling data transfers is frequently realized using heuristic approaches. This is justifiable for on-line systems when extremely fast response is required, however, when sending large amount of data such as transferring large files or streaming video, it is worthwhile to do real optimization. This paper describes formal models for various networking problems with the focus on data networks. In particular we describe how to model path placement problems where the task is to select a path for each demand starting at some source node and finishing at the destination node. The demands can be instantaneous, for example in data streaming, or spread in time in so called bandwidth on demand problems. This second problem is a complex combination of path placement and cumulative scheduling problems. As constraint programming is a successful technology for solving scheduling problems, we sketch the basic principles of constraint programming and illustrate how constraint programming can be applied to solve the above mentioned problems. The main advantage of this solving approach is extendibility where the base model can be augmented with additional constraints derived from the specific problem requirements.
    
    Speaker: Prof. Roman Bartak (Charles University in Prague)
    
    Slides
  - 34
    
    How to Navigate Next Generation Programming Models for Next Generation Computer Architecture
    
    Something strange has been happening in the slowly evolving, placid world of high performance computing. Software and hardware vendors have been introducing new programming models at a breakneck pace. At first blush, the proliferation of parallel programming models might seem confusing to software developers, but is it really surprising? In fact, programming models have been rapidly evolving for the better part of two decades, thanks in no small part to the boom in web-based application development frameworks and tools. The fundamental forces driving this are twofold: the first is the importance of domain-specific specialization and optimizations productively use modern hardware infrastructure and the second a base of software developers that is better able to adapt to these programming models and use “the right tool for the job”. They key question that remains which must be answered by vendors of these tools is: What tool is right for you? Intel provides a broad set of programming tools and programming models that is a microcosm of the broader diversity available in the software ecosystem. I will discuss how these tools and programming models relate and interoperate with each other in a way that developers can use to navigate the their choices. I will pay particular attention to our work in adding data parallelism in C++ via Intel’s Ct technology in ways that eliminate the traditional modularity tax associated with C++ frameworks. I will also show how this work has been applied in particles physics workloads.
    
    Speaker: Dr Anwar Ghuloum (Intel Corporation)
    
    Slides
  - 10:20
    
    Coffee Break
  - 35
    
    Tools for Dark Matter in Particle Physics and Astrophysics
    
    Speaker: Dr Alexander Pukhov (Moscow State University, Russia)
  - 36
    
    Lattice QCD simulations
    
    The formulation of QCD on a 4-dimensional space-time euclidean lattice is given. We describe, how with particular implementations of the lattice Dirac operator the lattice artefacts can be changed from a linear to a quadratic behaviour in the lattice spacing allowing therefore to reach the continuum limit faster. We give an account of the algorithmic aspects of the simulations, discuss the supercomputers used and give the computational costs. A few examples of physical quantities which are computed today at almost physical quark masses are presented.
    
    Speaker: Dr Karl Jansen (NIC, DESY, Zeuthen)
    
    Slides
- 12:00
  
  Lunch Break
- Thursday, 25 February - Computing Technology for Physics Research
  - 37
    
    The ALICE Online Data Quality Monitoring
    
    ALICE (A Large Ion Collider Experiment) is the detector designed to study the physics of strongly interacting matter and the quark-gluon plasma in Heavy-Ion collisions at the CERN Large Hadron Collider (LHC). The online Data Quality Monitoring (DQM) is a critical element of the data acquisition's software chain. It intends to provide shifters with precise and complete information to quickly identify and overcome problems, and as a consequence to ensure acquisition of high quality data. DQM typically involves the online gathering, the analysis by user-defined algorithms and the visualization of monitored data. This paper describes the final design of ALICE’s DQM framework called AMORE (Automatic MOnitoRing Environment), as well as its latest and coming features like the integration with the offline analysis and reconstruction framework, a better use of multi-core processors by a parallelization effort, and its interface with the eLogBook. The concurrent collection and analysis of data in an online environment requires the framework to be highly efficient, robust and scalable. We will describe what has been implemented to achieve these goals and the benchmarks we carried on to ensure appropriate performance. We finally review the wide range of usages people make of this framework, from the basic monitoring of a single sub-detector to the most complex ones within the High Level Trigger farm or using the Prompt Reconstruction and we describe the various ways of accessing the monitoring results. We conclude with our experience, before and after the LHC restart, when monitoring the data quality in a real-world and challenging environment.
    
    Speaker: Mr Barthelemy von Haller (CERN)
    
    Slides
  - 38
    
    Building Efficient Data Planner for Peta-scale Science
    
    Unprecedented data challenges both in terms of Peta-scale volume and concurrent distributed computing have seen birth with the rise of statistically driven experiments such as the ones represented by the high-energy and nuclear physics community. Distributed computing strategies, heavily relying on the presence of data at the proper place and time, have further raised demands for coordination of data movement on the road onwards achieving high performance. Massive data processing will be hardly “fair” to users and hardly using network bandwidth efficiently whenever diverse usage patterns and priorities will be involved unless we address and deal with planning and reasoning about data movement and placement. Although there exist several sophisticated and efficient point-to-point data transfer tools, the lack of global planners and decision makers, answering questions such as “How to bring the required dataset to the user?” or “From which sources to grab the replicated data”, is for most part lacking. We present our work and a status of the development of an automated data planning and scheduling system, ensuring fairness and efficiency of data movement by focusing on the minimal time to realize data movement (delegating the data transfer itself to existing transfer tools). Its principal keystones are self-adaptation to the network/service alteration, optimal selection of transfer channels, bottlenecks avoidance and user fair-share preservation. The planning mechanism is built on constraint based model, reflecting the restrictions from reality by mathematical constraints, using Constraint Programming and Mixed Integer Programming techniques. In this presentation, we will concentrate on clarifying the overall system from a software engineer's point of view and present the general architecture and interconnection between centralized and distributed components of the system. While the framework is evolving toward implementing more constraints (such as CPU availability versus storage for a better planing of massive analysis and data production), the current state of our implementation in use for STAR within multi-user environment between multiple sites and services will be presented and the benefit and consequences summarized.
    
    Speaker: Mr Michal ZEROLA (Nuclear Physics Inst., Academy of Sciences)
    
    Slides
  - 39
    
    Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam
    
    The real time data analysis at next generation experiments is a challenge because of their enormous data rate and size. The Belle II experiment, the upgraded Belle experiment, requires to manage a data amount of O(100) times the current Belle data size collected at more than 30kHz. A sophisticated data analysis is required for the efficient data reduction in the high level trigger farm in addition to the offline analysis. On the other hand, a telescope survey with Hyper Suprime-Cam at Subaru Observatory for the search of dark energy also needs to handle a large number of CCD images whose size is comparable with that of Belle II. The feed-back of the measurement parameters obtained by the real time data processing has never been performed in the past where the parameter tuning entirely relies on an empirical method. We are now developing a new software framework named "roobasf" to be shared both by Belle II and Hyper Suprime-Cam. The framework has the well-established software-bus architecture and the object persistency interface with ROOT IO. In order to achieve the required real-time performance, the parallel processing technique is widely used to utilize a huge number of network-connected PCs with multi-core CPUs. The parallel processing is performed not only in the trivial event-by-event manner, but also in the pipeline of the application software modules which are dynamically placed on many PCs. The object data flow over the network is implemented using the Message Passing Interface (MPI) which also provides the system-wide control scheme. The framework adopts Python as the user interface language. The detailed design and the development status of the framework is presented at the conference.
    
    Speaker: Mr Sogo Mineo (University of Tokyo)
    
    Slides
  - 40
    
    Contextualization in Practice: The Clemson Experience
    
    Dynamic virtual organization clusters with user-supplied virtual machines (VMs) have advantages over generic environments. These advantages include the ability for the user to have a priori knowledge of the scientific tools and libraries available to programs executing in the virtualized environment well as the other details of the environment. The user can also perform small-scale testing locally, thus saving time and conserving computational resources. However, user-supplied VMs require contextualization in order to operate properly in a given cluster environment. Two types of contextualization are necessary per-environment and per-session. Examples of per-environment contextualization include one-time configuration tasks such as ensuring availability of ephemeral storage, mounting of a cluster-provided shared filesystem, integration with the cluster's batch scheduler, etc. Also necessary is per-session contextualization such as the assignment of MAC and IP addresses. This paper discusses the challenges and techniques used to overcome those challenges in the contextualization of the STAR VM for the Clemson University cluster environment. Also included are suggestions to VM authors to allow for efficient contextualization of their VMs.
    
    Speaker: Dr Jerome LAURET (BROOKHAVEN NATIONAL LABORATORY)
    
    Slides
  - 15:40
    
    Coffee Break
  - 41
    
    Implementation of new WLCG services into the AliEn Computing model of the ALICE experiment before the data taking
    
    By the time of this conference the LHC ALICE experiment at CERN will have collected a significant amount of data. To process the data that will be produced during the life time of the LHC, ALICE has developed over the last years a distributed computing model across more than 90 sites that build on the overall WLCG (World-wide LHC Computing Grid) service. ALICE implements the different Grid services provided by the gLite middleware into the experiment computing model. During the period 2008-2009 the WLCG project has deployed new versions of some services which are crucial for the ALICE computing as the gLite3.2 VOBOX, the CREAM-CE and the gLite3.2 WMS. In terms of Computing systems, the current LCG-CE used by the four LHC experiments is about to be deprecated in benefit of the new CREAM service (Computing Resource Execution And Management). CREAM is a lightweight service created to handle job management operations at the CE level. It is able to accept requests both via the gLite WMS service and also via direct submission for transmission to the local batch system. This flexible duality provides the users with a large level of freedom to adapt the service to their own computing models, but at the same time it requires a careful follow up of the requirements and tests of the experiments to ensure that their needs are fulfilled before real data taking. ALICE has been the first Grid community to implement the CREAM into the experiment computing model and to test it to a production level. Since 2008 ALICE is providing the CERN Grid deployment team and the CREAM developers with important feedback which have lead to the identification of important bugs and issues solved before the real data taking. In addition ALICE has been also leader in testing and implementing other generic services as the gLite3.2 VOBOX and WMS before their final deployment. In this talk we present a summary of the ALICE experiences by using these new services also including the testing results of the other three LHC experiments. The experiments requirements and the expectations for both the sites and the services themselves are exposed in detail. Finally, the operations procedures, which have been elaborated together with the experiment support teams will be included in this presentation
    
    Speaker: Fabrizio Furano (CERN IT/DM)
    
    Slides
  - 16:35
- Thursday, 25 February - Data Analysis - Algorithms and Tools
  - 42
    
    ATLAS Second-Level Electron/Jet Neural Discriminator based on Nonlinear Independent Components
    
    The ATLAS online filtering (trigger) system comprises three sequential filtering levels and uses information from the three subdetectors (calorimeters, muon system and tracking). The electron/jet channel is very important for triggering system performance as interesting signatures (Higgs, SUSY, etc.) may be found efficiently through decays that produce electrons as final-state particles. Electron/jet separation relies very much on calorimeter information, which, in ATLAS, is segmented into seven layers. Due to differences both in depth and cell granularity of these layers, trigger algorithms may benefit from performing feature extraction at the layer level. This work addresses the second level (L2) filtering restricted to calorimeter data. Particle discrimination at L2 is split into two phases: feature extraction, in where detector information is processed aiming at extracting a compact set of discriminating variables, and an identification step, where particle discrimination is performed over these relevant variables. The Neural Ringer is an alternative electron/jet L2 discriminator. Through Neural Ringer, the feature extraction is performed by building up concentric energy rings from a Region of Interest (RoI) data. At each calorimeter layer, the hottest (most energetic) cell is defined as the first ring, and the following rings are formed around it, so that all cells belonging to a ring have their sampled energies added together and normalized. A total of 100 ring sums fully describes the ROI. Next, a supervised neural classifier, fed from the ring-structure, is used for performing the final identification. Independent Component Analysis (ICA) is a signal processing technique that aims at finding linear projections (s=Ax) of the multidimensional input data (x) in a way that the components of s (also called sources) are statistically independent (or at least as independent as possible). The nonlinear extension of ICA (NLICA) provides a more general formulation, as the sources are assumed to be generated by a nonlinear model: s=F(x), where F(.) is a nonlinear mapping. The Post-nonlinear (PNL) mixing model is a class of NLICA model that restricts the nonlinear mapping to a cascaded structure, which comprises a linear mapping followed by component-wise nonlinearities (cross-channel nonlinearities are not allowed). In this work, a modification on the Neural Ringer discriminator is proposed by applying the PNL model to the ring-structure for both feature extraction and signal compaction. In order to cope with different characteristics of each calorimeter layer, here the feature extraction procedure is performed in a segmented way (at the layer level). The neural discriminator is then fed from the estimated nonlinear independent components. The proposed algorithm is applied to different L2 datasets. Compared to the Neural Ringer, the proposed approach reduces the number of inputs for the neural classifier (contributing to reduce the computational requirements) and also produces higher discrimination performance.
    
    Speaker: Mr Eric LEITE (Federal University of Rio de Janeiro)
    
    Slides
  - 43
    
    High Volume data monitoring with RootSpy
    
    The GlueX experiment will gather data at up to 3GB/s into a level-3 trigger farm, a rate unprecedented at Jefferson Lab. Monitoring will be done using the cMsg publish/subscribe system to transport ROOT objects over the network using the newly developed RootSpy package. RootSpy can be attached as a plugin to any monitoring program to "publish" its objects on the network without modification to the original code. A description of the RootSpy package will be presented with details of the pub/sub model it employees for ROOT object distribution. Data rates obtained from tests using multi-threaded monitoring programs will also be shown.
    
    Speaker: Dr David Lawrence (Jefferson Lab)
  - 44
    
    mc4qcd: web based analysis and visualization tool for Lattice QCD
    
    mc4qcd is a web based collaboration for analysis of Lattice QCD data. Lattice QCD computations consists of a large scale Markov Chain Monte Carlo. Multiple measurements are performed at each MC step. Our system acquires the data by uploading log files, parses them for results of measurements, filters them, mines the data for required information by aggregating results in multiple forms, represents the results as plots and histograms, and it further allows refining and interaction by fitting the results. The system computes moving averages and autocorrelations, builds bootstrap samples and bootstrap errors, and allows modeling the data using Bayesian correlated constrained linear and non-linear fits. It can be scripted to allow real time visualization of results form an ongoing computation. The system is modular and it can be easily adapted to automating the workflow of other types of computations.
    
    Speaker: Prof. Massimo Di Pierro (DePaul University)
    
    Slides
  - 45
    
    TMVA - Toolkit for Multivariate Data Analysis
    
    At the dawn of LHC data taking, multivariate data analysis techniques have become the core of many physics analyses. TMVA provides easy access to sophisticated multivariate classifiers and is widely used to study and deploy these for data selection. Beyond classification, most multivariate methods in TMVA perform regression optimization which can be used to predict data corrections, e.g. for calibration or shower corrections. The tightening of the integration with ROOT provides a common platform for discussion between the user community and the TMVA devolopers. The talk gives an overview of the new features in TMVA such as regression, multi-class classification and cathegorization, the extented pre-processing capabilities, and planned further developments.
    
    Speaker: Dr Joerg Stelzer (DESY, Germany)
    
    Slides
  - 15:40
    
    Coffee Break
  - 46
    
    FAST PARALLELIZED TRACKING ALGORITHM FOR THE MUON DETECTOR OF THE CBM EXPERIMENT AT FAIR
    
    Particle trajectory recognition is an important and challenging task in the Compressed Baryonic Matter (CBM) experiment at the future FAIR accelerator at Darmstadt. The tracking algorithms have to process terabytes of input data produced in particle collisions. Therefore, the speed of the tracking software is extremly important for data analysis. In this contribution, a fast parallel track reconstruction algorithm which uses available features of modern processors is presented. These features comprize a SIMD instruction set and multithreading. The first allows to pack several data items into one register and to operate on all of them in parallel thus achieving more operations per cycle. The second feature enables the routines to exploit all available CPU cores and hardware threads. This parallelized version of the tracking algorithm has been compared to the initial serial scalar version which uses a similar approach for tracking. A speed up factor of 140 was achieved (from 630 msec/event to 4.5 msec/event) for an Intel Core 2 Duo processor at 2.26 GHz.
    
    Speaker: Mr Andrey Lebedev (GSI, Darmstadt / JINR, Dubna)
    
    Slides
  - 47
    
    The RooStats project
    
    RooStats is a project to create advanced statistical tools required for the analysis of LHC data, with emphasis on discoveries, confidence intervals, and combined measurements. The idea is to provide the major statistical techniques as a set of C++ classes with coherent interfaces, which can be used on arbitrary model and datasets in a common way. The classes are built on top of RooFit, which provides a very convenient functionality for modeling the probability density functions or the likelihood functions, required as inputs for any statistical technique. Furthermore, RooFit provides via the RooWorkspace class, the functionality for easily creating models, for analysis combination and for digital publication of the likelihood function and the data. We will present in detail the design and the implementation of the different statistical methods of RooStats. These include various classes for interval estimation and for hypothesis test depending on different statistical techniques such as those based on the likelihood function, or on frequentists or bayesian statistics. These methods can be applied in complex problems, including cases with multi parameter of interests and various nuisance parameters. We will also show some example of usage and we will describe the results and the statistical plots obtained by running the RooStats methods.
    
    Speakers: Dr Gregory Schott (Karlsruhe Institute of Technology), Dr Lorenzo Moneta (CERN)
    
    Slides
  - 48
    
    Parallelization of the SIMD Ensemble Kalman Filter for Track Fitting Using Ct
    
    A great portion of data mining in a high-energy detector experiment is spent in the complementary tasks of track ﬁnding and track ﬁtting. These problems correspond, respectively, to associating a set of measurements to a single particle, and to determining the parameters of the track given a candidate path [Avery 1992]. These parameters usually correspond to the 5-tuple state of the model of motion of a charged particle in a magnetic ﬁeld. Global algorithms for track ﬁtting have been superceded by recursive least-square estimation algorithms that, assuming the measurement noise is Gaussian, result in an optimal estimate [Fr\"uwirth and Widl 1992]. However, this assumption hardly ever holds due to energy loss and multiple scattering. Extensions to the Kalman ﬁlter have been proposed and implemented [Fr\"uwirth 1997] that use sums of Gaussians to model non-Gaussian distributions. In addition, non-linear ﬁltering is necessary to eliminate the effects of outliers including misclassiﬁed measurements. Track ﬁtting based on sums of Gaussians can be implemented through an ensemble of single Gaussian ﬁts, each of which is the result of a single Kalman ﬁlter [Gorbunov et al. 2008]. Efficient parallel implementations of these algorithms are also crucial due to the large amount of data that needs to be processed. Since many separate tracks need to be ﬁtted, the simplest way to parallelize the problem is to ﬁt many independent tracks at once. However, with the ensemble algorithm is it also possible to parallelize across the ensemble within a single track ﬁt. Parallelism mechanisms available in the hardware include multiple nodes and multiple cores. Prior implementations can also make use of the SIMD vector units present in most modern CPUs [Gorbunov et al. 2008]. We have performed an implementation of the ensemble approach using Ct. Ct is a generalized data-parallel programming platform that can efficiently target both multiple cores and SIMD vector units from a single high-level speciﬁcation in C++. Compared to the earlier SIMD implementation, Ct allows for hardware and instruction set portability. We have obtained scalable speedup with respect to a scalar baseline implemented using both single and double precision. In the prior work, a speedup of 1.6x for scalar vs. vectorized versions of the algorithm were reported for double precision. Our results are comparable to these. In addition, we have explored the implementation of sequential Monte Carlo in Ct using particle ﬁltering to model arbitrary probabality distributions. However, one challenge in this context is the computation of the likelihood function, which requires estimates of the measurement error for every measurement and their correlation. References Avery, P. 1992. Applied ﬁtting theory V: Track ﬁtting using the Kalman ﬁlter. Tech. rep. Fr\"uuwirth, R., and Widl, E. 1992. Track-based alignment using a Kalman ﬁlter technique. Communications in Computational Physics . Fr\"uwirth, R. 1997. Track-ﬁtting with non-Gaussian noise. Communications in Computational Physics (Jan.). Gorbunov, S., Kebschull, U., Kisel, I., Lindenstruth, V., and M\"uller, W. F. J. 2008. Fast SIMDized Kalman ﬁlter-based track ﬁt. Computer Physics Communications .
    
    Speaker: Dr Michael D. McCool (Intel/University of Waterloo)
    
    Slides
- Thursday, 25 February - Methodology of Computations in Theoretical Physics
  - 49
    
    Calculating one loop multileg processes. A program for the case of $gg\rightarrow t \bar{t}+gg$
    
    Processes with more than 5 legs are added to experimentalists' wish list for a long time now. This study is targeted to the NLO qcd corrections of such processes in the LHC. Many Feynman diagrams are contributing, including those with five- and six-point functions. A Fortran code for the numerical calculation of one-loop corrections for the process $gg\rightarrow t \bar{t}+gg$ is reviewed. A variety of tools like Diana, Form, Maple, Fortran are used in combination.
    
    Speaker: Dr Theodoros Diakonidis (DESY,Zeuthen)
    
    Slides
  - 50
    
    The automation of subtraction schemes for next-to-leading order calculations in QCD
    
    There has been made tremendous progress in the automation of one-loop (or virtual) contributions to next-to-leading order (NLO) calculations in QCD, using both the conventional Feynman diagram approach as well as unitarity-based techniques. To have rates and distributions for observables at particle colliders at NLO accuracy also the real emission and subtraction terms have to be included in the calculation. Recently, two variations of subtraction schemes have been automated by several groups. In this talk these two schemes will be reviewed and there implementations discussed.
    
    Speaker: Dr Rikkert Frederix (University Zurich)
    
    Slides
  - 15:00
    
    Coffee Break
  - 51
    
    The FeynSystem: FeynArts, FormCalc, LoopTools
    
    The talk describes the recent additions in the automated Feynman diagram computation system FeynArts, FormCalc, and LoopTools
    
    Speaker: Thomas Hahn (MPI Munich)
    
    Slides
  - 52
    
    New developments in event generator tuning techniques
    
    Data analyses in hadron collider physics depend on background simulations performed by Monte Carlo (MC) event generators. However, calculational limitations and non-perturbative effects require approximate models with adjustable parameters. In fact, we need to simultaneously tune many phenomenological parameters in a high-dimensional parameter-space in order to make the MC generator predictions fit the data. It is desirable to achieve this goal without spending too much time or computing resources iterating parameter settings and comparing the same set of plots over and over again. I will present extensions and improvements to the MC tuning system, Professor, which addresses the aforementioned problems by constructing a fast analytic model of a MC generator which can then be easily fitted to data. Using this procedure it is for the first time possible to get a robust estimate of the uncertainty of generator tunings. Furthermore, we can use these uncertainty estimates to study the effect of new (pseudo-) data on the quality of tunings and therefore decide if a measurement is worthwhile in the prospect of generator tuning. The potential of the Professor method outside the MC tuning area is presented as well.
    
    Speaker: Dr James Monk (MCnet/Cedar)
    
    Slides
- Thursday, 25 February - Multicore Panel
  - 53
    Multicore Panel
    
    The multicore panel will review recent activities in the multicore/manycore arena. It will consist of four people kicking off the session by making short presentations, but it will mainly rely on a good interaction with the audience: Mohammad Al-Turany (GSI/IT) Anwar Ghuloum (INTEL Labs) Sverre Jarp (CERN/IT) Alfio Lazzaro (CERN/IT)
    
    Speakers: Dr Alfio Lazzaro (Universita degli Studi di Milano & INFN, Milano), Anwar Ghuloum (Intel Corporation), Dr Mohammad Al-Turany (GSI DARMSTADT), Mr Sverre Jarp (CERN)
    
    Slides
    
    MC_Panel_CPUs.pdf
    
    MC_Panel_Intel_ACAT_tools_presentation.pdf
    
    MC_panel_LazzaroA.pdf
    
    MC_Panel_MAT_GPUs.pdf
- 19:30
  
  Social Event
Friday 26 February
- Friday, 26 February - Plenary Session
  - 54
    
    Scientific Computing with Amazon Web Services
    
    In an era where high throughput instruments and sensors are increasingly providing us faster access to new kinds of data, it is becoming very important to have timely access to resources which allow scientists to collaborate and share data while maintaining the ability to process vas > quantities of data or run large scale simulations when required. Built on Amazon's vast global computing infrastructure, Amazon Web Services (AWS) provides scientists with a number of highly scalable, highly available infrastructure services that can be used to perform a variety of tasks. The ability to scale storage and analytics resources on-demand has made AWS a platform for a number of scientific challenges including high energy physics, next generation sequencing, and galaxy mapping. A number of scientists are also making a number of algorithms and applications available as Amazon Machine Images, or as applications that can be deployed to Amazon Elastic MapReduce. In this talk, we will discuss the suite of Amazon Web Services relevant to the scientific community, go over some example use cases, and the advantages that cloud computing offers for the scientific community. We will also discuss how we can leverage new paradigms and trends in distributed computing infrastructure and utility models that allow us to manage and analyze big data at scale.
    
    Speaker: Dr Singh Deepak (Business Development Manager - Amazon EC2)
  - 55
    
    Applying CUDA Computing Model To Event Reconstruction Software
    
    Speaker: Dr Mohammad AL-TURANY (GSI DARMSTADT)
    
    Slides
  - 10:20
    
    Coffee Break
  - 56
    
    Application of Many-core Accelerators for Problems in Astronomy and Physics
    
    Recently, many-core accelerators are developing so fast that the computing devices attract researchers who are always demanding faster computers. Since many-core accelerators such as graphic processing unit (GPU) are nothing but parallel computers, we need to modify an existing application program with specific optimizations (mostly parallelization) for a given accelerator. In this paper, we describe our problem-specific compiler system for many-core accelerators, specifically, GPU and GRAPE-DR. GRAPE-DR is another many-core accelerators device that is specially targeted scientific applications. In our compiler, we focus a compute intensive problem expressed as two-nested loop.Recently, many-core accelerators are developing so fast that the computing devices attract researchers who are always demanding faster computers. Since many-core accelerators such as graphic processing unit (GPU) are nothing but parallel computers, we need to modify an existing application program with specific optimizations (mostly parallelization) for a given accelerator. In this paper, we describe our problem-specific compiler system for many-core accelerators, specifically, GPU and GRAPE-DR. GRAPE-DR is another many-core accelerators device that is specially targeted scientific applications. In our compiler, we focus a compute intensive problem expressed as two-nested loop. Our compiler ask a user to write computations in the inner-most loop. All details related to parallelization and optimization techniques for a given accelerator are hidden from the user point of view. Our compiler successfully generates the fastest code ever for astronomical N-body simulations with the performance of 2600 GFLOPS (single precision) on a recent GPU. However, this code that simply uses a brute-force $O(N2)$ algorithm is not practically useful for a system with $N > 100,000$. For more lager system, we need a sophisticated $O(N {\rm log} N)$ force evaluation algorithm, e.g., the oct-tree method. We also report our implementation of the oct-tree method on GPU. We successfully run a simulation of structure formation in the universe very efficiently using the oct-tree method. Another successful application on both GPU and GRAPE-DR is the evaluation of a multi-dimensional integral with quadruple precision. The program generated by our compiler runs at a speed of 5 - 7 GFLOPS on GPU and 3 - 5 on GRAPE-DR. This computation speed is more than 50 times faster than a general purpose CPU.Recently, many-core accelerators are developing so fast that the computing devices attract researchers who are always demanding faster computers. Since many-core accelerators such as graphic processing unit (GPU) are nothing but parallel computers, we need to modify an existing application program with specific optimizations (mostly parallelization) for a given accelerator. In this paper, we describe our problem-specific compiler system for many-core accelerators, specifically, GPU and GRAPE-DR. GRAPE-DR is another many-core accelerators device that is specially targeted scientific applications. In our compiler, we focus a compute intensive problem expressed as two-nested loop. Our compiler ask a user to write computations in the inner-most loop. All details related to parallelization and optimization techniques for a given accelerator are hidden from the user point of view. Our compiler successfully generates the fastest code ever for astronomical N-body simulations with the performance of 2600 GFLOPS (single precision) on a recent GPU. However, this code that simply uses a brute-force $O(N2)$ algorithm is not practically useful for a system with $N > 100,000$. For more lager system, we need a sophisticated $O(N {\rm log} N)$ force evaluation algorithm, e.g., the oct-tree method. We also report our implementation of the oct-tree method on GPU. We successfully run a simulation of structure formation in the universe very efficiently using the oct-tree method. Another successful application on both GPU and GRAPE-DR is the evaluation of a multi-dimensional integral with quadruple precision. The program generated by our compiler runs at a speed of 5 - 7 GFLOPS on GPU and 3 - 5 on GRAPE-DR. This computation speed is more than 50 times faster than a general purpose CPU. Our compiler ask a user to write computations in the inner-most loop. All details related to parallelization and optimization techniques for a given accelerator are hidden from the user point of view. Our compiler successfully generates the fastest code ever for astronomical N-body simulations with the performance of 2600 GFLOPS (single precision) on a recent GPU. However, this code that simply uses a brute-force $O(N2)$ algorithm is not practically useful for a system with $N > 100,000$. For more lager system, we need a sophisticated $O(N {\rm log} N)$ force evaluation algorithm, e.g., the oct-tree method. We also report our implementation of the oct-tree method on GPU. We successfully run a simulation of structure formation in the universe very efficiently using the oct-tree method. Another successful application on both GPU and GRAPE-DR is the evaluation of a multi-dimensional integral with quadruple precision. The program generated by our compiler runs at a speed of 5 - 7 GFLOPS on GPU and 3 - 5 on GRAPE-DR. This computation speed is more than 50 times faster than a general purpose CPU.
    
    Speaker: Naohito Nakasato (University of Aizu)
    
    Slides
  - 57
    
    Numerical approach to Feynman diagram calculations: Benefits from new computational capabilities
    
    Speaker: Dr Fukuko YUASA (KEK)
    
    Slides
- 12:00
  
  Lunch Break
- Friday, 26 February - Computing Technology for Physics Research
  - 58
    
    Tools to use heterogeneous Grid schedulers and storage system
    
    The Grid approach provides an uniform access to a set of geographically distributed heterogeneous resources and services, enabling projects that would be impossible without massive computing power. Different storage projects have been developed and a few protocols are being used to interact with them such as GsiFtp and SRM (Storage Resource Manager). Moreover, during last few years different Grid projects have developed different middleware such as EGEE, OSG, NorduGrid and each one typically implements its own interface and workflow. For a user community which needs to work through the Grid, interoperability is a key concept. To handle different Grid interfaces, the resource heterogeneity and different workflows, in a really transparent way, we have developed two modular tools: BossLite and Storage Element API. These deal with different Grid schedulers and storage systems respectively, by providing a uniform standard interface that hides the differences between the systems they interact with. BossLite transparently interacts with different Grid systems, working as a layer between an application and the middleware. Storage Element API implements and manages the operations that can be performed with the different protocols used in the main Grid storage systems. Both the tools are already being used in production in the CMS computing tools for distributed analysis and Monte Carlo production. In this paper we show their implementation, how they are used and performance results.
    
    Speaker: Dr Mattia Cinquilli (INFN, Sezione di Perugia)
    
    Slides
  - 59
    
    PROOF - Best Practices
    
    With PROOF, the parallel ROOT Facility, being widely adopted for LHC data analysis, it becomes more and more important to understand the different parameters that can be tuned to make the system perform optimally. In this talk we will describe a number of "best practices" to get the most out of your PROOF system, based on feedback from several pilot setups. We will describe different cluster configurations (CPU, memory, network, HDD, SSD), PROOF and xrootd configuration options, running a dedicated PROOF system or using PROOF on Demand (PoD) on batch systems. This talk will be beneficial for people setting up Tier-3/4 analysis clusters.
    
    Speaker: Dr Fons Rademakers (CERN)
    
    Slides
  - 60
    
    Optimizing CMS software to the CPU
    
    CMS is a large, general-purpose experiment at the Large Hadron Collider (LHC) at CERN. For its simulation, triggering, data reconstruction and analysis needs, CMS collaborators have developed many millions of lines of C++ code, which are used to create applications run in computer centers around the world. Maximizing the performance and efficiency of the software is highly desirable in order to maximize the physics results obtained from the available computing resources. In the past the code optimization effort in CMS has focused on improving algorithms, basic C++ issues, excessive dynamic memory use and memory footprint. Optimizing software today, on modern multi-/many-core 64bit CPU's and their memory architectures, requires however a more sophisticated approach. This presentation will summarize efforts in CMS to understand how to properly optimize our software for maximum performance on modern hardware. Experience with various tools, lessons learned and concrete results achieved will be described.
    
    Speaker: Dr Peter Elmer (PRINCETON UNIVERSITY)
    
    Slides
  - 61
    
    Optimization of Grid Resources Utilization: QoS-aware client to storage connection in AliEn
    
    In a World Wide distributed system like the ALICE Environment (AliEn) Grid Services, the closeness of the data to the actual computational infrastructure denotes a substantial difference in terms of resources utilization efficiency. Applications unaware of the locality of the data or the status of the storage environment can waste network bandwidth in case of slow networks or fail accessing data from remote or inoperational storage elements. In this paper we present an approach to QoS-aware client to storage connection by introduction of a periodically updated Storage Element Rank Cache. Based on the MonALISA monitoring framework, a Resource Discovery Broker is continuously assessing the status of all available Storage Elements in the AliEn Grid. Combining availability with network topology information, rated lists of Storage Elements are offered to any client requesting access to remote data. The lists are centrally cached by AliEn and filtered in the course of user-based authorization and requested QoS flags. This approach shows significant improvements towards an optimized storage and network resource utilization and enhances the client resilience in case of failures.
    
    Speaker: Mr Costin Grigoras (CERN)
    
    Slides
  - 15:40
    
    Coffee Break
  - 62
    
    "NoSQL" databases in CMS Data and Workflow Management
    
    In recent years a new type of database has emerged in the computing landscape. These "NoSQL" databases tend to originate from large internet companies that have to serve simple data structures to millions of customers daily. The databases specialise for certain use cases or data structures and run on commodity hardware, as opposed to large traditional database clusters. In this paper we discuss the current usage of "NoSQL" databases in the CMS data and workload management tools, discuss how we expect our systems to evolve to take advantage of them and how these technologies could be used in a wider context.
    
    Speaker: Andrew Melo (Vanderbilt)
  - 16:35
- Friday, 26 February - Data Analysis - Algorithms and Tools
  - 63
    
    Fourier Transforms as a tool for Analysis of Hadron-Hadron Collisions.
    
    Hadronic final states in hadron-hadron collisions are often studied by clustering final state hadrons into jets, each jet approximately corresponding to a hard parton. The typical jet size in a high energy hadron collision is between 0.4 and 1.0 in eta-phi. On the other hand, there may be structures of interest in an event that are of a different scale to the jet size. For example, to a first approximation the underlying event is a uniform emission of radiation spanning the entire detector, colour connection effects between hard partons may fill the region between a jet and the proton remnant and hadronisation effects may extend beyond the jets. We consider the possibility of performing a Fourier decomposition on individual events in order to produce a power spectrum of the transverse energy radiated at different angular scales. We attempt to identify correlations in the emission of radiation over distances ranging from the full detector size to approximately 0.2 in eta-phi.
    
    Speaker: Dr James William Monk (Department of Physics and Astronomy - University College London)
    
    Slides
  - 64
    
    Fast Parallel Ring Recognition Algorithm in the RICH Detector of the CBM Experiment at FAIR
    
    The Compressed Baryonic Matter (CBM) experiment at the future FAIR facility at Darmstadt will measure dileptons emitted from the hot and dense phase in heavy-ion collisions. In case of an electron measurement, a high purity of identified electrons is required in order to suppress the background. Electron identification in CBM will be performed by a Ring Imaging Cherenkov (RICH) detector and Transition Radiation Detectors (TRD). Very fast data reconstruction is extremely important for CBM because of the huge amount of data which has to be handled. In this contribution, the parallel ring recognition algorithm is presented. Modern CPUs have two features, which enable parallel programming. First, the SSE technology allows using the SIMD execution model. Second, multi core CPUs enable to use multithreading. Both features have been implemented in the ring reconstruction of the RICH detector. A speed up factor of 20 has been achieved (from 750 ms/event to 38 ms/event) for an Intel Core 2 Duo processor at 2.13 GHz.
    
    Speaker: Semen Lebedev (GSI, Darmstadt / JINR, Dubna)
    
    Slides
  - 65
    
    WatchMan Project - Computer Aided Software Engineering applied to HEP Analysis Code Building for LHC
    
    A lot of code written for high-level data analysis has many similar properties, e.g. reading out the data of given input files, data selection, overlap removal of physical objects, calculation of basic physical quantities and the output of the analysis results. Because of this, too many times, writing a new piece of code, one starts copying and pasting from old code, modyfing it then for specific purposes, ending up with a plethora of classes to maintain, debug and validate. Moreover nowadays the complexity of software frameworks of HEP experiments needs that the user gets many technical details before starting writing the code. Writing such code for each new analysis is error prone and time consuming. A solution of this problem is WatchMan, a "data analysis construction kit" and highly automated analysis code generator. WatchMan takes as inputs user-settings from a GUI or from a text-like steering file, and in few easy steps it dynamically generates the complete analysis code, ready to be run over data, locally or on the GRID. The package has been implemented in Python and C++, using CASE (Computer Aided Software Engineering) principles. As a first example we interfaced the tool to the framework of the ATLAS experiment and it has been used for various analyses in ATLAS by several users. The package is nevertheless independent of the experimental framework, and modular interfaces will be provided for other experiments as well.
    
    Speaker: Riccardo Maria Bianchi (Physikalisches Institut-Albert-Ludwigs-Universitaet Freiburg-Unk)
    
    Slides
    
    Source Code SVN Repository
    
    Wiki - Docs and tutorials
  - 15:40
    
    Coffe Break
  - 66
    
    Parallel approach to online event reconstruction in the CBM experiment
    
    Future many-core CPU and GPU architectures require relevant changes in the traditional approach to data analysis. Massive hardware parallelism at the levels of cores, threads and vectors has to be adequately reflected in mathematical, numerical and programming optimization of the algorithms used for event reconstruction and analysis. An investigation of the Kalman filter, which is the core of the reconstruction algorithms in modern HEP experiments, has demonstrated a potential several orders of magnitude increase of the speed of the algorithms, if properly optimized and parallelized. The Kalman filter based track fit is used as a benchmark for monitoring the performance of novel CPU and GPU architectures, as well as for investigating modern parallel programming languages. In the CBM experiment at FAIR/GSI all basic reconstruction algorithms have been parallelized. For maximum performance all algorithms use variables in single precision only. In addition, a significant speed-up is provided by localizing data in a high-speed cache memory. Portability of the parallel reconstruction algorithms with respect to different CPU and GPU architectures is supported by the special headers and vector classes, which have been developed for using SIMD instruction sets. The reconstruction quality is monitored at each stage in order to keep it at the same level as for the initial scalar versions of the algorithms. Different reconstruction methods, implemented in CBM, show different degrees of intrinsic parallelism, thus the speed-up varies up to few orders of magnitude. The speed-up factors for each stage of the algorithms parallelization are presented and discussed.
    
    Speaker: Dr Ivan Kisel (Gesellschaft fuer Schwerionen forschung mbH (GSI)-Unknown-Unknow)
    
    Slides
  - 67
    
    FATRAS – A Novel Fast Track Simulation Engine for the ATLAS Experiment
    
    Monte Carlo simulation of the detector response is an inevitable part of any kind of analysis which is performed with data from the LHC experiments. These simulated data sets are needed with large statistics and high precision level, which makes their production a CPU-cost intensive task. ATLAS has thus concentrated on optimizing both full and fast detector simulation techniques to achieve this goal within the computing limits of the collaboration. At the early stages of data-taking, in particular, it is necessary to reprocess the Monte Carlo event samples continuously, while integrating adaptations to the simulation modules to improve the agreement with the data taken from the detector itself. We present a new, fast track simulation engine which establishes a full Monte Carlo simulation which is based on modules and the geometry of the ATLAS standard track reconstruction application. This is combined with a fast parametric-response simulation of the Calorimeter. This approach shows a high level of agreement with full simulation, while achieving a relative timing gain of about 100. FATRAS was designed to provide a fast feedback cycle for tuning the MC simulation with real data: this includes the material distribution inside the detector, the integration of misalignment and conditions status, as well as calibration at the hit level. We present the concepts of the fast track simulation, although will concentrate mainly on the performance after integrating the feedback from first data taken with the ATLAS detector during the 2009-10 winter months.
    
    Speaker: Sebastian Fleischmann (U. Bonn)
    
    Slides
  - 68
    
    Visual Physics Analysis - Applications in High-Energy- and Astroparticle-Physics
    
    VISPA (Visual Physics Analysis) is a novel development environment to support physicists in prototyping, execution, and verification of data analysis of any complexity. The key idea of VISPA is developing physics analyses using a combination of graphical and textual programming. In VISPA, a multipurpose window provides visual tools to design and execute modular analyses, create analysis templates, and browse physics event data at different steps of an analysis. VISPA aims at supporting both experiment independent and experiment specific analysis steps. It is therefore designed as a portable analysis framework, supporting Linux, Windows and MacOS, with its own data format including physics objects and containers, thus allowing easy transport of analyses between different computers. All components of VISPA are designed for easy integration with experiment specific software to enable physics analysis within the same graphical tools. VISPA has proven to be an easy-to-use and flexible development environment in high energy physics as well as in astroparticle physics analyses. In this talk, we present applications of advanced physics analyses, and thereby explain the underlying software concepts.
    
    Speaker: Andreas Hinzmann (III. Physikalisches Institut A, RWTH Aachen University, Germany)
    
    Slides
- Friday, 26 February - Methodology of Computations in Theoretical Physics
  - 69
    
    IR subtraction schemes
    
    To compute jet cross sections at higher orders in QCD efficiently one has to deal with infrared divergences. These divergences cancel out between virtual and real corrections once the phase space integrals are performed. To use standard numerical integration methods like Monte Carlo the divergences' cancellation must be performed explicitly. Usually this is done constructing appropriate counterterms which are integrated over the unresolved region of the phase space. We will show some new approches to the infrared subtraction techniques for computing NNLO jet cross sections in QCD and the future possible phenomenological applications.
    
    Speaker: Dr Paolo Bolzoni (DESY)
    
    Slides
  - 70
    
    Feynman Integral Evaluation by a Sector decomposiTion Approach (FIESTA)
    
    Sector decomposition in its practical aspect is a constructive method used to evaluate Feynman integrals numerically. We present a new program performing the sector decomposition and integrating the expression afterwards. Also the program can be used in order to expand Feynman integrals automatically in limits of momenta and masses with the use of sector decompositions and Mellin--Barnes representations. The program is parallelizable on modern multicore computers and even to multiple computers. Also we demonstrate some new numerical results for four-loop massless propagator master integrals.
    
    Speaker: Mikhail Tentyukov (Karlsruhe University)
    
    Slides
  - 71
    
    Sector decomposition via computational geometry
    
    One of the powerful tools for evaluating multi-loop/leg integrals is sector decomposition, which can isolate infrared divergences from parametric representations of the integrals. The aim of this talk is to present a new method to replace iterated sector decomposition, in which the problems are converted into a set of problems in convex geometry, and then they can be solved by using algorithms in computational geometry. This method never falls into an infinite loop, and some examples show that it gives the relatively small number of generated sectors.
    
    Speaker: Dr Toshiaki KANEKO (KEK, Computing Research Center)
    
    Slides
  - 15:30
    
    Coffee Break
  - 72
    
    Multiple Polylogarithms and Loop Integrals
    
    An importance of the multiple-polylog function (MLP) for the calculation of loop integrals was pointed out by many authors. We give some general discussions between MLP and multi-loop integrals from view point of computer algebra.
    
    Speaker: Yoshimasa Kurihara (KEK)
    
    Slides
  - 73
    
    Two-Loop Fermionic Integrals in Perturbation Theory on a Lattice
    
    A comprehensive number of one-loop integrals in a theory with Wilson fermions at $r=1$ is computed using the Burgio--Caracciolo--Pelissetto algorithm. With the use of these results, the fermionic propagator in the coordinate representation is evaluated, making it possible to extend the Luscher-Weisz procedure for two-loop integrals to the fermionic case. Computations are performed with FORM and REDUCE packages.
    
    Speaker: Dr Roman Rogalyov (IHEP)
    
    Slides
  - 74
    
    Unstable-particles pair production in modified perturbation theory in NNLO
    
    We consider pair production and decay of fundamental unstable particles in the framework of a modified perturbation theory (MPT) which treats resonant contributions of unstable particles in the sense of distributions. The cross-section of the process is calculated within the NNLO of the MPT in a model that admits exact solution. Universal massless-particles contributions are taken into consideration. The calculations are carried out by means of FORTRAN code with double precision which ensures a per mille accuracy of the computations. A comparison of the outcomes with the exact solution demonstrates an excellent convergence of the MPT series at the energies close to and above the maximum of the cross-section. Near the maximum of the cross-section a discrepancy of the NNLO approximation makes up a few per mille.
    
    Speaker: Dr Maksim Nekrasov (Institute for High Energy Physics)
    
    Slides
- Friday, 26 February - Data Management Panel
  - 75
    
    Data Management Panel
    
    Speakers: Alberto Pace (CERN), Andrew Hanushevsky (Unknown), Beob Kyun Kim (KISTI), Dr Rene Brun (CERN), Tony Cass (CERN)
Saturday 27 February
- ACAT 2010 Summary
  - 76
    
    Computing Technology for Physics Research Summary
    
    Speaker: Axel Naumann (CERN)
    
    Slides
  - 77
    
    Data Analysis - Algorithms and Tools Summary
    
    Speaker: Dr Liliana Teodorescu (Brunel University)
    
    Slides
  - 10:20
    
    Coffee Break
  - 78
    
    Methodology of Computations in Theoretical Physics Summary
    
    Speaker: Peter Uwer (Humboldt-Universität zu Berlin)
    
    Slides
  - 79
    
    ACAT 2010 Summary