Analysis Requirements Jamboree

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Andrea Rizzi (INFN Sezione di Pisa, Universita' e Scuola Normale Superiore, P), Danilo Piparo (CERN), Paul James Laycock (Brookhaven National Laboratory (US))
Description

Sign up for the group's list: https://groups.google.com/forum/#!aboutgroup/hsf-analysis-wg

See below for the Vidyo coordinates

During this Jamboree we will hear about the concrete analysis steps that lead to publications in order to understand how they were developed and implemented and to identify commonalities among experiments.

The suggested list of aspects to be covered during the talks is the following:

  • Analysis steps and evolution: which kind of main tasks and operations do you perform on data? How this change from day zero of preliminary analysis studies to last day before publication? How do you deal with systematics?

  • Sketch of analysis workflow: An overall sketch of the complete analysis flow, even via a cartoon. On what datasets does it start (group reduced ntuples, central datasets)? Where and what does run on them (experiment framework, own program, on a university cluster, on the Grid)? What is the output (histograms, reduced ntuples) and how is it processed (ROOT macro/program, PyROOT script, own analysis framework)?

  • Analysis Interface: The method through which you actually execute the analysis, i.e. the analysis interface of your choice. Multiple options are of course possible and example interfaces are scripts, compiled programs dynamically compiled, jupyter notebooks, graphical user interfaces… The interface can of course depend on the step of the analysis being considered.

  • Scaling: The way in which you achieve a competitive turn-around time, i.e. how you make the analysis procedure scale well with the input data size increase. For example, do you exploit mass processing resources such as batch clusters or the Grid and, if yes, do you see any shortcoming with this approach? Do you feel that more interactive approaches could boost your productivity? The answer can of course depend on the step of the analysis being considered.

  • Reusability: Specific software developed explicitly for some analysis or group of analyses. If any analysis specific software has been developed in your case, do you think that the effort which was spent in developing such “software setup” was sizeable? If yes, do you think there could be opportunities to share pieces of it with others, or at least knowledge about it, which could make the creation of such setups less onerous in the future?

  • Missing functionality: Among the operations you needed to carry out, some might have been more difficult than others with the current set of software tools. What set of tools did you feel could need improvement? Can you also describe how?

  • Preservation and sharing: The issue of long term preservation of analyses as well as its very short term incarnation, the sharing, is a concern. What steps did you take to make sure your analysis procedure was shareable among your colleagues? And to ensure long term reproducibility?