Hi Alexei, and all, I know I had promised to let you have some info last week on EI developments and needs but that's life... apologies. The EventIndex infrastructure needs to interact with the new ProdSys-II (I use this term inclusively for all of Deft, Jedi, Panda and other parts that may exist now or in future) for two actions: data collection and data usage. Here are a few personal thoughts and numbers to start a discussion. A) - EI Data collection Let's first of all assume that EI wants to catalogue all events in any format that are produced by any central and group production job. This means that every running job must send at the end a record for each PERMANENT output file with the list of events in the file and a few other parameters per event. [How many additional parameters may be useful or needed, is a different discussion, see appendix]. The record size may be between (say) 20 and 200 bytes/event. I measured last week that currently Panda runs about 265k jobs/day, with (on average over a week) 1150 events/job. [The distribution of events/job may vary with time as it is related to the ratio between simulation and reconstruction jobs, but it doesn't really matter here]. Obviously Panda and DDM right now are able to cope with the output files of all these jobs. The question for ProdSys-II developers is which is the best way to transfer the EI info from the WN where the job has run to a central location at CERN from where it will be put into Hadoop by a separate process. Remember that the size of this info could be in the range between 2 kB and 2 MB per job. It would also be useful to know what could be the constraints on EI info size/event put by different technologies. From the EI side, what is important is that: - the records reach CERN in real time at job completion - the system is robust and will never lose records By the way, as the IFAE/IFIC group has taken responsibility for this work package within the EI project, expect to receive questions on this topic from Jordi Nadal, Santiago Gonzalez et al. and to interact with them in the near future. B) Data usage EI data usage will be approximately equivalent to the current Event Tags, with the main use cases being event picking and skimming. The details need to be discussed further but I think the major infrastructure components already exist and may need only minor adaptations from the ProdSys-II side. One aspect that we did not address yet is the possible use of EI in the context of the storage federation, as a component of a single event server system. We can perhaps have this discussion later this Summer, if this is considered a useful functionality. Best wishes to all in Tokyo! Dario ______________________________________________________________ Appendix: What to store in the EventIndex The minimal information to be stored for the event picking use case consists of: a) run/event/triggerstream (i.e. the unique event identifier) b) GUID of the file containing the event If the future EDM will provide a fast way to address an event within a RAW and a ROOT file, the GUID is all that's needed. Otherwise some more complex object, like the current event reference in the TagDB, needs to be passed back to the EventIndex. Clearly in this case the complexity of the EI info record increases substantially: c) reference to the event within the file The TagDB is currently used also to count the number of events that pass different triggers, or combinations of triggers. There is no reason to lose this functionality, but the trigger info does not change with event processing, so it can be uploaded only the first time by Tier-0 (differently from the TagDB): d) trigger info for the event If we want the IE to be useful for event skimming beyond online trigger selection, it could be useful to add a few quantities considered important but derived from reconstruction. These could be counters (e.g. number of jets) or offline triggers (the event has a di-muon within a given mass range) that can be used later-on for skimming: e) reco variables and offline triggers How much info we will in the end store depends on real needs and technology constraints, which are the subject of this year's studies. ______________________________________________________________ Dr Dario Barberis CERN-PH Department Tel.: +41.22.767.1302 MailBox C29000 Fax.: +41.22.767.8350 CH-1211 Genève 23 (Switzerland) Office: 42-2-001 ______________________________________________________________