EMI Execution Service F2F

313 (Padova)




The purpose of the meeting is to progress with the EMI Execution Service interface specification.


Monday: Scope, Architecture, Data Staging

Tueday: Delegation, State model

Wednesday: Operations, Task assignment, next steps

Postponed for next meeting: Job description, Resource Information, Activity Information

Monday July 26, 2010 ==================== Attending: Alvise Dorigo Eric Frizziero Aleksandr Konstantinov Balazs Konya Shahbaz Memon Massimo Sgaravatto Martin Skou Andersen Luigi Zangrando Discussions and agreements on WHO, WHY, WHEN, HOW. WHO The definition of the standard interface for the EMI execution service is one of the most important priority of the project Therefore the relevant people should be committed at least for 50% of their time in the definition of such EMI ES specification The main relevant persons part of this process for the definition of the specification are: ARC: Aleksandr, Balazs (not full time, not for all technical details), Martin gLite: Luigi, Massimo UNICORE: Bernd, Shahbaz (Christoph can't participate in such activities anymore) Since LB people are planning to use the EMI specification, we will need to show them the specification, so they will need to be involved at some time to check what (not needed now) The delegation part of the specification must be also submitted to the security people. WHY We need a common interface for describing, submitting and managing jobs. We need therefore to produce a high level techninal document with such specification, from where it must be possible to start the implementations WHEN The specification must be ready by October (there is a milestone) But the work should be finalized even before (e.g. it would be good to present it at OGF that will take place in Brussels at the end of October) Luigi notes that the XML rendering for Glue2 specification is a prerequisite. Balasz reports that it will be finalized by September at latest. HOW Besides these activities within EMI, there are also the activities within the OGF-PGI WG, which should go in parallel. The idea is to be able to push our EMI specification to the OGF-PGI WG when ready. After this Padova F2F meeting, it is agreed that EMI ES specification activities will continue in phoneconferences (2 per weeks), to be held on Mondays and Thursdays (from 09.30 to 11.30) starting from September. The Lund phone conf. system (accessible via skype) will be used. This EMI ES specification activities should be considered as part of the JRA1 standardization task. The emi-jra1-standard@eu-emi.eu mailing list will be used The document (also the draft versions) and all the other relevant stuff (meeting notes, etc.) will be made available in the EMI wiki (under the JRA1 standardization task area: https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1T6Standardization) It is agreed to always use the "EMI Execution Service (ES)" terminology (e.g. let us not use "AGU" anymore) It is necessary to identify a person responsible for leading the process (calling meetings, etc.) and being the editor of the document. Bernd will be asked if he can take such responsability Scope ----- Discussion about the scope of our specification activities. It is agreed that the specification should be relevant for computing elements, while higher level job management services (e.g. the gLite WMS) are out of scope. The following items are in-scope and should therefore be tackled by the specification that we have to define: - Interface to manage jobs - job info (status, detailed info, query) - Resource information (query) - Delegation (to satisfy data staging) - State model - Data staging - Job description (“EMI-JSDL”) Architecture ------------ The architecture presented in v0.4 of the document referred to a monolithic setup with 3 port-types all provided by the same service: - Execution (Create, Change, Cancel, Wipe, GetActivityStatus, GetActivityInfo,), - Info (QueryResource, QueryActivity), - Delegation It is agreed to have instead a new modular setup, with 2 "modules": - Activity-Factory - Activity-Manager Activity-Factory is used to create jobs and manage resource (CE) information Port-types: - Create port: CreateActivity - ResourceInfo port: QueryResourceInfo - Delegation port Activity-Manager used to manage jobs and get job info Port-types: - ActivityManagement port: Change, Cancel, Wipe, GetActivityStatus by ID, GetActivityInfo by ID - ActivityInfo port: QueryActivityInfo, GetActivityStatus by ID, GetActivityInfo by ID - Delegation port QueryResourceInfo must report only CE information. It must not report at all about jobs, not even the list of job IDs: the Factory doesn't know anything about the list of active jobs. QueryResourceInfo must also report about the end point(s) of the "associated" Activity Manager(s). It must be further discussed if GetActivityStatus and GetActivityInfo must be part of both ActivityManagement and ActivityInfo ports Data staging ------------ Discussions about the data staging. The text circulated via mail by Aleksandr some weeks ago are used as input in the discussion. Agreed on the concepts of stage-in, session and stage-out directories Agreed on the possible stage-in/stage-out models All the outcomes and agreements of such discussions are presented in the "Data staging functionality" section (1.2) of the document v.0.7. Still to be decided if and how collections of files should be supported The discussion about “protocol-specific extra variables” will be done when discussing the EMI-JSDL Tuesday July 27, 2010 ==================== Attending: Paolo Andreetto (just for the discussions related to delegation) Alvise Dorigo Eric Frizziero Aleksandr Konstantinov Balazs Konya Shahbaz Memon Massimo Sgaravatto Martin Skou Andersen Luigi Zangrando Delegation ---------- It is agreed that scope is only x509 token delegation. SAML token is reserved for future versions of the specification It is agreed that only 3820 proxies are allowed. Any extension is allowed. It is agreed that there are two possible scenarios for x509 delegation. - X509 tokens can come both directly from the client and transferred to the ES. This is a scenario that must be supported - x509 tokens can be fetched from a credential service (such as myproxy, SLCS). This is a scenario whose implementation is optional The type of supported credential services must be published as part of resource description (ES capability) See "Delegation" section (1.3) of the document v. 0.7 for more details on the outcomes of the discussions. Defined the details of the Delegation Port-Type to support the first scenario (see section 7 "Interface: Delegation Port-Type" of the document v. 0.7) Still to be decided about the association of credentials to data staging elements (to be discussed when discussing the EMI-JSDL). Aleksandr's proposes to allow specifying a delegationid per file, per staging element, per job. Activity State model -------------------- Discussions and agreement on state model State model consists of main states and secondary states. A job can only be in one main state but can have multiple secondary states. Outcomes (i.e. states that have been defined) in "State definitions" section (8.1) of the document v. 0.7. To be decided when discussing the wipe operation about the purged state. Discussions and agreeement on the possible state model transitions Outcomes in "State transitions" section (8.2) of the document v. 0.7 and in what was written in the whiteboard (pictures taken by Aleksandr to be reported in section 8.2) The implementations of the "failure recovery transitions" are optional (i.e. some services can be able to support some recoveries) Wednesday July 28, 2010 ======================= Attending: Alvise Dorigo Eric Frizziero Balazs Konya Shahbaz Memon Massimo Sgaravatto Martin Skou Andersen Luigi Zangrando Operations ---------- Thorough discussion of the CreateActivity operation Outcomes reported in section "CreateActivity operation" 2.1 of the document v. 0.7 Superficial discussions of all the other operations (outcomes of such discussions reported in the document v. 0.7) Some items still to be discussed: - why a cancel operation is needed (why the changeactivity can't be used instead) - do we need to provide operations to suspend and then resume jobs in the lrms ? Version 0.7 of the document produced and uploaded in the wiki (https://twiki.cern.ch/twiki/bin/view/EMI/EmiJra1T6Standardization) Text marked in yellows refer to items which need to be (further) discussed or need to be rephrased to make them consistent with the rest of the document. Sections whose name is marked in yellow haven't been discussed. Next actions ------------ - By next wed (Aug 4) Massimo will produce a v. 0.8 document doing the needed cleaning, make the whole document consistent and readable also for people not attending the meeting It will have to include also the white board snapshots (to be sent to Massimo by friday), in particular the following ones: - architecture (Shabbaz) - data staging figure (Alek) - 3 directories (Alek) - delegation figure (Alek) - state transition figure (Alek) - Luigi and Balazs will provide (by friday) the drafts/scheletons of the wsdl and jsdl schemas (to be included in the appendix of the document) - Discuss in the sept's phoneconfs the remaining set of operations including the HOLD/RESUME problem area - Find a person responsible to drive the process and to be the editor of the document (to be confirmed if this role can be taken by Bernd) - The first phoneconf (scheduled for September 2) will be planned by such person (or by Balazs if some person is not appointed before) - schedule a F2F meeting (two full days and a third half day) in / near Copenhagen. Date: Tuesday 21 September - Thursday 23 September (possible to shift one day and start on 22 and finish on 24) To be finalized by the end of the week (Balazs will drive the process)
There are minutes attached to this event. Show them.
The agenda of this meeting is empty