Attendees: WP1: Fab WP2: Leanne WP3: Antony WP4: Piotr WP5: Jens WP6: Cal WP7: Gareth WP8: Jeff WP9: Annalisa, Julian WP10: Johan WP12: Erwin SCG: - External: Lee, Sergio - Discussion of EGEE Erwin reported on the latest developments in EGEE. The EGEE exec committee has asked the ATF to provide input for question 2 of the reviewers: - slides produced and sent to Bob for EGEE exec committee - open bugs and issues - jeff explained the file access issue jeff will send a script; julian will send his conversation with Peter => action on Fab, Jens, and Leanne to look into it. - long running job usecase - akos sent a new diagram - Fab will check it. - Leanne still to read D9.3 - long running job (Fab): after 5-7 minutes network interruption job is considered to have failed and will be re-submitted. At this point the old job is killed by condorG - so it will be killed once the network comes up again. If the job does something during the network outage that cannot be prevented. => action on Fab: discuss with Condor/globus people if better error messages could be obtained from them => action on Fab: add a paragraph to the user guide mentioning the problem in the section discussing the re-submission. Architecture review of components to be added to 2.0: ----------------------------------------------------- - WP1: (slides on agenda page) add DagMan and Job partitioner DagMan can only deal with temporal dependencies (job start/end), no data dependencies or general events. the jdl is nested; each single job is separately specified inside the overall jdl - currently the single jobs have to be explicitly there, not just another jdl - that would be useful. nested DAGs are possible in principle - but needs testing; not sure whether it will be supported. parameterized job description (e.g. 'submit 50 jobs with the same executable and parameterized input data') is being studied; not sure whether it can be supported. - job partitioning: requires jobs to be checkpointable semantics of job partitioning not completely clear job needs to be specifically prepared to use that feature only applicable to specific use cases - not a general parallelization tool jobs with side-effects may not be partitionable - up to the user to check binding of input data to job steps not completely clear, it's definitely restricted. job checkpointing and partitioning document in WP1 edms - please read and discuss on the mailinglist. WP2: (no slides) Leanne explained that the only addition to 2.0 will be the full deployment of RLS - the replica manager will be modified to work with it. Applications should use the replica manager to find out where their data is and not RLI/LRC interfaces directly. If you use those you will have to follow the topology. The following things will not be deployed: file pre-fetch, collections, RSS, proxy services. All services will be fully integrated with VOMS Deployment: currently 1 LRC per VO. 2.x: 1 LRC per SE - should handle all VOs. 1 Tomcat container and 1 mysql instance - the VO LRCs can be separate services inside that. Leanne is currently working on such a setup. RLI deployment not yet clear - depends on size of testbed (1 per VO, 1 per country, 1 per task, ...) For each new RLI, the LRCs have to be configured to send updates to it. LRC updates are soft-state (using bloom filters) with a configurable time interval. The time-interval before new information is pushed to the RLI might be critical, in particular with DAGMan scheduling when a dependent job relies that the data has been produced by the other job. What's the policy for assigning LFNs to concurrent competing requests - first come, first served. LFNs are stored in the RMC. WP3: (slides on agenda page) mediator: --------- current: answer simple query from one table next week(?): joins over tables which are within one archiver (storing full tables) medium to long term: joins with archivers that don't archive whole tables hierarchies of archivers registry replication: --------------------- set of registry instances, geographically dispersed each registry will have all the information about all the producers and consumers within a VO. there is no 'master' registry. replication is invoked periodically (might be adjusted dynamically depending on the workload inside the system). WP4: (slides on agenda page) LCAS server: does not require root, use policy description language (PDL) plug-ins: - allowed users (grid-mapfile or allowed_users.db) - banned users (ban_user.db) - timeslots - voms authorization based on user certificate and job specification - what is job specification exactly used for? there are open questions between WP4 and WP1 => action on Piotr and Fab to work that out LCMAPS - provides local credentials for jobs - UNIX credentials, AFS tokens, Krb5 backwards compatible to existing systems (gridmapfile, k5cert) needs to run in privileged mode has to run in process space of incoming connections RMS Monitoring and Fault Tolerance WP5: (slides on agenda page) high priority issues: - migration tool - asynchronous requests - SRM v1 interface - additional SRM functionality: exists, delete - depend on disk cache mgmt if done properly - collaboration with WP9&10 (WP10 issues have been discussed in a break-out session) - support for edg 2.0 - documentation Jens also explained what will not be in the SE as currently planned. Use case changes: => action on Lee: put diagrams to CVS WP1: interaction of job-wrapper with local logger needs to be added WP2: RLS changes are internal attribute type needs to be specified now as well if a new attribute is being created WP3: check R-GMA (gin, gout) WP4: all internal WP5: check asynchronous calls update to SRM notation Baseline API: Leanne received all interface rpms needed. New version hopefully end of June. ETT Discussion: --------------- Cal summarized his document (see text attached to agenda) Jeff presented work done by a master student in Leiden (see slides at agenda): in addition to what is listed on the slides it's also important to take different priorities assigned to VOs at specific sites. Proposes to use the Maui simulator - but there are concerns about the runtime for doing that. Uses statistics of information in the LRMS on historic jobs. With R-GMA the script could be run in a canonical producer once it is queried -- might take too long. Better to run the script in fixed time intervals and publish the results in the info system. Julian: maybe we should think of different architectural approach: e.g. CE is actively polling for new jobs at brokers. I.e. the broker is managing the queues of the CEs. Question to Sergio: how can ETT be published per VO in glue? ETT could be multivalue - then the VO needs to be encoded in the value and the broker (actually not the broker - but the user would have to specify an appropriate expression in the JDL) would have to parse it. Conclusions: - complete architecture change (as discussed by Julian) is not taken into account for the moment. - the simulator developed at NIKHEF looks promising and is worth further testing. Main concerns are: runtime, load on CE, support for all required batch systems. Jeff to report on these issues. - computation at CE (e.g. via canonical producer) for every job seems to be too computational intensive - better have a daemon that periodically produces the data and publishes it into the info system. - Not yet clear how to publish it in Glue - multivalue attribute seems to be too complicated. Various possibilities should be proposed to broker developers of WP1 to get their feedback. => Action on Fab&Sergio: discuss with broker developers. Outbound connectivity: ---------------------- WP1: - job manager contacts the WMS to update the status: job-wrapper logs to a local logger on the CE which pushes the information forward to the LB - so no problem. - transfer of input/outputsandbox: this currently requires outbound connectivity but with GRAM 1.6 (included in GT2.2, CondorG v6.5) files could be staged the same way as the executable - this should solve the problem. The features are already distributed in the current EDG distribution but not turned on - needs testing. - interactive jobs: this requires outbound connectivity - not clear how this could be prevented. Would be interesting to get the opinion from Condor people on this issue. => Action on Fab: ask condor people. WP2: - service discovery: replica manager needs to contact IS. Discussed in WP3 section - output data registration: replica manager client needs: - registration in LRC - OK in 2.1, but not in 2.0 - registration of metadata and LFN - needs access to RMC - not ok. - NAT would solve it, or - needs RMC proxy service, probably on CE, would probably need performance tuning - data lookup: replica manager client: - needs to contact RMC to resolve LFN to GUIDOs - not ok. - needs to contact RLI to resolve GUIDOs to LRCs: - scenario 1: one RLI per LRC - OK. - scenario 2: sites without RLIs exist - not ok. - NAT, or - RLS proxy service - needs to contact all (or subset of) LRCs to find out PFNs - scenario 1: only need local information - go to local LRC - OK - scenario 2: want to have all replicas - not ok. - NAT, or - RLS proxy service will handle it. - replicate data to site: - replicate to local SE; - 3rd party gridFTP - not ok - NAT, or - SRMcopy - replicate data from site: - store in local SE (either directly, or copy there from WN) - OK; registration see above - copy to another SE - see replicate above - SRMcopy: - control goes to source SE - copy file out - OK - copy file in - not ok - need to use SRMprepareToGet instead and the destination SE will try to get the data. WP3: - R-GMA server at each sites - all connections go via it - so it's OK - if the server goes down we loose all clients - fault tolerance means could contact another server - if they are on the same site - OK - otherwise a problem. Glue Discussion (Sergio) (see slides at agenda page): Important: upcoming ATF document should be synchronized with GLUE to have a common terminology. Sergio first gave a short overview presentation on GLUE. There is currently no mechanism of ensuring consistency among different implementations of the reference UML model. Then the CERN setup problem was discussed: have 1 batch system and multiple gatekeepers to point to it since the gatekeeper doesn't scale as good as the batch system. VDT would like to integrate the ldap schema and EDG info providers; that's fine from an EDG point of view but the packaging must be in such a way that EDG could take VDT minus info providers since EDG will most probably be ahead of VDT in that. Sergio will find out with Alain Roy. Sergio suggested a procedure for schema modifications: - write short document with proposed modifications and rationale - send document to mailing-list - set up phone conference - proposal should come from a project rather than individuals WP3 should act as permanent contact point to glue taking part in these discussions and forwarding change request to concerned WP and ATF. Needs further discussion inside EDG. Things that are not of general interest, but only specific to a certain domain, typically don't go into the schema but end up in extensions. Extensions are dangerous since they prevent interoperability. EDG should work to bring its current extensions into the glue schema. => We need somebody responsible for EDG extensions!!! Discussion on Service proposal: it's not clear where the protocol a service speaks is specified. Needs clarifications. Sergio would like to get some examples. Cal will provide past email exchange with Steve Fisher to Sergio. Authorization information: Sergio pointed out that the current scheme proposed has problems since no group information is published. WP1/4/SCG need to work out a suitable format. Are there attempts inside GLUE to assign exact semantics to the fields in the schema? - Currently not. Is the schema capable of expressing whether there is POSIX access between an CE and SE - yes, via the access point in the bind table. Architecture document: ---------------------- It is agreed that such a document would be useful. It's structure should be similar to the one compiled by Lee for the 2nd review. If possible, a journal publication should be extracted out of this document. The decomposition should be component based not WP wise. It should be finished by the time of the final review. It will be written in LaTeX and stored in the ATF CVS area. Things to be done: - Editor - Lee proposed. (timeframe: 1 week) - Decide on a TOC - will start that via email (~2 weeks, start now based on TOC of two documents mentioned) - Assign sections to people - (worry about list of authors) - first draft by time of Heidelberg conference? at least final TOC, short description of section contents, people assigned. probably most of the work will be done in January. We will need plenary ATF meetings to go over the individual contribution and harmonize them. Next ATF: Heidelberg Conference: 26th Sep. afternoon - 1st Oct. noon. Tentative ATF: Saturday 27th Sep. (1/2 day)