Attendees: O. Barring, S. De Witt, F. Donno, G. Lo Presti, G. Oleynik, T. Perelmutov
The meeting was chaired by Flavia.
The meeting focused on the "File exist" problem seen while transferring files via FTS. The problem is due to the fact that after a file/SURL is created via an srmPrepareToPut request and the transfer has started, an SRM system can become overloaded or go into bad state. The FTS gives up the transfer after a few retries. The FTS issues an srmAbort + srmRm that somehow do not reach the SRM. Therefore, the original SURL is not removed. When, later on, at the experiment application layer the transfer is retried, the FTS attempt fails with the well known error: "File exist". The situation can be recovered "manually" by an explicit file removal and transfer retry. The scenario described seems to happen quite frequently.
To make this issue less "severe", it was proposed that the experiment application layer enables the overwrite flag on srmPrepareToPut using the "-o" FTS option when the transfer is attempted for the second time.
The overwrite flag is part of the WLCG SRM v2.2 original MoU:
http://cd-docdb.fnal.gov/0015/001583/001/SRMLCG-MoU-day2%5B1%5D.pdf
(see paragraph 2.8 and section 2.8.1 - overwrite flag; and the last point of paragraph 2.8.3)
The SRM_FILE_BUSY is a return code that can be returned only in case the overwrite flag
is enabled on a second srmPrepareToPut request for a SURL for which the first srmPrepareToPut request has not yet been followed by a PutDone request
(see the SRM spec: http://sdm.lbl.gov/srm-wg/doc/SRM.v2.2.html#_Toc163229759 for an explanation of the SRM_FILE_BUSY and SRM_DUPLICATION_ERROR status code at file level).
Because the difference between returning FILE_BUSY and DUPLICATION_ERROR at file level was not clear (overwrite vs no overwrite), during the January 2007 WLCG workshop it was decided that in case the overwrite flag was enabled on the second srmPrepareToPut request, the SRM server would have aborted the first srmPrepareToPut request and honored the second one: check point 2 of
https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesSolvedDuringTheWLCGWorkshop22-26Jan2007
Today, it was recognized that it has some advantages to have the Abort+Rm executed automatically on the server side since in this way the operations would be triggered with no SRM/GSI overhead (that comes when a client makes such requests). This would reduce the chances of failure of a second transfer attempt.
At the moment only DPM implements the proposed solution. CASTOR could provide an implementation once SRM v1 is decommissioned. In fact SRM v1 forces the support for "SRM v1"-TURLs which do not allow the necessary encoding of the "request/user id". Therefore, CASTOR could provide an implementation of the wanted functionality by October 2008.
dCache at the moment has some difficulty in implementing the proposed solution. The dCache developers will discuss internally the matter and get back to Flavia by this coming Friday.
As an alternative it was suggested that, when an srmPrepareToPut/overwrite request fails, the FTS could issue an srmAbort + srmRm and then retry immediately.
(Would the srmRm bulk deletion problem then come to play a role here ? - It is not clear that this could alleviate the problem seen)
Flavia will circulate to this list a very drafty version of the SRM v2.2 addendum to the WLCG MoU by Wednesday, April 9th 2008. The developers can send to Flavia their first comments by Friday, April 11th. The document will be discussed and agreed by the developers in the week of the 14th of April. The developers will just sign off the document from a technical/feasibility point of view. In the week of the 21st of April, the experiments will comment on the document. After that, the document will be taken to the MB for blessing. The implementation plan will then start.
Next meeting will be on Monday the 14th of April followed by daily meetings if needed.
The week of the 21st of April is the week of the WLCG workshop:
http://indico.cern.ch/conferenceDisplay.py?confId=6552
Flavia
From Timur:
==========
Hi Flavia
I argued that the decision to use the overwrite flag in the prepareToPut request in case of the pending previous prepareToPut request is wrong and this may lead to a confusion on the part of the client. Let me lay out the arguments against this behavior again for the record.
1. This behavior is not specified in SRM Specification. This behavior overloads the meaning of the overwrite flag, which is meant to indicate that overwrite of existing file is desired, not the interruption of the current transfers into the same file.
1. One of the basic principals of the SRM is that it makes every effort not to lie about the available resources and once it commits the resources, it makes every effort to honor that commitment. If we allow one request to effectively interrupt the one in progress, we are violating this principle. SRM promised to accept a file, SRM gave a TURL and now, through the action of a different user or a job, these promises are broken.
3. If one request succeed and the second one succeeds by preempting the first one, it might be not clear which particular client is actually transferring the data, as they both might have the same TURL. So at the end of the transfer when putDone is executed, it is not clear whose data is in the system and who should receive the error code. The only way to guarantee that we know which request is actually doing the transfer is by using the request specific TURL that has to be unique even thought the target is the same file. This is again not the part of the SRM Specification and would be a new requirement. This would require a significant modification in case of dCache and possibly Castor and would drain the resources from other more pressing areas of improvement.
I proposed instead to modify the clients that want this sort of behavior on retries to actually execute the explicit Abort, and only then to try to execute prepareToPut. I believe Shaun agreed with me that in case of successful abort, no need to issue the rm is needed, before a retry of the transfer. I had a false impression that at the end of the meeting most agreed that this was a better solution.
Thanks,
Timur
Flavia's answer:
============
Timur and all,
Timur Perelmutov wrote:
>
> 1. This behavior is not specified in SRM Specification. This behavior overloads the meaning of >the overwrite flag, which is meant to indicate that overwrite of existing file is desired, not the >interruption of the current transfers into the same file.
That's why we discussed it and agreed on at the WLCG Workshop in January 2007:
https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesSolvedDuringTheWLCGWorkshop22-26Jan2007 (point 2)
> 1. One of the basic principals of the SRM is that it makes every effort not to lie about the >available resources and once it commits the resources, it makes every effort to honor that >commitment. If we allow one request to effectively interrupt the one in progress, we are >violating this principle. SRM promised to accept a file, SRM gave a TURL and now, through the >action of a different user or a job, these promises are broken.
We always stressed that the behavior of SRM should be as close as possible to the one of a Unix operating/file system.
If a user does not want that his/her files are overwritten, he/she can always protect them removing write permission from group and others. If other users (or the same one) have permissions to write the file, they can as well overwrite it, just as it happens in Unix. The system gives you a handle that is valid, as long as others (or yourself) do not enforce their rights on the file.
> 3. If one request succeed and the second one succeeds by preempting the first one, it might >be not clear which particular client is actually transferring the data, as they both might have the >same TURL. So at the end of the transfer when putDone is executed, it is not clear whose data >is in the system and who should receive the error code. The only way to guarantee that we >know which request is actually doing the transfer is by using the request specific TURL that has >to be unique even thought the target is the same file.
TURLs have a pin lifetime in general. For the same file you can have multiple TURLs (for reading as well) that can expire at different time. This is part of the spec. Castor is anyway making the TURL unique for other internal needs. It happens that this change satisfies also this possible requirement.
> This is again not the part of the SRM Specification and would be a new requirement.
In my opinion, the fact that the TURL is unique or not for the same file could be an implementation detail and therefore it is not part of the SRM spec.
> This would require a significant modification in case of dCache and possibly Castor and would >drain the resources from other more pressing areas of improvement.
I understand that this change might require a significant modification in dCache and probably "drain resources from other more pressing areas of improvement". Therefore, we decided to wait for the outcome of your internal discussion before coming up with a decision. I understood that there could be an easier implementation strategy ... as it has happened other times when we discussed "pressing issues" in dCache ;-)
> I proposed instead to modify the clients that want this sort of behavior on retries to actually >execute the explicit Abort, and only then to try to execute prepareToPut. I believe Shaun >agreed with me that in case of successful abort, no need to issue the rm is needed, before a >retry of the transfer.
Well, as it was said during our meeting, the invocation of an explicit srmAbort(Request/Files) from the client takes already place and it does not go through in the case the storage services are "busy". Furthermore, this will create even more SRM "traffic". The srmRm is necessary in case the transfer was indeed complete but an srmPutDone could not be issued (the client has not received an ack from gridftp and therefore the status of the file is unknown).
Recently, ATLAS has proved that a bulk srmRm request could also be problematic in some cases for dCache and can impose quite a load on PNFS.
> I had a false impression that at the end of the meeting most agreed that this was a better >solution.
My impression of the conclusions of our meeting was that we would have waited for your internal investigation since for the other implementations it did not look like the change required would have implied a major rework of the code.
Of course, in the case that for dCache this change would not be really feasible, we would have to change the clients somehow and even there we would have to agree with you on the best strategy to follow.
Please, keep in mind that we all do understand your difficulty in accommodating this request especially given the budget cuts at FNAL. Therefore, I believe we are all open to find a solution that can make everybody happy. Given the SRM effort, I would really hope that this solution would be homogeneous, in terms of functionality, across all implementations.
Thank you very much for your useful comments, Timur.
Flavia
Conclusions:
==========
After a few discussions with the dCache developement team we decided that:
1. dCache will try to "optimize" internally the handling of abort and rm operations. S2 specific tests will be created to test the optimization.
2. All implementations will return SRM_FILE_BUSY at file level if a second srmPrepareToPut with
overwrite flag enabled is issued before an srmPutDone is issued on the first request for the same SURL.
3. When FTS is requested to perform a put operation with the overwrite flag enabled, if srmPrepareToPut fails, FTS will try to remove the SURL and then retry the operation. All this has to be carefully tested to make sure no more problems are introduced.