-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL support functionality notes #3
Comments
Hi, |
Actually, here I have GETURL finding a 'globus://' url:
Nevertheless, I am missing the step where this url is downloaded into the tmp_url which is supposed to be stored in .git/annex/tmp. |
For some reason the file type does not get propagated: Which should be: .git/annex/tmp/URL-s572--globus&c%%8ca92f91-39fb-4176-bcb-c3b94a808a2c79d140e7725fef792609.mat The right approach is to save the actual file content into this location, am I right? so the globus://path/to/file content should be saved in this location? If yes, what is the use of it? |
as we talked about during weekly jitsi meeting -- just save the file into the filename git annex requested (i.e. |
Hi, Thank you for your message, so I tried it, but it does return a Failure at the very end:
And the file content is put on my folder with the following name What do you think about this? |
That ExitFailure 1 at the very end means a process git-annex ran exited
nonzero, but it might be something that was expected to do so.
Did git-annex display any error message or exit nonzero?
I don't know what to make of that long filename, but if the file got
added to your git repository's working tree by git-annex addurl, it
seems to me that it might have succeeded.
…--
see shy jo
|
Hi, That is the only failure message, I posted everything it logged until the end, so no more information. It does download the file, but it places it in the local directory from where I launch the command, even if I pass the .git/annex/tmp/URL-tmp-key location I receive from annex, it does not like it. I think that, if a file extension is not provided of where to download the file (for example a .git/annex/tmp/URL-tmp-key.txt to download a globus .txt file in there), annex does not recognize it and it logs some fail and place the download in the current local directory I am in. Nevertheless, I should not modify the tmp key, it should be fine if the file extension is not provided, as discussed with @yarikoptic I am going to investigate further. This is the only missing step left, all other remote components are done. We can start thinking if we want to export data between endpoint, or some other functionality |
git-annex addurl <url>, when not provided a --file, does download the
url to the local directory, and adds that to the git repository.
This is normal operation as far as I can see, unless the git-annex
process exits with a code other than 0.
…--
see shy jo
|
Hi @yarikoptic, I am still concerned about the checkpresent operation which is a mandatory one, for this reason: The checkpresent(key) should return a True if the file corresponding to that key is transferred_stored, as we know with transfer_store(key, filename). The google-drive does that and in transfer_store it creates the file to be uploaded, it names it by the key and stores the content there so when he wants to retrieve it, it queries the file by the key when checkpresent(key) is called. In fact, he would then do _get_file(key) and check it exists. Now it is the case I cannot add anything to globus and the checkpresent is an independent call, so I cannot have a cache as it gets cleared at every call. I can ask globus the file, just like _get_keys but globus does not know what that key is and where the corresponding file is because we missed the transfer_store step. In terms of checking the size, yes, the key has the size in it but again, the missing information is where the file is corresponding to that key. This is why I did implement looking into git-annex:lower_hash/key.log.web which stores the path/to/file corresponding to that key. I think the best thing is, given the key, check in git annex branch where we have lower_hash/key.log.web. This file has the globus path that was added via addurl, so globus://id/path/to/file. so now I can ask the path to globus and check on the size and on if it is present. Nevertheless, addurl already checks there is a file in globus corresponding to that globus:// url because of claimurl and checkurl calls, and if it is the .web file is generated. Therefore I think I can check the size to make sure nothing has changed! What do you think? Let me know, thanks again! |
So no, what I will do in checkpresent(key) is to call a geturl with the key. If I do not get a globus:// url back it returns False, if I do, which is in the case I added the url, I check on the size of the file that corresponds to the one on my key to make sure nothing has changed and return True if successful. Would you agree with that? it is clean |
Here is a protocol from running `git annex addurl` on `s3://` url which is handled via datalad special remote:
So the workflow would be
CLAIMURL-SUCCES
for CLAIMURL those urls which start withglobus://[<globus-name>|<globus-uuid>]/<fileprefix>
where<globus-name>
(or<globus-uuid>
) and<fileprefix>
are options of the special remoteCHECKURL-CONTENTS Size|UNKNOWN Filename
response forCHECKURL
query by annex (for those matching URLs)TRANSFER RETRIEVE Key File
we could analyze the provided Key.URL-
) we could actually avoid using GETURLS (as we did in datalad) but just parse that key to extract the URL and corresponding path to be RETRIEVEDbut overall summary -- we should be able to make it work as a proper git annex external special remote with GET/PUT and EXPORT while also supporting regular
annex addurl globus://...
functionality (thusdatalad addurs
could be used to establish "import" of already existing directories on globus; until git annex provides protocol/support for proper "import").BUT I believe that git-annex might be the one which seems to "register url" (thus storing the ad-hoc globus:// url in git-annex branch in .web file for the key) for the key upon
addurl URL
. Ideally we should avoid that url being stored, but rather just store the path to the file (and version info) to that file assumingglobus://[<globus-name>|<globus-uuid>]/<fileprefix>
prefix. that would allow for more flexible management (e.g. rename of the globus endpoint, or renaming/movingfileprefix
), and minimize storage within git-annex branch. We might need to clarify that with @joeyh (Q: is it possible for special remote to announce that claimed url shouldn't be stored as a url for the file)The text was updated successfully, but these errors were encountered: