-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial for its usage #23
Comments
|
bash execute.sh is the same level as the download folder. so i should be inside my download folder? you mean i can simply get into my download folder and run it? |
No. You should be outside of the download folder. The current directory should look like this:
Then run bash execute.sh. The download process should work from there. Just tested the download feature and it works for me. |
to make my confusion clear i have to copy each of your folder the way you have made ..then only i can run..i was thinking that i would simply run execute.sh and it will work.. |
Right the intended goal here is for execute.sh to handle everything. Did you clone the repository or just download the individual file? Forgive my confusion, but I'm not sure how things are set up on your end. |
"Did you clone the repository" I will clone it ...and will update you..thank you for clarifying |
No problem. Please me know if you run into any other issues. |
I cloned it its running ..how much would be the download size? I would like to know and once i done it do i have to run every now and then? |
The file should be about 18+GB. I say 18+ because Pubtator Central updates their server monthly; therefore, your downloaded file should be at least 18GB to be correct. |
Thank you for the information once its done I will be back with question to bug you again. With regards |
the bash script is running like 33 hours as of now is it expanding or what exactly is going on? I would be glad to know. So as im not sure so I haven't terminated the process. I would be glad if you can tell me |
The 33 hour process is my pipeline converting pubtator central's annotations into xml format to be processed later. It is a large file that can take up to a few days to fully process. No other solution here but to wait until all the pieces have been completed. |
Unfortunately the machine was restarted it seems i have to do it again or it can run from where it was there last? |
The older version of the code required you to start from scratch. The newly updated version allows you to start from anywhere in the pipeline. I highly recommend using the newly upgraded version/read the docs for it. It could make your life easier when restarting the parsers. |
Is this an error or something else do let me know ...im not sure |
This error was generated because Pubtator Central's server sent back an error code. I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look. |
". I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look." i simply ran this
shall i run this again? |
No. Don't do that run this command: python scripts/download_full_text.py \
--input data/pubtator-pmids-to-pmcids.tsv \
--document_batch 100000 \
--output data/pubtator-central-full-text.xml If you run |
thank you for the immediate help this i got after running the above code sorry for asking these fundamental doubts ..since I use R almost so Im not sure about te errors
I did make a new folder its running
|
The error i received after running the above
|
Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api. |
"Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api." okay i will try small numbers |
Please do have a look I did see the folder i do see xml files around 553 mb a total of 38 files |
For ease of debugging please upload this file: data/pubtator-pmids-to-pmcids.tsv. I'll need it so I can see whats causing the issue. |
sorry for the late reply im doing it now..i will share the link since its more than 10mb https://drive.google.com/file/d/1G-6ehkeR_V8IhqiBryCMVe1jGc9GPB8Y/view?usp=sharing |
Hello sir ..I would be glad to know what was going wrong on my side ... |
Hi @krushnach80 - you have encountered a research project that is in progress but on someone's back burner at the moment. It sounds like you might be better served by directly interacting with the pubtator API or similar if you need faster responses in this case: https://www.ncbi.nlm.nih.gov/research/pubtator/ |
thank you sir ..i found something which would be easier for me ..https://cran.rstudio.com/web/packages/pubtatordb/vignettes/pubtatordb.html but i would love to use your tool as well |
Can you put a tutorial for its usage I do see the reporsitory but Im getting confused what Im supposed to run the web version of pubtator is straight forward where I have to just put pmids it returns back the result . I would be glad if you can put a tutorial
I ran this
but this exist here "https://github.com/greenelab/pubtator/blob/master/download/bioconcepts2pubtator_offsets.gz.log"
Im not sure what Im doing wrong
The text was updated successfully, but these errors were encountered: