Tutorial for its usage #23

kcmtest · 2020-08-02T16:18:32Z

Can you put a tutorial for its usage I do see the reporsitory but Im getting confused what Im supposed to run the web version of pubtator is straight forward where I have to just put pmids it returns back the result . I would be glad if you can put a tutorial

I ran this

bash execute.sh
wget: download/bioconcepts2pubtatorcentral_offset.gz.log: No such file or directory

but this exist here "https://github.com/greenelab/pubtator/blob/master/download/bioconcepts2pubtator_offsets.gz.log"

Im not sure what Im doing wrong

The text was updated successfully, but these errors were encountered:

danich1 · 2020-08-03T14:26:09Z

execute.sh cannot find the download folder which is why you are getting the No such file or directory error. Please make sure you are running bash execute.sh is the same level as the download folder.

kcmtest · 2020-08-03T14:36:52Z

bash execute.sh is the same level as the download folder. so i should be inside my download folder? you mean i can simply get into my download folder and run it?

danich1 · 2020-08-03T14:45:12Z

bash execute.sh is the same level as the download folder. so i should be inside my download folder? you mean i can simply get into my download folder and run it?

No. You should be outside of the download folder. The current directory should look like this:

data/
download/
mapper/
scripts/
execute.sh
... (other files)

Then run bash execute.sh. The download process should work from there. Just tested the download feature and it works for me.

kcmtest · 2020-08-03T15:33:07Z

to make my confusion clear i have to copy each of your folder the way you have made ..then only i can run..i was thinking that i would simply run execute.sh and it will work..

danich1 · 2020-08-03T15:58:05Z

i was thinking that i would simply run execute.sh and it will work..

Right the intended goal here is for execute.sh to handle everything. Did you clone the repository or just download the individual file? Forgive my confusion, but I'm not sure how things are set up on your end.

kcmtest · 2020-08-03T16:30:50Z

"Did you clone the repository" I will clone it ...and will update you..thank you for clarifying

danich1 · 2020-08-03T16:56:02Z

"Did you clone the repository" I will clone it ...and will update you..thank you for clarifying

No problem. Please me know if you run into any other issues.

kcmtest · 2020-08-03T18:47:32Z

I cloned it its running ..how much would be the download size? I would like to know and once i done it do i have to run every now and then?

danich1 · 2020-08-03T18:56:36Z

I cloned it its running ..how much would be the download size? I would like to know and once i done it do i have to run every now and then?

The file should be about 18+GB. I say 18+ because Pubtator Central updates their server monthly; therefore, your downloaded file should be at least 18GB to be correct.

kcmtest · 2020-08-03T19:06:10Z

Thank you for the information once its done I will be back with question to bug you again.

With regards

kcmtest · 2020-08-03T20:54:46Z

I will have to read me file properly before i come back to you. I will run the test example present first.
The download is finished i think but last couple of hours this is running not sure what it is
its not downloading anything I guess but what it is?

kcmtest · 2020-08-05T06:46:42Z

the bash script is running like 33 hours as of now is it expanding or what exactly is going on? I would be glad to know. So as im not sure so I haven't terminated the process. I would be glad if you can tell me

danich1 · 2020-08-05T19:15:18Z

the bash script is running like 33 hours as of now is it expanding or what exactly is going on? I would be glad to know. So as im not sure so I haven't terminated the process. I would be glad if you can tell me

The 33 hour process is my pipeline converting pubtator central's annotations into xml format to be processed later. It is a large file that can take up to a few days to fully process. No other solution here but to wait until all the pieces have been completed.

kcmtest · 2020-08-06T07:14:54Z

Unfortunately the machine was restarted it seems i have to do it again or it can run from where it was there last?

danich1 · 2020-08-06T15:18:57Z

Unfortunately the machine was restarted it seems i have to do it again or it can run from where it was there last?

The older version of the code required you to start from scratch. The newly updated version allows you to start from anywhere in the pipeline. I highly recommend using the newly upgraded version/read the docs for it. It could make your life easier when restarting the parsers.

kcmtest · 2020-08-10T15:50:00Z

30988903it [58:10:58, 147.95it/s] 
30988894it [11:25:20, 753.61it/s]  
1097it [2:10:37,  7.14s/it]
sys:1: DtypeWarning: Columns (4,10) have mixed types. Specify dtype option on import or set low_memory=False.
1097it [1:44:13,  5.70s/it]
274it [10:05:12, 132.53s/it]
Traceback (most recent call last):
  File "scripts/download_full_text.py", line 124, in <module>
    download_full_text(args.input, args.document_batch, args.temp_dir)
  File "scripts/download_full_text.py", line 58, in download_full_text
    response = call_api(query)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
    return func(*args, **kargs)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
    return func(*args, **kargs)
  File "scripts/download_full_text.py", line 21, in call_api
    raise Exception(response.text)
Exception

Is this an error or something else do let me know ...im not sure

danich1 · 2020-08-10T16:00:29Z

This error was generated because Pubtator Central's server sent back an error code. I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look.

kcmtest · 2020-08-10T16:05:30Z

". I don't know what caused it, so my suggestion is try rerunning that part of the pipeline and if the error comes again I'll take a look." i simply ran this

bash execute.sh

shall i run this again?

danich1 · 2020-08-10T16:12:33Z

No. Don't do that run this command:

 python scripts/download_full_text.py \
    --input data/pubtator-pmids-to-pmcids.tsv \
    --document_batch 100000 \
    --output data/pubtator-central-full-text.xml

If you run bash execute.sh you will restart everything. Not ideal.

kcmtest · 2020-08-10T16:14:47Z

thank you for the immediate help

this i got after running the above code sorry for asking these fundamental doubts ..since I use R almost so Im not sure about te errors

download_full_text.py: error: the following arguments are required: --temp_dir

I did make a new folder its running

python scripts/download_full_text.py --input data/pubtator-pmids-to-pmcids.tsv --document_batch 100000 --output data/pubtator-central-full-text.xml --temp_dir /run/media/punit/data4/tupa/
0it [00:00, ?it/s]

kcmtest · 2020-08-10T16:26:26Z

The error i received after running the above

0it [02:10, ?it/s]
Traceback (most recent call last):
  File "scripts/download_full_text.py", line 124, in <module>
    download_full_text(args.input, args.document_batch, args.temp_dir)
  File "scripts/download_full_text.py", line 58, in download_full_text
    response = call_api(query)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
    return func(*args, **kargs)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
    return func(*args, **kargs)
  File "scripts/download_full_text.py", line 21, in call_api
    raise Exception(response.text)
Exception: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Submitted URI too large!</title>
<link rev="made" href="mailto:info@ncbi.nlm.nih.gov" />
<style type="text/css"><!--/*--><![CDATA[/*><!--*/ 
    body { color: #000000; background-color: #FFFFFF; }
    a:link { color: #0000CC; }
    p, address {margin-left: 3em;}
    span {font-size: smaller;}
/*]]>*/--></style>
</head>

<body>
<h1>Submitted URI too large!</h1>
<p>


    The length of the requested URL exceeds the capacity limit for
	this server. The request cannot be processed.
   
</p>
<p>
If you think this is a server error, please contact
the <a href="mailto:info@ncbi.nlm.nih.gov">webmaster</a>.

</p>

<h2>Error 414</h2>
<address>
  <a href="/">www.ncbi.nlm.nih.gov</a><br />
  <span>Apache</span>
</address>
</body>
</html>

danich1 · 2020-08-10T16:29:45Z

Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api.

kcmtest · 2020-08-10T16:30:54Z

"Basically the program is sending too many ids to be processed. Change document_batch to be 100 or 1000 and run again. The default parameter is too high for Pubtator Central's api."

okay i will try small numbers

kcmtest · 2020-08-10T17:58:01Z

python scripts/download_full_text.py --input data/pubtator-pmids-to-pmcids.tsv --document_batch 100 --output data/pubtator-central-full-text.xml --temp_dir /run/media/punit/data4/tupa/
38it [1:26:01, 135.83s/it]
Traceback (most recent call last):
  File "scripts/download_full_text.py", line 124, in <module>
    download_full_text(args.input, args.document_batch, args.temp_dir)
  File "scripts/download_full_text.py", line 58, in download_full_text
    response = call_api(query)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 113, in wrapper
    return func(*args, **kargs)
  File "/home/punit/anaconda3/envs/pubtator/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
    return func(*args, **kargs)
  File "scripts/download_full_text.py", line 21, in call_api
    raise Exception(response.text)
Exception

Please do have a look

I did see the folder i do see xml files around 553 mb a total of 38 files

danich1 · 2020-08-10T18:54:20Z

For ease of debugging please upload this file: data/pubtator-pmids-to-pmcids.tsv. I'll need it so I can see whats causing the issue.

kcmtest · 2020-08-10T20:34:29Z

For ease of debugging please upload this file: data/pubtator-pmids-to-pmcids.tsv. I'll need it so I can see whats causing the issue.

sorry for the late reply im doing it now..i will share the link since its more than 10mb https://drive.google.com/file/d/1G-6ehkeR_V8IhqiBryCMVe1jGc9GPB8Y/view?usp=sharing

kcmtest · 2020-08-12T12:47:07Z

Hello sir ..I would be glad to know what was going wrong on my side ...

cgreene · 2020-08-12T17:26:50Z

Hi @krushnach80 - you have encountered a research project that is in progress but on someone's back burner at the moment. It sounds like you might be better served by directly interacting with the pubtator API or similar if you need faster responses in this case: https://www.ncbi.nlm.nih.gov/research/pubtator/

kcmtest · 2020-08-12T21:16:26Z

thank you sir ..i found something which would be easier for me ..https://cran.rstudio.com/web/packages/pubtatordb/vignettes/pubtatordb.html

but i would love to use your tool as well

danich1 mentioned this issue Aug 3, 2020

Implement download tracker and pipeline execution change #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial for its usage #23

Tutorial for its usage #23

kcmtest commented Aug 2, 2020 •

edited

Loading

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020 •

edited

Loading

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

kcmtest commented Aug 3, 2020 •

edited

Loading

kcmtest commented Aug 5, 2020

danich1 commented Aug 5, 2020

kcmtest commented Aug 6, 2020 •

edited

Loading

danich1 commented Aug 6, 2020

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020 •

edited

Loading

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020

kcmtest commented Aug 10, 2020 •

edited

Loading

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020 •

edited

Loading

kcmtest commented Aug 12, 2020

cgreene commented Aug 12, 2020

kcmtest commented Aug 12, 2020

Tutorial for its usage #23

Tutorial for its usage #23

Comments

kcmtest commented Aug 2, 2020 • edited Loading

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020 • edited Loading

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

danich1 commented Aug 3, 2020

kcmtest commented Aug 3, 2020

kcmtest commented Aug 3, 2020 • edited Loading

kcmtest commented Aug 5, 2020

danich1 commented Aug 5, 2020

kcmtest commented Aug 6, 2020 • edited Loading

danich1 commented Aug 6, 2020

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020 • edited Loading

kcmtest commented Aug 10, 2020

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020

kcmtest commented Aug 10, 2020 • edited Loading

danich1 commented Aug 10, 2020

kcmtest commented Aug 10, 2020 • edited Loading

kcmtest commented Aug 12, 2020

cgreene commented Aug 12, 2020

kcmtest commented Aug 12, 2020

kcmtest commented Aug 2, 2020 •

edited

Loading

kcmtest commented Aug 3, 2020 •

edited

Loading

kcmtest commented Aug 3, 2020 •

edited

Loading

kcmtest commented Aug 6, 2020 •

edited

Loading

kcmtest commented Aug 10, 2020 •

edited

Loading

kcmtest commented Aug 10, 2020 •

edited

Loading

kcmtest commented Aug 10, 2020 •

edited

Loading