You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as we talked in other repo, i adjusted the client that it could parse citations from text.
The solution became a bit ugly. But now:
it reads "txt" file as an input with each citation in new line
groups citations by thousands (or batch_size specified) and saves them in XML file, naming it by input name plus each thousand (or batch_size specified)
At the end opens each file and adds appropriate XML beginning and END
The TXT and PDF files handling are separated after common function "process"
Issues:
I needed to rename 'input' variable to 'input2' as python was complaining for the name
Input file must be given in TXT
If workers specified more than 1, the input file and outcome file is loosing sorting order.
Examples:
if order matters - (--n < 2): python grobid-client.py --input /path/to/refs/file.txt --n 1
if not - (--n >1 or default) python grobid-client.py --input /path/to/refs/file.txt
to parse with single worker 2 millions citations with Macbook Pro 2015 it took around 6 hours. Not so slow :)
Dear Patrice,
as we talked in other repo, i adjusted the client that it could parse citations from text.
The solution became a bit ugly. But now:
Issues:
Examples:
if order matters - (--n < 2):
python grobid-client.py --input /path/to/refs/file.txt --n 1
if not - (--n >1 or default)
python grobid-client.py --input /path/to/refs/file.txt
to parse with single worker 2 millions citations with Macbook Pro 2015 it took around 6 hours. Not so slow :)
Here is the file https://github.com/darjusp/contribs/blob/master/grobid-client.py
The text was updated successfully, but these errors were encountered: