You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A great way to use biblio-glutton's deeper citation matching capabilities would be if this extension could detect strings on a page that look like citations (eg, a list of reference strings), and if they don't already have identifiers or URLs in the string, do a match query against the biblio-glutton instance.
The current code uses regex to match identifier strings. Detecting reference/citation strings would probably be harder... look for a sequence of <li> elements, or short paragraphs? Maybe there exists Javascript code that already does this sort of fuzzy detection already, or code in another language that could be adapted.
Other issues (off the top of my head) are that a full match query is more expensive than an identifier lookup query, and that the privacy concerns when submitting full "raw strings" are more concerning than only matched identifiers (eg, passwords, email, or other strings could accidentally get sent in the API request).
This would be a bunch of work, and it isn't something I could contribute to in the near future, but i'm opening this issue in case others are interested or know of a project that has already done the hard bits.
The text was updated successfully, but these errors were encountered:
This would be indeed a long term goal given the work to be done. I've thought about intermediary steps, I've planed to add the first ones in the near future:
let the user highlight freely a reference text in the web page, and call the glutton bib. ref. on this text for adding the matched DOI + OA URL. So in this case, the user is doing the hard job...
same on a PDF displayed on a web browser (this would go to grobid first which is more robust wrt noisy text),
the user select a full bibliographical section, and use grobid to segment the references and then glutton.
Then for moving forward in the direction you point, we could imagine having a light deep learning model for web page very similar to GROBID's reference identifier, run by the browser extension via https://github.com/transcranial/keras-js (most of the current grobid models can already run with Keras). Some tags like <li> could be used as additional features, but the problem is then to find training data.
The execution of the model in a particular page could be triggered by the user, rather than something automated in the background as the current regex, to avoid the legitimate privacy concerns you are raising and because this would slow down the browser (contrary to the current regex).
A great way to use biblio-glutton's deeper citation matching capabilities would be if this extension could detect strings on a page that look like citations (eg, a list of reference strings), and if they don't already have identifiers or URLs in the string, do a match query against the biblio-glutton instance.
The current code uses regex to match identifier strings. Detecting reference/citation strings would probably be harder... look for a sequence of
<li>
elements, or short paragraphs? Maybe there exists Javascript code that already does this sort of fuzzy detection already, or code in another language that could be adapted.Other issues (off the top of my head) are that a full match query is more expensive than an identifier lookup query, and that the privacy concerns when submitting full "raw strings" are more concerning than only matched identifiers (eg, passwords, email, or other strings could accidentally get sent in the API request).
This would be a bunch of work, and it isn't something I could contribute to in the near future, but i'm opening this issue in case others are interested or know of a project that has already done the hard bits.
The text was updated successfully, but these errors were encountered: