Replies: 14 comments
-
Thanks for using LitStudy! Looks like Could you maybe provide the rest of the notebook, or do you have the line that creates |
Beta Was this translation helpful? Give feedback.
-
Hi stijnh thanks for the quick response. Sure, here we go: |
Beta Was this translation helpful? Give feedback.
-
I have defined DocumentSet as docs_springer in my case, and it seems to have resolved the error, as the output is no longer an AttributeError, but instead (As below): Does this look correct to you? |
Beta Was this translation helpful? Give feedback.
-
You would need to do something like this: docs_springer, docs_not_found = litstudy.refine_scopus(docs_springer)
print(len(docs_springer), "papers found on Scopus")
print(len(docs_not_found), "papers NOT found on Scopus") |
Beta Was this translation helpful? Give feedback.
-
Great, thanks stijnh. Separately, I was wondering whether the full results from the word distribution can somehow be viewed, as the table output seems to provide only a snapshot? Thanks, as always, S |
Beta Was this translation helpful? Give feedback.
-
@stijnh I'm also a bit confused about the ngram_threshold, even after reading the guidance documents. An ngram_threshold of 0.8 does what exactly? Classifies something as agreeing with/matching that ngram if 80% of its characters are the same as the reference ngram (included in the corpus)? Sorry for the question, but I can't seem to clarifying on my own and it would be good to know how LitStudy is working here. Thanks, S |
Beta Was this translation helpful? Give feedback.
-
Hi,
This is the complete table of all ngrams, that means all the words that contain a Remove
The parameter The actual processing is done by |
Beta Was this translation helpful? Give feedback.
-
Great, thanks for your help. I have removed the .filter(like="_") and am obviously presented with a larger list. My question is how I can view/export/download this list in its entirety? Thanks again, Sam |
Beta Was this translation helpful? Give feedback.
-
Hi @stijnh another quick question from me which might have a simple answer, hence why I am not opening it as a new issue: In the word distribution plot which has been produced below, is the highest result saying that the word 'nature' only appears across 35% of the documents? I am asking because it was a key search term used in the original Scopus search, so in theory all of the documents (that is, 100%) should include the word 'nature'. Thanks, as always, for your patience and advice, Sam |
Beta Was this translation helpful? Give feedback.
-
The thing returned by For example, you can add
Not sure about this one. Maybe sometimes Good luck! |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing this @stijnh - one (final) question which isn't clear to me from the guidance, how can we change the parameters to search for trigrams? I have a feeling that the top scoring bigram below "nature_solutions" is actually "nature-based solutions" or "nature based solutions", and would like to capture this in the word distribution output. |
Beta Was this translation helpful? Give feedback.
-
Thanks @stijnh , although I can't seem to get pandas to write the DataFrame to a .csv, here's what I'm doing: There's no error returned, but nothing being written to the .csv either... |
Beta Was this translation helpful? Give feedback.
-
Replace
by
You were creating an empty |
Beta Was this translation helpful? Give feedback.
-
Great, thanks @stijnh I've now instead encountered the issue of the exported .xlsx from DataFrame being unopenable, due to an invalid extension of file pathway, but this seems to be a known issue that requires a workaround so I've posted elsewhere. If you are curious, here's the issue |
Beta Was this translation helpful? Give feedback.
-
AttributeError: 'DocumentSet' object has no attribute 'title' is displayed, even after changing title within relevant CSV file (docs_springer) to read 'title'.
Thanks in advance! :)
Sam
Beta Was this translation helpful? Give feedback.
All reactions