Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scopus exceeds csv field limit #92

Open
r-wrobel opened this issue Apr 16, 2024 · 0 comments
Open

Scopus exceeds csv field limit #92

r-wrobel opened this issue Apr 16, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@r-wrobel
Copy link

Hi, I encountered a bug which is triggered by (very) long lines in the csv files.
It seems that the used csv module has a limit for the number of characters for fields of 131072 characters:

Cell In[33], line 2
----> 2 docs_scopus=litstudy.load_scopus_csv("scopus.csv")

File [\site-packages\litstudy\sources\scopus_csv.py:116] in load_scopus_csv(path)
    114 with robust_open(path) as f:
    115     lines = csv.DictReader(f)
--> 116     docs = [ScopusCsvDocument(line) for line in lines]
    117     return DocumentSet(docs)

File \Lib\csv.py:116, in DictReader.__next__(self)
    113 if self.line_num == 0:
    114     # Used only for its side effect.
    115     self.fieldnames
--> 116 row = next(self.reader)
    117 self.line_num = self.reader.line_num
    119 # unlike the basic reader, we prefer not to return blanks,
    120 # because we will typically wind up with a dict full of None
    121 # values

Error: field larger than field limit (131072)

You can use the DOI 10.1016/C2013-0-19213-6 for testing. The line of the complete csv export from Scopus has 182667 chars.
I assume, a solution is presented at https://stackoverflow.com/a/15063941

@isazi isazi added the bug Something isn't working label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants