-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Read an Excel File]: GC overhead limit exceeded #322
Comments
Can you try something along the lines of |
When i used .option("maxRowsInMemory", 20),the result dataframe is empty. |
I'm not surprised by the |
I used .option("maxRowsInMemory", 200), but it's still often to get an OOM error. My excel file is only 16MB. This is error message. |
@AlexZhang267 I unfortunately can't invest much time into spark-excel at the moment. |
I'm getting the same problem here. |
Hi @kennydataml , |
I unfortunately can't get a scrubbed excel for you since it's on a client laptop. for xlpath in excels:
csvpath = xlpath split join yadayda
try: # exception handling since we don't know the number of sheets
for i in range(15): # dynamic number of sheets
df = (spark.read
.format("crealytics ... spark excel yada yada")
.option("dataAddress", f"{i}!A1") # sub sheet index here
.option("header", "true")
.option("maxRowsInMemory", 100000)
.load(xlpath)
# write excel to csv
(df.write
.repartition(200) # attempting to circumvent memory issues
.format("csv")
.mode("append")
.option("header", "true")
.save(csvpath)
)
except Exception as err:
print(repr(err)) I've narrowed down the problem to only 1 of 8 excel files. I can consistently reproduce it on that particular excel file. It opens up just fine using microsoft excel, so I'm puzzled why only 1 particular excel file gives me an issue. |
I have the same issue and my excel file is only 2 MB, It also happens on some specific files |
Hi @NestorAGC123 , |
the problem is that spark-excel reads the file as an input stream and that uses far more memory than reading it as java.io.File - I have logged https://bz.apache.org/bugzilla/show_bug.cgi?id=65581 for a possible solution |
Does this problem still unclosed? I have the same problem. |
@yanghong maybe you could do a PR based on https://bz.apache.org/bugzilla/show_bug.cgi?id=65581 changes ? |
Fine, it‘s poi’s limitaion. |
Expected Behavior
Current Behavior
Troubleshouting
Your Environment
The text was updated successfully, but these errors were encountered: