Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError: Invalid string length #276

Closed
hugo-nl opened this issue Nov 22, 2024 · 3 comments · Fixed by #322
Closed

RangeError: Invalid string length #276

hugo-nl opened this issue Nov 22, 2024 · 3 comments · Fixed by #322
Assignees
Labels
bug Something isn't working

Comments

@hugo-nl
Copy link
Collaborator

hugo-nl commented Nov 22, 2024

🐛 When adding long reports we get a RangeError from ChromaDb
⚡ A potential fix is to introduce batching to chroma since it seem to be happening for longer reports

Ex: https://group.vattenfall.com/globalassets/corporate/who-we-are/sustainability/vattenfall-annual-and-sustainability-report-2023.pdf

@hugo-nl hugo-nl self-assigned this Nov 22, 2024
@hugo-nl hugo-nl added the bug Something isn't working label Nov 22, 2024
@hugo-nl hugo-nl changed the title Bugg: RangeError: Invalid string length RangeError: Invalid string length Nov 25, 2024
@Greenheart
Copy link
Contributor

The same error arises for another long report: Atlas Copco, 2023

image

@Greenheart Greenheart added this to Garbo Nov 27, 2024
@Greenheart Greenheart moved this to Todo in Garbo Nov 27, 2024
@Greenheart
Copy link
Contributor

Greenheart commented Nov 27, 2024

This consistently happens for reports where we find more than 20 tables. Then we likely get a too long document to store in chromadb.

Could we split the tables into their own collection? Or could we remove duplicate emissions/economy table data by solving #274 ?

@Greenheart
Copy link
Contributor

This is not caused by chromadb per se, but actually the V8 JS runtime. Reference: nodejs/node#35973

We should try to batch process the report and its tables. Since we can't store the full length of the string in memory, perhaps we have to refactor this job to use temporary files and streams, to allow processing the report chunk by chunk.

@Greenheart Greenheart moved this from Todo to Review in Garbo Nov 27, 2024
@github-project-automation github-project-automation bot moved this from Review to Done in Garbo Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants