Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String length (...) exceeds the maximum length (20000000) #10

Closed
sergiimk opened this issue Feb 11, 2024 · 4 comments
Closed

String length (...) exceeds the maximum length (20000000) #10

sergiimk opened this issue Feb 11, 2024 · 4 comments
Labels
upstream Issue in upstream dependency

Comments

@sergiimk
Copy link
Member

After migration to Spark 3.5 we pulled in new Jackson version that introduces a default limit on the length of string (20MB).

See: FasterXML/jackson-core#1014

This creates an issue for some of our examples operating with very large GeoJSON fields.

Neither the jackson library nor Spark expose it as configuration option, so the only way to fix it is to fork Spark, which we won't be doing.

This ticket is to take note of the issue, but most likely we will solve it by switching away from Livy and towards something like Spark Connect that transfers data in Arrow format.

@sergiimk sergiimk added the upstream Issue in upstream dependency label Feb 11, 2024
@sergiimk
Copy link
Member Author

Reported an upstream issue: https://issues.apache.org/jira/browse/SPARK-47150

@CrzEP
Copy link

CrzEP commented Jun 25, 2024

so.anyone can tell me ,how to let maxLenLim change to more bigger?

@sergiimk
Copy link
Member Author

We have not found a workaround. I think the only solution is to patch Spark code ... which is a lot of work.

We are only hitting this issue when querying large GIS datasets that have geometry properties within them, so we are waiting for upstream issue to be fixed.

I suggest leaving a comment on Spark's Jira.

Meanwhile our strategy is to depend less and less on Spark in favor of other engines.

@sergiimk
Copy link
Member Author

sergiimk commented Aug 2, 2024

We went the hard way and patched default configuration directly in Livy.

Commit:
kamu-data/incubator-livy@d1c898c

Released in engine image version: v0.23.1

@sergiimk sergiimk closed this as completed Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Issue in upstream dependency
Projects
None yet
Development

No branches or pull requests

2 participants