Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.2 #587

Merged
merged 35 commits into from
Jan 19, 2023
Merged

Spark 3.2 #587

merged 35 commits into from
Jan 19, 2023

Conversation

echeipesh
Copy link
Contributor

@echeipesh echeipesh commented Jun 30, 2022

Pushing for Spark 3.2

  • get tests green
  • update to spark-testing-base for more speed
  • get code review and/or cleanup the changes
  • figure out what to do with GeoTrellis layer data source
  • update python package
    • cleanup dependencies: disentangle the release wheel from pweave
    • grok the python environment, consider introducing poetry instead of conda
  • run some jobs in spark 3.2 environment

@pomadchin
Copy link
Member

I'm going to bump the min Spark version in GT as well locationtech/geotrellis#3389
Thogh it should not affect this PR much.

@echeipesh
Copy link
Contributor Author

@pomadchin It'd be nice to get 3.6.3 out with spark 3.2.0 and then 3.7.0 with CES3 and spark 3.2.1. That'd allow this to sync with frameless 0.11 and later with 0.12+.

@pomadchin
Copy link
Member

@echeipesh 🚀 locationtech/geotrellis#3471 + I'm dropping Spark 3.0 over there typelevel/frameless#637 (frameless will be published for 3.1.x, 3.2.x, and 3.3.x)

@@ -48,7 +48,8 @@ import org.scalatest.{BeforeAndAfterAll, Inspectors}

import scala.math.{max, min}

class GeoTrellisDataSourceSpec extends TestEnvironment with BeforeAndAfterAll with Inspectors with DataSourceOptions {
trait GeoTrellisDataSourceSpec extends TestEnvironment with BeforeAndAfterAll with Inspectors with DataSourceOptions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very little motivation to fix this. Note that the push-down predicates also got busted with move to spark 3.2.

Can't imagine ever using this rather than reading COGs or even Zarr. Unless somebody would like to fix this I would like to suggest to remove this feature altogether to keep things tidy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metasim @pomadchin What do you guys think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

metasim and others added 18 commits December 14, 2022 13:11
Spark 3.2

Lets get it compiled, spark 2 support is well out the window anyway.
GT settings bring in jackson classes and with shading it was getting weird
this is a starting point
Also untangle the super weird inheritance relationship between the two
Made them more direct. Good for fixing things and better for performance because these versions don't need to create intermediate mask tiles.
This is a change but it's towards less surprising
- fixed weird init order in tests
- all tests share same context now thanks to base
- exclude scala-xml from tests
Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -48,7 +48,8 @@ import org.scalatest.{BeforeAndAfterAll, Inspectors}

import scala.math.{max, min}

class GeoTrellisDataSourceSpec extends TestEnvironment with BeforeAndAfterAll with Inspectors with DataSourceOptions {
trait GeoTrellisDataSourceSpec extends TestEnvironment with BeforeAndAfterAll with Inspectors with DataSourceOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@pomadchin pomadchin force-pushed the spark-3.2 branch 2 times, most recently from d1ecf9b to a2d5a7a Compare January 3, 2023 15:14
avoid reflection which is done at runtime by structural types
It needs more work at another time
@echeipesh
Copy link
Contributor Author

echeipesh commented Jan 19, 2023

I've been looking at the python side of the project enough to form strong enough opinions of it. I think it's actually going to need some work and it doesn't make sense to do it in this PR.

  • make the build PEP 517 compatible
  • ditch use of conda for poetry (we're not building a conda package here anyway)
  • separate the the pweave deps from the release deps
  • decide if shipping jars with .whl still makes sense
    • it was a nice attempt at nicety but unless you start the process from ipython instead of pyspark it is just dead weight.
  • pyrasterframes python files are in the sbt project structure
    • Kind of the only benefit of this is to couple the production of assembly jar with the building of the .whl
    • Not sure the weirdness of this justifies the setup

So that's a lot of moving stuff around potentially and it would be much cleaner to make that it's own PR.

@echeipesh echeipesh merged commit 8331358 into develop Jan 19, 2023
@echeipesh echeipesh mentioned this pull request Jan 19, 2023
@echeipesh echeipesh deleted the spark-3.2 branch February 24, 2023 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants