You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to pre-generate the index for a very large set of polygons (loaded from Shapefile) and store as parquet so that I can reuse it in frequent production processes, but it seems that the ZOrderCurve type column named "index" is ignored when joining the parquet data with a list of points.
import org.apache.spark.sql.types._
import magellan.{Point, Polygon}
import org.apache.spark.sql.magellan.dsl.expressions._
val schema = new StructType(Array(
StructField("latitude", DoubleType, false),
StructField("longitude", DoubleType, false)
))
val sample = spark.read.schema(schema).option("header",true).csv("./sample.csv.gz")
magellan.Utils.injectRules(spark)
//spark.read.format("magellan").load("s3://myBucket/my_shapefile_folder")
// .withColumn("index", $"polygon" index 15)
// .selectExpr("polygon", "index", "metadata.ID AS id")
// .write.saveAsTable("shapes")
sample.join(spark.table("shapes"), point($"longitude",$"latitude") within $"polygon").explain()
@zebehringer can you give this PR a try? The issue I think is that the nullability column is reset(a bug in Spark SQL) when Spark SQL writes to Parquet.. and when we read back this causes a schema mismatch
I wanted to pre-generate the index for a very large set of polygons (loaded from Shapefile) and store as parquet so that I can reuse it in frequent production processes, but it seems that the ZOrderCurve type column named "index" is ignored when joining the parquet data with a list of points.
Here's the plan:
The text was updated successfully, but these errors were encountered: