-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using cx indexer without spatial index #54
Conversation
I agree with this sentiment. |
So I'm guessing that the fact that the implicitly created |
Added explicit |
Taking a look... I'm pretty sure that the partition-level spatial indexes used to persist, but it also makes sense that this isn't something Dask would want a library to rely on 🙂 |
These look like reasonable updates to me. And overall, it might just be easier to remove all implicit spatial index initialization, and just have methods that need one raise an error if it hasn't been initialized yet.
Did you push these changes? I didn't see them in the diff just now. For DaskGeoDataFrame, I think we'd also want the option to initialize the partition-level spatial indexex with an option to the |
Sorry pushed now. |
The |
I have yes. It's also very unclear to me but here's the test cases I'm using (tested before and after this PR respectively): df.cx[-19986136.0: -18986136.0, 445.5396728515625:10000].compute()
p = df.cx_partitions[-19986136.0:-18986136.0, 445.5396728515625:10000]
def report_sindex(df):
print(df.geometry.array._sindex)
return df
p.map_partitions(report_sindex, meta=p._meta).persist()
df = df.build_sindex().persist()
def report_sindex(df):
print(df.geometry.array._sindex)
return df
df.map_partitions(report_sindex, meta=df._meta).persist()
|
Ok, looks good! |
Any opinion whether I should release this as spatialpandas 0.3.7 or 0.4.0? |
Not a strong one. I'd lean toward 0.4.0 since it fixes a bug by adding to the API. |
bcf5187
to
74cfcb3
Compare
The problem with using
cx
on a DaskGeoDataFrame was that it would usecx
on each partition which in turn would spatially index that partition because it was using thesindex
attribute. It seems as though these spatial indexing wouldn't persist either so every time you did some indexing it would create a completely newsindex
slowing down the indexing massively. I now only use the sindex if it already exists. I would like there to be a more explicit API on GeometryArrays, GeoSeries, GeoDataFrames, DaskGeoSeries and DaskGeoDataFrame that says "create an spatial index" rather than using the implicit behavior of creating a spatial index whensindex
is accessed.