Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Lazy coordinates_to_cells #42

Open
highway900 opened this issue Nov 29, 2023 · 3 comments
Open

Question: Lazy coordinates_to_cells #42

highway900 opened this issue Nov 29, 2023 · 3 comments

Comments

@highway900
Copy link

I am a new polars user and I am curious how do I use the coordinates_to_cells function in a lazy context?

If I do what I think needs to be done I get an error TypeError: 'Expr' object is not iterable I can achieve my goal in the eager way. But hoping I can do this with the lazy api?

import polars as pl
from h3ronpy.polars.vector import coordinates_to_cells

# Sample Polars DataFrame with latitude and longitude
data = {
    "x": [-74.0060, -118.2437, -87.6298],  # 'x' for longitude
    "y": [40.7128, 34.0522, 41.8781],  # 'y' for latitude
}

res = 8
df = (
    pl.DataFrame(data)
    .lazy()
    .with_columns(
        coordinates_to_cells(pl.col("x"), pl.col("y"), resarray=res)
        .h3.cells_to_string()
        .alias(f"h3_{res}")
    )
)
@nmandery
Copy link
Owner

Whats required for this is being able to call coordinates_to_cells directly on a polars expression Expr. We are already providing polars expressions in https://github.com/nmandery/h3ronpy/blob/ca891fa5dfa8e1ea4dd7006d15d31bd294a45a2a/python/h3ronpy/polars/__init__.py#L57C7-L57C7 , but not for this functionality. The problem here is that this function requires at minimum two series as input and I do not see how this can be achieved using https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.api.register_expr_namespace.html#polars.api.register_expr_namespace . Polars expressions seem to operate only one single series. Please correct me if that is not the case - I am not up-to-date with the most recent versions of polars.

What could be done is implementing an extension of a LazyFrame (https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.api.register_lazyframe_namespace.html), but I am not sure about how useful this would be. It would only allow calling the function directly on lazyframes, not from within expressions.

@highway900
Copy link
Author

Thanks for looking at this, I was mostly looking at using a Lazyframe and not explicitly using expressions. I will have poke around with register_lazyframe_namespace. I think though you answered my query which was this currently isn't possible so it's not just my lack of experience with polars being the problem :)

@BielStela
Copy link
Contributor

BielStela commented Jun 20, 2024

Hi ^^. In order to take multiple args it could be implemented using the polars plugin system like in https://marcogorelli.github.io/polars-plugins-tutorial/lost_in_space/. Something in the line of

#[polars_expr(output_type = UInt64)]
fn coordinates_to_cells(inputs: &[Series], kwargs: H3Kwargs) -> PolarsResult<Series> {
    let lats = inputs[0].f64()?;
    let lons = inputs[1].f64()?;
    let resolution = Resolution::try_from(kwargs.resolution).unwrap();

    let mut cells: Vec<u64> = Vec::with_capacity(lats.len());

    lats.iter().zip(lons.iter()).for_each(|(lat, lon)| {
        if let (Some(lat), Some(lon)) = (lat, lon) {
            cells.push(u64::from(LatLng::new(lat, lon).unwrap().to_cell(resolution)))
        }
    });

    Ok(UInt64Chunked::from_vec("cells", cells).into_series())
}

and then register the function in pythonland with:

import polars as pl
from polars.plugins import register_plugin_function
from polars.type_aliases import IntoExpr


def coordinates_to_cells(lat: IntoExpr, lon: IntoExpr,*, resolution: int) -> pl.Expr:
        return register_plugin_function(
            plugin_path=Path(__file__).parent,
            args=[lat, lon],
            kwargs={"resolution": resolution},
            function_name="coordinates_to_cells",
            is_elementwise=True,
        )

Would allow us to operate on the LazyFrame example as such

In [7]: df.collect()
Out[7]:
shape: (3, 2)
┌───────────┬─────────┐
│ x         ┆ y       │
│ ---       ┆ ---     │
│ f64       ┆ f64     │
╞═══════════╪═════════╡
│ -74.006   ┆ 40.7128 │
│ -118.2437 ┆ 34.0522 │
│ -87.6298  ┆ 41.8781 │
└───────────┴─────────┘

In [8]: df.select(cells=coordinates_to_cells("x", "y", resolution=8)).collect()
Out[8]:
shape: (3, 1)
┌────────────────────┐
│ cells              │
│ ---                │
│ u64                │
╞════════════════════╡
│ 616717907826573311 │
│ 616483633261182975 │
│ 616736054719807487 │
└────────────────────┘

However, this needs a custom plugin in rust land which needs to be build as a polars plugin :/

I guess it can be done using the existing h3ronpy function coordinates_to_cells and doing some black magic with pl.Expr to exctract the series from the multicolumn expression like

df.select(cell = pl.col("x", "y").h3.coordinates_to_cells(resolution=8))

But the documentation falls short and I did not find anything similar in the wilderness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants