Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing of segments #222

Closed
selipot opened this issue Aug 1, 2023 Discussed in #208 · 9 comments · Fixed by #272
Closed

indexing of segments #222

selipot opened this issue Aug 1, 2023 Discussed in #208 · 9 comments · Fixed by #272
Labels
enhancement New feature or request

Comments

@selipot
Copy link
Member

selipot commented Aug 1, 2023

Discussed in #208

Originally posted by selipot June 28, 2023
I am wondering if we could improve indexing within rows. Let's say we start from a ragged array dataset ds with a given rowsize, we first need to generate a row index, define a slice, then index that slice:

row_index = np.insert(np.cumsum(ds.rowsize.values), 0, 0) # define row index
j = 1
row = slice(row_index[j], row_index[j+1]) # define a slice for row 1
ds.lon[row][n] # access the first data point of that row

How can we simply this to something like ds.lon[j][n] which is basically a syntax/functionality that is supported by awkward arrays but not xarray.

Is one long haul solution to define new classes such xarray.RaggedDataArray and xarray.RaggedDataset for which we could have .isel and .sel methods for double indexing (ds.lon[j][n])?

@selipot selipot added the enhancement New feature or request label Aug 1, 2023
@selipot
Copy link
Member Author

selipot commented Aug 1, 2023

Alternatively, I'd like to see a function that apply apply any function onto a specific row of a ragged array, perhaps called apply_row? As an example, plot the trajectory in the n-th row

apply_row(plt.plot,[ds["longitude"],ds["latitude"],ds["count"],n)

So far, one needs to do

traj_idx = np.insert(np.cumsum(ds["count"].values), 0, 0)
sli = slice(traj_idx[j], traj_idx[j+1])
plt.plot(ds.longitude[sli],ds.latitude[sli])

That's just too much.

@selipot selipot assigned milancurcic and unassigned milancurcic Aug 1, 2023
@milancurcic
Copy link
Member

apply_row seems useful but it also seems a special case of apply_ragged. Consider adding a keyword parameter rows to apply_ragged. rows can be None (default) which resolves to the current apply_ragged behavior. Passing an int to rows does what you described as apply_row. And passing an array-like of ints to rows applies the function on the requested rows only.

@selipot
Copy link
Member Author

selipot commented Sep 15, 2023

I like that suggestion @milancurcic, could you work on it?

@philippemiron
Copy link
Contributor

But in that case, this is using Matplotlib API (plt.plot) in parallel, which should be avoided...

@milancurcic
Copy link
Member

I don't understand. Where does Matplotlib come in?

@milancurcic
Copy link
Member

Oh, Shane's example from the top. Yes, use it with Matplotlib at your own risk.

@milancurcic
Copy link
Member

I think for an int value of rows it would be OK even with Matplotlib because the function would be applied only once, and thus in one thread. User can also disable concurrency when calling apply_ragged, by passing executor with max_workers of 1.

@selipot
Copy link
Member Author

selipot commented Sep 15, 2023

I think this would be a good thing. For testing purposes I find myself wanting to use apply_ragged for a single row, or a small subset of a ragged array. And don't tell me that I can use subset ... :)

@selipot
Copy link
Member Author

selipot commented Sep 15, 2023

We can add a warning that using this for plotting is maybe not a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants