-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatten / Reshape data cube dimensions #308
Comments
So, the input could be:
in my opinion the result should have a shape of MxN, resulting from a combination of the available ones in the datacube. Practical example:
If the input data has also the time dimensions, we need to allow a result like:
Anyway, we will lose the necessary information for reshaping the output of the machine learning algorithm, so maybe we will also need another process to reshape the output or having a more general reshape process allowing to flatten the data but also reconstruct it following a sample datacube (the data before flattening for instance.) references: |
Porting over a discussion from #306 - posted by @jdries:
We've just checked the process description closely and we believe that the behavior is not covered by the process description of
It is possible that I've misunderstood how you envision the flattening approach through apply_dimenesion and it would be good to look at an example. What I found in UC3 used for cc @edzer - How does a user flatten in stars? (edit: st_redimension) |
Looking at the example above, we also typically solve that one without flattening. The random_forest_inference callback then simply gets the 2 band values per pixel and timestep, and predicts the NDVI. Also our more complex cases based on deep learning work like that, no flattening and reshaping is needed. The big problem lies more with training models, because that is a 'global' operation that can not be split up using the callback approach. On the other hand, the point sampling through aggregate_spatial does solve the problem of 'flattening' the spatial dimensions. |
In my opinion, we would need a reshape process that does raster-cube to vector-cube and vice-versa (or two separate processes) When the input is a raster-cube it flattens the data to a vector-cube: Input: (x,y,time,band) with shape (M,N,T,B) = (10,20,100,2) Input: (x,y,time,band) with shape (M,N,T,B) = (10,20,100,2) Input: (x,y,time) with shape (M,N,T) = (10,20,100) Input: (x,y,band) with shape (M,N,B) = (10,20,2) When the input is a vector-cube it reshapes the data to a raster-cube given a target cube. Using combinations of apply_dimension and reduce_dimension might be difficult to understand for someone with a machine learning background and as we have seen it does not cover all the possible scenarios. |
Could it be that we have some confusion on the 'vector-cube' concept? |
I tried to clarify some of this in #68. |
That's also fine, but to train a ML model we do need that this vector-cube or how we want to call it, has just 2 dimensions. So it could also have the structure that you mentioned, where each row has also the (x,y) or polygon property that generated it, but we still need a process that reshapes back and forth the data. |
Not sure if I agree. To train an ML model (and also for inference), we need to provide a matrix to the model, where the shape of that matrix indeed depends on the model.
My biggest problems with the reshaping proposal:
The main argument for reusing the existing processes is simply that we have them already, and we have to teach our users anyway how to work with them. I agree that these are not the most simple processes, but for EO researcher that have ambitions to use machine learning and probably deep learning as well, this should be will within their skillset. |
There are sometimes use cases that need to "flatten" (or stack: xarray, pands) data cube dimensions. Right now, VITO uses "apply_dimension" + target_bands as a workaround, but that may not fully be covered through the specification.
We need to check whether we really want to use that approach long-term, it is a bit weird to use a const operation as callback.
The better approach could be to actually define a new process.
This is already required by multiple use cases: SRR2 UC3, SRR3 UC8
It has already been discussed as part of two other issues at least:
The text was updated successfully, but these errors were encountered: