-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add robust polynomial, sum of sinusoids fitting #151
Conversation
Nice! I presume this is the first step toward your bias corrections? Is scikit-learn already installed due to some other dependency? It is not a direct dependency in the environment file. |
Yes, first towards a series of Ups! Didn't check, it was in my local xdem environment for some reason. |
Looks very promising!! :-) Regarding the dependency on scikit-learn (and of other packages in general), I believe that unless not importing the module makes xdem useless, e.g. rasterio, we should not make it a hard dependency. So basically, it should only be imported within the functions where it is needed. |
For now, I have added an Should we open an issue (improvement) for creating a file for "full environment" and one for "minimal environment"? |
My idea was to have the import statement within the function directly. I was discussing with Fabien about it and there are pro/cons to each approach:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice functionality! I have some quite small remarks but generally I like it!
@@ -398,6 +408,149 @@ def hillshade(dem: Union[np.ndarray, np.ma.masked_array], resolution: Union[floa | |||
# The output is scaled by "(x + 0.6) / 1.84" to make it more similar to GDAL. | |||
return np.clip(255 * (shaded + 0.6) / 1.84, 0, 255).astype("float32") | |||
|
|||
def get_xy_rotated(raster: gu.georaster.Raster, myang: float) -> tuple[np.ndarray, np.ndarray]: | |||
""" | |||
Rotate x, y axes of image to get along- and cross-track distances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns pixel coordinates rotated around the lower left corner, right? Where do the "cross-track distances" come in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also forego the raster class (if we want) by using xdem.coreg._get_x_and_y_coords
on a transform and the shape of the array. I don't know if that is necessary or better; just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could be a nice idea to combine both with an optional rotation argument.
I think generally it is advised to import at the top-level because running the function multiple times would otherwise run the imports multiple times. This is not very slow, but it's also not instantaneous. See StackOverflow for reference. I think both approaches are fine; as you mention, there are pros and cons to both. I think we should just be consistent, and for now, we have the |
My two cents are that Thoughts, @adehecq and @rhugonnet ? |
Fully agree, |
Just to clarify, where is the information on the dependencies for conda-forge stored? I thought it was in the |
But a module that was previously imported is not imported again, so the function will take a little bit more time on the first call, but then it should be almost the same no? |
Looks great! This makes me think we could use your robust polynomial fit in coreg deramp. |
@rhugonnet what's the status of this PR? |
On it, trying to homogenize things and push a final version! |
Again, I have an assertion error that happens only in CI, while everything passes locally... and this is for a calculation with a fixed |
@adehecq @erikmannerfelt All ready for approval to be merged, except for that one test that seems to fail randomly in CI (while it never does locally, the |
I'm still desperately trying to understand just the base: why 🙏 |
After a few hours, still can't trace it back. 😢
for i in range(2):
assert coefs[3*i] == pytest.approx(true_coefs[3*i], abs=0.02) The function call is based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job !
I haven't looked in detail at your latest changes, but if you took into account our comments, it should be alright.
It's annoying you couldn't find the issue behind the test randomly failing...
Polynomial fitting + Sum of sin fitting + Across/Along-track sampling
Resolves #50
2 robust polynomial fitting solutions
1/ One combining
sklearn.linear_model
andsklearn.preprocessing.PolynomialFeatures
: solves a polynomial with robust estimators Linear Regression, Theil-Sen (median approach), RANSAC or Huber.2/ Using
scipy.optimize.least_squares
and specific loss function, some are quite robust to outliers (see example in tests).Both implemented in the same function
Linear can be solved with
scipy
orsklearn
. While Theil-Sen, RANSAC and Huber only withsklearn
.Input/Output
Simple input:
x
,y
as input, choice ofestimator
, cost functioncost_func
(by default, median absolute error), and a few other options described in the docs.Simple output: polynomial degree as integer, and coefficients in a vector.
Choosing the best polynomial
Simply using the polynomial that has the lowest cost (less spread between true and predicted values) is known to not be a good approach for choosing the optimal degree, as it can lead to overfitting. Here I wrote a simple function that selects the polynomial of smallest degree within a percentage margin of the "best cost" found by the fit.
So, for instance, if degree 1 fit has a cost of 100, degree 2 fit a cost of 20, degree 3 fit a cost of 5 and then degrees 4 to 6 fits have a cost between 4 and 5, the function (with a margin of 20% by default) will select degree 3 as the optimal solution.
EDIT:
TODOLIST to finalize PR
poly
andsumofsin
functions to call new wrapper functionskwargs
argument that can be identified to any subfunction call (same logic as inspatialstats.py
)fit.py
following Sort the mess in spatial_tools.py #157