Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review - Feature importance #94

Open
jwagemann opened this issue Aug 6, 2019 · 1 comment
Open

Review - Feature importance #94

jwagemann opened this issue Aug 6, 2019 · 1 comment
Labels

Comments

@jwagemann
Copy link
Contributor

  • Could you please comment on feature importance. What did you learn from them? Were they all needed? How do you establish feature importance? How is the subsequent work affected by this?
  • Have you explored spatial correlations among variables?
@tommylees112
Copy link
Contributor

Could you please comment on feature importance. What did you learn from them? Were they all needed? How do you establish feature importance? How is the subsequent work affected by this

We are using Shap Values to calculate feature importance. Shap values can be used to understand what motivated a model to make certain predictions. These values operate on the local level, i.e. they tell us why the model predicted a VHI score of 99 for a specific pixel. Global importance features can then be derived by aggregating these local explanations.

The subsequent work has not yet been affected by this. We are currently using these models to interpret relationships and understand which variables are predictive of agricultural drought (VHI). However, the idea of learning the relationships can be used for feature selection and we might say that in order to speed up model training and development, only the important features are included in future models.

Have you explored spatial correlations among variables

In order to account for spatial correlations we have the option to append the values for surrounding pixels to the X data (the covariates). This should capture some of the spatial co-variability and we can increase the number of surrounding pixels that are incorporated into the model. However, we pay for the increased potential of capturing the spatial relationship with an increase in the number of variables as we increase the surrounding_pixels: int argument.

We currently therefore, limit our surrounding_pixels to 1, meaning that we have all variables in the 9 surrounding pixels that we have in our target pixel. This is a necessary compromise given computational constraints.

However, it is worth noting that the spatial relationships among these 9 input points aren’t communicated to the model - this is something we are considering for future iterations, perhaps using CNNs.

We are also looking to explore how feature importance varies over space. We have already explored the importance of features over time using Shapley Values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants