-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict_class and predict_probabilities #368
Comments
Sounds reasonable. What do others think? @clausmichele @LukeWeidenwalker |
Some thoughts:
Sorry for the wall of text - I realize this is out of scope for this issue, feel free to redirect this discussion somewhere more suitable. |
In the VITO backend we had problems with getting the probabilities out of Spark's RandomForrest implementation, while getting the class was straightforward. So I think it's best to have
Indeed, this proposal is only for classification models. In that light, it might be more future proof to use
on the level of openEO process we would, at best, replicate the API of sklearn in some abstracted way, which doesn't seem like a bad thing to me.
The long term goal is probably to do this, but at the moment we are just focused on training and inference as the core ML building blocks. All the rest (evaluation, tuning, ...) is now expected to be done client side. Which gives the user the most flexibility anyway.
openEO won't stay relevant if ML/AI in some form isn't part of the offering I'd think. |
I'm +1 on the original proposal. What do others think? How do we proceed? |
I also agree with the proposal. We should move forward with the ML processes to keep openEO attractive! I also partly agree with Lukas, but I would say that for more advanced users and models we could still use UDFs. |
Dear @m-mohr @clausmichele @edzer @dthiex, some thoughts on ML/DL processes in openEO: IMHO, a recommended generic set of functions for ML/DL processes in openEO is:
Some relevant points regarding the above:
The API proposed above would be a minimal set of ML/DL classification functions for openEO. Extensions that could be considered later include: (a) Measuring classification uncertainty. Full disclosure: this proposal is based on our 6-year experience in the development of SITS. All of the above functions (and extensions) are implemented and operational in SITS. In terms of openEO, this means that such an API for openEO will be readily implemented in a SITS back-end. Hopefully, this would motivate other openEO back-end developers to follow suit. |
Dear @clausmichele
Not sure I agree. UDFs are non-standard and should be discouraged, because they fragment the openEO landscape and thus would undermine the purpose of openEO which is to have a single interface supported by different back-ends. |
Potentially interesting for "bring your own model": https://onnx.ai/ |
@JeroenVerstraelen is working on an implementation of CatBoost base ML in the VITO backend and while discussing details a couple of things came up:
predict_catboost
process would be practically identical topredict_random_forest
, except for some textual differences in title and descriptions. Turns out that it is not really necessary to define a dedicatedpredict_
process for each kind of machine learning model: all the model details are embedded in theml-model
object and you could just use a singlepredict(data: array, model: ml-model)
for all kinds of ML models.reduce_dimension
and the other inapply_dimension
. It felt error prone and confusing to let these two different patterns depend on a rather inconspicuous boolean parameter. It might be better to have a separate processes for class prediction and probabilities predictionSo with this background, the proposal is to introduce two generic ml prediction processes:
predict_class(data: array, model: ml-model) -> number
predict_probabilities(data: array, model: ml-model) -> array
both can be easily spec'ed based on current https://github.com/Open-EO/openeo-processes/blob/draft/proposals/predict_random_forest.json
The text was updated successfully, but these errors were encountered: