-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add edm4hep::Tensor type for use in ML training and inference #388
base: main
Are you sure you want to change the base?
Conversation
Hi @veprbl, thanks for this proposal and apologies for the long delay from our side. We have discussed this proposal in todays EDM4hep meeting and we have a few questions regarding how you (plan to) use this. Presumably, there is already some experience with this from EIC? One of the main concerns that was raised today is that this is obviously extremely generic, and we were wondering whether it is maybe too generic.
|
Hi @tmadlener , thanks for your very nice feedback. The reason to introduce the type is to enable ML workflows in our reconstruction pipeline, for which we use immutable PODIO objects for exchange of data between algorithms and for storage on disk. We've already implemented this type in EDM4eic and there is a reference implementation for inference with ONNX in reconstruction that goes along with automated training CI workflow (eic/EICrecon#1618). It would appear that the type can have a more general utility outside of ePIC/EIC software. At the same time, we are looking to gather some feedback from the greater community, hence this is submitted. It would help us to share this type to make a better case for introduction of optimizations in PODIO for this use case (e.g. zero-copy facilities).
Indeed several more scalar value types are possible. Following https://onnxruntime.ai/docs/api/c/group___global.html#gaec63cdda46c29b8183997f38930ce38e one could natively add
I thought about this. My only idea was that a
Some thought was given to this, but I didn't see an invariant to check here.
For use case of ONNX they support passing value types that are lists of tensors and maps (which are more like lists of 2-tuples). Those need not to belong to EDM4hep. There is also a case for supporting sparse tensors encoding, that would probably be useful to have in EDM4hep.
This is inspired by ONNX, but not tied to it. Torch and Catboost models exported to ONNX were tested during development of this. I haven't tried it, but also don't see why inference with Torchscript/TF-lite would not work with this.
It doesn't do that explicitly. There appears to be a support for named dimensions: https://onnxruntime.ai/docs/api/c/struct_ort_1_1detail_1_1_tensor_type_and_shape_info_impl.html |
(answering to question in minutes)
If naming is important, specifically in ONNX, one can modify model to take more tensors (inputs are named), a concatenation ONNX operator would need to be inserted into the computation graph. More generally, ML feature representations are not always are tables, so it may not be a functionality that will work generally in every framework. |
BEGINRELEASENOTES
ENDRELEASENOTES
This should help to support ML in reconstruction frameworks and to write tensors to disk for training with conventional python ML tools.