Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add edm4eic::Tensor #96

Merged
merged 1 commit into from
Nov 4, 2024
Merged

add edm4eic::Tensor #96

merged 1 commit into from
Nov 4, 2024

Conversation

veprbl
Copy link
Member

@veprbl veprbl commented Sep 19, 2024

This adds a new type that facilitates exchange of tensor data for ML applications.

@simonge
Copy link
Contributor

simonge commented Oct 2, 2024

I had some comments on this but answered them myself while writing this out, I've added them here anyway in case it's useful but otherwise the type looks fine.

  • If we want to make use larger batch inference we could use the folder/unfolder to stick together a bunch of events. This would add another variable dimension ontop of the event size batch. Admittedly this could be flattened out for running the inference and handled by the folder.
  • Any batch treatment of a graph neural network or other variable size input requires a non-static dimension. I remembered this is passed to the inference as a separate tensor, keeping track of how the data tensor is divided up.

@veprbl
Copy link
Member Author

veprbl commented Oct 2, 2024

This is a good topic. My comment is that the current implementation in EICrecon PR does batch inference. As far as EDM is concerned, there is no difference between batch and single item evaluations, all tensors are supposed to have a concrete shape. ONNX model and the converters from/to-tensor are the one that have to know. Convertors obviously need to do the batching, and, like you say, folding unfolding, indeed is a possibility. For the second point, this is new to me, sounds like this is jagged tensor implementation. I'm not sure that ONNX models are coded like that, it actually supports "sequences" of tensors, which are more likely to be used, according to my intuition. Support for those may require another type in the EDM with a vector member for edm4eic::Tensors.

@simonge
Copy link
Contributor

simonge commented Oct 3, 2024

I have a demonstration model which (currently poorly) predicts the position and time of a hit from a varying sized collection of pixel hits. Here the model takes a tensorflow ragged tensor input and can be exported using the tf2onnx package:
https://github.com/simonge/AllpixFastSim/blob/GNN/MakeModel/train_predictor_gnn.py#L80

The model actually ends up taking two input tensors of different types. The first is just a float32 containing the input values and the second gives the position in the value array where different pseudo-clusters are separated as an int64.
https://github.com/simonge/AllpixFastSim/blob/GNN/TestModel/plot_input_resolutions_gnn.py#L127

From what I see, I believe this would be completely compatible with the data type and onnx inference algorithm you currently have.

Copy link

@ruse-traveler ruse-traveler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm happy with this as is, but I wonder if might want to add additional element types down the line... For example, I can imagine having a bool (elementType==9) might be useful for some ML classifiers.

@veprbl
Copy link
Member Author

veprbl commented Nov 3, 2024

LGTM! I'm happy with this as is, but I wonder if might want to add additional element types down the line... For example, I can imagine having a bool (elementType==9) might be useful for some ML classifiers.

I'd like to avoid the bool. It's probably useless (no existing exporter will use it) and might present an edge case when we add any kind of support for zero-copy.

@veprbl veprbl merged commit 9f0067b into main Nov 4, 2024
5 checks passed
@veprbl veprbl deleted the pr/add_tensor_type branch November 4, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants