Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling TF on image-classification pipeline. #15030

Merged
merged 1 commit into from
Jan 6, 2022

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Jan 4, 2022

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@philschmid
@LysandreJik

probs = model_outputs.logits.softmax(-1)[0]
scores, ids = probs.topk(top_k)

if self.framework == "pt":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to make the post-processing framework-agnostic? similar to the text-classification pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no top_k with a nice API on numpy so I would stay away from native-> np -> native or have the ugly topk version in np.

Overall this looks to me as the least worst of all codes.

Do you know a clean np version of topk (would indeed make things cleaner)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one can do a top_k using np.argpartition and/or np.argsort, do you mean that there's no np.topk when you say there's no clean version?

Copy link
Contributor Author

@Narsil Narsil Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean np.argsort/np.argpartition is not as clean as torch.top_k.

top_k can be done in O(n + k log(k)) where sorting is O(n + k log(k)) with numpy apparently Same complexity.

https://stackoverflow.com/questions/6910641/how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array/23734295#23734295

ind = np.argpartition(a, -top_k)[-top_k:]
indices = ind[np.argsort(a[ind])]
values = a[indices]

is also pretty unreadable.

I don't think having a single if framework == "tf" in the decoding is that bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but if we want to enable pipelines for jax or re-use parts of it in optimum i think we should try to have an aligned way of doing things and not use for one pipeline framework-specific (pt, tf)processing and for another pipeline not

@Narsil Narsil requested a review from LysandreJik January 4, 2022 12:29
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then LGTM

Thanks for working on it @Narsil

@Narsil Narsil merged commit 5a06118 into huggingface:master Jan 6, 2022
@Narsil Narsil deleted the tf_image_classification branch January 6, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants