Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added video processing section (Unit 7 - Transformers based models) #351

Merged

Conversation

mreraser
Copy link
Contributor

@mreraser mreraser commented Oct 3, 2024

Co-authored-by: seoulsky-field
seoulsky.field02@gmail.com

What does this PR do?

Added Transformers based models at video processing section. This document provides an overview of how Transformer models are applied in video processing, focusing on the Vision Transformer (ViT) and its video-specific variant, the Video Vision Transformer (ViViT), and TimeSFormer model.

Thank you in advance for your review.

Part of Proposed Outline Revision for Unit 7. Video & Video Processing / dicussions #348

Who can review?

@jungnerd @cjfghk5697 @1kmmk1 and anyone who wants to review!

Who can review (Final)

Copy link
Contributor

@jungnerd jungnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed minor typos and suggested the name of the anchor.
Other than that, everything looks good to me 👍🏻

mreraser and others added 2 commits October 8, 2024 15:16
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
mreraser and others added 3 commits October 8, 2024 15:54
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @mreraser !
I think the main base for most my comments is that we need to keep in mind that the course should also be good to read for beginners. I think sometimes you assume a bit too much prior knowledge, adding some more background info here and there would already be really great.

But apart from that it is a great piece of education, I already learned quite a few things just from reading through it once. Thank you so much for the effort 🤗

Comment on lines 20 to 23
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/transformer_based_video_model/unit7_1_vit_architecture.png" alt="Vision transformer architecture"></img>
</div>
<small>ViT architecture. Taken from the <a href= "https://arxiv.org/abs/2010.11929"> original paper</a>.</small>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how these blocks will look in HF markdown (once again), as currently the preview is missing because of the persisting token error. So I will just assume it is alright for now, but once we can see how it really looks in the docs, you might need to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand, and I’ll make adjustments if any issues arise in the future.

<small>ViViT architecture. Taken from the <a href = "https://arxiv.org/abs/2103.15691">original paper</a>.</small>

### Embedding video clips[[embedding-video-clips]]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some words here about what embeddings are and why it is important (just a short info for beginners). And also say why you will explain Uniform Frame Sampling and Tubelet Embeddings. Right now I feel like this part is missing some context.

mreraser and others added 2 commits October 24, 2024 15:31
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
@mreraser
Copy link
Contributor Author

@johko Thank you very much for your review! 😄 I will carefully read through the details you provided and make the necessary revisions accordingly.

mreraser and others added 10 commits October 24, 2024 15:40
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
Co-authored-by: Johannes Kolbe <2843485+johko@users.noreply.github.com>
@mreraser
Copy link
Contributor Author

Hello @johko! 😃

I have carefully reviewed your feedback and addressed the points you mentioned as follows:

  1. Removed all anchor points
  2. Fixed minor typos
  3. Added an explanation of embeddings and their importance
  4. Clarified the reason for discussing Uniform Frame Sampling and Tubelet Embeddings
  5. Provided a definition of spatio-temporal tokens
  6. Explained the term "contextualize"
  7. Defined n_w, n_h, and n_t earlier in the text

Thank you for your guidance, and please let me know if there’s anything else you’d like me to improve!

Copy link
Owner

@johko johko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes 🙂
LGTM 👍

@mreraser
Copy link
Contributor Author

mreraser commented Nov 13, 2024

Thank you for the changes 🙂 LGTM 👍

Thank you @johko 👍 I also resolved some toctree conflits. Have a good one!

Copy link
Collaborator

@ATaylorAerospace ATaylorAerospace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great additions..LGTM!

@ATaylorAerospace ATaylorAerospace merged commit 206b0be into johko:stage Nov 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants