-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New unified video IO #61
Comments
Hi there! Yees! I have been thinking about IO a lot and wanted to change it since the very inception of this project. Also, note that some of the features rely on cv2 while others on torchvision.something.read_video. We need to unify it, and, hopefully, make it faster by picking another IO tool. Also, we need to merge the frame-wise base extractor and flow base extractor after the io algorithm is unified.
I thought so and it is quite plain and easy to understand. I do admit that we could do a better job there. However, first, I need to figure out how FFmpeg converts 30 fps to 25, for instance, or some other non-trivial resampling, and, second, we need to reimplement the "skipping/duplicating" algorithm here. It seems like a rather overkill at this stage but I think you have a more clear view of how to do it. Could you share it? I would love to read it. On a side note, I dislike having cv2 as a dependency and the fact that it is quite slow but I like that I could read a video frame by frame and extract features from a batch. Note, if you will try to load the whole video with a reader and try to process it, we will quickly run out RAM. Recently, I’ve been trying many video IO scripts:
|
I will rename the issue to make it more precise if I understand your suggestion correctly. |
Thank you for make the title more clear! Regarding changing the FPS, I need to find more information about it. My initial idea was to first calculate the number of frames needed for the new fps, then use |
torchvision.io.VideoReader looks good! But it is really sad that we must use a high version of torch. I think it's very good if We could try to find the best IO script as a start. |
By the way, ChangingFrameRate may be helpful. |
I did a benchmark on It seems
|
Oh, thanks a lot! A few things I would like to clarify for myself:
|
For the first point, yes, I think we need to check if the reader loads the whole video into RAM, and they do not. I don't really understand the difference between the generator and iterator you mentioned, but during the file processing, these readers probably only read some basic information about the video when they are initialized, after which they move the pointer according to the user input (I'm not sure of the details either). For the second and the third points, I think we'd better keep OpenCV until I also tested the gpu version of We may just need to change the existing IO API, but I still have a question, should we re-encode the video and use all frames after re-encoding? Or should we just read the video and extract the frames we need. Perhaps you would have more experience in this area? Do you know how other researchers have handled this? |
Iterators keep information in RAM and you just iterate through values in some way, e.g. [1, 2, 3] is an iterator. Generator generates new value. It means it does not need to store the whole list/video in RAM. I think the famous example of this differentiation is the
Usually, researchers just pre-encode the whole folder to the same fps. For instance, scrap YouTube, then re-encode with ffmpeg to 25 fps. I think, the reencode_wtih_another_fps does this but it just does it every time a person processes a video. It is negligible if the user extracts the features once but can be problematic if they need 10 different features as this will be run 10 times. Well, I think it is ok still, the process is very fast and not the bottleneck, so I don't share your concerns, really. Yet, I admit that it is ugly and I would want to have something else. Maybe we could replace it with something more pythonic like ffmpeg-python. It is the same but you don't need to write the command in a ffmpeg CLI kind of way. Take a look at ffmpeg-python, do you think it would be nice to replace the current script with it (I could do it, don't worry; the docs for this library are quite horrible by the way)? |
Thanks for your reply, and I agree with you that using |
Ok, I decided to double-check it because I think it wasn't my impression the last time I checked. Thanks for providing the colab MWE, I could quickly start experimenting. Here is my 2nd honest attempt to use By the way, did you compile it for GPU use or installed it in some other way? |
Sorry for the late reply, but I'm very busy these days as I'm at the beginning of the semester. Your analysis of the !sudo apt-get update
!sudo apt-get install -y build-essential python3-dev python3-setuptools make cmake
!sudo apt-get install -y libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev
!git clone --recursive https://github.com/dmlc/decord
%cd decord
!mkdir build
%cd build
!cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release
!make
%cd /content/decord/python
!python setup.py install --user
%cd /content I tried the GPU version
I also tried the CPU version with your code. The results are slightly different from yours in the notebook:
The GPU version uses less memory and is significantly faster (over 7 times faster). |
Your recent changes are great and have significantly reduced code redundancy. The code is also easier to read now. Here I have another idea that might make it a little better.
The process of extracting video frames is now defined in
base_flow_extractor.py
andbase_framewise_extractor.py
, which useffmpeg
to re-encode the video andopencv
to read it. But users may need a new feature of extracting a fixed number of frames for a video, which is harder to implement with the current framework. And, users may want to be able to extract features faster, so could you consider using a more efficient API to read videos (e.g., decord). Moreover, specifying fps to extract features requires re-encoding the video and generating temporary files. Is this necessary? Can we meet the fps by extracting frames at intervals rather than re-encoding it?The ideal API may looks like:
In summary, this new feature can further reduce code redundancy, improve speed and provide new extracting methods.
I would love to make this improvement, but I've been busy lately and probably won't be able to do it until September.
The text was updated successfully, but these errors were encountered: