Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ffmpeg integration #1986

Closed
wants to merge 1 commit into from
Closed

Add ffmpeg integration #1986

wants to merge 1 commit into from

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Nov 5, 2021

This PR adds prototype streaming API based on ffmpeg.
Currently, the Python surface is minimum implementation.
It needs refinement.

  • Handles audio

    • Can change the sampling rate on the fly
    • Can change the sample format on the fly.
      • (Implemented and tested) uint8, int16, int32, and float32
      • (Implemented but not tested) int64 and float64.
    • Can change the resampling algorithm Covered via custom filter
  • Handles video

    • Can change the image resolution on the fly.
      • Can change the rescaling algorithm. Covered via custom filter
    • Can change the frame rate on the fly.
      • Can change the algorithm for changing the frame rate. Covered via custom filter
  • Handles multiple input streams.

  • Handles multiple output streams.

    • Can duplicate input streams for different output configurations.
  • avdevice integration

  • Seek mechanism

  • Figure out the way to build / package.

    • Split the binary libtorchaudio and libtorchaudio_ffmpeg.
    • If binding dynamically, figure out the best way to detect installed ffmpeg across platforms. (pkg-config seems straightforward.)
  • Add API similar to existing torchaudio.load function for simple use cases.

    • Add the function definition
    • Add the parameters

@mthrok mthrok closed this Nov 6, 2021
@mthrok mthrok reopened this Nov 26, 2021
@mthrok mthrok changed the title Bind ffmpeg Add Streaming API Nov 26, 2021
@mthrok mthrok force-pushed the ffmpeg branch 9 times, most recently from 47c3fe2 to 7ba62b2 Compare November 28, 2021 18:18
@mthrok mthrok mentioned this pull request Nov 28, 2021
@mthrok mthrok mentioned this pull request Nov 28, 2021
@mthrok mthrok mentioned this pull request Nov 28, 2021
@mthrok mthrok mentioned this pull request Nov 28, 2021
@mthrok mthrok force-pushed the ffmpeg branch 2 times, most recently from cf64494 to 5f8095f Compare November 28, 2021 21:07
@mthrok mthrok force-pushed the ffmpeg branch 5 times, most recently from cc306c5 to 7a66c5e Compare January 27, 2022 01:33
@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mthrok mthrok force-pushed the ffmpeg branch 2 times, most recently from 738764e to 6d8e701 Compare January 27, 2022 21:21
@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This PR adds prototype streaming API based on ffmpeg.
Currently, the Python surface is minimum implementation.
It needs refinement.

- [x] Handles audio
   - [x] Can change the sampling rate on the fly
   - [x] Can change the sample format on the fly.
       - [x] (Implemented and tested) `uint8`, `int16`, `int32`, and `float32`
       - [ ] (Implemented but not tested) `int64` and `float64`.
   - [ ] Can change the resampling algorithm
- [x] Handles video
   - [x] Can change the image resolution on the fly.
       - [ ] ~Can change the rescaling algorithm.~ Covered via custom filter
   - [x] Can change the frame rate on the fly.
       - [ ] ~Can change the algorithm for changing the frame rate.~ Covered via custom filter
- [x] Handles multiple input streams.
- [x] Handles multiple output streams.
   - [x] Can duplicate input streams for different output configurations.
- [x] avdevice integration
- [x] Seek mechanism

- [ ] Figure out the way to build / package.
  - [x] Split the binary `libtorchaudio` and `libtorchaudio_ffmpeg`.
  - [x] If binding dynamically, figure out the best way to detect installed ffmpeg across platforms. (`pkg-config` seems straightforward.)
  - [ ] If binding statically, figure out which dependencies to include.
     - [ ] Building and shipping all the dependencies of ffmpeg (such as `gnutls`, `x264`, `x265` etc requires a lot of efforts)
- [ ] Add API similar to existing `torchaudio.load` function for simple use cases.
  - [x] Add the function definition
  - [ ] Add the parameters

Statically binding ffmpeg increases the binary size from 3.6M to 23M.

```
-rwxr-xr-x  1 moto staff 3.6M Nov  5 00:43 torchaudio/lib/libtorchaudio.so
-rw-r--r--  1 moto staff  23M Nov  5 04:23 torchaudio/lib/libtorchaudio.so
```
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
…#2041)

Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add wrapper classes that auto release memories allocated by ffmpeg libraries.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
- [x] Needs to be imported after updating TARGETS file.

Pull Request resolved: pytorch#2041

Reviewed By: carolineechen

Differential Revision: D32688964

Pulled By: mthrok

fbshipit-source-id: 165bef5b292dbedae4e9599d53fb2a3f06978db8
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add `Decoder` class that manages `AVCodecContext` resource and process input `AVPacket`.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2041.

Pull Request resolved: pytorch#2042

Reviewed By: carolineechen

Differential Revision: D32933294

Pulled By: mthrok

fbshipit-source-id: e443debadb44d491462fb641cd5b7b20c413b5b9
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add FilterGraph class that is responsible for handling AVFilterGraph structure and the application of filters.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2042.

Pull Request resolved: pytorch#2043

Reviewed By: carolineechen

Differential Revision: D32940535

Pulled By: mthrok

fbshipit-source-id: 231e3ad17df2d67b6c7b323e5c89e718a3d48d0d
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add Buffer class that is responsible for converting `AVFrame` to `Tensor`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2043.

Pull Request resolved: pytorch#2044

Reviewed By: carolineechen

Differential Revision: D32940553

Pulled By: mthrok

fbshipit-source-id: 8b8b2222ad7b47edc17e9139420e8a71c00d726a
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Add Sink class that bundles FilterGraph and Buffer. Part of pytorch#1986. Splitting the PR for easier review.

For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.

Pull Request resolved: pytorch#2111

Reviewed By: carolineechen

Differential Revision: D33350388

Pulled By: mthrok

fbshipit-source-id: 8f42c5fe4be39ef2432c51fc0d0ac72ba3f06a26
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2044.

Pull Request resolved: pytorch#2045

Reviewed By: carolineechen

Differential Revision: D33299858

Pulled By: mthrok

fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add `Streamer` class that bundles `StreamProcessor` and handle input.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2045.

Pull Request resolved: pytorch#2046

Reviewed By: carolineechen

Differential Revision: D33299863

Pulled By: mthrok

fbshipit-source-id: 6470cbe061057c8cb970ce7bb5692be04efb5fe9
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add `Streamer` TorchBind.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2046.

Pull Request resolved: pytorch#2047

Reviewed By: hwangjeff

Differential Revision: D33355190

Pulled By: mthrok

fbshipit-source-id: a3ad4c2822ed3a7ddc19b1aaca9dddabd59ce2f8
@mthrok mthrok closed this Jun 4, 2022
@mthrok mthrok deleted the ffmpeg branch June 4, 2022 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants