Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Buffer class #2044

Closed
wants to merge 1 commit into from
Closed

Add Buffer class #2044

wants to merge 1 commit into from

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Nov 28, 2021

Part of #1986. Splitting the PR for easier review.

Add Buffer class that is responsible for converting AVFrame to Tensor.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after #2043.

@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines +10 to +11
std::deque<torch::Tensor> chunks;
AVMediaType media_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add reference to the variables?

Suggested change
std::deque<torch::Tensor> chunks;
AVMediaType media_type;
std::deque<torch::Tensor>& chunks;
AVMediaType& media_type;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an enum, so it's cheap and safer to copy.

Comment on lines +151 to +154
case AVMEDIA_TYPE_AUDIO:
push_audio_frame(frame);
break;
case AVMEDIA_TYPE_VIDEO:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are AVMEDIA_TYPE_AUDIO and AVMEDIA_TYPE_VIDEO defined? Or it's taken care by ffmpeg?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch::Tensor convert_audio_tensor(AVFrame* pFrame) {
// ref: https://ffmpeg.org/doxygen/4.1/filter__audio_8c_source.html#l00215
AVSampleFormat format = static_cast<AVSampleFormat>(pFrame->format);
int num_channels = pFrame->channels;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like num_channels can be defined when initializing the Buffer object? As each Buffer object will receive signal from one Streamer, which fixed the num_channels after initialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, but unfortunately the shape information is not necessarily available at the time Buffer object is initialized, due to a filter graph operation inserted before the buffer, which can change number of audio channels, image size, image channels and rate. The structure passed around does not have the explicit information of these.

Let me figure this out later. along with the buffer size concern you brought up.

} // namespace

void Buffer::push_audio_frame(AVFrame* pFrame) {
chunks.push_back(convert_audio_tensor(pFrame));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check the size of chunks so that if it exceeds the buffer size, we might want to pop out the last chunk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me follow-up this one. I am still figuring out a safe way to process chunks of different rates at the same time. (sample rate / frame rate)

@facebook-github-bot
Copy link
Contributor

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mthrok mthrok deleted the ffmpeg-buffer branch December 24, 2021 01:26
facebook-github-bot pushed a commit that referenced this pull request Dec 29, 2021
Summary:
Part of #1986. Splitting the PR for easier review.

Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after #2044.

Pull Request resolved: #2045

Reviewed By: carolineechen

Differential Revision: D33299858

Pulled By: mthrok

fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed
@mthrok mthrok mentioned this pull request Jan 5, 2022
30 tasks
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add Buffer class that is responsible for converting `AVFrame` to `Tensor`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2043.

Pull Request resolved: pytorch#2044

Reviewed By: carolineechen

Differential Revision: D32940553

Pulled By: mthrok

fbshipit-source-id: 8b8b2222ad7b47edc17e9139420e8a71c00d726a
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#1986. Splitting the PR for easier review.

Add StreamProcessor class that bundles `Buffer`, `FilterGraph` and `Decoder`.
Note: The API to retrieve the buffered Tensors is tentative.
For the overall architecture, see https://github.com/mthrok/audio/blob/ffmpeg/torchaudio/csrc/ffmpeg/README.md.

Note: Without a change to build process, the code added here won't be compiled. The build process will be updated later.
Needs to be imported after pytorch#2044.

Pull Request resolved: pytorch#2045

Reviewed By: carolineechen

Differential Revision: D33299858

Pulled By: mthrok

fbshipit-source-id: d85bececed475f45622743f137dd59cb1390ceed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants