Unble to create a Diart source with a PyAV.AudioFrame as stream.on_next input #168

zucher · 2023-08-10T15:28:56Z

zucher
Aug 10, 2023

Hi,
I'm currently creating a service allowing to retrieve the AudioFrame generate by the aiortc webrtc stack, in order to have a live diarization.

I converting the AudioFrame into mono, and extracting the associated ndarray thanks to:

audio_frame.to_ndarray()

the resulting shape is (1,960)

The resulting error is:
Waveform must have shape (1, samples) but (1, 1, 960) was found

when I squeeze it:

audio_frame.to_ndarray().squeeze()

I have the following error:
Temporal features must be 2D or 3D

So I don't know how to pass over this error, someone already try to do some similar stuff ?

Thanks

juanmc2005 · 2023-08-11T16:15:02Z

juanmc2005
Aug 11, 2023
Maintainer

Hi @zucher, could you provide the full stacktrace so I can know where the error is coming from exactly?

4 replies

zucher Aug 11, 2023
Author

Hi @juanmc2005 ,

with:

for simple to_ndarray():

for squeeze():

zucher Aug 29, 2023
Author

@juanmc2005, Do you need additional informations ?

juanmc2005 Sep 4, 2023
Maintainer

@zucher I am unable to reproduce the error. I'd need to know exactly how you're using diart in this context.

Could you provide a minimal reproducible example that I can run to see the problem?

zucher Sep 4, 2023
Author

Ok I have to do it soon

sorgfresser · 2023-11-12T22:12:29Z

sorgfresser
Nov 12, 2023

Pretty sure that the squeeze() will remove both dimensions with 1.
It might be useful to verify that you do not end up with a shape of (960) which is indeed not 2D. Maybe try squeeze(axis=0) and see if that fixes it.

0 replies

juanmc2005 · 2023-11-13T13:16:19Z

juanmc2005
Nov 13, 2023
Maintainer

@zucher I just realized that the dimensions must be inversed in your example.

From SpeakerSegmentation docstrings:

        Parameters
        ----------
        waveform: TemporalFeatures, shape (samples, channels) or (batch, samples, channels)

So you should simply provide an audio_frame with shape (960, 1) instead of (1, 960).
I'm planning on creating a documentation page soon so this would be easier to debug and work with in the future.

0 replies

zucher · 2023-11-14T08:52:00Z

zucher
Nov 14, 2023
Author

Hi @juanmc2005 , thank you for your reply, unfortunatly after swap , I have also an issue:

with

        audio_frame = av.audio.resampler.AudioResampler(layout="mono").resample(audio_frame)[0]
        self.stream.on_next(audio_frame.to_ndarray().swapaxes(0,1))

My code:

Aiortc audio track interception

import asyncio
from aiortc import MediaStreamTrack
from aiortc.mediastreams import MediaStreamError
from pathlib import Path
import logging.handlers
import logging
import pydub

from diart.blocks.diarization import OnlineSpeakerDiarization
from diart.inference import RealTimeInference
from diart.sinks import RTTMWriter

from .WebRTCAudioSource import WebRTCAudioSource

HERE = Path(__file__).parent

logger = logging.getLogger(__name__)

class AudioDiarization(MediaStreamTrack):
    """
    A audio stream track that only listen.
    """

    kind = "audio"

    def __init__(self, track, channel, transform, event_emitter):
        super().__init__()  # don't forget this!
        self.track = track
        self.transform = transform
        self.ee = event_emitter
        self.channel = channel

        self.diart_init()
        
        self.sound_chunk = pydub.AudioSegment.empty()
        self.silent_count = 0

    @staticmethod
    def create_transformer(track, channel, transform, event_emitter):
        return AudioDiarization(track, channel, transform, event_emitter)
    
    def diart_init( self):
        pipeline = OnlineSpeakerDiarization()
        self.source = WebRTCAudioSource("", 48000)
        inference = RealTimeInference(pipeline, self.source)
        inference.attach_hooks(lambda ann_wav:
                                logger.info(ann_wav[0].to_rttm())
                                )
        prediction = inference()

    async def recv(self):       
        try:
           if self.track.readyState != "live":
                raise MediaStreamError
            
           audio_frame = await self.track.recv()

           self.source.push_audio_frame(audio_frame)

           return audio_frame

        except Exception as e:
            if self.track.readyState == 'ended':
                raise e
            else:
                logger.error(e)

WebRTCAudioSource for diart integration

from diart.sources import AudioSource

from typing import Text, Optional, AnyStr, Dict, Any, Union, Tuple
from rx.subject import Subject
from av.frame import Frame
import av
import numpy as np
import pydub

class WebRTCAudioSource(AudioSource):
    def __init__(self, uri: Text, sample_rate: int):
        super().__init__(uri, sample_rate)
        self.stream = Subject()

    @property
    def duration(self) -> Optional[float]:
        """The duration of the stream if known. Defaults to None (unknown duration)."""
        return None

    def read(self):
        """Start reading the source and yielding samples through the stream."""
        pass

    def close(self):
        """Stop reading the source and close all open streams."""
        pass

    def push_audio_frame(self, audio_frame: Frame):

        audio_frame = av.audio.resampler.AudioResampler(layout="mono").resample(audio_frame)[0]
        self.stream.on_next(audio_frame.to_ndarray().swapaxes(0,1))

The last line is failing

1 reply

juanmc2005 Nov 14, 2023
Maintainer

Ok I just verified and audio sources actually push audio with shape (1, samples).
Can you try to push chunks of the same size, and make that size equal to the step of the pipeline?

If that doesn't work, can you try updating to diart v0.8?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unble to create a Diart source with a PyAV.AudioFrame as stream.on_next input #168

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Unble to create a Diart source with a PyAV.AudioFrame as stream.on_next input #168

zucher Aug 10, 2023

Replies: 4 comments · 5 replies

juanmc2005 Aug 11, 2023 Maintainer

zucher Aug 11, 2023 Author

zucher Aug 29, 2023 Author

juanmc2005 Sep 4, 2023 Maintainer

zucher Sep 4, 2023 Author

sorgfresser Nov 12, 2023

juanmc2005 Nov 13, 2023 Maintainer

zucher Nov 14, 2023 Author

juanmc2005 Nov 14, 2023 Maintainer

zucher
Aug 10, 2023

Replies: 4 comments 5 replies

juanmc2005
Aug 11, 2023
Maintainer

zucher Aug 11, 2023
Author

zucher Aug 29, 2023
Author

juanmc2005 Sep 4, 2023
Maintainer

zucher Sep 4, 2023
Author

sorgfresser
Nov 12, 2023

juanmc2005
Nov 13, 2023
Maintainer

zucher
Nov 14, 2023
Author

juanmc2005 Nov 14, 2023
Maintainer