-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to pass raw byte data into the pipeline? #68
Comments
Hi @chanleii, Your question is very similar to #67. I think the same answer applies. As I mentioned in #67, I would be glad to merge a PR with this feature if you want to contribute :) |
Hi, I'm also working on a similar implementation. So far I have achieve receiving audio data from websocket and push it into pipeline. But I can't get the rest of diarization part to work. The pipeline I build: import diart.operators as dops
from diart.sinks import RTTMWriter
from diart.sources import AudioSource
from diart.pipelines import OnlineSpeakerDiarization, PipelineConfig
config = PipelineConfig()
pipeline = OnlineSpeakerDiarization(config)
rttm_writer = RTTMWriter(path="testing.rttm")
source = AudioSource("live_streaming", sample_rate)
observable = pipeline.from_audio_source(source)
observable.pipe(
dops.progress(f"Streaming {source.uri}", total=source.length, leave=True),
).subscribe(rttm_writer) The data transfer function: def bytes2nd(data: bytes):
nd = pcm2float(np.frombuffer(data, dtype=np.int8), dtype='float64')
nd = np.reshape(nd, (1, -1)
return nd
|
Hi @ckliao-nccu, I see you're instantiating There are 2 key things missing in your code:
In the current state of your code, calling |
Hi @juanmc2005 , Here are my full code: import os, io
import numpy as np
import soundfile as sf
from tornado import ioloop
from tornado.escape import json_decode
from tornado.web import Application, RequestHandler, url
from tornado.websocket import WebSocketHandler
import diart.operators as dops
from diart.sinks import RTTMWriter
from diart.sources import AudioSource
from diart.pipelines import OnlineSpeakerDiarization, PipelineConfig
from diart.blocks import SpeakerSegmentation, OverlapAwareSpeakerEmbedding
config = PipelineConfig()
pipeline = OnlineSpeakerDiarization(config)
rttm_writer = RTTMWriter(path="testing.rttm")
segmentation = SpeakerSegmentation.from_pyannote("pyannote/segmentation")
sample_rate = segmentation.model.get_sample_rate()
class WSHandler(WebSocketHandler):
def open(self):
print("WebSocket opened")
self.source = AudioSource("live_streaming", sample_rate)
self.observable = pipeline.from_audio_source(self.source).pipe(
dops.progress(f"Streaming {self.source.uri}", total=self.source.length, leave=True),
).subscribe(rttm_writer)
# Message received from websocket.
def on_message(self, message):
if message == "complete":
self.source.stream.on_completed()
else :
data, samplerate = sf.read(io.BytesIO(message),
format='RAW',
samplerate=sample_rate,
channels=1,
subtype='FLOAT'
)
data = np.asarray(data)
data = np.reshape(data, (1, -1))
# Emit chunks
self.source.stream.on_next(data)
def on_close(self):
print("WebSocket closed")
class MainHandler(RequestHandler):
def get(self):
self.render("index.html")
def main():
port = os.environ.get("PORT", 8888)
app = Application(
[
url(r"/", MainHandler),
(r"/ws", WSHandler),
]
)
print("Starting server at port: %s" % port)
app.listen(int(port))
ioloop.IOLoop.current().start()
if __name__ == "__main__":
main() |
@ckliao-nccu ok it looks like your implementation should be working.
|
|
Ok so the progress bar is showing and it's being updated (if I understand correctly). self.observable = pipeline.from_audio_source(self.source).pipe(
dops.progress(f"Streaming {self.source.uri}", total=self.source.length, leave=True),
ops.starmap(lambda annotation, chunk: ann),
ops.do_action(utils.visualize_annotation(config.duration)),
).subscribe(rttm_writer) and this: self.observable = pipeline.from_audio_source(self.source).pipe(
dops.progress(f"Streaming {self.source.uri}", total=self.source.length, leave=True),
ops.starmap(lambda annotation, chunk: chunk),
ops.do_action(utils.visualize_feature(config.duration)),
).subscribe(rttm_writer)
|
@ckliao-nccu would it help your use case if diart provided a |
Of course it will help!! I have already try writing a |
I just realized there was a typo in the code snippet I sent, it should be "annotation" and not "ann".
I'm thinking of adding this to the next release, would you mind opening a PR with your implementation? Even if it's incomplete that would save me quite some time, then we can mark it as a draft and I can modify the branch or make my own. |
I noticed that and fixed it. And I found out there is somtthing wrong with my mic last time I tried. After I fixed it,
I tried to stich the chunks as a file with the code below. And the output class WSHandler(WebSocketHandler):
def open(self):
self.f = open('test.wav', 'ab')
def on_message(self, message):
self.f.write(message)
def on_close(self):
self.f.close() My new question is that is there any requirement that data chunks need to meet for diart to process? |
Yes the errors are normal. It's because we've removed the chunk or annotation with the starmap operator and
Yes, you should make sure that the data is a numpy array with shape chunk = np.frombuffer(message, dtype="float").reshape(1, -1) |
@ckliao-nccu I just added an experimental Would you mind experimenting with it and telling me if you find any problems? I ran some tests locally with a very simple client and it seems to work as expected. message = base64.b64encode(chunk.astype(np.float32).tobytes()).decode("utf-8") |
@juanmc2005 yes it work like a charm! |
Happy to hear it works! Let me know if you run into any troubles with it. I'll make sure to include it in the next release.
Would you mind changing the uri and opening a pull request? |
I'm currently working on a implementation where I occasionally receive raw byte data from a TCP socket. I want to pass this data into the pipeline but the current
AudioSource
:s seem to be limited to microphone input and audio files. Does the current version support what I'm trying to implement or do I have to write it myself?The text was updated successfully, but these errors were encountered: