-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio in wrong WAV file - Pi 4 #731
Comments
Thanks @aaknitt - If I got things in there correctly, this should be the diff showing what was added in the commit that broke things: 7036348...669ca00 Just to check - Is the audio ending up in only one of the Wav files or is there some duplication where the same audio is 2 different files? Are you able to share the logs around when you are having the issue? They may help show what is happening. I can add in some additional debug messages if needed. With the latest change it should be writing everything that is in the Recorder out to a wav... I wonder if it has something to do with the Pi. Would you be able to run one of the versions that is not working on a more powerful computer? |
@robotastic as best I can tell, the last commit without the issue is 71564fa and the first one with the issue is 669ca00. A log file is attached that shows a recording of TGID 58909. Some of the highlights are pasted below. The TGID is granted a channel at 20:36:00 and the last grant update message is received at 20:36:34.5. 5 seconds later, the call is removed due to inactivity. So the call is active for 39 seconds, which is reflected in the JSON file (also attached). The actual length of audio is about 32 seconds, but only 24.2 seconds make it into the WAV associated with this call. The remaining 7.6 seconds end up in the WAV file for the next call on that TGID, which doesn't occur until 21:15. In this setup I'm only recoding that single TGID. If my memory is correct, if I'm set up to record more than one TGID, the "missing" audio would end up in the WAV file for the next call that uses the same recorder (vs. same frequency). I don't think there's duplication of audio going on...it seems like the end of one call gets put in the beginning of the WAV of the next call, and this continues and it gets worse and worse over time. I'll try the Docker image for 4.4.1 on an Ubuntu machine when I get a chance.
|
Thanks for sharing - I will go look through the log files. In the snippet you shared, it looks like it is not getting the End Of Transmission message on the voice channel.... or more likely, the voice channel is really far behind the control channel. So when the Control Channel stops sending Updates for that call, the voice channel is only half way through processing the buffer for that call. The Pi 4 should be fast enough to handle one channel, easily. I wonder if you have a very slow memory card if that could be holding things back? One thing to try is using a RAM FS instead of the SD for the Capture dir and see if that changes things. |
I was thinking along the same lines, but prior versions work well on the same Pi with the same SD card, etc. |
@robotastic I may be totally barking up the wrong tree here since I'm not well-versed in understanding the diffs and commit history, but it looks like perhaps the change from nonstop_wavfile_sink_impl.cc to transmission_sink.cc could be related? In transmission_sink.cc it only calls wav_write_sample() when state == RECORDING but that wasn't the case in nonstop_wavfile_sink_impl.cc. nonstop_wavfile_sink_impl.cc may even hold it in the RECORDING state until everything gets written? I'm not very confident I'm interpreting any of this correctly, but it seems like this may be a logical place for things to go wrong. |
If you're comfortable modifying the code, changing this line from 0 to 10, rebuilding, and capturing the additional logging while only recording that one TG, would probably be super helpful.
|
Will do |
I created a debug branch and that turns on some additional debug statements. I have been getting some reports on my DC system that some calls are getting cut off.... and the audio is just getting lost. Give this branch a try: |
Here's a log from the debug-call-mismatch branch. Search for 58909, which is the only TGID being recorded. |
If you're able to rebuild it with the verbosity cranked up, I have a hunch. |
verbosity was set to 10 when I built. |
Can you double check? I know he set it back to 0 in his last commit on the debug branch. With it at 10, you should see much more detailed logs similar to this...
|
That looks right. Only think I can think of is that WinSCP (or whatever tool you're using) isn't working with Git? Can you SSH directly in and verify?
|
@aaknitt That's weird! That should be displaying a ton of debug messages. @robotastic Just thought I'd jot this theory down while it's in my head. I don't have a good way to know if this is really the issue or not...
|
@tadscottsmith I'm not sure I totally follow all of that, but would that theory align with the first WAV file after startup being too short, with some if its audio making its way into the start of the next file? It seems like samples from the transmission aren't making their way into the WAV file before it gets closed out. |
My theory is that things are writing to the WAV file correctly, but the WAV files aren't always being closed correctly, so that multiple transmissions from the same recorder might be getting dumped into the same WAV file. |
Interesting! I could see something like that happening on a simulcast channel esp. Maybe the TDUs do not get inserted correctly. Let me go walk through that while I check out the Debug Logs.
… On Nov 2, 2022, at 9:43 PM, tadscottsmith ***@***.***> wrote:
My theory is that things are writing to the WAV file correctly, but the WAV files aren't always being closed correctly, so that multiple transmissions from the same recorder might be getting dumped into the same WAV file.
—
Reply to this email directly, view it on GitHub <#731 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA3TXD5EXN3CHVEFPKTQKDWGMKCTANCNFSM6AAAAAARORC7HM>.
You are receiving this because you were mentioned.
|
Checkout out the logs - it looks like there was only one TDU detected in the voice channel. The rest of the calls ended because they had timed out - which means that they were still recording but an update message hadn't been heard in the past 5 seconds. Let me go look at that code path and see if there is something weird there. It might be a less used code path. |
I'd add a check similar to this. I know there are instances on my system where transmissions aren't ended correctly because there's never a TDU string that doesn't also contain voice fames. |
I added in a second talkgroup to troubleshoot further. Some logs and recordings are attached. It seems like things work pretty well for short transmissions. It's the long transmissions (which is what is always occurring on the paging talkgroup) that seem to cause issues. However, even on with the short transmissions there's some funny business going on. For example, Recorder 2 is used for the first transmission on TGID 58910 after startup. That transmission eventually times out. While it's in the process of timing out, Recorder 3 is spooled up for additional traffic in TGID 58910. The next time that Recorder 2 is used, a TDU is detected immediately after the recorder is started. I think it's likely that this is actually the TDU from the end of the previous transmission that got stuck somehow? The part that puzzles me is why Recorder 2 stops getting or processing data prematurely for the original call. It eventually times out, but its TDU seems to show up the next time that recorder is used. I haven't really pieced all of this together (long recordings and misplaced TDUs) into a coherent theory as to what's actually going on yet, but I'm wondering if they're related as both seem to be examples of recorders ending processing of data prematurely, and that data ending up being processed the next time that recorder is used. |
TLDR; I made a small change and added some more logs. Prob won't fix things but may give some more clues.Just paging through logs and taking note of interesting things. From that initial instance, it does look like Rec #2 just stopped for some reason, like there should be audio and it is not recording it. I have some Transmission start with a TDU though, when Rec 3 starts, it has a TDU initially, so things may not be getting stuck in the buffer, it may just be part of the P25 process:
One weird thing is that the 1st call should have probably Concluded after this.... record_more_transmissions is set to false, I am going to go check the code:
OK - Checked the code... right now, I don't have a block of code checking for when a Call is Inactive (hasn't received a Call Update in > 1) and the Recorder has received a TDU. It looks like I have code to do that... no idea when I commented it out though! trunk-recorder/trunk-recorder/main.cc Lines 884 to 903 in fe2eecb
It looks like the Control Channel is already ahead of the Voice Channel here. The Chan Grant comes in before the TDU on the voice channel. Interestingly, this cause the Source ID to be missed initially, since it is in the Grant
|
For what it's worth, I finally got around to building and running on my Ubuntu machine today, and got the same results. Caveat is that while doing testing I was running two instances...one for my feed (running a much, much older build) while at the same time running a build of the debug-call-mismatch branch. So processor load was higher than it would be normally, but the old build continued to perform normally while the debug-call-mismatch build had the same results that I'm seeing on the Pi (if anything, it's worse). Really puzzling why no one else seems to be seeing the same thing. |
Here are logs from the latest build. This is just recording the single paging TGID. Let me know if it's more useful to collect data with just that single TGID or with another (non-paging) one also moving forward. Edit: I just saw you added another commit around mid-day. The recordings attached are from the morning commit, I'll rebuild and try again. |
yea - it is all sort of weird. On that one paging channel one clear issue is that there are no TDUs being received ... but that shouldn't matter too much, it should just revert back to the old way of doing things and time out after X number of seconds if it hasn't seen an Update message. It is good that it is also happening on the Ubuntu box. It at least means that is a general problem and not some weird Pi corner case... which might have explained why other people have not reported it. The only other thing I can thing to try is turning on the debug for the control channel. Change this line to 10:
And set this line back to 0:
This should give us everything that is happening on the Voice Channel and maybe give us a sense of why it is doing that. Maybe... I am going to try some experiments on my side too. |
ok it's getting a bit weirder. With the latest build, Ubuntu now seems to be behaving while the Pi is still having issues. Starting to make me doubt my sanity. Log files for each are attached with some overlapping time periods. The call at 9:42 p.m. on TG 58909 (the paging talkgroup) is captured by both. The Ubuntu instance is recording 58909 (paging) and 58910 (dispatch), while the Pi is only capturing 58909. |
Hi @aaknitt I wanted to check back and see how things were looking. Is there any chance that the signal could be not tuned correctly? Trunk Recorder should still handle it better, but that might explain the changing behavior. |
@robotastic sorry I've been out of town a bunch and haven't been doing any more digging on this lately. I should be able to get back into it if needed, but I've kind of lost track of what the current theory is and what additional data/logs are needed to help track it down. The fact that I can turn the problem on and off by using different build versions makes me think that it's not a hardware/tuning issue, but may I'm misunderstanding what you're getting at. Let me know what you need from me to keep the investigation moving. |
No problem - I sort of lost the bubble on it to. If you are able to give it one more try with the Good point about different versions handling it differently. It probably is not a tuning thing... unless the newer version has some problems where if a message gets dropped it starts misbehaving. (For future me - this looks like the correct link for comparing changes between the 2 versions: 669ca00...71564fa ) One thing I noticed from the last batch you uploaded, the big difference between the Pi version and the Ubuntu one is that the Pi is not getting any Termination flags on the Voice Channel. The Recorder on the Pi Channel is abruptly stopped. I wonder if there is some decoding problem? Or maybe it is falling behind a ton? Pi:
Ubuntu:
In the Pi Logs, there is only 1 Voice Termination Message:
But on the Ubuntu Logs, there are > 1300!! I have not idea what could be causing this.... |
Ah! - I must have messed up the search. Good find! So that is one obvious difference. The question is whether there was a change that stopped the TDUs from being generated/processed or if there was a change in the code that makes it fail when TDUs don't come in. |
I can't seem to get the hyperlink quite right, but there are a few instances of call->stop_call(); in main.cc that seemed to be commented out in the new version. I'm having a hard time stepping all the way through it, but any chance they were closing up the previous WAV file when new calls came in? |
Interesting - I had to go through the code to refresh my memory. It looks like call->stop_call(); doesn't change things much. It just sets the Call's State based upon whether the recorder has reach a termination flag yet or not. This gets handled anyhow in the manage_calls() loop, so it wasn't need anymore. |
Oh boy. I think I'm going to owe everyone some beer for all of the collective time spent on this. I spooled up another Pi from scratch and that one is working perfectly on both I swapped SDRs, power supplies, and SD cards between the two Pis to try to isolate the issue. The problem follows the SD card, nothing else. However, I suspect that it may not be an issue with the physical SD card but rather something with the OS/build configuration on that card. I say this because I did get the problem to occur on Ubuntu once (but was unable to reproduce it). Hard to say for sure though. Still puzzling to me is that I could turn the problem off and on on the troublesome Pi/SD card by switching between older and newer versions. I'd love to be able to home in on the root cause so that others can avoid it moving forward, so if anyone has any ideas on what might be different in the OS, Pi settings, etc. I'm open to doing more experimentation. However, since this doesn't seem to be widespread and doesn't duplicate/reproduce easily, I'm leaning towards closing the issue for now. Edit: Note that I had the same issue on the troublesome Pi/SD card when running via Docker. Not sure what that means for isolating the issue but it's pertinent, I think. |
I think compared to other resources, disk performance is one that hasn't historically been inspected very closely and it would be a good idea to try and get a handle on what performance TR needs before things start going sideways. It would be nice if we could configure a work/temp directory for active recordings; this would let us mount some small but quick storage/a ramdisk for the performance-sensitive task and then move it to slower storage once the call has been concluded. Just spitballing here of course. |
No worries at all @aaknitt - it was a real good exercise and always helpful for me to go back and check things. This is definitely really weird either way! Just to double, was the GNU Radio version different on any of the installs? you can check with That is a great point @sally-yachts ! It is probably most noticeable on SD cards, which I think can be pretty slow... but if you have a lot of recorder writing at once, that could also hit a disk io threshold. Having the documentation for using a Ramdisk is good idea. Especially if you are just uploading everything and not planning on storing it locally long term. |
I have the same issue, but it only seems to effect my system when I reduce the bandwidth on my sdr's. At 2.3 mhz or 2.56mhz I don't have the issue, I drop the bandwidth to 2mhz or below I start having this issue. I'm also running decently powerful system, a AMD Ryzen 5 PRO 5650GE Processor, and a m.2 pcie ssd, using docker on ubuntu 22. |
@jessm33 is on to something here. I think I've got my "problem" SD card behaving by messing with the sample rate. Currently working to tease out whether it's an issue with the sample rate itself or related to which frequencies in the system are covered by which SDR, how close a frequency is to edge of SDR BW, etc. @jessm33 can you post the details of your SDR config (center frequency & BW) for when it has the issue and when it doesn't, along with the frequencies of the system you're monitoring? |
@aaknitt I'm monitoring 2 systems: Fairfax County: https://www.radioreference.com/db/sid/6957 This issue is very noticeable on the Fairfax County system, as fire dispatches are announced by an automatic voice on 2 different talk groups, each dispatch is at least 30 seconds long, and are very easy to detect when they end up in the wrong talk groups recording. Active control channel on Fairfax County is 856.262500 MHz. Working centers & rates:
Not working centers & rates:
|
@jessm33 Could you run the debug branch with the not working rates? It'll be pretty verbose logging, but if you could capture the logs surrounding an instance of it happening, it might really help. https://github.com/robotastic/trunk-recorder/tree/debug-call-mismatch |
I'll post more details later tonight, but there seems to be some correlation with choosing a sample rate that results in getting this message on startup and having the issue being present in the newer versions:
On the other hand, if I choose a sampling rates that produces this message on startup, the issue does not seem to be present:
Sample rates that seem ok include 0.9 MHz, 1.4 MHz, 2.4 MHz. An example of a sample rate that's problematic for the newer versions but not the older versions is 2.212504 MHz. |
The two-stage decimator should kick in when a sample rate is an even multiple of 24000, 25000, or 32000. Do things seem ok if you try 2050000 or 2016000? |
@tadscottsmith here's debug log from around a time when this issue happened: at 17:18:29 calls start on tgs 1519 and 12 (the automated dispatch channels) call should be aprox 30 seconds. audio on recordings for both tg 1519 and 12 has the end cut off and audio from one those talk groups ends up at the beginning of a tg 1107 call. |
Yes, those rates are working correctly. |
These rates 1664000, 1792000, 2176000 are working for me without the error message @aaknitt mentioned or this audio in the wrong wav issue. |
Well that is pretty wild! I would totally believe there could be something up with sample rate and those filters... the weird part is that it works in one version and not the other. I just double checked using FileMerge on the Mac and there weren't any changes to code that touches the actual signal. @aaknitt Can you just double check that these are the version you are using for the one that works and the broken one?
@jessm33 Any chance you could try using this commit ( 71564fa ) and seeing if it works better on those rates/centers you were having trouble with? |
@robotastic so far that still seems to be the case, even though it doesn't make much sense to me. A couple more data points: 8e9e770 (June 5) which uses p25_recorder_impl.cc and is prior to the OP25 update (tested on Ubuntu) does have the issue |
@aaknitt Thanks for doing all this testing. And just to double, double check my understanding: With 669ca00, it no longer works with 2.212504 MHz? Would it be possible to use the Ubuntu machine and keep everything the same and just change between those 2 versions and see which sample rates work? The goal is to double check it is a code thing and not a HW thing. It could also be that some RTL-SDR dongle can handle being tuned to sample rates that are not easily divide. I think you can just do a
|
This is actually very easily reproducible on my PI using a sample rate (2400100) that can't be decimated and a single recorder. One thing I've noticed is that the transmission sink is really struggling to process the samples in 669ca00. They then just get left in the output_queue and wait for the next call on the same recorder. Here you can see in 71564fa that the transmission_sink easily clears the output_queue and there are plenty of additional TDUs to pass the termination tag and cause the transmission_sink to end the transmission.
And here you can see on 669ca00, that when the call is concluded, there are almost 80,000 samples left in the queue.
That being said, I haven't been able to find a smoking gun as to what change between the two versions is causing the issue. |
Still, digging, but I'm wondering if somewhere there's a STATE mismatch. The newer version only writes if the recorder is in a RECORDING state. trunk-recorder/lib/gr_blocks/transmission_sink.cc Lines 548 to 565 in 669ca00
|
oohhh.... good call! I will just go spin up a local version too so I can do some debugging too. I am not sure if it is a state mis-match. If it was just that we would only end up with some gaps in the files, it wouldn't get backed up. The function always return with the total number of items it was input, which essentially always clears the items it was given.... I think. The warning message should also fire because the number of written would be less than the number of samples that were input. trunk-recorder/lib/gr_blocks/transmission_sink.cc Lines 567 to 573 in 669ca00
|
@tadscottsmith That was one hell of a find! |
Update: Changing this in the latest master commit makes the issue go away: use: However, I'm sure there may have been a reason for making that change in the first place, so I'm not sure what else it might be breaking by changing back. |
WHOA!! Amazing find - let me go trace things backwards and see what where that change hits. Nice work and thank you so much for keeping at this! |
@robotastic the commit where that change hits is listed in my comment above with the screenshots |
Agreeing with everyone in the Discord chat...the changes made in the last 48 hours seem to have resolved the issue as best I can tell. Nice work @robotastic ! |
Raspberry Pi 4 is having issues with audio from the end of one call ending up in the start of the audio file of a subsequent call. Seems to get worse the longer trunk-recorder is running. Seems to be related to audio from a specific recorder getting stuck and not getting flushed to the current WAV file and ending up in the next WAV for that same recorder.
Issue does not appear to be present in release 4.3.2 but is in 4.4.0 and 4.4.1.
Mar 29 d616a86 commit seems ok
Apr 3 1697f7e commit seems ok
Apr 5 7036348 commit seems ok
Apr 7 71564fa commit seems ok
Apr 7 669ca00 commit has the issue
Apr 7 b2786c9 commit has the issue
Apr 9 dd3e6bf commit has the issue
Normal/correct transmission on paging TGID:
Transmission from another TGID on the same frequency ending up at the end of the same WAV file (this is sometimes present in 4.3.2 and is not the issue in question):
Audio from prior transmission of the same recorder at the beginning of WAV file (this is the issue in question):
The text was updated successfully, but these errors were encountered: