-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter data sent to gsteamer using magic bytes #50
Filter data sent to gsteamer using magic bytes #50
Conversation
Would it be feasible to stream the "filtered out" audio to an RTSP audio stream? |
Yes that is my eventual plan. First step filter out from the video. Second step put it into own stream. I'm working on it, no guarantee though. |
Nice sleuthing. If this is the case (and you are probably completely right) then the BC protocol has gone from "obnoxious" to "exceedingly terrible." Why on earth wouldn't you make this two separate message types? Would you mind explaining each of the magic values in a comment? Even just something like the following would be helpful: /// Video I-Frame: signals start of H264/H265 I-Frame etc. I'm also curious about how you learned these. In particular the "binary packet with no header" is really annoying from an implementation standpoint, since we don't actually know what kind it is except "probably it goes with the last data." This "in band signaling" is problematic for correctness in case a continuation packet happens to have the same magic bytes as a new packet. Is there any indication of how much data goes with each magic value - kind of a "sub packet"? That way the parser could keep track of how much more data goes with the previous video packet. It's okay if you don't know the answers - that's part of reverse engineering :) Regarding the implementation, it might be better if the byte parsing was moved to the pub enum BinaryData {
VideoData(Vec<u8>),
AudioData(Vec<u8>),
// do we need Unknown(Vec<u8>) here?
}
pub struct ModernMsg {
pub xml: Option<BcXml>,
pub binary: Option<BinaryData>,
} and then |
The magic codes are based of the work of @twistedddx in issue #4. The format he described matached my own investigations and I worked it up into this PR. With regards to extra info. After each magic header there is data such as the number of expected bytes in the packet. I believe that if this number is >40000 it gets chunked. We could set it up to expect this chunking but it seems like a lot of extra work. After the magick header in the video stream there is also the text H264. We may be able to use this to auto detect the format. |
With regards to understandig why no header is a continued packet: Also if I told it to not pass the unknown headers to the video stream and tried to play it only the top half of the video was valid indicating that the later half of the frame was missing. |
Perhaps this will ultimately require a more complex structure for each type to parse correctly. E.g. In video type We extract the data such as video frame size and type. I also think that the PTS and DTS are in the header so we could maybe attach proper time stamps. |
It is probably good for us that they don't use separate header for continuation. I believe most players search the stream for the nal unit separators. info on h264 stream This is why passing all the data is currently working. Our VLC and ffmpegs are scanning the streaming for the NAL start codes then playing assuming all the rest (until the next NAL unit start) is the video data. If there was extra data mid NAL unit we would probably be seeing artifacts in the video. |
I've gotten the basics for putting it into the enum you mentioned. Still have to refractor more if we want to properly handle that overflow issue you mentioned |
Something that may be worth knowing before it costs you hours of time: Now that I see these chunked packets during transport maybe the difference is that the official client incorrectly picks up some bytes each time the data was chunked? Maybe there is a continuation byte marker that the client doesn't remove from the video frame. |
Thanks you for the info I will keep that in mind when I get around to keeping track of the sizes. But at it's currently implementation it is lazy and dosent worry about it. If there is a continuation marker maybe we are getting small artifacts... Hmmm |
I'm wondering if filtering all this out will help improve the UDP connections. |
@thirtythreeforty could you help level up my rust here: My current implementation of parsing more structure from the video data looks like this Click to expand#[derive(Debug, PartialEq, Eq)]
pub struct VideoIFrame {
pub magic: Vec<u8>,
pub video_type: Vec<u8>,
pub data_size: Vec<u8>,
pub unknowna: Vec<u8>,
pub timestamp: Vec<u8>,
pub unknownb: Vec<u8>,
pub clocktime: Vec<u8>,
pub unknownc: Vec<u8>,
pub video_data: Vec<u8>,
}
impl VideoIFrame {
pub fn from_binary(binary: &[u8]) -> VideoIFrame {
VideoIFrame {
magic: binary[0..4].to_vec(),
video_type: binary[4..8].to_vec(),
data_size: binary[8..12].to_vec(),
unknowna: binary[12..16].to_vec(),
timestamp: binary[16..20].to_vec(),
unknownb: binary[20..24].to_vec(),
clocktime: binary[24..28].to_vec(),
unknownc: binary[28..32].to_vec(),
video_data: binary[32..].to_vec(),
}
}
pub fn len(&self) -> usize {
self.video_data.len() + 32
}
pub fn as_slice(&self) -> &[u8] {
let mut result = self.magic.clone();
result.extend(&self.video_type);
result.extend(&self.data_size);
result.extend(&self.unknowna);
result.extend(&self.timestamp);
result.extend(&self.unknownb);
result.extend(&self.clocktime);
result.extend(&self.unknownc);
result.extend(&self.video_data);
result.as_slice()
}
}
#[derive(Debug, PartialEq, Eq)]
pub struct VideoPFrame {
pub magic: Vec<u8>,
pub video_type: Vec<u8>,
pub data_size: Vec<u8>,
pub unknowna: Vec<u8>,
pub timestamp: Vec<u8>,
pub unknownb: Vec<u8>,
pub video_data: Vec<u8>,
}
impl VideoPFrame {
pub fn from_binary(binary: &[u8]) -> VideoPFrame {
VideoPFrame {
magic: binary[0..4].to_vec(),
video_type: binary[4..8].to_vec(),
data_size: binary[8..12].to_vec(),
unknowna: binary[12..16].to_vec(),
timestamp: binary[16..20].to_vec(),
unknownb: binary[20..24].to_vec(),
video_data: binary[24..].to_vec(),
}
}
pub fn len(&self) -> usize {
self.video_data.len() + 24
}
pub fn as_slice(&self) -> &[u8] {
let mut result = self.magic.clone();
result.extend(&self.video_type);
result.extend(&self.data_size);
result.extend(&self.unknowna);
result.extend(&self.timestamp);
result.extend(&self.unknownb);
result.extend(&self.video_data);
result.as_slice()
}
}
#[derive(Debug, PartialEq, Eq)]
pub enum VideoFrame {
IFrame(VideoIFrame),
PFrame(VideoPFrame),
}
impl VideoFrame {
pub fn len(&self) -> usize {
match self {
VideoFrame::IFrame(binary) => binary.len(),
VideoFrame::PFrame(binary) => binary.len(),
}
}
pub fn as_slice(&self) -> &[u8] {
match self {
VideoFrame::IFrame(binary) => binary.as_slice(),
VideoFrame::PFrame(binary) => binary.as_slice(),
}
}
}
#[derive(Debug, PartialEq, Eq)]
pub enum BinaryData {
VideoData(VideoFrame),
AudioData(Vec<u8>),
InfoData(Vec<u8>),
Unknown(Vec<u8>),
} The trouble I am having is with the as_slice (used as part of the serialisation and the passing to gstreamer). I think this function is meant to represent the data a single continuous piece, but when the struct is constructed I break it up. In as_slice I try to reconstruct it as a single piece using extends. But it complains about how I am returning a loacally variable as a borrow
I am thinking that I could instead write it as something like this: pub struct VideoPFrame {
pub binary Vec<u8>,
}
impl VideoPFrame {
pub fn from_binary(binary: &[u8]) -> VideoPFrame {
VideoPFrame {
binary: binary.to_vec(),
}
}
pub fn magic(&self) -> &[u8] {
self.binary[0..4]
}
pub fn video_type(&self) -> &[u8] {
self.binary[4..8]
} Instead but it doesn't feel as elegent. I suppose on the plus side though we could make: enum VideoType {
H264,
H265,
Unknown,
}
pub fn video_type(&self) -> VideoType {
const h264_magic: &[u8] = &[0x48, 0x32, 0x36, 0x34];
const h265_magic: &[u8] = &[0x48, 0x32, 0x36, 0x35];
match self.binary[4..8] {
h264_magic => VideoType::H264,
h265_magic => VideoType::H265,
_ => VideoType::Unkown,
}
} What do you think would be best? |
I'll push my work so far and you can see it in the wild but it will fail to compile |
So I decided to go with the getters method where the struct just stores it as raw and then there are methods to get the data from it. I am now having fun trying to get the correct size info, there seems to be some extra bytes somewhere when the frame is split |
I am having a sinking feeling about the extra bytes. The h264 format should have a start code of |
Ok on the plus side it seems unrelated to the number of start code... on the negative side why are the start codes having an extra 0? |
So the number of extra bytes in the packet frame seems to be arbitary I have seen 0, 3, 4, 5 and 6 extra bytes without a pattern. This number of extra bytes does not scale linearly with the number of chunks indicating not a continuation marker. Ok this seems a bit dodgy but..... The frames are split like this:
Example of weird count:
In otherwords the first packet has the odd number of bytes I am wondering if their rounding logic for the count is incorrect somehow. I could get it to work by....
I have been looking at the beginning and end of each packet, with and without a header and there is no pattern that I can see. No numbers near the beginning or end of the packet correspond to anything like the packet size and the hexes are not ASCII either |
So I am having some success with the altered expected bytes |
So at this point I'm feeling that BC will either:
|
So I have observed the start of another video frame even before the rest of the chucks are processed. So it seems to have dropped the packet for some reason.... Sigh |
More things I have observed in files written to disk. Several frames from another camera can appear mid stream. P or I frames may end prematurely. This sounds like your described BC stop processing a I or P frame if it is interrupted by the start of another frame. Audio is not always a single frame at a time, but extremely often is. BC pad the frames with zeros to byte align to the next 8 byte boundary. But rarely a frame does not start on a 8 byte boundary. Basically there are no rules and data corruption is not uncommon. |
Right - this is basically stopping you from returning a dangling pointer. To do this you would need to return a
I would say the latter is more elegant. You're returning a reference with a lifetime based on the original argument's lifetime (a reference to borrowed data). Without lifetime elision, this looks like: pub fn magic<'a>(&'a self) -> &'a [u8] {
self.binary[0..4]
} -- here All this said, however, it looks like you're trending further toward writing a real parser, for which I'm currently using Nom. Although I'm not super happy with it, it does give you a couple niceties, like being able to operate directly on a stream. The deal with Nom is that you return functions ("combinators") that do the actual parsing. This lets the Nom library control parsing execution. Don't worry too much about Nom for now, just wanted to put the bug in your ear. I'd say let's figure out the BC data stream format first, then we can worry about the implementation. I'll put more comments about the format in a second comment since this one is getting long... |
I think I've got all your changes in please let me know what you think |
Also can we remove #![allow(dead_code)]
#![allow(unused_variables)] from lib.rs? Then the linter will pick up unused functions are vars. |
Raw data may be useful especially if we have it from a few different camera, but for now just trace level should tell us when it finds a packet it dosen't recognise. |
I'd like to remove the full header check function too. It is there in case of the very small chance that 4 bytes are not enough to pick out the header in the video data as suggested by @hh1209. However I've never had this as neccessary and my back of the envelope calculation suggests that the odds of it happening randomly are 1 in 4,228,250,625 bytes. So once every 4GB.The advance header check is only really needed during the recovery phase when we are advancing by one byte at a time because we found an invalid packet. 99% of the time (as in all the time in my hour long test except during camera disconnection caused by my WiFi) we are not in the recovery phase so the chance of getting a magic packet out of place is even more unlikely then 1 in 4GB. Bottom line can we remove the advance header check and clean up code a bit more. |
That would of been me. My suggestion is born out of my approach and code is absolutely terrible. I am a data cabler by trade and have written some code in my spare time. I agree that because your code is not dependent on brute force searching every byte for magic bytes this increased resilience to false positives may be over the top. |
Yes, that sounds okay to me. The packet stream seems reasonably well-formed, and your code copes with the cases where it's not. |
Thank you for the changes removing the continuation kind. This looks great! I would like to rescind my previous feedback about the grey frames and errors. I had managed to get my local Git client into a weird state where I thought I was up to date but I was actually running old code. The tip of this branch is butter smooth for me in ffplay and VLC*. So yes, make any final changes you would like and I am good to merge this! * VLC in TCP mode, that is. Its buffer size is still too small for UDP. |
Ah yes sorry, that's why I shouldn't tag from memory. Anyways while I agree this is a good idea for brute forcing every byte, we don't use that as a method and mostly use the size field given in the header. Also revised estimate on the odds as I forget to take into account multiple types of magic header
Total Chance 1/255 * 3/255 * 4/255 * 4/255 = 1.13522E-08 Total 1 in 88,088,554 or 1 in 88MB |
Ok so that's it then all updated! We also remove the Invalid Media Kind too since the only thing that really checked that was this brute forcing |
Once we realized it's an inner packet stream, have you actually seen any occurrences of truncated packets? We can take that probability to zero if it's parsing out well-formed packets :) Also if there's a very very occasional truncated packet, the scan-for-magic-bytes behavior will smoothly pick the media packet stream back up after the corruption passes by. |
I also need a larger buffer even for my E1, maybe that needs a note on the readme |
I haven't actually seen a truncated packet even during my hour long test |
Well actually I did see it but only during a reconnect |
One last change - can you add a |
During a reconnect, truncated packets would probably be okay. In these kinds of edge cases the Unknown code would kick in. Or the media packet code could simply swallow the garbage until it hits a header. Lots of mitigation behaviors are okay. |
We already have mitigation set up where we just drop the buffer coming from the camera until we hit magic again, so I think that's enough for such cases |
Sweet. This has been quite the journey from "grep the extra junk out of the stream" to "rats there's another packet format" to "make a whole new parsing layer." Thank you very much @twistedddx and @QuantumEntangledAndy |
Yeah it's been quite a journey. But I just want to say "Yay" merged! |
I think audio will be a piece of cake now. If you have a branch you want to post, you can, or I can start on it |
Yes audio and auto detect are done. I am just rebasing onto master |
…c_headers Filter data sent to gsteamer using magic bytes
The data sent to gstreamer includes non video related data, such as audio. This causes some players to have to perform extra work to filter out invalid packets (see #49).
This PR uses the headers of each packet to deduce its types and the appropriate action.
fixes #49
Proving that the same headers are used throughout the reolink family this should work for all, but it would be best to test on a few more cameras.