Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping out the MpegTsDemuxer flow #6

Open
slifty opened this issue May 8, 2021 · 5 comments
Open

Mapping out the MpegTsDemuxer flow #6

slifty opened this issue May 8, 2021 · 5 comments
Labels
discussion The conversation is the point

Comments

@slifty
Copy link
Member

slifty commented May 8, 2021

Discussion

What do you want to talk about?

The MpegTsDemuxer logic is complicated, but I'd like to try to map it out a little in the spirit of understanding how it works right now. This issue will capture that exploration. Some of the discussion may end up winding up as documentation as well.

Relevant Research

@slifty slifty added the discussion The conversation is the point label May 8, 2021
@slifty
Copy link
Member Author

slifty commented May 8, 2021

The current MpegTsDemuxer API

The sole interaction point to the demuxer is MpegTsDemuxer. Here's how to interact with it...

  • Inputs: An MpegTsDemuxer ingests MpegTs data via process.
  • Outputs: An MpegTsDemuxer accepts a callback on instantiation. that callback is called every time a packet is parsed.

That's basically the API -- in goes stream data, out comes packets. This is in some ways performing the same function as a transform stream, but it doesn't implement that API.

Action Items

@slifty
Copy link
Member Author

slifty commented May 8, 2021

How does processing work?

An MpegTsDemuxer instance tracks a few things as class properties:

  • pids - this is a map of packet identifiers to their associated Stream objects.

  • pmt - this is the Program Map Table. In theory a given mpegts could have multiple program streams. This is called a "multi-program transport stream (MPTS)" but I don't believe MPTS's are supported by the current implementation.

  • leftover - this is a little block of any leftover data from a past process. The leftover is merged into whatever data is passed in next.

  • lview - this is just a helper DataView object that is used to access the contents of leftover. It's passed to the utilities that actually use the data.

  • ptr - this is a pointer, and specifies how much data is sitting in leftover (so if ptr is 0 that means there is no data in leftover)

  • cb - this is a callback which is invoked whenever there's a new Packet extracted from the stream.

A lot of these attributes are passed directly to utility functions WHICH MODIFY THEIR PROPERTIES.

This makes it really hard to know exactly what's going on / debug.

Action Items

  • We should rename these attributes so their purposes are more intuitive.
  • I would like to either (A) eliminate the use of pointers / memory management or more likely (B) keep that efficiency, but encapsulate the complexity in a new class with named methods for data access and manipulation.
  • It's going to be painful, but we really really cannot have our utilities modifying all of these class attributes indirectly. We need to go down the list and refactor the utility methods so they return the values necessary for MpegTsDemuxer to explicitly manage its own state.

@slifty
Copy link
Member Author

slifty commented May 8, 2021

How does processing work (continued)

When new data is processed it is passed a buffer (it can be passed some other stuff which makes it possible to specify what portion of the buffer to use, but at this point I think that just complicates the API. I would rather have the developer be responsible for passing exactly the data they want processed).

The processing then does the following:

  1. Calculate remainder -- this is how much remaining data in the leftover that needs to be merged with the new data.
const remainder = (PACKET_LEN - this.ptr) % PACKET_LEN
  1. I actually think the next step has a bug.
if (remainder > 0) {
	if (len < remainder) {
		this.leftover.set(buffer.subarray(offset, offset + len), this.ptr)
		return 1 // still have an incomplete packet
	}
	...
}

What this is saying, as written, is:

If there was leftover data, and if the length of the new data is less than the length of the leftover data, then combine the leftover data and exit out.

But that's not what we want! For instance, if packet length is 100, the remainder from the last processing is 80, the new data is 70, then 80 + 70 = 150 which is greater than the packet length and we very well could have a complete packet.

What I believe it SHOULD be doing is saying is:

If the leftover data + the new data is less than the packet length then combine the leftover data and exit out because we can't possibly have a complete packet.

The current logic also doesn't update the internal pointer even though it adds data to leftover, which I think means that data would get overwritten. I've opened an upstream issue to get clarification.

Action Items

  • Fix the "leftover packet" logic (assuming I interpreted all this correctly). This might be done by updating the logic or maybe refactoring the concept of leftovers so we just always combine it with the incoming data and attempt to process via a singular path / without short-circuiting.

@slifty
Copy link
Member Author

slifty commented May 8, 2021

How does processing work (continued again)

  1. If there was no remainder then first we define a few things:
  • len which is the length of the buffer plus the offset (honestly I don't understand why offset is being added back to len -- another reason why I think we should remove the concept of offset as a parameter entirely.)
  • offset which is basically being set to the remainder -- this is a bit confusing since if remainder is > 0 then it is possible a whole bunch of stuff has been added to the leftover. Again, I think the leftover logic may just be a bit buggy.
  1. Next, we iterate over the input buffer. This uses some tracking variables:
  • ptr (not to be confused with this.ptr) is what keeps track of the current position in the process buffer.
  • datalen (not to be confused with len from earlier) keeps track of how much data remains to be processed.

On each loop we do some things:

  • this.ptr is updated to be datalen every iteration (which means it is possible this.ptr could be greater than PACKET_LEN and cause stuff like this.
  • If datalen is 0 then we return because there's no more data to process.
  • If datalen is less than PACKET_LEN then we return because there isn't enough data to process a packet.
  • Otherwise we try to demux a packet! Apparently demuxPacket can return a non-zero value which means something is terribly wrong and we need to nope out <-- this is what can lead to this.ptr being greater than PACKET_LEN.

Action Items

  • This loop is the meat and potatoes of the process step and ideally everything else would just be prepping a DataView that combines leftover and buffer and not calling demuxPacket (or returning) from anywhere else.
  • We need to fix the bug where this.ptr can be too large, we also need to change the demuxPacket API so that it does not manage MpegTsDemuxer's state.

@slifty
Copy link
Member Author

slifty commented May 11, 2021

Process has been refactored, but it would still be worth doing a deep dive into how the demuxPacket method works (and it's various spinoffs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion The conversation is the point
Projects
None yet
Development

No branches or pull requests

1 participant