Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Helper attempts to index MPEG-TS files in project and eats 100% cpu #21136

Closed
mjbvz opened this issue Jan 11, 2018 · 21 comments · Fixed by #57008
Closed

Code Helper attempts to index MPEG-TS files in project and eats 100% cpu #21136

mjbvz opened this issue Jan 11, 2018 · 21 comments · Fixed by #57008
Labels
Bug A bug in TypeScript Good First Issue Well scoped, documented and has the green light Help Wanted You can do this VS Code Tracked There is a VS Code equivalent to this issue
Milestone

Comments

@mjbvz
Copy link
Contributor

mjbvz commented Jan 11, 2018

TS Template added by @mjbvz

TypeScript Version: 4.3.5

Search Terms

  • video
  • mpeg
  • mp4
  • binary

From @jon-valliere on January 10, 2018 22:54

  • VSCode Version: 1.19.2
  • OS Version: 10.13.2

Steps to Reproduce:

  1. Add a MPEG-TS file (e.g. somefile.ts) outside of your normal code folder but somewhere in the project so it shows up in "Explorer" in VSC
  2. Look at your processes -- intellicode totally stops working and the process eats 100%

The TypeScript processor should be smart enough to be able to detect if the file is text or not.

Copied from original issue: microsoft/vscode#41439

@mjbvz mjbvz self-assigned this Jan 11, 2018
@mjbvz
Copy link
Contributor Author

mjbvz commented Jan 11, 2018

From @usernamehw on January 10, 2018 23:34

I wonder if you can exclude media folder(or files) from tsconfig.json:

{
  // ...
  "exclude": [
    "media"
  ]
}

@mjbvz
Copy link
Contributor Author

mjbvz commented Jan 11, 2018

There are workarounds we can tell users about on the VSCode side, but moving over to TS to see if the server should be smarter detecting if a file is binary when indexing

@mhegazy mhegazy added the Suggestion An idea for TypeScript label Jan 11, 2018
@mhegazy
Copy link
Contributor

mhegazy commented Jan 11, 2018

not sure what we can do here.. one option is to bailout on parsing if we see too many parse errors..

@jon-valliere
Copy link

jon-valliere commented Jan 20, 2018

I have a pretty novel idea: check for data[0] == 0x47 and data[188] == 0x47 // assume this is a mpeg-ts file. You could go through the whole file if you wanted to be "extra" sure.

@shirakaba
Copy link

shirakaba commented Jan 29, 2018

I'm not sure what the convention is about where to continue discussion when an issue is a duplicate, but:

To explain @jon-valliere 's suggestion of reading date[0] == 0x47 (quoted text from issue #21453 ):

The Wikipedia page indicates that an MPEG-TS file is merely a "sequence of packets, without any global header". Thus, a valid MPEG-TS file will start with a header packet (the format is again described on the Wikipedia page), beginning with a sync byte, invariably with the value 0x47 (The character 'G').

The data[188] === 0x47 part checks the header at the start of the next packet. It adds some safety, but also wouldn't identify an (admittedly rare) single-packet MPEG-TS file. Unfortunately, the remaining three bytes of the header can all effectively take any value, so they can't be used to extend the fingerprint.

@mhegazy As an alternative to bailing out on high numbers of parse errors, one could bail out if the file size of the .ts file is above a default (yet configurable) threshold, say, 1MB. Java has a similar restriction, wherein the bytecode generated for any method needs to be under 64kB, although I believe that's a hard limit. We are already doing something similar for large JS projects with --disableSizeLimit (albeit without allowing one to configure an explicit size limit): #7444.

@shirakaba
Copy link

@mjbvz Yes, I've tested just now, and that exclude option does indeed fix the problem. However, it is ultimately no more than a workaround, as it requires the developer to be fully aware of this issue beforehand; the very nature of this problem is that it is very hard to track down due to the cryptic compiler errors. I could just as easily rename my MPEG-TS files to .tsv or .tsa, but I've lost a whole day of work on more than one occasion by not being able to identify what the actual cause of the error is.

@RyanCavanaugh RyanCavanaugh added Bug A bug in TypeScript and removed Suggestion An idea for TypeScript labels Jan 29, 2018
@RyanCavanaugh
Copy link
Member

@mhegazy proposing that we bail on files that are > 500 kB and match the above heuristic?

0x47 is just G, unfortunately, but it's hard to imagine a large TS file that doesn't start with / or i.

@mhegazy
Copy link
Contributor

mhegazy commented Jan 30, 2018

so you think data[0] === data[188] === "g" is not gonna create false positives? should we do it on the server only or on the commandline as well?

@RyanCavanaugh
Copy link
Member

Probably both; just treat a file like that as having invalid encoding or something.

@jon-valliere
Copy link

@shirakaba I wouldn't imagine that a file containing a single 188 byte MPEG-TS packet would cause the TypeScript compiler to hang forever. Not like a 65MB MPEG-TS file does.

@jon-valliere
Copy link

jon-valliere commented Jan 30, 2018

@mhegazy I guess you could check the entire file, at 188 byte intervals for the 0x47 marker. If it exists in the entire file, the chance of a false positive would be insignificant. Maybe the compiler should throw a warning saying that it has detected an MPEG-TS file and print the name just incase there is some crazy situation where there is a false positive.

@mhegazy
Copy link
Contributor

mhegazy commented Jan 30, 2018

A PR would be appreciated. We would follow @jon-valliere's suggestions and check on 188 intervals the input text, then raise an error for an invalid file content if true.

@mhegazy mhegazy added Help Wanted You can do this Good First Issue Well scoped, documented and has the green light labels Jan 30, 2018
@mhegazy mhegazy added this to the Community milestone Jan 30, 2018
@shirakaba
Copy link

shirakaba commented Jan 30, 2018

I guess you could check the entire file, at 188 byte intervals for the 0x47 marker. If it exists in the entire file

MPEG-TS files can be considerably big (eg. a full Blu-Ray disc can hold 25-50 GB as BDAV MPEG-TS – so if exported to regular MPEG-TS without further compression, would give that kind of ballpark – and one hour of a recorded HD television-quality stream can come to ~1.6 GB). I would recommend not checking the entire file if it would cause a large disc read. A few packets should suffice.

Hard disc drives read at ~128 MB/s, and we should aim not to hang on a file for more than a fraction of a second, so how about analysing some (very) small fraction of 128 MB (but as Ryan says, only for files above 500 KB)?

Edit: Before, I accidentally mistook packets to be in KB rather than bytes.

  • 3 × 188 byte packets = 564 bytes; 0.0044 ms read
  • 4 × 188 byte packets = 752 bytes; 0.0059 ms read
  • 5 × 188 byte packets = 940 bytes; 0.0073 ms read
  • 10 × 188 byte packets = 1.88 KB; 0.0147 ms read
  • 20 × 188 byte packets = 3.76 KB; 0.0293 ms read
  • 50 × 188 byte packets = 9.40 KB; 0.0730 ms read
  • 100 × 188 byte packets = 18.80 KB; 0.1470 ms read
  • 1,000 × 188 byte packets = 188 KB; 1.4700 ms read

Again, genuine code files do not usually come much larger than 500 KB, so if we keep seeing the sync byte for even around 5 packets, it should indicate a clear positive. But to err on the safe side, how about 188-byte intervals for the first 9.40 KB (fifty packets), then? This would take less than 1 ms per file, yet still have a very low chance of a false positive.

Remember that we may not be talking just one MPEG-TS file here; someone may drop a whole folder of them into the project, so it's wise to be conservative about how much of each one we should be reading in, even if individually the read time might not be too long.

@jon-valliere
Copy link

@shirakaba I'm not sure where 188 KB packets came from? They're 188 Bytes; maybe you mean 10,000 x 188 Byte packets?

I like your idea about using a limit. Isn't there already a parse limit of 1MB or am I confused? Why not just check up to the configured parse limit?

@shirakaba
Copy link

shirakaba commented Jan 30, 2018

I'm not sure where 188 KB packets came from?

@jon-valliere Good catch – you're right, I accidentally mistook packets to be in KB rather than bytes in my last comment, so I've corrected it with an edit now. Sorry for the confusion.

I like your idea about using a limit. Isn't there already a parse limit of 1MB or am I confused? Why not just check up to the configured parse limit?

For non-TS files, the maximum program size is hard-limited to 20MB (Ryan Cavanaugh explains why shortly after the linked comment): #19326 (comment)

For TS files, I can't find any mention of a parse limit. Here, someone even performs tsc --watch on a 6MB file: #10878 (comment)

As now edited into my comment above, I have changed my suggested cautionary threshold from 10 packets to 50 packets, as this would keep read times for each file below 1 ms yet still provide a good amount of protection against false positives.

Note: If anyone really does need to put a letter 'G' into their file both as the very first character and for every subsequent 188th character for 49 further iterations, then I would suggest a list of good alternatives to the letter 'G' here – Unicode characters based on 'G'.

@jon-valliere
Copy link

@shirakaba The compiler warning about "detection of MPEG-TS file /xxxxx/xxxx.ts" will also notify the user in the unlikely chance that they are attempting to compile a TypeScript file with 'G' in all the wrong places. That way they're not wondering, "why isn't this file compiling."

@tchakabam
Copy link

tchakabam commented Aug 2, 2018

My solution/work-around is quite simple:

  1. I keep source-files in a src foldr and have my tsconfig.json only include those. If I need Typescript or JS files in project root or somewhere else, I add them to include array explicitely.

  2. I keep any MPEG-TS files somewhere else :)

Since there simply is a collision of file-extension conventions here, I don't see how any "real" solution can be found other than telling the compiler where your Typescript files really are and actually avoiding collisions.

An alternative I could recommend would be using .tsx for all TypeScript files, regardless wether they contain JSX or not (since a file that may contain JSX, may also not contain any).

But you could also figure out quite easily if data is MPEG-TS in fact with the specific sync-bytes and headers and be pretty sure it's not a Typescript source. That would be much more elegant than a size-limit or such.

@KnisterPeter
Copy link
Contributor

KnisterPeter commented Sep 7, 2018

@mhegazy Should this be probably done before system.readFile? Since an MPEG-TS file could be quite big it would be better to check it before reading the whole file.
As well there is no code abstraction in the TS host right now for reading parts of a file. How to deal with that?

KnisterPeter added a commit to KnisterPeter/TypeScript that referenced this issue Sep 7, 2018
MPEG-TS files consist of frames each 188 bytes long.
These files should not be parsed as typescript files
and therefore we look into the first 50 frames and
check if we find a 'valid' header.

Closes microsoft#21136
@RyanCavanaugh RyanCavanaugh modified the milestones: Community, Backlog Mar 7, 2019
@rsktash
Copy link

rsktash commented Jan 14, 2022

When using TS and hls at the same time avoid .ts extension for media chunk files.
There is a -hls_segment_filename fileNamePath option for ffmpeg when generating from a media source file

@rsktash
Copy link

rsktash commented Jan 14, 2022

image

@jakebailey
Copy link
Member

After #53667, I suspect that this is fixed as we should be bailing out early on these kinds of files. Is anyone still watching this issue that can verify 5.0 vs 5.1 on this front?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A bug in TypeScript Good First Issue Well scoped, documented and has the green light Help Wanted You can do this VS Code Tracked There is a VS Code equivalent to this issue
Projects
None yet
9 participants