Understanding the `disableChunked` option #1200

vecerek · 2023-08-08T11:11:48Z

Hi there 👋
This is more of a question, rather than a bug report or feature request. There's the undocumented option called disableChunked. I can see from the code that if it's set to true, it performs the tokenization in a different manner. However, is there any recommendation on when to use this flag and why?

I've seen some file type detection for zip files take 9+ seconds in production from time to time. I've noticed that in such cases up to a 100 byte range requests are made. Then I tried to set the disableChunkedoption to true and that seemed to have solved the problem. I'm thinking of always disabling the chunked tokenization but I'd like to know the trade-offs if any exist 🙏

The text was updated successfully, but these errors were encountered:

vecerek · 2023-09-29T11:38:06Z

An explanation here may help me think more about #1201.

Borewit · 2023-09-29T13:38:18Z

I am not behind a computer at the moment, and is a long time ago I wrote this code. I believe it is the following…By default this tokenizer reads files in chunks, which means it only reads those parts which it really needs. So these parts are driven my the parsers, so ideally they just fetch the metadata and not the audio itself.If you disable that mode, you essentially read from a stream. Depends on the scenario when one is better then the other one.If maybe faster, it may be slower depending on the file type, network delay.Op 29 sep. 2023 om 13:38 heeft Attila Večerek ***@***.***> het volgende geschreven: An explanation here may help me think more about #1201. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

vecerek · 2024-06-05T06:13:02Z

Thanks for the answer, @Borewit! I would like to know a bit more about the stream version. I see that the code calls s3request.getRangedRequest([0, 0]) first to get access to the ContentRange response header, which is then parsed. Then, the instanceLength of the parsed content range is used as the size of the stream. I understand when working with streams, one always has to set the size. In my case, the s3request.getRangedRequest([0, 0]) call often takes too long. Furthermore, in my use case, I have to make a HEAD request for the same object before the call to makeTokenizer. I wonder if there was a way to reuse the size from my own HEAD request in the call to makeTokenizer to shave off that 1 slow request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the `disableChunked` option #1200

Understanding the `disableChunked` option #1200

vecerek commented Aug 8, 2023

vecerek commented Sep 29, 2023

Borewit commented Sep 29, 2023 via email

vecerek commented Jun 5, 2024

Understanding the disableChunked option #1200

Understanding the disableChunked option #1200

Comments

vecerek commented Aug 8, 2023

vecerek commented Sep 29, 2023

Borewit commented Sep 29, 2023 via email

vecerek commented Jun 5, 2024

Understanding the `disableChunked` option #1200

Understanding the `disableChunked` option #1200