Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the disableChunked option #1200

Open
vecerek opened this issue Aug 8, 2023 · 3 comments
Open

Understanding the disableChunked option #1200

vecerek opened this issue Aug 8, 2023 · 3 comments

Comments

@vecerek
Copy link

vecerek commented Aug 8, 2023

Hi there 👋
This is more of a question, rather than a bug report or feature request. There's the undocumented option called disableChunked. I can see from the code that if it's set to true, it performs the tokenization in a different manner. However, is there any recommendation on when to use this flag and why?

I've seen some file type detection for zip files take 9+ seconds in production from time to time. I've noticed that in such cases up to a 100 byte range requests are made. Then I tried to set the disableChunkedoption to true and that seemed to have solved the problem. I'm thinking of always disabling the chunked tokenization but I'd like to know the trade-offs if any exist 🙏

@vecerek
Copy link
Author

vecerek commented Sep 29, 2023

An explanation here may help me think more about #1201.

@Borewit
Copy link
Owner

Borewit commented Sep 29, 2023 via email

@vecerek
Copy link
Author

vecerek commented Jun 5, 2024

Thanks for the answer, @Borewit! I would like to know a bit more about the stream version. I see that the code calls s3request.getRangedRequest([0, 0]) first to get access to the ContentRange response header, which is then parsed. Then, the instanceLength of the parsed content range is used as the size of the stream. I understand when working with streams, one always has to set the size. In my case, the s3request.getRangedRequest([0, 0]) call often takes too long. Furthermore, in my use case, I have to make a HEAD request for the same object before the call to makeTokenizer. I wonder if there was a way to reuse the size from my own HEAD request in the call to makeTokenizer to shave off that 1 slow request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants