Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle very small pdf's #580

Merged
merged 1 commit into from
Feb 18, 2023
Merged

Conversation

eric-yuan-vanta
Copy link
Contributor

@eric-yuan-vanta eric-yuan-vanta commented Feb 10, 2023

Handle tiny pdf's, and add a test pdf < 1350 bytes (the test pdf is MIT licensed and from https://brendanzagaeski.appspot.com/0004.html)

just fyi generating an empty pdf from google drive results in a pdf of similar size (~700 bytes)

resolves #579

@eric-yuan-vanta
Copy link
Contributor Author

Hey @sindresorhus would appreciate a review on this when you get a chance. Thanks!

@eric-yuan-vanta eric-yuan-vanta changed the title Handle tiny pdf's Handle very small pdf's Feb 13, 2023
core.js Show resolved Hide resolved
@sindresorhus sindresorhus requested a review from Borewit February 14, 2023 05:58
@eric-yuan-vanta eric-yuan-vanta requested review from sindresorhus and removed request for Borewit February 14, 2023 17:18
@eric-yuan-vanta
Copy link
Contributor Author

I rerequested review and github seemed to automatically remove @Borewit. Not sure why!

core.js Show resolved Hide resolved
@eric-yuan-vanta eric-yuan-vanta requested review from Borewit and removed request for sindresorhus February 17, 2023 17:05
core.js Show resolved Hide resolved
throw errors

lint

lint

simplify

wrap all AI reads in try catch

move only ingore to try catch, early return pdf

Revert "move only ingore to try catch, early return pdf"

This reverts commit 3b90419.
const buffer = Buffer.alloc(Math.min(maxBufferSize, tokenizer.fileInfo.size));
await tokenizer.readBuffer(buffer, {mayBeLess: true});
try {
await tokenizer.ignore(1350);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remains a sensitive point, but is not introduced by this PR.
Maybe we should drop support for specialized PDF formats as text based formats are out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

End of stream error when checking the file type of a pdf
3 participants