-
-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend file type (updated) #603
Extend file type (updated) #603
Conversation
Gentle nudge towards @sindresorhus . We are really looking forward to using this feature in the next official version. |
I'm on vacation, so I won't be able to review this for another couple of weeks. |
Thanks a lot for squeezing in time to send change requests, even on vacation! We will then see that we don't rely on quick action from your side. Enjoy your time off! |
Co-authored-by: Sindre Sorhus <sindresorhus@gmail.com>
…/file-type into extend-file-type-updated
I am on holiday as well, this time without a laptop. Back in 2 weeks. |
Another gentle nudge towards @sindresorhus and @Borewit , so this PR does not get lost in the the vortex of time. |
... And yet another gentle nudge to @sindresorhus . I'd really appreciate if you found the time for a review :-) |
Is there anything I can do to support the review of this PR? |
I will try find some time as well to review this PR @FredrikSchaefer. Please give me 2 weeks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your patience @FredrikSchaefer.
I think you have done a fantastic job. I just have some minor comments.
I will be off the grid for a bit more then a week. So sorry.
core.d.ts
Outdated
|
||
Custom detectors can be used to add new FileTypeResults or to modify return behaviour of existing FileTypeResult detections. | ||
|
||
If the detector returns `undefined`, the `tokenizer.position` should be 0 (unless it's a stream). That allows other detectors to parse the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I agree with the "unless it's a stream". Essentially you can iterate to other detectors if you took a bite of the apple. Only peek is allowed, if read you you have consumed the tokenizer, which is very similar to a stream.
I fear this an area where we can expect a lot questions from users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the comment.
Yeah, guess you're right about the lot of questions.
I just mindlessly took this information from this previous discussion.
Let me suggest a more detailed explanation here:
If the detector returns undefined
, the tokenizer.position
should typically be 0. This allows easy parsing by other detectors, unless subsequent custom detectors specify otherwise. Additionally, the detector shouldn't consume the tokenizer; while peeking is non-consuming, reading is.
What do you think of this?
I'm really open to anything here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also my other comment: https://github.com/sindresorhus/file-type/pull/603/files#r1356979704
I suggest something like this:
If the detector returns no_match it is not allowed to read from the tokenizer (the tokeinzer.position must remain 0) otherwise following scanners will read from the wrong file offset.
If the detector return undefined the scanner is certain the file type cannot be determined, neither by other scanners.
no_match represents option 1 explained in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point that custom detectors should be able to interrupt detection. However, I see two small downsides of the suggested approach:
-
The wording no_match does not really make clear to me whether it means no match for this detector, or no match at all.
-
The standard FileTypeParser returns undefined when no file type could be recognized. Therefore requiring the custom detectors to return something else is a bit counter intuitive.
I therefore suggest to do it the other way around:
If the detector returns undefined, it is not allowed to read from the tokenizer (the tokenizer.position must remain 0) otherwise following scanners will read from the wrong file offset.
If the detector returns file_type_undetectable, the detector is certain the file type cannot be determined, even by other scanners. The FileTypeParser interrupts the parsing and immediately returns undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, one could argue that file_type_undetectable also does not clearly say whether it means file type undetectable for this detector or for all detectors, but it still makes it a bit clearer in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point that custom detectors should be able to interrupt detection. However, I see two small downsides of the suggested approach:
The wording no_match does not really make clear to me whether it means no match for this detector, or no match at all.
The standard FileTypeParser returns undefined when no file type could be recognized. Therefore requiring the custom detectors to return something else is a bit counter intuitive.
I therefore suggest to do it the other way around:
If the detector returns undefined, it is not allowed to read from the tokenizer (the tokenizer.position must remain 0) otherwise following scanners will read from the wrong file offset.
If the detector returns file_type_undetectable, the detector is certain the file type cannot be determined, even by other scanners. The FileTypeParser interrupts the parsing and immediately returns undefined.
Sounds good to me. The second case can be also be something like, detector started reading but for some reason failed to determine the file-type. Not ideally, but it can happen. If the detector starts reading, there is no way back.
We could also check the position after each custom scanner. It may not be 0 actually, there is also an iterated use case with ID3 header. The position should be remain unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Just pushed a commit taking care of that check.
Added detectionImpossible to allow interruption by custom detectors Updated doc of fileTypeFromFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there
Forgive me, wrong account, same person. |
…ested-code-changes Suggested changes to simplify code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks a lot for your effort @FredrikSchaefer
For your final approval @sindresorhus.
@sindresorhus, kind reminder, please proceed with merging if you are happy with this one |
Explain role of `blob` argument in function comment.
e62ca5d
to
4164767
Compare
4164767
to
ca6e449
Compare
Hey There :)
I came across this issue #340 and found that it's just what we need. As the initial PR is way outdated, I decided to start over from scratch.
Resolves #340.