-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding module #439
Encoding module #439
Conversation
@@ -60,7 +60,7 @@ impl Decoder { | |||
/// | |||
/// If you instead want to use XML declared encoding, use the `encoding` feature | |||
pub fn decode_with_bom_removal<'b>(&self, bytes: &'b [u8]) -> Result<Cow<'b, str>> { | |||
let bytes = if bytes.starts_with(b"\xEF\xBB\xBF") { | |||
let bytes = if bytes.starts_with(&[0xEF, 0xBB, 0xBF]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aesthetic preference, I feel like this is easier to read and better communicates the non-text-ness.
Codecov Report
@@ Coverage Diff @@
## master #439 +/- ##
==========================================
+ Coverage 51.14% 51.16% +0.02%
==========================================
Files 26 27 +1
Lines 13308 13314 +6
==========================================
+ Hits 6806 6812 +6
Misses 6502 6502
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but fix the markdown link
src/encoding.rs
Outdated
/// | Bytes |Detected encoding | ||
/// |-------------|------------------------------------------ | ||
/// |`00 00 FE FF`|UCS-4, big-endian machine (1234 order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because now the function is public, maybe more explicitly mark rows, where autodetection is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed the non-supported rows entirely. It is unlikely that support for the other encodings will ever be available, at least using this library. Plus anyone can just visit the link to see the full table.
} | ||
} | ||
|
||
// TODO: add some tests for functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mind to add tests before merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it but I figured it will be easier to do a big testing push at the end. We've only got a few sample documents and entering the data manually would be painful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, it needs to happen eventually - it would just be helpful to have the full picture of how encoding works together in mind while doing that work.
} | ||
|
||
#[cfg(feature = "encoding")] | ||
fn split_at_bom<'b>(bytes: &'b [u8], encoding: &'static Encoding) -> (&'b [u8], &'b [u8]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant, because first part is not used anywhere. Why just cut off the beginning, as was before, is not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but I was thinking that A) may want to add some variants that return the BOM the same way that we provide StartText
and B) at the very least it would be useful for testing.
On the other hand, I kinda feel like both StartText
and returning the BOM have limited utility in practice. But it feels like an open question.
I'll leave it as-is for now but I wouldn't be be upset if we end up removing it later.
Move the core functionality for encoding / decoding to a new module and provide some freestanding utilities for decoding buffers.