-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for different encodings #210
Conversation
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
==========================================
+ Coverage 95.86% 96.20% +0.34%
==========================================
Files 54 55 +1
Lines 21044 21180 +136
==========================================
+ Hits 20174 20377 +203
+ Misses 870 803 -67
Continue to review full report at Codecov.
|
Add an '--encoding' flag to the CLI options to allow setting a custom encoding. When no encoding is passed, attempt to read the encoding from BOM Tread as UTF-8 if all else fails. This uses the encoding_rs_io crate to do this
src/lib.rs
Outdated
where | ||
T: Read, | ||
{ | ||
fn load_source(&mut self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why mut? this looks strange to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has to do with supporting Read traits. I'm trying to get rid of it again. We might need to re-think the SourceContainer strategy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a different idea for the SourceContainer I'm very open ... they did not turn out ultra-elegant :-)
src/lib.rs
Outdated
|
||
/// SourceContainers offer source-code to be compiled via the load_source function. | ||
/// Furthermore it offers a location-String used when reporting diagnostics. | ||
pub trait SourceContainer { | ||
/// loads and returns the SourceEntry that contains the SourceCode and the path it was loaded from | ||
fn load_source(&self) -> Result<SourceCode, String>; | ||
fn load_source(&mut self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this suddenly gets mut when we introduce encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read() expects a mutable reader, so if we want to implement for a "Read" trait, we need it to be mutable.
I'll try to look into another strategy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the only reason I support Read here is for the tests. So I could probably move this into integration tests that load files instead.
src/lib.rs
Outdated
.map_err(|err| format!("{:}", err))?; | ||
Ok(SourceCode { | ||
source: buffer, | ||
path: "local".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used in tests? This may cause confusion if it ever appears in the production mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I wanted to have the actual reading of the strings shared with productive code, so I added "local" for this one. I am not a fan honestly.
Remove the dyn trait as it's not required in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good change, 3 small requests
src/lib.rs
Outdated
pub type Sources<'a> = [&'a dyn SourceContainer]; | ||
pub trait SourceLocation { | ||
/// returns the location of this source-container. Used when reporting diagnostics. | ||
fn get_location(&self) -> Option<&str>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, lets force a location.
at the moment we don't see a situation where no location-string would be better than a dummy location string (e.g. "<in_memory>" or "<unit_test>".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traits are merged back, and we now have a method to get a String out of the reader.
src/lib.rs
Outdated
/// returns the location of this source-container. Used when reporting diagnostics. | ||
fn get_location(&self) -> &str; | ||
fn load_source(self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String>; | ||
// fn get_location(&self) -> &str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment stayed here because you wanted to think about joining the two traits again ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed :)
@@ -255,6 +269,22 @@ mod cli_tests { | |||
assert_eq!(parameters.output_format_or_default(), super::DEFAULT_FORMAT); | |||
} | |||
|
|||
#[test] | |||
fn encoding_resolution() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add a test that defines the behavior if I ask for an unknown decoding via the cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Closes #203
Add an '--encoding' flag to the CLI options to allow setting a custom
encoding.
When no encoding is passed, attempt to read the encoding from BOM
Tread as UTF-8 if all else fails.
This uses the encoding_rs_io crate to do this