Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for different encodings #210

Merged
merged 7 commits into from
Jul 27, 2021
Merged

Add support for different encodings #210

merged 7 commits into from
Jul 27, 2021

Conversation

ghaith
Copy link
Collaborator

@ghaith ghaith commented Jul 24, 2021

Closes #203

Add an '--encoding' flag to the CLI options to allow setting a custom
encoding.

When no encoding is passed, attempt to read the encoding from BOM
Tread as UTF-8 if all else fails.
This uses the encoding_rs_io crate to do this

@codecov
Copy link

codecov bot commented Jul 24, 2021

Codecov Report

Merging #210 (63b07dc) into master (862f5e7) will increase coverage by 0.34%.
The diff coverage is 97.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   95.86%   96.20%   +0.34%     
==========================================
  Files          54       55       +1     
  Lines       21044    21180     +136     
==========================================
+ Hits        20174    20377     +203     
+ Misses        870      803      -67     
Impacted Files Coverage Δ
src/compile_error.rs 22.22% <0.00%> (-1.59%) ⬇️
src/cli.rs 95.70% <100.00%> (+0.53%) ⬆️
src/lib.rs 86.17% <100.00%> (+45.94%) ⬆️
tests/correctness/external_functions.rs 100.00% <100.00%> (ø)
tests/integration/external_files.rs 100.00% <100.00%> (ø)
tests/tests.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 862f5e7...63b07dc. Read the comment docs.

Add an '--encoding' flag to the CLI options to allow setting a custom
encoding.

When no encoding is passed, attempt to read the encoding from BOM
Tread as UTF-8 if all else fails.
This uses the encoding_rs_io crate to do this
src/lib.rs Outdated
where
T: Read,
{
fn load_source(&mut self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why mut? this looks strange to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to do with supporting Read traits. I'm trying to get rid of it again. We might need to re-think the SourceContainer strategy here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a different idea for the SourceContainer I'm very open ... they did not turn out ultra-elegant :-)

src/lib.rs Outdated

/// SourceContainers offer source-code to be compiled via the load_source function.
/// Furthermore it offers a location-String used when reporting diagnostics.
pub trait SourceContainer {
/// loads and returns the SourceEntry that contains the SourceCode and the path it was loaded from
fn load_source(&self) -> Result<SourceCode, String>;
fn load_source(&mut self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this suddenly gets mut when we introduce encoding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read() expects a mutable reader, so if we want to implement for a "Read" trait, we need it to be mutable.
I'll try to look into another strategy here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the only reason I support Read here is for the tests. So I could probably move this into integration tests that load files instead.

src/lib.rs Outdated
.map_err(|err| format!("{:}", err))?;
Ok(SourceCode {
source: buffer,
path: "local".into(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used in tests? This may cause confusion if it ever appears in the production mode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I wanted to have the actual reading of the strings shared with productive code, so I added "local" for this one. I am not a fan honestly.

@ghaith ghaith marked this pull request as draft July 24, 2021 09:54
Copy link
Collaborator

@riederm riederm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change, 3 small requests

src/lib.rs Outdated
pub type Sources<'a> = [&'a dyn SourceContainer];
pub trait SourceLocation {
/// returns the location of this source-container. Used when reporting diagnostics.
fn get_location(&self) -> Option<&str>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, lets force a location.
at the moment we don't see a situation where no location-string would be better than a dummy location string (e.g. "<in_memory>" or "<unit_test>".)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traits are merged back, and we now have a method to get a String out of the reader.

src/lib.rs Outdated
/// returns the location of this source-container. Used when reporting diagnostics.
fn get_location(&self) -> &str;
fn load_source(self, encoding: Option<&'static Encoding>) -> Result<SourceCode, String>;
// fn get_location(&self) -> &str;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment stayed here because you wanted to think about joining the two traits again ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed :)

@@ -255,6 +269,22 @@ mod cli_tests {
assert_eq!(parameters.output_format_or_default(), super::DEFAULT_FORMAT);
}

#[test]
fn encoding_resolution() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a test that defines the behavior if I ask for an unknown decoding via the cli

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ghaith ghaith marked this pull request as ready for review July 26, 2021 20:57
@ghaith ghaith merged commit 5528671 into master Jul 27, 2021
@ghaith ghaith deleted the encoding branch July 27, 2021 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support different Encodings
2 participants