Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 XML variants #16586

Closed
wants to merge 2 commits into from
Closed

Conversation

gioele
Copy link
Contributor

@gioele gioele commented Aug 18, 2014

These two commits implement two variants of Base64 used when the result is to be used as part of an XML element name or as a NMTOKEN.

The second commit changes the signature of the main method in the FromBase64 trait, as the existing code did not take config into account and relied on an assumptions that are not true (i.e. that every Base64 charset uses - for 0x3e and _ for 0x3f).

@alexcrichton
Copy link
Member

Due to the change to from_base64, could you modify the commit message in accordance with our breaking changes policy?

@sfackler
Copy link
Member

It seems strange for from_base64 to take a Config, since only 1 of the 3 fields matters.

gioele added 2 commits August 19, 2014 08:08
This commit changes the signature of the main method in the FromBase64
trait, as the existing code did not take config into account and relied
on an assumptions that are not true for all configs (i.e. that every
Base64 charset uses `-` for 0x3e and `_` for 0x3f).

This change makes the base64 API more symmetric and allows for future
changes to `from_base64` to make it more RFC-compliant.

[breaking-change]
@gioele
Copy link
Contributor Author

gioele commented Aug 19, 2014

@alexcrichton: thanks for the reminder. I have now changed the commits to mention the API change as described in policy and pushed the new commits.

@sfackler: it is not that strange. First it brings back a bit of simmetry to the base64 API. But more importantly, the fact that it uses only one field is because it is not yet complete: in order to be RFC compliant it should raise an error when it sees no newline in a config that requires the presence of newlines.

@gioele
Copy link
Contributor Author

gioele commented Aug 28, 2014

May I have a review on this minor PR, now that the commit messages have been fixed as requested?

@alexcrichton
Copy link
Member

Thanks for updating the commit message @gioele, that looks great!

I'm also a little uncomfortable in putting the config in the commonly-used from_base64 method, although I also think that it is somewhat necessary.

Looking around at other languages, it looks like Ruby has separate methods for each flavor of encode/decode, Go has separate encoder/decoder structures, Java has separate methods, and python has separate methods.

Usability-wise I don't think passing in a config is the best way to go, and it seems like the trends in other languages is not to have one method that takes configuration.

@gioele
Copy link
Contributor Author

gioele commented Aug 30, 2014

Thanks for the review, @alexcrichton.

I agree that having a mandatory Config parameter in from_base64 is not very good usability-wise. But at least it mirrors the fact that to_base64 requires the same parameter, applying the priciple of least surprise.

Talking of API in other design, I would like this method to work like Fixnum#to_s in Ruby. When called without any argument, to_s will produce a string "as expected"

42.to_s # => "42"

If one wants to convert that number to a string, but using another base, they can add the base parameter to it.

42.to_s(base=16) # => "2a"

This kind of API requires the possibility to have default parameters, something that Rust does not have at the moment IIUC.

Two concrete possibility for the API of from/to_base64 may be

  1. a generic method to_base64_with_config(&self, config: Config) and a shortcut for the most common case to_base64(&self) {self.to_base64(STANDARD) }.
  2. same as 1. plus separate methods for each encoding config, e.g., to_base64_url, to_base64_xmlnames.

I think this is the best kind of API as long as default parameters cannot be used.

Should I go forth and implement one of these designs?

@alexcrichton
Copy link
Member

Sorry for being a little slow to getting back to you! It looks like the trend of other languages is to either have a separate encoder/decoder for each type of base64 encoding (a separate trait for us) or a separate pair of methods.

I personally find the from_base64 method confusing as I always forget what the "from" part refers to. I would tend to want to rename this encode/decode as it's clear that the receiver of the method is being encoded or it's being decoded (as opposed to "from" referring to the return value). I'm sure that @aturon would know about other conventions in the standard library as well. The ToHex/FromHex traits would also need to be affected.

So I suppose to speak concretely, I would advocate for something like:

trait Base64Decode {
    fn decode_base64(...);
    fn decode_base64_url(...);
    fn decode_base64_xml(...);
}

Or something along the lines of that.

This should get another opinion though as I'm sure others would differ!

@aturon
Copy link
Member

aturon commented Sep 4, 2014

The use of from_ here does not follow the usual conventions, which use from_ only for constructors. Given that these methods are used as part of serialization, I think encode and decode are a good pairing (rather than trying to shoehorn the conversion method conventions).

As far as Config goes, the Java API uses something a bit like a builder strategy -- there are three entry points to get an encoder (vanilla, MIME, and url), but then the encoder itself has a builder-style method for setting the padding. We've been exploring the builder pattern in Rust, and it might be worth considering something similar here.

In any case, I agree with @sfackler that the decoding methods shouldn't take a Config structure unless all of the fields are relevant, and with @alexcrichton that we should have an ergonomic default. I'd personally be happy with a builder-style interface, or with convenience method(s) that wrap a more general method that take DecodeConfig or some such.

@gioele
Copy link
Contributor Author

gioele commented Sep 5, 2014

I see the Builder patter overcomplex for a Base64 transformation, especially because it would force the addition of one method for each supported type from which you can encode/decode. And BTW, the Java API is not exactly a model to follow for ergonomic interfaces. :)

Here is my proposal, based on what @alexcrichton suggested: a Base64{En,De}code trait for the basic {en,de}coding and other traits for specialized conversions. I propose this split to avoid having big Base64{En,De}code traits that:

  1. force all implementing types to implement many different functions,
  2. require breaking the trait to add new configurations.
trait Base64Decode {
    fn decode_base64(...);
    fn decode_base64_with_config(...);
}

trait Base64DecodeXML {
    fn decode_base64_xml_name(...);
    fn decode_base64_xml_nmtoken(...);
}

trait Base64DecodeURL { ... }

@aturon
Copy link
Member

aturon commented Sep 10, 2014

@gioele Sounds good to me!

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to reopen with an updated PR!

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants