Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Base64 decoder have a return type of Uint8List? #31784

Closed
jfphilbin opened this issue Jan 5, 2018 · 13 comments
Closed

Should Base64 decoder have a return type of Uint8List? #31784

jfphilbin opened this issue Jan 5, 2018 · 13 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert

Comments

@jfphilbin
Copy link

In Dart version 2.0.0-dev.15.0 the following does not work:

BASE64.decode(s)?.buffer?.asInt32List();

Fixing this would only require changing the return type of List<int> convert in Base64Decoder
class to Uint8List convert. The _Base64.decode method already has type Uint8List.

Is there some reason is missing for not doing this?

@lexaknyazev
Copy link
Contributor

Also dart:io has similar cases (e.g. #31547).

@kevmoo kevmoo added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert labels Jan 7, 2018
@lrhn
Copy link
Member

lrhn commented Jan 8, 2018

I think it would be reasonable to change the return type to Uint8List. The Base64Decoder.convert already says that it returns a Uint8List, and it's a valid override of the Codec<List<int>,String>.decode method.

Fair warning, though - the buffer of the returned Uint8List might be bigger than the actual contents, so you probably want to do:

var bytes = BASE64.decode(s);
var ints = bytes.buffer.asInt32List(0, bytes.length ~/ 4);

so you have the correct number if 32-bit integers.

@lexaknyazev
Copy link
Contributor

lexaknyazev commented Jan 8, 2018

I think it would be reasonable to change the return type Uint8List.

@lrhn Could other similar cases be changed as well?

  • dart:convert:

    • AsciiCodec, Latin1Codec, Utf8Codec (with their encoders) - all return an instance of Uint8List.
  • dart:io:

    • File.readAsBytesSync, File.readAsBytes, File.readSync, File.read return Uint8List (or a Future with it).
    • File.openRead returns a Stream of Uint8List.
    • BytesBuilder.takeBytes and BytesBuilder.toBytes return Uint8List.
    • ZLibCodec, GZipCodec (with encoders) return Uint8List.
    • RandomAccessFile.readSync, RandomAccessFile.read probably return Uint8List (or a Future with it).
    • Stdin should probably extend Stream<Uint8List>.
    • SystemEncoding.encoder should probably be of Converter<String, Uint8List> type.
    • RawSocket.read, Socket.listen, Datagram.data should probably also use Uint8List instead of List<int>.

@lrhn
Copy link
Member

lrhn commented Jan 18, 2018

It's probably possible to make AsciiDecoder return a Uint8List, and similarly for the other decoders, but we can't change the type of AsciiCodec from being a Codec<String, List<int>> (through Encoding) since we want to accept any list of integers in the encoding part.

@lexaknyazev
Copy link
Contributor

Yet another case:
UriData.contentAsBytes() always returns Uint8List but declares List<int> as return type.

@jfphilbin
Copy link
Author

jfphilbin commented Mar 12, 2018

The new top level function base64Decode has the old signature:
List<int> base64Decode(String source)
shouldn't it be changed to:
Uint8List base64Decode(String source)
so that it agrees with base64.decode?

@lrhn
Copy link
Member

lrhn commented Mar 12, 2018

It should, good catch!.

@mathieujobin
Copy link

why not returning a string? that's what most languages do

String base64Decode(String source)

doesn't it make sense?

@lrhn
Copy link
Member

lrhn commented Jul 2, 2018

A base-64 text represents a sequence of bytes. That's why decoding it returns a list of bytes.
Nothing requires that sequence of bytes to represent string content. It might just be bytes.

You can convert a sequence of bytes to a string again, but there is no single unique way to do that, and any one way we pick might turn out to be wrong.

You could choose to always do UTF-8 decoding of the bytes into a string, but then it would fail if the bytes are not valid UTF-8 (not all byte sequences are). Or you could do LATIN-1 decoding, then all byte sequences are valid, but you might not get what you want (if you wanted UTF-8).

So, it's better to create the bytes and let the user tell us the correct interpretation of those bytes, than it is to guess and sometimes guess wrong.

@mathieujobin
Copy link

Thank you very much for this excellent answer. it does make sense, and right on point with the problem I have. my string is neither UTF8 or Latin1, but BINARY !
How do I get back a binary string from the bytes? (I'm new to Dart)

thank you

@lrhn
Copy link
Member

lrhn commented Jul 2, 2018

You have to first explain what a binary string is.

Dart strings are sequences of unsigned 16-bit integers. Your bytes are unsigned 8-bit integers, so you have to say which 16-bit value/character each byte should correspond to.

You can embed the 8-bit integers in a string, using the byte value as the low eight bits of the 16-bit code unit. That is actually what the LATIN-1 decoder does because LATIN-1 exactly is the characters of the first 256 Unicode codepoints. So: String byteString = latin1.decode(base64decode(inputString))

If you started with a Python b'some text' string, then it likely is equivalent to a LATIN-1 string. At least it will preserve the all the bytes to do a LATIN-1 decoding.

@mathieujobin
Copy link

Thank you, this is very helpful

@lrhn
Copy link
Member

lrhn commented Oct 2, 2020

It now returns Uint8List.

@lrhn lrhn closed this as completed Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-convert
Projects
None yet
Development

No branches or pull requests

5 participants