Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[breaking change] Make utf8.encode() / Utf8Codec.encode() return more precise Uint8List return type #52801

Closed
mkustermann opened this issue Jun 28, 2023 · 9 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes

Comments

@mkustermann
Copy link
Member

Change Intent

Make Utf8Codec.encode() return a Uint8List instead of a List<int>.

Justification

This change will to allow compilers to better optimize code which is accessing the bytes. It also allows being explicit about the guarantees we make: We already guarantee to return a Uint8List but it's not visible in the return type. Some users then do explicit downcasts via utf8.encode(...) as Uint8List, others pass around List<int> which can result in inefficient code.

Impact

The impact will be more precision in static types and smaller code size / higher performance is some cases. It may break existing code if such code is relying on type inference to infer List<int> and would fail if it would now infer Uint8List.

Mitigation

Explicitly pass List<int> as type arguments in places where the automatically inferred Uint8List type is incorrect.

@mkustermann mkustermann added the breaking-change-request This tracks requests for feedback on breaking changes label Jun 28, 2023
@lrhn
Copy link
Member

lrhn commented Jun 28, 2023

This is a follow up on #36900 which intended to make that change, but ended up changing Utf8Encoder.convert instead.

(I do not remember the reason for not changing Utf8Codec.encode at the time, or whether it was just an oversight.)

@itsjustkevin
Copy link
Contributor

@vsmenon @grouma and @Hixie for breaking change review.

@Hixie
Copy link
Contributor

Hixie commented Jun 28, 2023

Very LGTM, been asking for this for years :-)

@kevmoo
Copy link
Member

kevmoo commented Jun 28, 2023

Ditto. Super useful change.

mkustermann added a commit to mkustermann/flutter-packages that referenced this issue Jun 29, 2023
To avoid analyzer warnings when `utf8.encode()` will return the more
precise `Uint8List` type, we use `const Utf8Encoder().convert()` which
already returns `Uint8List`

See dart-lang/sdk#52801
mkustermann added a commit to mkustermann/engine that referenced this issue Jun 29, 2023
To avoid analyzer warnings when utf8.encode() will return the more precise Uint8List type, we use const Utf8Encoder().convert() which already returns Uint8List

See dart-lang/sdk#52801
mkustermann added a commit to flutter/engine that referenced this issue Jun 29, 2023
To avoid analyzer warnings when utf8.encode() will return the more
precise Uint8List type, we use const Utf8Encoder().convert() which
already returns Uint8List

See dart-lang/sdk#52801
@grouma
Copy link
Member

grouma commented Jun 29, 2023

Explicitly pass List as type arguments in places where the automatically inferred Uint8List type is incorrect.

Do we have an idea of how many instances will require this mitigation? For internal developers, I assume the fixes will be driven by the Dart team?

@mkustermann
Copy link
Member Author

Do we have an idea of how many instances will require this mitigation? For internal developers, I assume the fixes will be driven by the Dart team?

The issue with inference - which will make your code compile & run differently seems very rare - I have seen 2 cases so far.

What is more common is that code performs an explicit downcast via utf8.encode(...) as Uint8List. This doesn't compile or run code differently, but will produce static "unnecessary cast" analyzer warnings.

My plan would be to handle those 2 issues preemptively in g3, dart sdk & flutter related repos before landing the actual change.

@grouma
Copy link
Member

grouma commented Jun 29, 2023

SGTM and LGTM.

mkustermann added a commit to flutter/flutter that referenced this issue Jun 29, 2023
)

To avoid analyzer warnings when utf8.encode() will return the more
precise Uint8List type, we use const Utf8Encoder().convert() which
already returns Uint8List

See dart-lang/sdk#52801
@vsmenon
Copy link
Member

vsmenon commented Jun 30, 2023

lgtm

auto-submit bot pushed a commit to flutter/packages that referenced this issue Jun 30, 2023
To avoid analyzer warnings when `utf8.encode()` will return the more precise `Uint8List` type, we use `const Utf8Encoder().convert()` which already returns `Uint8List`

See dart-lang/sdk#52801
@a-siva a-siva added the area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. label Jul 5, 2023
mkustermann added a commit to mkustermann/dart_cli_script that referenced this issue Jul 6, 2023
A breaking change (see [0]) will make `utf8.encode()` return the more precise
`Uint8List` type (instead of the current `List<int>`).

In rare circumstances this can lead to changes in behavior, mainly when
code relies on type inference, a different type got inferred and the code
dependend on the type not being inferred a more precise type.

Here we explicitly use `Stream<List<int>>` instead of relying on type
inference (which would infer `Stream<Uint8List>` in some cases after
[0]). This is necessary as the stream transformer APIs cannot work with
subtypes. Example of code that fails at runtime:

```
  import 'dart:typed_data';
  import 'dart:convert';

  void main() {
    Stream<Uint8List> stream = Stream.fromIterable([]);
    Stream<List<int>> stream2 = stream;
    stream2.transform(utf8.decoder);
         //  ^^^ Will throw due to Utf8Decoder not being subtype of
         //      StreamTransformer<Uint8List, String>.
  }
```

[0] dart-lang/sdk#52801
nex3 added a commit to google/dart_cli_script that referenced this issue Jul 6, 2023
A breaking change (see [0]) will make `utf8.encode()` return the more precise
`Uint8List` type (instead of the current `List<int>`).

In rare circumstances this can lead to changes in behavior, mainly when
code relies on type inference, a different type got inferred and the code
dependend on the type not being inferred a more precise type.

Here we explicitly use `Stream<List<int>>` instead of relying on type
inference (which would infer `Stream<Uint8List>` in some cases after
[0]). This is necessary as the stream transformer APIs cannot work with
subtypes. Example of code that fails at runtime:

```
  import 'dart:typed_data';
  import 'dart:convert';

  void main() {
    Stream<Uint8List> stream = Stream.fromIterable([]);
    Stream<List<int>> stream2 = stream;
    stream2.transform(utf8.decoder);
         //  ^^^ Will throw due to Utf8Decoder not being subtype of
         //      StreamTransformer<Uint8List, String>.
  }
```

[0] dart-lang/sdk#52801

Co-authored-by: Natalie Weizenbaum <nweiz@google.com>
copybara-service bot pushed a commit that referenced this issue Jul 11, 2023
Right now `utf8.encode()` has a static return type of `List<int>`
due to extending `Encoding` (which extends `Codec<String, List<int>>`).

We cannot easily change `Encoding` to extend `Codec<String, Uint8List>`
because that would also change `utf8.decode()` to require `Uint8List`
which would be a breaking change.

So instead we override `utf8.encode()` to have more precise return type.

Some parts of our SDK are run using the checked-in SDK, so it cannot
rely on the changed return type yet (until checked-in SDK is rolled).

So we use `const Utf8Encoder().convert()` as a temporary change, as
that already has `Uint8List` return type.

Issue #52801

TEST=ci

CoreLibraryReviewExempt: More precise return type for existing API
Change-Id: I2861d1f0eb3d292d8e3ec8437c0d441a2d2bd193
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/254903
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Lasse Nielsen <lrn@google.com>
@mkustermann
Copy link
Member Author

The breaking change has landed now.

@github-project-automation github-project-automation bot moved this from In review to Complete in Breaking Changes Jul 11, 2023
mkustermann added a commit to mkustermann/engine that referenced this issue Jul 14, 2023
The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

[0] dart-lang/sdk#52801
mkustermann added a commit to mkustermann/engine that referenced this issue Jul 14, 2023
The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  Uint8List bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  Uint8List bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.length);
```

a shorter hand for that is:

```
  Uint8List bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
mkustermann added a commit to mkustermann/engine that referenced this issue Jul 14, 2023
The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  Uint8List bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  Uint8List bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.length);
```

a shorter hand for that is:

```
  Uint8List bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
mkustermann added a commit to flutter/engine that referenced this issue Jul 14, 2023
The change in [0] has propagated now everywhere, so we can use 
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  Uint8List bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  Uint8List bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.length);
```

a shorter hand for that is:

```
  Uint8List bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
kjlubick pushed a commit to kjlubick/engine that referenced this issue Jul 14, 2023
…ter#43335)

To avoid analyzer warnings when utf8.encode() will return the more
precise Uint8List type, we use const Utf8Encoder().convert() which
already returns Uint8List

See dart-lang/sdk#52801
kjlubick pushed a commit to kjlubick/engine that referenced this issue Jul 14, 2023
…ter#43675)

The change in [0] has propagated now everywhere, so we can use 
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  Uint8List bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  Uint8List bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.length);
```

a shorter hand for that is:

```
  Uint8List bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
harryterkelsen pushed a commit to harryterkelsen/engine that referenced this issue Jul 20, 2023
…ter#43675)

The change in [0] has propagated now everywhere, so we can use 
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  Uint8List bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  Uint8List bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.length);
```

a shorter hand for that is:

```
  Uint8List bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
mkustermann added a commit to mkustermann/flutter that referenced this issue Jul 24, 2023
The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  TypedData bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  TypedData bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.lengthInBytes);
```

a shorter hand for that is:

```
  TypedData bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
mkustermann added a commit to flutter/flutter that referenced this issue Jul 24, 2023
)

The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  TypedData bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  TypedData bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.lengthInBytes);
```

a shorter hand for that is:

```
  TypedData bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
copybara-service bot pushed a commit that referenced this issue Jul 24, 2023
The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

As the checked-in SDK has been rolled to include [0] we can now rely on
the better return type.

[0] #52801

TEST=ci

CoreLibraryReviewExempt: Minor cleanup.
Change-Id: I2c0144023e03b2c265582d83a7fb9469b02f1570
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/313563
Reviewed-by: Lasse Nielsen <lrn@google.com>
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ben Konyi <bkonyi@google.com>
LouiseHsu pushed a commit to LouiseHsu/flutter that referenced this issue Jul 31, 2023
…ter#130567)

The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  TypedData bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  TypedData bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.lengthInBytes);
```

a shorter hand for that is:

```
  TypedData bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
vashworth pushed a commit to vashworth/flutter that referenced this issue Aug 2, 2023
…ter#130567)

The change in [0] has propagated now everywhere, so we can use
`utf8.encode()` instead of the longer `const Utf8Encoder.convert()`.

Also it cleans up code like

```
  TypedData bytes;
  bytes.buffer.asByteData();
```

as that is not guaranteed to be correct, the correct version would be

```
  TypedData bytes;
  bytes.buffer.asByteData(bytes.offsetInBytes, bytes.lengthInBytes);
```

a shorter hand for that is:

```
  TypedData bytes;
  ByteData.sublistView(bytes);
```

[0] dart-lang/sdk#52801
Leptopoda added a commit to nextcloud/neon that referenced this issue Aug 19, 2023
While the entire utf8 converter has been switched to Uint8List the type annotation still remains List<int> for this release.
Adding the downcast as this behavior is what we need.

A future dart release should also change the type annotations triggering a linter rule.
see: dart-lang/sdk#52801
Signed-off-by: Nikolas Rimikis <rimikis.nikolas@gmail.com>
Leptopoda added a commit to nextcloud/neon that referenced this issue Aug 24, 2023
While the entire utf8 converter has been switched to Uint8List the type annotation still remains List<int> for this release.
Adding the downcast as this behavior is what we need.

A future dart release should also change the type annotations triggering a linter rule.
see: dart-lang/sdk#52801
Signed-off-by: Nikolas Rimikis <rimikis.nikolas@gmail.com>
gkc added a commit to atsign-foundation/at_client_sdk that referenced this issue Jan 15, 2024
…inter to ignore the unnecessary cast. This is to allow us to support dart versions both before and after 3.2.0, specifically for this breaking change dart-lang/sdk#52801 which was introduced in dart 3.2.0 - see https://github.com/dart-lang/sdk/blob/main/CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. breaking-change-approved breaking-change-request This tracks requests for feedback on breaking changes
Projects
Status: Complete
Development

No branches or pull requests

8 participants