Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DiplomatStr #367

Merged
merged 3 commits into from
Nov 22, 2023
Merged

Introduce DiplomatStr #367

merged 3 commits into from
Nov 22, 2023

Conversation

robertbastian
Copy link
Collaborator

@robertbastian robertbastian commented Nov 22, 2023

#57

The existing usage of str in Diplomat is that it basically represents WTF8 on the ABI. This PR pushes the validation from the diplomat glue into the Rust library, which might be able to handle invalid UTF-8, so validation might not be necessary.

My plan is to add DiplomatStr16 next. If we want to we can add str back, but this time the glue code would do an unsafe unwrap, and these APIs should only be enabled if the caller can do validation, and should be documented to be UB if used incorrectly.

@Manishearth
Copy link
Contributor

This and the other PR should not be called WTF-whatever, WTF-8 is very specific encoding that has a documented way of storing unpaired surrogates.

This is unvalidated UTF-8, it should be called DiplomatStr and the other one DimplomatStr16.

@robertbastian
Copy link
Collaborator Author

ok

@robertbastian robertbastian changed the title Introduce DiplomatWtf8 Introduce DiplomatStr Nov 22, 2023
@Manishearth Manishearth merged commit 60e78d1 into rust-diplomat:main Nov 22, 2023
5 checks passed
Copy link
Contributor

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if people still have str in their diplomat code?

@robertbastian
Copy link
Collaborator Author

  • We'll make a breaking change
  • I'm planning to add str back, but with loud UB warnings on the C/C++ side

@Manishearth
Copy link
Contributor

They have to upgrade. We've already made a macro-breaking change with the char changes, and both of these bring us closer to the world that matches reality: the strings and characters you get over Diplomat have not been validated and should not be relied on as-is.

@Manishearth
Copy link
Contributor

Manishearth commented Nov 22, 2023

I'd rather add str back after 2.0 when everyone is on the HIR backends because then we can gate this on a CFG-feature. Not strongly opposed to sooner.

We should also have a prefers = DiplomatStr / prefers = DiplomatStr16 CFG thing so that we can hide UTF8 APIs on JS (and rename the utf16 ones)

@robertbastian
Copy link
Collaborator Author

We should also have a prefers = DiplomatStr / prefers = DiplomatStr16 CFG thing so that we can hide UTF8 APIs on JS (and rename the utf16 ones)

I'm doing this manually for Dart at the moment, I'm not sure if automating it with more config is worth the effort for us at least.

@Manishearth
Copy link
Contributor

Not in the short run, no. In the short run we should do exactly what you're doing.

Post 2.0 when we've gotten rid of all AST backends we should design diplomat CFGs that cover all the use cases that crop up and move to that, I'd like to not have the bridge mention backends for any reason other than "this backend does not support this feature currently but could".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants