-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syntax extension fourcc!() #12034
Conversation
Wouldn't it be better to limit the character values to codepoints 0-255 instead of ASCII? I think there are several places where this could be used (as a literal for file magic numbers) where the ability to insert a value in the range 128-255 would be useful. |
@yuriks Using what encoding? The problem with non-ASCII is you have to declare some encoding to support. And strings in Rust are all UTF-8, which is unusable for this purpose as you need a single-byte encoding. |
You can directly map the code points from 0 to 255 to their byte values and
|
I believe that a syntax extension like this is the perfect candidate for a loadable syntax extension. It seems a little odd to me to bake this into the compiler itself, but I agree that it's definitely useful! |
@alexcrichton Sounds good, I'll make it loadable. Should loadable syntax extensions live elsewhere? I think the only ones we have so far are in |
Right now we don't have any examples of loadable syntax extensions in the distribution (in terms of procedural syntax extensions), but I'd be more amenable to |
Loadable is a good idea. @yuriks When you say "map the code points [...] to their byte values", map it in what encoding? And you can't use hex escapes either, because those are interpreted by the lexer. |
Source will be parsed as UTF-8, as all other rust source. What I mean is simply that, for example, unicode code-point 190 will be interpreted as the byte 190, no special encoding. This is the same what you're effectively doing for ASCII, but with a larger allowed range. What do you mean about not being able to use hex escapes? Does "foo\xBE" not work in a literal inside a macro? |
@yuriks Ok, so what you're saying is interpret it as ISO-Latin-1, because that happens to be the first block of Unicode (codepoints 128-255), and then just throw an error for any characters that fall out of this range? As for hex escapes, the lexer will see |
@kballard Yes, that's what I'm proposing. I don't really see it as an extra encoding/interpretation step, you're just casting the u32 characters to a u8 byte, erroring if they're out of range. This requires a very slight change in the interpretation to iterate over the string's characters instead of the bytes. I don't think this interpretation is particularly surprising and it adds flexibility to the macro. Another thing occurred to me: I think the macro should default to big-endian regardless of the architcture's endianess. The u32 value should be the same for a given fourcc in any architecture. Where they differ is during serialization, which is already well handled elsewhere. |
@yuriks Right now the interpretation is it produces the same byte sequence in memory on all architectures. I'm open to the idea of defaulting to big endian, although we'd want to introduce a new identifier |
The original idea was that |
The specific way it should be represented depends a lot on the specific use case. FourCCs that will be stored in a struct that is expected to memcmp would probably use target order, on the other hand, integer comparisons want it to be independent of the target's endianess, so no size fits all. That said I think defaulting to non-platform specific behaviour is the sanest default. |
(It's implied in my last comment that I agree that it needs a 'target' identifier. I'm less sure of the need for 'host', however.) |
Yeah, I'm not sure about |
I updated the PR with a todo list. I'm going to start by converting it to a loadable syntax extension and defaulting to big endian. @yuriks @kballard If either of you have some spare cycles and want to add a |
I can do those. I assume it's best to way until you convret it to a extension, right? |
Why big endian? Little seems more common in computer systems, unless you are using PPCs. |
Big endian converts to an integer in the same order that it reads: |
@yuriks Feel free to do it at your leisure. Converting it to a loadable extension won't significantly change the structure, so merging it in shouldn't be a problem. |
@dguenther I made a PR on your fork changing the default, adding |
Thanks! I'll merge that in tonight once I get home, then make it loadable. |
Syntax extension is loadable and lives in libfourcc. @alexcrichton Is the feature gate still necessary if it's loaded? |
You can remove the feature gate because it's covered under the |
// option. This file may not be copied, modified, or distributed | ||
// except according to those terms. | ||
|
||
#[feature(phase)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, all of the feature(phase)
tests need // xfail-stage1
for bootstrapping reasons.
@alexcrichton I think I've dealt with these issues. Is it okay for the tests to stay where they are? |
To load the extension and use it: | ||
|
||
```rust | ||
#[feature(phase)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll need a semicolon after it I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting a doc test failure with the semicolon:
run doc-fourcc [x86_64-apple-darwin]
running 1 test
<anon>:5:18: 5:19 error: expected item but found `;`
<anon>:5 #[feature(phase)];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing the semicolon fixed it, but I'll run tests again to check. Is there a good way to run just the doc tests, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hm, I forgot that.
For now, I think you can leave out the feature
directive because I don't think that there's a great way to add it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Hmm, if I remove the feature directive, how would I mute the error to prevent the test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```rust,ignore
// code
This looks great! Thanks for the work! Just a small comment, and otherwise r=me |
All right, I resolved that last comment. r? @alexcrichton |
Thanks, and nice work! |
Ah I also forgot that I think for you you'll need to |
Sounds good, tests now have |
@alexcrichton Hm, I'm not sure on this one. Any ideas? I'm looking for past examples of the error now.
|
I'm pretty sure this is #12102 |
For now, you can |
Sounds good. Tests now have |
Hm, now that's a curious error. I'm not entirely sure how For now, I would hazard a guess that |
Well, at least we're finding a few bugs. I've added |
Dunno if this'll work, but I'll let bors sort it out! |
The unused imports didn't cause the failure in this one, did they? |
No, we're still getting the assertion: |
@alexcrichton Was the LLVM assert the thing that you cc'd me on or was it something else? I haven't seen that before. If you're testing something that uses a |
Ah. Nothing special happens with |
fourcc!() allows you to embed FourCC (or OSType) values that are evaluated as u32 literals. It takes a 4-byte ASCII string and produces the u32 resulting in interpreting those 4 bytes as a u32, using either the platform-native endianness, or explicitly as big or little endian.
It was decided that a consistent result across platforms would be the most useful and least surprising. A "target" option has been added to get the old behaviour of using the target platform's endianess.
Codepoints with those values will be interpreted as bytes with their raw codepoint value. ('\xAB' -> 0xABu8, etc.) Codepoints > 255 remain forbidden.
Hmm... I added an empty test to the |
I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with `#[macro_registrar]`, and if so, where it should be located. Original PR is here: #9255 TODO: * [x] Convert to loadable syntax extension * [x] Default to big endian * [x] Add `target` identifier * [x] Expand to include code points 128-255
Fix issue rust-lang#12034: add autofixes for unnecessary_fallible_conversions fixes rust-lang#12034 Currently, the `unnecessary_fallible_conversions` lint was capable of autofixing expressions like `0i32.try_into().unwrap()`. However, it couldn't autofix expressions in the form of `i64::try_from(0i32).unwrap()` or `<i64 as TryFrom<i32>>::try_from(0).unwrap()`. This pull request extends the functionality to correctly autofix these latter forms as well. changelog: [`unnecessary_fallible_conversions`]: Add autofixes for more forms
I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with
#[macro_registrar]
, and if so, where it should be located.Original PR is here: #9255
TODO:
target
identifier