Add syntax extension fourcc!() #12034

dguenther · 2014-02-04T22:31:17Z

I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with #[macro_registrar], and if so, where it should be located.

Original PR is here: #9255

TODO:

Convert to loadable syntax extension
Default to big endian
Add target identifier
Expand to include code points 128-255

yuriks · 2014-02-05T02:26:59Z

Wouldn't it be better to limit the character values to codepoints 0-255 instead of ASCII? I think there are several places where this could be used (as a literal for file magic numbers) where the ability to insert a value in the range 128-255 would be useful.

lilyball · 2014-02-05T07:00:40Z

@yuriks Using what encoding? The problem with non-ASCII is you have to declare some encoding to support. And strings in Rust are all UTF-8, which is unusable for this purpose as you need a single-byte encoding.

yuriks · 2014-02-05T07:30:55Z

You can directly map the code points from 0 to 255 to their byte values and
disallow the rest. The user can then use hex escapes to output those bytes.
On Feb 5, 2014 5:01 AM, "Kevin Ballard" notifications@github.com wrote:

@yuriks https://github.com/yuriks Using what encoding? The problem with
non-ASCII is you have to declare some encoding to support. And strings in
Rust are all UTF-8, which is unusable for this purpose as you need a
single-byte encoding.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/12034#issuecomment-34142171
.

alexcrichton · 2014-02-05T17:18:06Z

I believe that a syntax extension like this is the perfect candidate for a loadable syntax extension. It seems a little odd to me to bake this into the compiler itself, but I agree that it's definitely useful!

dguenther · 2014-02-05T18:00:16Z

@alexcrichton Sounds good, I'll make it loadable. Should loadable syntax extensions live elsewhere? I think the only ones we have so far are in libstd/macros.rs.

alexcrichton · 2014-02-05T18:32:47Z

Right now we don't have any examples of loadable syntax extensions in the distribution (in terms of procedural syntax extensions), but I'd be more amenable to libfourcc for now as an example.

lilyball · 2014-02-05T19:53:18Z

Loadable is a good idea.

@yuriks When you say "map the code points [...] to their byte values", map it in what encoding? And you can't use hex escapes either, because those are interpreted by the lexer.

yuriks · 2014-02-05T20:21:57Z

Source will be parsed as UTF-8, as all other rust source. What I mean is simply that, for example, unicode code-point 190 will be interpreted as the byte 190, no special encoding. This is the same what you're effectively doing for ASCII, but with a larger allowed range. What do you mean about not being able to use hex escapes? Does "foo\xBE" not work in a literal inside a macro?

lilyball · 2014-02-05T20:24:50Z

@yuriks Ok, so what you're saying is interpret it as ISO-Latin-1, because that happens to be the first block of Unicode (codepoints 128-255), and then just throw an error for any characters that fall out of this range?

As for hex escapes, the lexer will see "foo\xBE" and interpret the \xBE while constructing the AST. The syntax extension will only see the already-parsed string. And for some reason Rust treats \x escapes as if they were \u escapes (but only reading 2 digits instead of 4), so this is equivalent to "foo\u00BE". Of course, if you're declaring that the string should be converted into ISO-Latin-1 before being turned into a fourcc, then that does effectively turn \xBE back into the appropriate byte.

yuriks · 2014-02-05T20:35:36Z

@kballard Yes, that's what I'm proposing. I don't really see it as an extra encoding/interpretation step, you're just casting the u32 characters to a u8 byte, erroring if they're out of range. This requires a very slight change in the interpretation to iterate over the string's characters instead of the bytes.

I don't think this interpretation is particularly surprising and it adds flexibility to the macro.

Another thing occurred to me: I think the macro should default to big-endian regardless of the architcture's endianess. The u32 value should be the same for a given fourcc in any architecture. Where they differ is during serialization, which is already well handled elsewhere.

lilyball · 2014-02-05T20:39:52Z

@yuriks Right now the interpretation is it produces the same byte sequence in memory on all architectures. I'm open to the idea of defaulting to big endian, although we'd want to introduce a new identifier target that means "use the target endian" (and possibly even host to mean "use the host endian").

lilyball · 2014-02-05T20:40:50Z

The original idea was that fourcc!("foo ") would produce the same 4 bytes in memory regardless of target endianness, but I don't remember anymore why I thought that was preferable to picking a default endianness.

yuriks · 2014-02-05T20:45:53Z

The specific way it should be represented depends a lot on the specific use case. FourCCs that will be stored in a struct that is expected to memcmp would probably use target order, on the other hand, integer comparisons want it to be independent of the target's endianess, so no size fits all. That said I think defaulting to non-platform specific behaviour is the sanest default.

yuriks · 2014-02-05T20:46:56Z

(It's implied in my last comment that I agree that it needs a 'target' identifier. I'm less sure of the need for 'host', however.)

lilyball · 2014-02-05T20:47:53Z

Yeah, I'm not sure about host, I just suggested it because it's a thing that can be done. I have no idea if there's any actual use for it. Probably not. So let's drop that idea.

dguenther · 2014-02-06T15:53:31Z

I updated the PR with a todo list. I'm going to start by converting it to a loadable syntax extension and defaulting to big endian. @yuriks @kballard If either of you have some spare cycles and want to add a target identifier or handle the code point expansion, that'd be helpful. Otherwise, I might try to land the PR and open issues for those.

yuriks · 2014-02-06T17:00:24Z

I can do those. I assume it's best to way until you convret it to a extension, right?

adrientetar · 2014-02-06T17:51:01Z

Why big endian? Little seems more common in computer systems, unless you are using PPCs.

yuriks · 2014-02-06T17:56:52Z

Big endian converts to an integer in the same order that it reads: fourcc!("FOO ") == 0x464F4F20u32. Given that the purpose here is converting strings to integers, not sequences of bytes, the fact that common architectures store integers as little endian sequences of bytes doesn't have much bearing, in my opinion.

dguenther · 2014-02-06T18:08:57Z

@yuriks Feel free to do it at your leisure. Converting it to a loadable extension won't significantly change the structure, so merging it in shouldn't be a problem.

yuriks · 2014-02-06T22:39:09Z

@dguenther I made a PR on your fork changing the default, adding target and allowing 128-255.

dguenther · 2014-02-06T22:53:18Z

Thanks! I'll merge that in tonight once I get home, then make it loadable.

dguenther · 2014-02-07T17:38:32Z

Syntax extension is loadable and lives in libfourcc. @alexcrichton Is the feature gate still necessary if it's loaded?

alexcrichton · 2014-02-07T17:43:51Z

You can remove the feature gate because it's covered under the feature(phase) gate.

alexcrichton · 2014-02-07T17:44:58Z

src/test/compile-fail/gated-fourcc.rs

+// option. This file may not be copied, modified, or distributed
+// except according to those terms.
+
+#[feature(phase)];


For now, all of the feature(phase) tests need // xfail-stage1 for bootstrapping reasons.

dguenther · 2014-02-07T21:19:18Z

@alexcrichton I think I've dealt with these issues. Is it okay for the tests to stay where they are?

alexcrichton · 2014-02-07T21:36:28Z

src/libfourcc/lib.rs

+To load the extension and use it:
+
+```rust
+#[feature(phase)]


This'll need a semicolon after it I believe.

I was getting a doc test failure with the semicolon:

run doc-fourcc [x86_64-apple-darwin] running 1 test <anon>:5:18: 5:19 error: expected item but found `;` <anon>:5 #[feature(phase)];

I think removing the semicolon fixed it, but I'll run tests again to check. Is there a good way to run just the doc tests, for example?

Oh hm, I forgot that.

For now, I think you can leave out the feature directive because I don't think that there's a great way to add it otherwise.

@alexcrichton Hmm, if I remove the feature directive, how would I mute the error to prevent the test failure?

```rust,ignore // code

alexcrichton · 2014-02-07T21:38:47Z

This looks great! Thanks for the work!

Just a small comment, and otherwise r=me

dguenther · 2014-02-07T23:31:53Z

All right, I resolved that last comment. r? @alexcrichton

alexcrichton · 2014-02-08T02:57:12Z

Thanks, and nice work!

alexcrichton · 2014-02-08T03:37:29Z

Ah I also forgot that I think for you you'll need to xfail-pretty the tests.

dguenther · 2014-02-08T06:42:53Z

Sounds good, tests now have xfail-pretty. Hopefully that solves the error.

dguenther · 2014-02-08T14:31:30Z

@alexcrichton Hm, I'm not sure on this one. Any ideas? I'm looking for past examples of the error now.

/home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/src/test/run-pass/syntax-extension-fourcc.rs:18:1: 18:19 error: /home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/obj/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libfourcc-9d1510d3-0.10-pre.so: wrong ELF class: ELFCLASS32
/home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/src/test/run-pass/syntax-extension-fourcc.rs:18 extern mod fourcc;
                                                                                                                        ^~~~~~~~~~~~~~~~~~

yuriks · 2014-02-08T16:33:51Z

I'm pretty sure this is #12102

alexcrichton · 2014-02-08T19:49:31Z

For now, you can // xfail-android the tests (until #12102 is fixed)

dguenther · 2014-02-08T20:07:16Z

Sounds good. Tests now have // xfail-android

alexcrichton · 2014-02-08T23:40:36Z

Hm, now that's a curious error. I'm not entirely sure how #[macro_registrar] factors into testing a crate. (cc @sfackler).

For now, I would hazard a guess that #[cfg(not(test))] on the registrar may help things, but that's just a guess (I've never seen this error before).

dguenther · 2014-02-09T00:14:39Z

Well, at least we're finding a few bugs. I've added #[cfg(not(test))] to the macro registrar.

alexcrichton · 2014-02-09T00:15:20Z

Dunno if this'll work, but I'll let bors sort it out!

dguenther · 2014-02-09T03:46:35Z

The unused imports didn't cause the failure in this one, did they?

yuriks · 2014-02-09T03:52:42Z

No, we're still getting the assertion: Assertion failed: Section->Number != -1 && "Sections with relocations must be real!", file c:/bot/slave/auto-win-32-opt/build/src/llvm/lib/MC/WinCOFFObjectWriter.cpp, line 224

sfackler · 2014-02-09T04:46:00Z

@alexcrichton Was the LLVM assert the thing that you cc'd me on or was it something else? I haven't seen that before.

If you're testing something that uses a #[phase(syntax)] crate, you'll need to put it in run-pass-fulldeps to make sure you have an up to date libsyntax, xfail-fast and xfail-stage1 it.

alexcrichton · 2014-02-09T04:49:50Z

@sfackler I was just curious if anything special was done during testing for feature(phase) or anything related.

This look like it's #10872 haunting us again. Perhaps adding a dummy test which doesn't do anything will help this? (the test case triggering this assert has a --test but not tests.

sfackler · 2014-02-09T04:58:39Z

Ah. Nothing special happens with phase(syntax) stuff when compiling tests. I don't think (?) anything special should need to be done. It's a bit strange that this would be failing even though something like run-pass-fulldeps/macro-crate.rs works fine.

fourcc!() allows you to embed FourCC (or OSType) values that are evaluated as u32 literals. It takes a 4-byte ASCII string and produces the u32 resulting in interpreting those 4 bytes as a u32, using either the platform-native endianness, or explicitly as big or little endian.

It was decided that a consistent result across platforms would be the most useful and least surprising. A "target" option has been added to get the old behaviour of using the target platform's endianess.

Codepoints with those values will be interpreted as bytes with their raw codepoint value. ('\xAB' -> 0xABu8, etc.) Codepoints > 255 remain forbidden.

dguenther · 2014-02-09T05:43:34Z

Hmm... I added an empty test to the lib.rs file and moved the run-pass test into run-pass-fulldeps. Worth another shot?

I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with `#[macro_registrar]`, and if so, where it should be located. Original PR is here: #9255 TODO: * [x] Convert to loadable syntax extension * [x] Default to big endian * [x] Add `target` identifier * [x] Expand to include code points 128-255

Fix issue rust-lang#12034: add autofixes for unnecessary_fallible_conversions fixes rust-lang#12034 Currently, the `unnecessary_fallible_conversions` lint was capable of autofixing expressions like `0i32.try_into().unwrap()`. However, it couldn't autofix expressions in the form of `i64::try_from(0i32).unwrap()` or `<i64 as TryFrom<i32>>::try_from(0).unwrap()`. This pull request extends the functionality to correctly autofix these latter forms as well. changelog: [`unnecessary_fallible_conversions`]: Add autofixes for more forms

alexcrichton reviewed Feb 7, 2014
View reviewed changes

lilyball and others added 4 commits February 8, 2014 23:40

Converted fourcc! to loadable syntax extension

97078d4

Default fourcc! to big-endian.

6381daa

It was decided that a consistent result across platforms would be the most useful and least surprising. A "target" option has been added to get the old behaviour of using the target platform's endianess.

Allow codepoints 128-255 in fourc!!

337e62e

Codepoints with those values will be interpreted as bytes with their raw codepoint value. ('\xAB' -> 0xABu8, etc.) Codepoints > 255 remain forbidden.

bors closed this Feb 9, 2014

bors merged commit 337e62e into rust-lang:master Feb 9, 2014

dguenther deleted the fourcc branch February 10, 2014 18:06

Add syntax extension fourcc!() #12034

Add syntax extension fourcc!() #12034

Uh oh!

Conversation

dguenther commented Feb 4, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

alexcrichton commented Feb 5, 2014

Uh oh!

dguenther commented Feb 5, 2014

Uh oh!

alexcrichton commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

yuriks commented Feb 5, 2014

Uh oh!

lilyball commented Feb 5, 2014

Uh oh!

dguenther commented Feb 6, 2014

Uh oh!

yuriks commented Feb 6, 2014

Uh oh!

adrientetar commented Feb 6, 2014

Uh oh!

yuriks commented Feb 6, 2014

Uh oh!

dguenther commented Feb 6, 2014

Uh oh!

yuriks commented Feb 6, 2014

Uh oh!

dguenther commented Feb 6, 2014

Uh oh!

dguenther commented Feb 7, 2014

Uh oh!

alexcrichton commented Feb 7, 2014

Uh oh!

alexcrichton Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

dguenther commented Feb 7, 2014

Uh oh!

alexcrichton Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

dguenther Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

dguenther Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

dguenther Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Feb 7, 2014

Uh oh!

dguenther commented Feb 7, 2014