Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax extension fourcc!() #12034

Merged
merged 4 commits into from
Feb 9, 2014
Merged

Add syntax extension fourcc!() #12034

merged 4 commits into from
Feb 9, 2014

Conversation

dguenther
Copy link
Contributor

I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with #[macro_registrar], and if so, where it should be located.

Original PR is here: #9255

TODO:

  • Convert to loadable syntax extension
  • Default to big endian
  • Add target identifier
  • Expand to include code points 128-255

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

Wouldn't it be better to limit the character values to codepoints 0-255 instead of ASCII? I think there are several places where this could be used (as a literal for file magic numbers) where the ability to insert a value in the range 128-255 would be useful.

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

@yuriks Using what encoding? The problem with non-ASCII is you have to declare some encoding to support. And strings in Rust are all UTF-8, which is unusable for this purpose as you need a single-byte encoding.

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

You can directly map the code points from 0 to 255 to their byte values and
disallow the rest. The user can then use hex escapes to output those bytes.
On Feb 5, 2014 5:01 AM, "Kevin Ballard" notifications@github.com wrote:

@yuriks https://github.com/yuriks Using what encoding? The problem with
non-ASCII is you have to declare some encoding to support. And strings in
Rust are all UTF-8, which is unusable for this purpose as you need a
single-byte encoding.


Reply to this email directly or view it on GitHubhttps://github.com//pull/12034#issuecomment-34142171
.

@alexcrichton
Copy link
Member

I believe that a syntax extension like this is the perfect candidate for a loadable syntax extension. It seems a little odd to me to bake this into the compiler itself, but I agree that it's definitely useful!

@dguenther
Copy link
Contributor Author

@alexcrichton Sounds good, I'll make it loadable. Should loadable syntax extensions live elsewhere? I think the only ones we have so far are in libstd/macros.rs.

@alexcrichton
Copy link
Member

Right now we don't have any examples of loadable syntax extensions in the distribution (in terms of procedural syntax extensions), but I'd be more amenable to libfourcc for now as an example.

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

Loadable is a good idea.

@yuriks When you say "map the code points [...] to their byte values", map it in what encoding? And you can't use hex escapes either, because those are interpreted by the lexer.

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

Source will be parsed as UTF-8, as all other rust source. What I mean is simply that, for example, unicode code-point 190 will be interpreted as the byte 190, no special encoding. This is the same what you're effectively doing for ASCII, but with a larger allowed range. What do you mean about not being able to use hex escapes? Does "foo\xBE" not work in a literal inside a macro?

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

@yuriks Ok, so what you're saying is interpret it as ISO-Latin-1, because that happens to be the first block of Unicode (codepoints 128-255), and then just throw an error for any characters that fall out of this range?

As for hex escapes, the lexer will see "foo\xBE" and interpret the \xBE while constructing the AST. The syntax extension will only see the already-parsed string. And for some reason Rust treats \x escapes as if they were \u escapes (but only reading 2 digits instead of 4), so this is equivalent to "foo\u00BE". Of course, if you're declaring that the string should be converted into ISO-Latin-1 before being turned into a fourcc, then that does effectively turn \xBE back into the appropriate byte.

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

@kballard Yes, that's what I'm proposing. I don't really see it as an extra encoding/interpretation step, you're just casting the u32 characters to a u8 byte, erroring if they're out of range. This requires a very slight change in the interpretation to iterate over the string's characters instead of the bytes.

I don't think this interpretation is particularly surprising and it adds flexibility to the macro.

Another thing occurred to me: I think the macro should default to big-endian regardless of the architcture's endianess. The u32 value should be the same for a given fourcc in any architecture. Where they differ is during serialization, which is already well handled elsewhere.

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

@yuriks Right now the interpretation is it produces the same byte sequence in memory on all architectures. I'm open to the idea of defaulting to big endian, although we'd want to introduce a new identifier target that means "use the target endian" (and possibly even host to mean "use the host endian").

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

The original idea was that fourcc!("foo ") would produce the same 4 bytes in memory regardless of target endianness, but I don't remember anymore why I thought that was preferable to picking a default endianness.

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

The specific way it should be represented depends a lot on the specific use case. FourCCs that will be stored in a struct that is expected to memcmp would probably use target order, on the other hand, integer comparisons want it to be independent of the target's endianess, so no size fits all. That said I think defaulting to non-platform specific behaviour is the sanest default.

@yuriks
Copy link
Contributor

yuriks commented Feb 5, 2014

(It's implied in my last comment that I agree that it needs a 'target' identifier. I'm less sure of the need for 'host', however.)

@lilyball
Copy link
Contributor

lilyball commented Feb 5, 2014

Yeah, I'm not sure about host, I just suggested it because it's a thing that can be done. I have no idea if there's any actual use for it. Probably not. So let's drop that idea.

@dguenther
Copy link
Contributor Author

I updated the PR with a todo list. I'm going to start by converting it to a loadable syntax extension and defaulting to big endian. @yuriks @kballard If either of you have some spare cycles and want to add a target identifier or handle the code point expansion, that'd be helpful. Otherwise, I might try to land the PR and open issues for those.

@yuriks
Copy link
Contributor

yuriks commented Feb 6, 2014

I can do those. I assume it's best to way until you convret it to a extension, right?

@adrientetar
Copy link
Contributor

Why big endian? Little seems more common in computer systems, unless you are using PPCs.

@yuriks
Copy link
Contributor

yuriks commented Feb 6, 2014

Big endian converts to an integer in the same order that it reads: fourcc!("FOO ") == 0x464F4F20u32. Given that the purpose here is converting strings to integers, not sequences of bytes, the fact that common architectures store integers as little endian sequences of bytes doesn't have much bearing, in my opinion.

@dguenther
Copy link
Contributor Author

@yuriks Feel free to do it at your leisure. Converting it to a loadable extension won't significantly change the structure, so merging it in shouldn't be a problem.

@yuriks
Copy link
Contributor

yuriks commented Feb 6, 2014

@dguenther I made a PR on your fork changing the default, adding target and allowing 128-255.

@dguenther
Copy link
Contributor Author

Thanks! I'll merge that in tonight once I get home, then make it loadable.

@dguenther
Copy link
Contributor Author

Syntax extension is loadable and lives in libfourcc. @alexcrichton Is the feature gate still necessary if it's loaded?

@alexcrichton
Copy link
Member

You can remove the feature gate because it's covered under the feature(phase) gate.

// option. This file may not be copied, modified, or distributed
// except according to those terms.

#[feature(phase)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, all of the feature(phase) tests need // xfail-stage1 for bootstrapping reasons.

@dguenther
Copy link
Contributor Author

@alexcrichton I think I've dealt with these issues. Is it okay for the tests to stay where they are?

To load the extension and use it:

```rust
#[feature(phase)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll need a semicolon after it I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting a doc test failure with the semicolon:

run doc-fourcc [x86_64-apple-darwin]

running 1 test
<anon>:5:18: 5:19 error: expected item but found `;`
<anon>:5 #[feature(phase)];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing the semicolon fixed it, but I'll run tests again to check. Is there a good way to run just the doc tests, for example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm, I forgot that.

For now, I think you can leave out the feature directive because I don't think that there's a great way to add it otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Hmm, if I remove the feature directive, how would I mute the error to prevent the test failure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 ```rust,ignore
 // code

@alexcrichton
Copy link
Member

This looks great! Thanks for the work!

Just a small comment, and otherwise r=me

@dguenther
Copy link
Contributor Author

All right, I resolved that last comment. r? @alexcrichton

@alexcrichton
Copy link
Member

Thanks, and nice work!

@alexcrichton
Copy link
Member

Ah I also forgot that I think for you you'll need to xfail-pretty the tests.

@dguenther
Copy link
Contributor Author

Sounds good, tests now have xfail-pretty. Hopefully that solves the error.

@dguenther
Copy link
Contributor Author

@alexcrichton Hm, I'm not sure on this one. Any ideas? I'm looking for past examples of the error now.

/home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/src/test/run-pass/syntax-extension-fourcc.rs:18:1: 18:19 error: /home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/obj/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libfourcc-9d1510d3-0.10-pre.so: wrong ELF class: ELFCLASS32
/home/rustbuild/src/rust-buildbot/slave/auto-linux-64-x-android-t/build/src/test/run-pass/syntax-extension-fourcc.rs:18 extern mod fourcc;
                                                                                                                        ^~~~~~~~~~~~~~~~~~

@yuriks
Copy link
Contributor

yuriks commented Feb 8, 2014

I'm pretty sure this is #12102

@alexcrichton
Copy link
Member

For now, you can // xfail-android the tests (until #12102 is fixed)

@dguenther
Copy link
Contributor Author

Sounds good. Tests now have // xfail-android

@alexcrichton
Copy link
Member

Hm, now that's a curious error. I'm not entirely sure how #[macro_registrar] factors into testing a crate. (cc @sfackler).

For now, I would hazard a guess that #[cfg(not(test))] on the registrar may help things, but that's just a guess (I've never seen this error before).

@dguenther
Copy link
Contributor Author

Well, at least we're finding a few bugs. I've added #[cfg(not(test))] to the macro registrar.

@alexcrichton
Copy link
Member

Dunno if this'll work, but I'll let bors sort it out!

@dguenther
Copy link
Contributor Author

The unused imports didn't cause the failure in this one, did they?

@yuriks
Copy link
Contributor

yuriks commented Feb 9, 2014

No, we're still getting the assertion: Assertion failed: Section->Number != -1 && "Sections with relocations must be real!", file c:/bot/slave/auto-win-32-opt/build/src/llvm/lib/MC/WinCOFFObjectWriter.cpp, line 224

@sfackler
Copy link
Member

sfackler commented Feb 9, 2014

@alexcrichton Was the LLVM assert the thing that you cc'd me on or was it something else? I haven't seen that before.

If you're testing something that uses a #[phase(syntax)] crate, you'll need to put it in run-pass-fulldeps to make sure you have an up to date libsyntax, xfail-fast and xfail-stage1 it.

@alexcrichton
Copy link
Member

@sfackler I was just curious if anything special was done during testing for feature(phase) or anything related.

This look like it's #10872 haunting us again. Perhaps adding a dummy test which doesn't do anything will help this? (the test case triggering this assert has a --test but not tests.

@sfackler
Copy link
Member

sfackler commented Feb 9, 2014

Ah. Nothing special happens with phase(syntax) stuff when compiling tests. I don't think (?) anything special should need to be done. It's a bit strange that this would be failing even though something like run-pass-fulldeps/macro-crate.rs works fine.

lilyball and others added 4 commits February 8, 2014 23:40
fourcc!() allows you to embed FourCC (or OSType) values that are
evaluated as u32 literals. It takes a 4-byte ASCII string and produces
the u32 resulting in interpreting those 4 bytes as a u32, using either
the platform-native endianness, or explicitly as big or little endian.
It was decided that a consistent result across platforms would be the
most useful and least surprising. A "target" option has been added to
get the old behaviour of using the target platform's endianess.
Codepoints with those values will be interpreted as bytes with their
raw codepoint value. ('\xAB' -> 0xABu8, etc.) Codepoints > 255 remain
forbidden.
@dguenther
Copy link
Contributor Author

Hmm... I added an empty test to the lib.rs file and moved the run-pass test into run-pass-fulldeps. Worth another shot?

bors added a commit that referenced this pull request Feb 9, 2014
I was looking into #9303 and was curious if this would still be valuable. @kballard had already done 99% of the work, so I brought the branch up to date and added a feature gate. Any feedback would be appreciated; I wasn't sure if this should be set up as a syntax extension with `#[macro_registrar]`, and if so, where it should be located.

Original PR is here: #9255

TODO:
* [x] Convert to loadable syntax extension
* [x] Default to big endian
* [x] Add `target` identifier
* [x] Expand to include code points 128-255
@bors bors closed this Feb 9, 2014
@bors bors merged commit 337e62e into rust-lang:master Feb 9, 2014
@dguenther dguenther deleted the fourcc branch February 10, 2014 18:06
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 26, 2024
Fix issue rust-lang#12034: add autofixes for unnecessary_fallible_conversions

fixes rust-lang#12034

Currently, the `unnecessary_fallible_conversions` lint was capable of autofixing expressions like `0i32.try_into().unwrap()`. However, it couldn't autofix expressions in the form of `i64::try_from(0i32).unwrap()` or `<i64 as TryFrom<i32>>::try_from(0).unwrap()`.

This pull request extends the functionality to correctly autofix these latter forms as well.

changelog: [`unnecessary_fallible_conversions`]: Add autofixes for more forms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants