Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add saltwater backend #1782

Closed
wants to merge 5 commits into from
Closed

[WIP] add saltwater backend #1782

wants to merge 5 commits into from

Conversation

jyn514
Copy link
Member

@jyn514 jyn514 commented May 14, 2020

Addresses #1762

Not expecting this to be merged, just looking for feedback on whether this is the right approach.

I forgot that the lexer returns its own Locations, so I'm using those just to get a prototype going. I can switch to using the proper clang locations later (although since they are never displayed maybe it won't matter?)

The lexer currently doesn't allow inputting a single token at a time so I hacked around it by treating the token span as if it were a whole file and assuming there was only one token in the file. I can fix this later, not sure yet how hard it will be on the rcc end.

All the test cases pass locally. Note that this already could fix #1594 if I removed the as i64 cast and instead returned a Literal from parse_int_literal_tokens. It looks like bindgen has a VarType so impl From<Literal> for VarType should be pretty simple. It would be nice to add an UnsignedInt variant as well, then the two would match up almost exactly (rcc has no literal boolean values).

What is parse_macro doing? It looks like it's supposed to parse the entire #define a 1 + 2 directive? Should it also perform macro replacement, and should that include function macros?

cc @emilio

Copy link
Contributor

@emilio emilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks like a sensible approach to me :)

@jyn514

This comment has been minimized.

@jyn514

This comment has been minimized.

@jyn514
Copy link
Member Author

jyn514 commented May 14, 2020

I added parse_macro, which is even more of a work in progress since it doesn't pass in the currently parsed macros. This requires some work on the rcc side.

header_namespace_hpp will fail even if that's fixed: The test case looks like

#define NAMESPACE foobar
namespace NAMESPACE { ... }

It seems that cexpr used to return an error for that define, but rcc instead returns Ok(Literal::Int(0)). Have to look into why, it should be giving an error that foobar is undefined since this happens after preprocessing.

@jyn514
Copy link
Member Author

jyn514 commented May 20, 2020

I added a commit passing in the macro definitions. The failures look simpler to fix now:

diff
---- header_complex_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/complex.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/complex.h"
+pub const COMPLEX_TEST: u32 = 0;
---- header_infinite_macro_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/infinite-macro.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/infinite-macro.h"
-
-pub const INFINITY: f64 = ::std::f64::INFINITY;
-pub const NAN: f64 = ::std::f64::NAN;
---- header_issue_1676_macro_namespace_prefix_hpp stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/issue-1676-macro-namespace-prefix.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/issue-1676-macro-namespace-prefix.hpp"
+
+pub const nssv_inline_ns: u32 = 0;
---- header_jsval_layout_opaque_1_0_hpp stdout ----
---- header_jsval_layout_opaque_hpp stdout ----
---- header_layout_array_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/layout_array.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/layout_array.h"
+pub const __rte_aligned: u32 = 0;
+pub const __rte_cache_aligned: u32 = 0;
+pub const LIST_HEAD: u32 = 0;
---- header_layout_array_too_long_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/layout_array_too_long.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/layout_array_too_long.h"
+pub const __rte_aligned: u32 = 0;
+pub const __rte_cache_aligned: u32 = 0;
+pub const TAILQ_ENTRY: u32 = 0;
---- header_layout_large_align_field_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/layout_large_align_field.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/layout_large_align_field.h"
+pub const __rte_aligned: u32 = 0;
+pub const __rte_cache_aligned: u32 = 0;
+pub const TAILQ_HEAD: u32 = 0;
+pub const TAILQ_ENTRY: u32 = 0;
---- header_layout_mbuf_1_0_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/layout_mbuf_1_0.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/layout_mbuf_1_0.h"
+pub const __rte_aligned: u32 = 0;
+pub const __rte_cache_aligned: u32 = 0;
+pub const __rte_cache_min_aligned: u32 = 0;
---- header_layout_mbuf_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/layout_mbuf.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/layout_mbuf.h"
+pub const __rte_aligned: u32 = 0;
+pub const __rte_cache_aligned: u32 = 0;
+pub const __rte_cache_min_aligned: u32 = 0;
---- header_macro_const_h stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/macro_const.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/macro_const.h"
-pub const foo: &'static [u8; 4usize] = b"bar\0";
+pub const foo: &'static [u8; 5usize] = b"bar\0\0";
-pub const INVALID_UTF8: [u8; 5usize] = [240u8, 40u8, 140u8, 40u8, 0u8];
+pub const INVALID_UTF8: [u8; 6usize] = [240u8, 40u8, 140u8, 40u8, 0u8, 0u8];
---- header_namespace_hpp stdout ----
--- expected: "/home/joshua/src/rust/bindgen/tests/expectations/tests/namespace.rs"
+++ generated from: "/home/joshua/src/rust/bindgen/tests/headers/namespace.hpp"
+    pub const NAMESPACE: u32 = 0;
-    pub mod _bindgen_mod_id_17 {
+    pub mod _bindgen_mod_id_18 {
-        pub _base: root::_bindgen_mod_id_17::A,
+        pub _base: root::_bindgen_mod_id_18::A,
  1. I'm adding a \0 character to the end of Token::Str, but bindgen already has those as part of the token.
  2. I'm generating more macros than cexpr was (e.g. COMPLEX_TEST and NAMESPACE). Need to investigate this more.
    • I mistakenly marked function-macros as 0, since I counted (a, b, ..., c) as a comma expression yielding the value of c, then substituted c for 0 and ignored the body. rcc now correctly gives an error if an #if has trailing characters after the expression.
    • I count #define a b as defining a to 0. This is the correct behavior according to the standard, but means bindgen generates more const items than before.
  3. I don't allow const-folding NaN and INFINITY, which is not correct:
<stdin>:1:11 error: invalid program: cannot divide by zero
float f = 1.0 / 0;
          ^^^^^^^
<stdin>:1:11 error: invalid program: cannot divide by zero
float f = 0.0 / 0;
          ^^^^^^^
  1. I give an overflow for header_jsval_layout_opaque_hpp and header_jsval_layout_opaque_1_0_hpp:
tests/headers/jsval_layout_opaque_1_0.hpp:117:71 error: invalid program: positive overflow in expresson
    JSVAL_SHIFTED_TAG_MAX_DOUBLE   = ((((uint64_t)JSVAL_TAG_MAX_DOUBLE) << JSVAL_TAG_SHIFT) | 0xFFFFFFFF),
                                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Also the location error is wrong here. Need to debug both of these issues. This is just jyn514/saltwater#205

@jyn514
Copy link
Member Author

jyn514 commented May 23, 2020

header_error_e0600_cannot_apply_unary_negation_to_u32_h seems to have fixed itself at some point, I think because it's in expression position and not in a macro.

@jyn514
Copy link
Member Author

jyn514 commented May 24, 2020

I found the reason for the overflow in header_jsval_layout_opaque_hpp and header_jsval_layout_opaque_hpp: I consider all enums to be at most i64, but that test case expects supporting up to u64. With -pedantic_errors, clang gives the same error:

enum E {
	A = 18444492278190833663 
};
<stdin>:2:6: error: integer literal is too large to be represented in a signed integer type, interpreting as unsigned [-Werror,-Wimplicitly-unsigned-literal]
<stdin>:2:2: error: ISO C restricts enumerator values to range of 'int' (18444492278190833663 is too large) [-Werror,-Wpedantic]

Do you want me to try and support this?

@jyn514
Copy link
Member Author

jyn514 commented May 28, 2020

@emilio I think everything left requires a policy decision, let me know when you have a chance to look at this :)

To summarize:

  • Several tests use values for an enum that fit into u64 but not i64. This technically breaks the standard but I could probably find a way to support it if I thought about it (the tricky part is deciding what to do for enum e { A = -1, B = UINT_MAX })
  • I have different output than cexpr for #define a b, where b is not a macro definition: I consider it to be the constant 0, as per http://port70.net/~nsz/c/c11/n1570.html#6.10.1p4. It shouldn't be too hard to make this configurable. The issue is that cpp_expr assumes that you're calling it in an #if, but in bindgen's case, it's actually an arbitrary macro expansion. See fe546e6 for full diff of changes.

@bors-servo
Copy link

☔ The latest upstream changes (presumably d2e0407) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514
Copy link
Member Author

jyn514 commented Jun 21, 2020

I fixed the conflicts and also switched to a published version of rcc so that CI shows meaningful failures instead of blocking network access. Note that rcc was renamed to saltwater recently so I changed some stuff related to that as well.

I'm still interested in seeing this in bindgen, but I've been having trouble getting in contact with anyone. What are the next steps? Are you still interested in using a different backend for parsing expressions? I'm not sure it makes sense to keep spending time on this if you're not interested.

@jyn514
Copy link
Member Author

jyn514 commented Jun 21, 2020

The CI failure doesn't seem related to this PR (https://travis-ci.com/github/rust-lang/rust-bindgen/jobs/351921557#L426):

   Compiling bindgen-integration v0.1.0 (/home/travis/build/rust-lang/rust-bindgen/bindgen-integration)
error[E0133]: use of extern static is unsafe and requires unsafe function or block

   --> src/lib.rs:240:5

    |

240 |     assert_eq!(bindings::CONST_VALUE, 3);

    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of extern static

    |

    = note: extern statics are not controlled by the Rust type system: invalid data, aliasing violations or data races will cause undefined behavior

@emilio
Copy link
Contributor

emilio commented Jun 21, 2020

I still think this is interesting, sorry for not getting back (I'm just very swamped with real life to invest a lot of time on bindgen r/n).

@emilio
Copy link
Contributor

emilio commented Jun 21, 2020

That being said, looks like those failures are real regressions. Here's another example from automation:

-pub const k: EasyToOverflow = 2147483648;
+extern "C" {
+    pub static k: EasyToOverflow;
+}

Seems like we were able to evaluate the constant before and now we're falling back to generating a static.

@emilio
Copy link
Contributor

emilio commented Jun 21, 2020

(Which is a bit odd, as we're using clang to evaluate those, in theory...)

These patches look generally reasonable. When running tests locally I get:

---- header_jsval_layout_opaque_1_0_hpp stdout ----
thread 'header_jsval_layout_opaque_1_0_hpp' panicked at 'saltwater failed to parse clang token: Locatable { data: IntegerOverflow { is_signed: Some(true) }, location: Location { span: Span { start: 0, end: 18 }, file: FileId(1) } }', src/clang.rs:822:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- header_jsval_layout_opaque_hpp stdout ----
thread 'header_jsval_layout_opaque_hpp' panicked at 'saltwater failed to parse clang token: Locatable { data: IntegerOverflow { is_signed: Some(true) }, location: Location { span: Span { start: 0, end: 18 }, file: FileId(1) } }', src/clang.rs:822:33

It seems saltwater is choking at:

#define JSVAL_TAG_MASK               0xFFFF800000000000LL

Which is kind of a blocker because this is real code, which ought to work. I... don't think that's an integer overflow?

The NAMESPACE = 0 changes are also a bit concerning... Though I guess not too much, and I guess we still prevent duplicate definitions if they're used in multiple places so it's probably ok, if not ideal.

@jyn514
Copy link
Member Author

jyn514 commented Jun 21, 2020

Here's another example from automation:

Do you know how to reproduce that locally? In a typed context that works fine:

$ swcc -c
typedef unsigned long long EasyToOverflow;
const EasyToOverflow k = 0x80000000;
const EasyToOverflow k_expr = 1ULL << 60;
const EasyToOverflow wow = 1ULL << 31;
$ swcc -c
// ok, let's try again with int instead
int i = 0x80000000;
<stdin>:1:9 warning: conversion to int loses precision (-2147483648 != 2147483648)
int i = 0x80000000;
        ^^^^^^^^^^

So my suspicion is that this is trying to parse it as a signed integer instead of unsigned.

Which is kind of a blocker because this is real code, which ought to work. I... don't think that's an integer overflow?

#1782 (comment) (tl;dr: it overflows for long but not for unsigned long)

The NAMESPACE = 0 changes are also a bit concerning... Though I guess not too much, and I guess we still prevent duplicate definitions if they're used in multiple places so it's probably ok, if not ideal.

I'm not quite sure how to avoid them and still do macro replacement ... see #1782 (comment). Maybe I could add a special mode that treats #define a b as an error?

@jyn514
Copy link
Member Author

jyn514 commented Jun 21, 2020

Another idea is on the bindgen side, I could try first parsing as a signed integer, and if that doesn't work, try unsigned. Let me test that out real quick.

@jyn514 jyn514 changed the title [WIP] add rcc backend [WIP] add saltwater backend Jun 21, 2020
@emilio
Copy link
Contributor

emilio commented Jun 21, 2020

It seems it depends on the libclang version. So libclang 3.8 should reproduce that error.

@jyn514
Copy link
Member Author

jyn514 commented Jun 22, 2020

Another idea is on the bindgen side, I could try first parsing as a signed integer, and if that doesn't work, try unsigned. Let me test that out real quick.

This seemed to work, at least locally. We'll see what pops up in CI.

@jyn514
Copy link
Member Author

jyn514 commented Jun 22, 2020

The remaining failures are the ones that only happen with llvm < 4. It only happens in parse_int_literal_tokens - I think it's the bug mentioned in parse_macro where trailing tokens are included when they shouldn't be. The fix should be the same as parse_macro, try discarding the last token.

@jyn514
Copy link
Member Author

jyn514 commented Jun 22, 2020

Yay this is passing except for MSRV 🎉

About minimum versions, the following saltwater deps require later than 1.34:

  • codespan-reporting: this shouldn't be a dependency at all, fixed by Allow using codespan without codespan-reporting brendanzab/codespan#251 and Bump dependency versions jyn514/saltwater#480
  • hexponent: I talked to @pythondude325 and he thinks this should be pretty easy to fix (#[no_std] will only be supported on recent rust versions) Just waiting on me to bump the version
  • lasso: I pinged @Kixiron but haven't heard back. Nothing here looks like a blocker though, it's mostly parts of std that could be rewritten with slightly more code. Waiting on Add compatibility with 1.34 Kixiron/lasso#3
  • target_lexicon: this one will be tricky, they make heavy use of #[nonexhaustive] and Self::variant. I'll opened Consider setting a lower MSRV bytecodealliance/target-lexicon#58 on the off chance it will help. If worst comes to worst, I don't think bindgen actually needs sizeof(int) in macros (??) so I might be able to make a reduced version of the parser which disallows sizeof, but I have a lot of conditional compilation already and I'd like to avoid that if possible. and we came up with a plan using features. Waiting on target_lexicon to publish a new version and cranelift to update to that version.
  • saltwater itself: just need to remove some matches! and a redundant default-run.

@emilio
Copy link
Contributor

emilio commented Jun 22, 2020

I think bumping the msrv if needed should also be fine, fwiw. We have the check to make sure we notice when we change it.

@pythongirl325
Copy link

hexponent version 0.3.1 now supports Rust version 1.34.0.

@Kixiron
Copy link
Member

Kixiron commented Jul 13, 2020

lasso version 0.3.0 now supports Rust v1.34

@kulp kulp mentioned this pull request Jul 14, 2020
@kulp
Copy link
Member

kulp commented Jul 20, 2020

@jyn514, now that you have done all the hard work of getting MSRV requirements lowered ... bumping the MSRV to 1.40.0 is being forced elsewhere (#1826 (comment)). I mention it just in case you get to updating the MSRV before #1826 does.

@jyn514
Copy link
Member Author

jyn514 commented Jul 20, 2020

After all that hard work :( Oh well, I'll try and rebase this this week.

@jyn514
Copy link
Member Author

jyn514 commented Jul 20, 2020

@emilio this is ready for review

@bors-servo
Copy link

☔ The latest upstream changes (presumably 94bce16) made this pull request unmergeable. Please resolve the merge conflicts.

- Rewrite parse_macro using rcc
- Pass in macro definitions to rcc
- Remove cexpr
- Remove null-terminator from rcc tokens
Just a few more harmless constants
- Don't try to add 'u' if it's already there
This was the same LLVM bug noted in parse_macro, but it seems that cexpr
allowed trailing tokens while saltwater does not.
@jyn514
Copy link
Member Author

jyn514 commented Sep 6, 2020

Still waiting on review. This doesn't actually fix any signedness bugs yet though, maybe I should work on that in the meantime. I tried using the parsed U64/I64 from saltwater (ignoring 'default macro signedness') but it gave really weird results:

-pub const MIN_I32_Minus1: i64 = -2147483649;
-pub const LONG12: i64 = 123456789012;
-pub const LONG_12: i64 = -123456789012;
+pub const MIN_I32_Minus1: u32 = 18446744071562067967;
+pub const LONG12: u64 = 123456789012;
+pub const LONG_12: u32 = 18446743950252762604;

I might need to fix jyn514/saltwater#385 first.

@bors-servo
Copy link

☔ The latest upstream changes (presumably 727dc63) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Signedness mismatch
7 participants