Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate test-suite for the low-level Storable instances #22

Open
edsko opened this issue Aug 1, 2024 · 18 comments
Open

Generate test-suite for the low-level Storable instances #22

edsko opened this issue Aug 1, 2024 · 18 comments
Assignees

Comments

@edsko
Copy link
Collaborator

edsko commented Aug 1, 2024

When we generate low-level data types with Storable instances, we should generate a test suite that verifies that does a "self-test": generate some C code that accepts a pointer to memory, fills it with some values, then on the Haskell side use the Storable instance to verify that all values are in the right place, and construct an error message if not.

Note that such a test failure would be a bug in hs-bindgen, so users would not be able to do much with it other than report it (or try to fix it), but at least we'd catch the problem.

(Rust bindgen generates tests also; it might be useful to take a closer look at exactly they are testing, so see if we can/should do the same.)

@edsko edsko added this to the 1: `Storable` instances milestone Aug 1, 2024
@edsko
Copy link
Collaborator Author

edsko commented Aug 1, 2024

The cross-compilation story here is more complicated, if we want to be able to generate bindings for target B but run the test suite on target A; probably we simply don't want to support that, at least not initially.

@TravisCardwell
Copy link
Collaborator

I am working with the simple_structs example, and the fields offsets are not what I expect.

I expect S2 to have a size of 12 bytes with an alignment of 4 bytes, with offsets 0, 4, and 8 for the respective fields.

clang_Cursor_getOffsetOfField is returning offsets of 0, 32, and 64 (generated code) even though the size and alignment of the structure are as expected.

  • Experimenting in C, the offsets are as I expect.
  • Experimenting with hsc2hs, the offsets are as I expect.
  • Experimenting with the generated offsets, my tests fail.

Is this an issue, or am I doing something wrong?

@TravisCardwell
Copy link
Collaborator

The Rust bindgen tutorial is out of date. It no longer generates tests, but it runs "build-time tests" instead. Doing the tutorial, the tests are in the generated bindings.rs file and look like this:

const _: () = {
    ["Size of bz_stream"][::std::mem::size_of::<bz_stream>() - 80usize];
    ["Alignment of bz_stream"][::std::mem::align_of::<bz_stream>() - 8usize];
    ["Offset of field: bz_stream::next_in"][::std::mem::offset_of!(bz_stream, next_in) - 0usize];
    ...
};

The tests are put inside of const _: () = { ... }; blocks to ensure that they do not pollute the outer scope. There are separate blocks per struct.

Each test looks like ["test description"][expression];, where the test passes when the expression evaluates to zero. If an expressions does not evaluate to zero, it is an invalid index of the array and results in a build-time failure.

The following things are tested:

  • Size of structs
  • Alignment of structs
  • Offset of struct fields

@phadej
Copy link
Collaborator

phadej commented Nov 12, 2024

Based on https://juliainterop.github.io/Clang.jl/stable/api/ getOffsetOfField returns offset in bits, not bytes (and that makes sense when you consider packed structs)

@edsko
Copy link
Collaborator Author

edsko commented Nov 12, 2024

Based on https://juliainterop.github.io/Clang.jl/stable/api/ getOffsetOfField returns offset in bits, not bytes (and that makes sense when you consider packed structs)

I see. I wonder if we should make this more explicit in our Hs implementation (data Offset = Bytes .. | Bits ..)? And then leave the Bits case unsupported for now..?

@TravisCardwell
Copy link
Collaborator

Great find, @phadej! Thanks!

To confirm, I tested the following structure and indeed see the expected offsets of 0, 8, and 16 bits.

struct foo {
  char a;
  char b;
  char c;
};

I tested the following bit-field structure and get -1 field offsets for all fields. The libclang documentation says that this is returned when the cursor kind is not a field declaration, but clang-ast-dump confirms that the cursor kind is CXCursor_FieldDecl for all fields.

#include <stdbool.h>
#include <stdint.h>

struct datetime {
  unsigned _BitInt(6) dt_seconds :6;
  unsigned _BitInt(6) dt_minutes :6;
  unsigned _BitInt(5) dt_hour    :5;
  unsigned _BitInt(5) dt_day     :5;
  unsigned _BitInt(4) dt_month   :4;
  bool                dt_is_dst  :1;
  uint16_t            dt_year;
};

This seems to be a C23 support issue. I tested the following C89 version and get the expected results.

struct datetime {
  unsigned dt_seconds :6;
  unsigned dt_minutes :6;
  unsigned dt_hour    :5;
  unsigned dt_day     :5;
  unsigned dt_month   :4;
  unsigned dt_is_dst  :1;
  unsigned dt_year;
};

In addition to documenting the bits unit of clang_Cursor_getOffsetOfField, I added clang_Cursor_isBitField and clang_getFieldDeclBitWidth so that I can confirm the behavior in clang-ast-dump.

I imagine that we will need to handle the Storable instances for records with bit fields specially. Perhaps we need to use peekByteOff and pokeByteOff, parsing/constructing the packed bytes manually... If so, we likely need to deal with the endianness of the platform.

I am not sure if we should add an Offset type or keep the unit as bits like libclang. This would be good to consider when implement support for bit-fields.

For now, I just did the following:

  • Added documentation to HsBindgen.C.AST.Type.StructField

  • Added assertions to HsBindgen.C.Fold.Type.mkStructField

    unless (fieldOffset `mod` 8 == 0) $ error "bit-fields not supported yet"
  • Converted from bits to bytes in HsBindgen.Translation.LowLevel.structDecs

PR #275

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Nov 13, 2024

I manually wrote tests to get a feel for the factors that we need to consider in the design.

acme-c/acme.h
#ifndef ACME_H
#define ACME_H

typedef struct foo {
  char a;
  int b;
  float c;
} foo;

void acme_dump_foo(foo*);

#endif
acme-c/acme.c
#include <stdio.h>

#include "acme.h"

void acme_dump_foo(foo* x) {
  printf("foo.a: %c\n", x->a);
  printf("foo.b: %d\n", x->b);
  printf("foo.c: %f\n", x->c);
  printf("#foo   (%02lu) %p\n", sizeof *x,   x);
  printf("#foo.a (%02lu) %p\n", sizeof x->a, &x->a);
  printf("#foo.b (%02lu) %p\n", sizeof x->b, &x->b);
  printf("#foo.c (%02lu) %p\n", sizeof x->c, &x->c);
}
experiment-quickcheck.cabal
name:           experiment-quickcheck
version:        0.0.0.0
cabal-version:  1.24
build-type:     Simple

library
  hs-source-dirs: src
  exposed-modules:
      Acme
  include-dirs: acme-c
  install-includes:
      acme.h
  c-sources:
      acme-c/acme.c
  build-depends:
      base
  default-language: Haskell2010
  ghc-options: -Wall

executable experiment-quickcheck
  hs-source-dirs: app
  main-is: Main.hs
  build-depends:
      base
    , experiment-quickcheck
  default-language: Haskell2010
  ghc-options: -Wall

test-suite test-hs-bindgen
  type: exitcode-stdio-1.0
  hs-source-dirs: test-hs-bindgen/src
  main-is: Spec.hs
  other-modules:
      Acme.Test
      Acme.Test.FFI
      Acme.Test.Instances
  include-dirs:
      acme-c
      test-hs-bindgen/cbits
  install-includes:
      acme.h
      test-acme.h
  c-sources:
      test-hs-bindgen/cbits/test-acme.c
  build-depends:
      base
    , experiment-quickcheck
    , QuickCheck
    , tasty
    , tasty-quickcheck
  default-language: Haskell2010
  ghc-options: -Wall
src/Acme.hs
{-# LANGUAGE CApiFFI #-}
{-# LANGUAGE RecordWildCards #-}

module Acme
  ( CFoo(..)
  , dumpFoo
  ) where

import Foreign as F
import qualified Foreign.C as FC

------------------------------------------------------------------------------

data CFoo = MkCFoo
  { cFoo_a :: FC.CChar
  , cFoo_b :: FC.CInt
  , cFoo_c :: FC.CFloat
  }
  deriving (Eq, Show)

instance F.Storable CFoo where
  sizeOf    _ = 12
  alignment _ = 4

  peek ptr = do
    cFoo_a <- F.peekByteOff ptr 0
    cFoo_b <- F.peekByteOff ptr 4
    cFoo_c <- F.peekByteOff ptr 8
    pure MkCFoo{..}

  poke ptr MkCFoo{..} = do
    F.pokeByteOff ptr 0 cFoo_a
    F.pokeByteOff ptr 4 cFoo_b
    F.pokeByteOff ptr 8 cFoo_c

------------------------------------------------------------------------------

dumpFoo :: CFoo -> IO ()
dumpFoo x = F.alloca $ \ptr -> do
    F.poke ptr x
    cDumpFoo ptr

foreign import capi "acme.h acme_dump_foo"
  cDumpFoo :: F.Ptr CFoo -> IO ()
app/Main.hs
module Main (main) where

import qualified Acme

------------------------------------------------------------------------------

main :: IO ()
main = Acme.dumpFoo $ Acme.MkCFoo 120 11 3.14159
test-hs-bindgen/cbits/test-acme.h
#ifndef TEST_ACME_H
#define TEST_ACME_H

#include <stdbool.h>

#include "acme.h"

void test_CFoo_CToHs(char, int, float, foo*);

bool test_CFoo_HsToC(foo*, char, int, float);

#endif
test-hs-bindgen/cbits/test-acme.c
#include <stdbool.h>

#include "acme.h"
#include "test-acme.h"

void test_CFoo_CToHs(char a, int b, float c, foo* target) {
  target->a = a;
  target->b = b;
  target->c = c;
}

bool test_CFoo_HsToC(foo* value, char a, int b, float c) {
  return value->a == a && value->b == b && value->c == c;
}
test-hs-bindgen/src/Spec.hs
module Main (main) where

import Test.Tasty (defaultMain, testGroup)

import qualified Acme.Test

------------------------------------------------------------------------------

main :: IO ()
main = defaultMain $ testGroup "test-hs-bindgen"
    [ Acme.Test.tests
    ]
test-hs-bindgen/src/Acme/Test/Instances.hs
{-# LANGUAGE RecordWildCards #-}
{-# OPTIONS_GHC -Wno-orphans #-}

module Acme.Test.Instances () where

import qualified Acme

import qualified Test.QuickCheck as QC

------------------------------------------------------------------------------

instance QC.Arbitrary Acme.CFoo where
  arbitrary = do
    cFoo_a <- QC.arbitrary
    cFoo_b <- QC.arbitrary
    cFoo_c <- QC.arbitrary
    pure Acme.MkCFoo{..}
test-hs-bindgen/src/Acme/Test/FFI.hs
{-# LANGUAGE CApiFFI #-}

module Acme.Test.FFI
  ( testCFooCToHs
  , testCFooHsToC
  ) where

import qualified Foreign as F
import qualified Foreign.C as FC

import qualified Acme

------------------------------------------------------------------------------

testCFooCToHs :: FC.CChar -> FC.CInt -> FC.CFloat -> IO Acme.CFoo
testCFooCToHs a b c = F.alloca $ \ptr -> do
    test_CFoo_CToHs a b c ptr
    F.peek ptr

foreign import capi unsafe "test-acme.h test_CFoo_CToHs"
  test_CFoo_CToHs ::
       FC.CChar
    -> FC.CInt
    -> FC.CFloat
    -> F.Ptr Acme.CFoo
    -> IO ()

------------------------------------------------------------------------------

testCFooHsToC :: Acme.CFoo -> FC.CChar -> FC.CInt -> FC.CFloat -> IO Bool
testCFooHsToC x a b c = F.alloca $ \ptr -> do
    F.poke ptr x
    (/= 0) <$> test_CFoo_HsToC ptr a b c

foreign import capi unsafe "test-acme.h test_CFoo_HsToC"
  test_CFoo_HsToC ::
       F.Ptr Acme.CFoo
    -> FC.CChar
    -> FC.CInt
    -> FC.CFloat
    -> IO FC.CBool
test-hs-bindgen/src/Acme/Test.hs
module Acme.Test (tests) where

import qualified Test.QuickCheck.Monadic as QCM
import Test.Tasty (TestTree, testGroup)
import Test.Tasty.QuickCheck (Property, testProperty)

import qualified Acme

import qualified Acme.Test.FFI as FFI
import Acme.Test.Instances ()

------------------------------------------------------------------------------

testCFoo :: TestTree
testCFoo = testGroup "CFoo"
    [ testProperty "C->Haskell" prop_CToHs
    , testProperty "Haskell->C" prop_HsToC
    ]
  where
    prop_CToHs :: Acme.CFoo -> Property
    prop_CToHs x = QCM.monadicIO $ do
      x' <- QCM.run $
        FFI.testCFooCToHs (Acme.cFoo_a x) (Acme.cFoo_b x) (Acme.cFoo_c x)
      QCM.assert $ x' == x

    prop_HsToC :: Acme.CFoo -> Property
    prop_HsToC x = QCM.monadicIO $ do
      isSuccess <- QCM.run $
        FFI.testCFooHsToC x (Acme.cFoo_a x) (Acme.cFoo_b x) (Acme.cFoo_c x)
      QCM.assert isSuccess

------------------------------------------------------------------------------

tests :: TestTree
tests = testGroup "Acme"
    [ testCFoo
    ]

I used QuickCheck for this experiment. It has few dependencies, and there are (already) Arbitrary instances for C types. We need to be able to create Abitrary instances for any user types that we would like to test. I wonder if QuickCheck sounds good, or if a different library (such as hedgehog) would be preferred.

I did not use pkgconfig for this experiment. The .cabal file explicitly specifies C headers and sources. My understanding is that using pkgconfig is very beneficial when linking to external packages that may have dependencies on other packages, but it does not work on Windows. Using pkgconfig for local C source requires exporting environment variables while building as well as during execution (of tests). Explicitly specifying the C header and sources like in this experiment works well for local C source. I have not tried mixing the two: using pkgconfig to configure external dependencies and explicitly specifying C headers and sources of generated tests. This is something that is probably worth trying out.

Allocation and deallocation of memory in tests is all done on the Haskell side, using alloca.

The C->Haskell test, given an arbitrary value, calls a C function with the values of the fields and a pointer to a structure value. The C function populates the structure value. The QuickCheck property is equality of the value populated in C with the original arbitrary value.

The Haskell->C test, given an arbitrary value, calls a C function with a pointer to the structure value and the expected values of the fields. The C function checks that each field is equal to the expected value.

I wish that I could output detailed error messages on failure, but QuickCheck does not support that AFAIK.

On the C side, we need equality for each field type for any type that we would like to test. This is trivial in this experiment because only primitive types are used, but it complicates things when supporting other types.

When we generate tests, perhaps the goal is to generate the whole test-hs-bindgen tree on the filesystem? Perhaps we should first recursively delete the directory if it already exists? Users would be able to explicitly manage the content using revision control (one of the major benefits of the preprocessor backend). Perhaps we should avoid touching the user's .cabal file? We could create a test-hs-bindgen/README.md file with a bit of documentation, including a suggested test-suite stanza.

Thoughts? Is this moving in the right direction? Please feel free to share any corrections and suggestions.

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Nov 13, 2024

I added unit tests that check the size and alignment.

Test suite test-hs-bindgen: RUNNING...
test-hs-bindgen
  Acme
    CFoo
      sizeof:     OK
      alignof:    OK
      C->Haskell: OK
        +++ OK, passed 100 tests.
      Haskell->C: OK
        +++ OK, passed 100 tests.

All 4 tests passed (0.00s)
Test suite test-hs-bindgen: PASS

The alignment check uses stdalign.h, added in C11. I wonder what minimum standard of C we should target in the generated code...

@TravisCardwell
Copy link
Collaborator

I created a hedgehog version of the manual tests.

I implemented some generators for the C types used in the test. The generator for CFloat includes special IEEE values: negative zero, infinity, and negative infinity. It does not include NaN. The tests compare values to ensure that they match on the Haskell and C sides, but all NaN comparisons are false. We could use isnan on the C side and isNaN on the Haskell side to implement equality that would work for us. On the Haskell side, we probably do not want to do that in the Eq instances for user types, but we could write separate equality functions for use in tests. For now, I simply do not generate NaN.

The QuickCheck instance does not include special IEEE values at all (which is somewhat disappointing), so Nan is not an issue there.

FWIW, I prefer hedgehog over QuickCheck.

@TravisCardwell
Copy link
Collaborator

I updated my hedgehog tests to allow NaN, in case we want to do that.

On the Haskell side, it uses a RepEq (representational equality) type class. Floating values have special implementations, but instances for other primitives just use the default (==). I wrote the CFoo instance by hand for this experiment, but we can use (sop) generics to generate instances in such cases.

On the C side, two library functions are added for checking floats and doubles for representational equality. The generated test code just needs to use those functions where appropriate. We likely should create a rep_eq_ function for each user type (structure) instead of inlining it. Such functions compare values of structure fields with the expected values; they do not compare two structures. Even if we do not allow NaN, we should create such (eq_) functions, so it allowing NaN does not add much code on the C side.

We will likely have library functions that are needed by the generated tests. We could create a package for the Haskell side, but the test generator needs to create the C headers and source files anyway, so perhaps it is best to do the same for Haskell modules.

The current experiment tests a single "generated" module, but one of our goals is to support generation of multiple modules. If we need to generate instances (such as Arbitrary instances if we use QuickCheck and RepEq instances if we want to test NaN), it may be easiest to put all orphan instances in a single module even if the modules are tested separately.

@TravisCardwell
Copy link
Collaborator

I tried to create instances for the newtypes defined in Foreign.C.Types and ran into the following issues:

  • CBool wraps Word8, but this representation does not seem to match my system. For now, I test with two values, 0 (False) and 1 (True), where any other value is considered True.
  • CUSeconds is not defined in my (time.h or) sys/types.h even though it is in my man time_t documentation. For now, I am not supporting this type.

If there are any suggestions for what types to support in the test library, or which include files I need, I would be happy to hear them.

@phadej
Copy link
Collaborator

phadej commented Nov 20, 2024

@TravisCardwell what this prints on your system:

#include <stdio.h>
#include <stdbool.h>

int main() {
	printf("%ld\n", sizeof(bool));
	return 0;
}

on my machine it prints 1.

@TravisCardwell
Copy link
Collaborator

Thanks @phadej!

It prints 1 for me as well. The unit is bytes, so it matches Haskell's CBool type, which is a newtype wrapper around Word8.

The behavior is not like Word8, however, at least on my system.

Assignment:

#include <stdbool.h>
#include <stdio.h>

int main() {
  bool b = false;
  unsigned char* c = (unsigned char*) &b;
  printf("%d %d\n", b, *c);
  b = true;
  printf("%d %d\n", b, *c);
  b = 2;
  printf("%d %d\n", b, *c);
  return 0;
}

Attempting to assign 2 results in 1.

0 0
1 1
1 1

Addition:

#include <stdbool.h>
#include <stdio.h>

int main() {
  bool b = false;
  printf("%d\n", b);
  b += 1;
  printf("%d\n", b);
  b += 1;
  printf("%d\n", b);
  b += 2;
  printf("%d\n", b);
  return 0;
}

Addition does not overflow like integral types.

0
1
1
1

My current (WIP) testing implementation defines instances for CBool so that it behaves like this, only allowing 0 (False) and 1 (True) values. I am not defining a separate type because Word8 has the appropriate size/alignment.

One more test
#include <stdbool.h>
#include <stdio.h>

typedef struct foo {
  bool a;
  bool b;
  bool c;
} foo;

int main() {
  foo x = { .a = true, .b = false, .c = true };
  printf("foo:   size %lu addr %p\n", sizeof x, &x);
  printf("foo.a: size %lu addr %p value %d\n", sizeof x.a, &x.a, x.a);
  printf("foo.b: size %lu addr %p value %d\n", sizeof x.b, &x.b, x.b);
  printf("foo.c: size %lu addr %p value %d\n", sizeof x.c, &x.c, x.c);
  return 0;
}
foo:   size 3 addr 0x7ffea40c0b54
foo.a: size 1 addr 0x7ffea40c0b54 value 1
foo.b: size 1 addr 0x7ffea40c0b55 value 0
foo.c: size 1 addr 0x7ffea40c0b56 value 1

@TravisCardwell
Copy link
Collaborator

In development of hs-bindgen-testlib, I initially planned on supporting all of the types defined in Foreign.C.Types. I am unsure about some of the types, however, so I decided to only implement the types that I have high confidence about. We can add support for other types as needed.

  • CChar
  • CSChar
  • CUChar
  • CShort
  • CUShort
  • CInt
  • CUInt
  • CLong
  • CULong
  • CPtrdiff
  • CSize
  • CWchar
  • CSigAtomic
  • CLLong
  • CULLong
  • CBool
  • CIntPtr
  • CUIntPtr
  • CIntMax
  • CUIntMax
  • CClock
  • CTime
  • CUSeconds
    • Not in time.h or sys/types.h on my system
  • CSUSeconds
    • Not in time.h on my system
    • Is in sys/types.h on my system, but that is not one of our standard headers. Should we add it?
  • CFloat
  • CDouble

On a related note, I suggested in chat that we defined all supported types in a hs-bindgen-stdlib (or hs-bindgen-patterns). We can re-export types from Foreign.C and provide our own definitions/instances as appropriate. If we always use those types, not importing types from Foreign.C directly, I think it could work well.

@TravisCardwell
Copy link
Collaborator

With the current state of hs-bindgen-testlib, my manual tests are as follows:

test-hs-bindgen
  Acme
    CFoo
      HsSizeOfXEqCSizeOfX:                OK
      HsAlignOfXEqCAlignOfX:              OK
      PokePeekXSameSemanticsX:            OK
        +++ OK, passed 100 tests.
      HsGenSeq1CEq:                       OK
      HsGenSeq1SameSemanticsCGenSeq1:     OK
      HsPreturbNXSameSemanticsCPreturbNX: OK
        +++ OK, passed 100 tests.
  • HsSizeOfXEqCSizeOfX: the size of a type is the same in Haskell and C
  • HsAlignOfXEqCAlignOfX: the alignment of a type is the same in Haskell and C
  • PokePeekXSameSemanticsX: the poke then peek of a value is semantically equal to the value
  • HsGenSeq1CEq: a value sequentially generated in Haskell has the expected value in C
    • This is just a unit test. If we want to shrink by setting fields to zero, we would need to communicate to the C side which fields should be set/zero.
    • Since GenSeq only produces normal values, it should be fine to compare using == on the C side.
  • HsGenSeq1SameSemanticsCGenSeq1: a value sequentially generated in C has the expected value in Haskell
    • This is just a unit test, as the tested value is generated on the C side.
    • I do not know if we will generate/derive Eq instances for the generated Haskell data types, but SameSemantics can be used to compare on the Haskell side.
  • HsPreturbNXSameSemanticsCPreturbNX: a value preturbed using Haskell has the same semantics as the value preturbed using C (using the same size)
    • This property benefits from shrinking, both of the size and the arbitrarily generated value.
    • This property preturbs on both the C and Haskell side, which I think is better than using inverse preturbation. Note, however, that inverse preturbation is implemented via the size parameter: sameSemantics (preturb (negate size) (preturb size x)) x

Notes:

  • When possible, C implementations of functionality that needs to be the same on the Haskell side are implemented in functions so that they can be tested by the hs-bindgen-testlib test suite.
  • Haskell signatures for such functions use C types, so that values have the same bounds. For example, the preturb size parameter has type CLong, which has the same size as a Haskell Int on my system. It would be better to use a stdint such as int64_t, but Foreign.C.Types does not include such types. Perhaps this can be resolved when we implement our own Haskell types (in hs-bindgen-stdlib)...
  • All hs-bindgen-testlib C library functions are prefixed with hsbg_. I read that parameter names are often prone to macro issues, so some people prefix parameter names a well, but I have not seen that much and am not currently prefixing parameter names or variable names within functions.
  • What style shall we use for our C code? I am currently using clang-format with the default LLVM style.
  • Static analysis on the C code:
    • clang-tidy caught my mistaken use of abs instead of labs. This reminds me that I am not seeing compilation warnings when only building as part of the Haskell build process. Static analysis is especially valuable in this case, IMHO!
    • Cppcheck did not have any suggestions.
    • cpplint checks Google coding standards, which includes formatting that is different from the LLVM style.
      • Interesting: This tool recommends including the path name in the header guard (HS_BINDGEN_TESTLIB_CLIB_HS_BINDGEN_TESTLIB_H_ instead of just HS_BINDGEN_TESTLIB_H). I did not apply this recommendation.
      • Google style is different from LLVM style. With the current code, the differences are few and can be ignored, but perhaps formatting with Google would allow us to make easier use of this tool...
        • Quoted (local) includes go above bracketed includes, which is the opposite of what I am used to.
        • There must be at least two spaces between code and comments.
    • oclint failed to build on my system. I did not put any more time into it.

@phadej
Copy link
Collaborator

phadej commented Nov 26, 2024

FWIW,

Rust bindgen generates tests also; it might be useful to take a closer look at exactly they are testing

Does very little.

And we can run cargo test to verify that the layout, size, and alignment of our generated Rust FFI structs match what bindgen thinks they should be:

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct foo {
    pub sixty_four: u64,
    pub thirty_two: u32,
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of foo"][::std::mem::size_of::<foo>() - 16usize];
    ["Alignment of foo"][::std::mem::align_of::<foo>() - 8usize];
    [
        "Offset of field: foo::sixty_four",
    ][::std::mem::offset_of!(foo, sixty_four) - 0usize];
    [
        "Offset of field: foo::thirty_two",
    ][::std::mem::offset_of!(foo, thirty_two) - 8usize];
};

Rust doesn't have "Storable", the struct is generated so it's one-to-one with C. We could also generically derive our Storable instances, and then test that they do indeed what we think they should do (making the generated code system independent, leaving only tests system dependent).

@TravisCardwell
Copy link
Collaborator

I am currently investigating the idea of generating tests in separate directories per architecture. The goal is to allow users that need to support multiple architectures to easily maintain multiple generated test suites within a single project repository. Cabal conditionals can be used to enable/disable test suites depending on the current architecture.

I do not yet understand how we will support multiple architectures, so it is time for some experimentation. I thought that we would need to run libclang on the target architecture, but I learned about the --target option, which I want to try out.

When using the preprocessor, we output system-dependent values in the Haskell source code. It therefore seems like we would need to have separate library packages/modules for different architectures. With my current experimental test, I do not think there is anything that is architecture-dependent, as the differences are in the tested library.

There are various identifiers that identify the architecture.

clang -print-targets
Registered Targets:
  aarch64     - AArch64 (little endian)
  aarch64_32  - AArch64 (little endian ILP32)
  aarch64_be  - AArch64 (big endian)
  amdgcn      - AMD GCN GPUs
  arm         - ARM
  arm64       - ARM64 (little endian)
  arm64_32    - ARM64 (little endian ILP32)
  armeb       - ARM (big endian)
  avr         - Atmel AVR Microcontroller
  bpf         - BPF (host endian)
  bpfeb       - BPF (big endian)
  bpfel       - BPF (little endian)
  hexagon     - Hexagon
  lanai       - Lanai
  loongarch32 - 32-bit LoongArch
  loongarch64 - 64-bit LoongArch
  mips        - MIPS (32-bit big endian)
  mips64      - MIPS (64-bit big endian)
  mips64el    - MIPS (64-bit little endian)
  mipsel      - MIPS (32-bit little endian)
  msp430      - MSP430 [experimental]
  nvptx       - NVIDIA PTX 32-bit
  nvptx64     - NVIDIA PTX 64-bit
  ppc32       - PowerPC 32
  ppc32le     - PowerPC 32 LE
  ppc64       - PowerPC 64
  ppc64le     - PowerPC 64 LE
  r600        - AMD GPUs HD2XXX-HD6XXX
  riscv32     - 32-bit RISC-V
  riscv64     - 64-bit RISC-V
  sparc       - Sparc
  sparcel     - Sparc LE
  sparcv9     - Sparc V9
  systemz     - SystemZ
  thumb       - Thumb
  thumbeb     - Thumb (big endian)
  ve          - VE
  wasm32      - WebAssembly 32-bit
  wasm64      - WebAssembly 64-bit
  x86         - 32-bit X86: Pentium-Pro and above
  x86-64      - 64-bit X86: EM64T and AMD64
  xcore       - XCore

That list is also output by llc --version. Running a command like llc -march=x86-64 -mattr=help displays the CPUs and features supported by a platform, which I assume we do not need to worry about.

I read that Clang's support for target triplets is just for compatibility with GCC. We use clang_getTranslationUnitTargetInfo and clang_TargetInfo_getTriple to get that information for a translation unit, however. I found the documentation on cross-compilation using Clang and learned that the target triplet should be specified, not just the target architecture.

Setting the target to just the architecture (such as x86-64) results in a CallFailed failure with an unhelpful backtrace; perhaps we will want to handle such incorrect arguments more gracefully at some point in the future.

Targeting x86_64-pc-linux-gnu (the output of gcc -dumpmachine on my system) works.

$ cabal run hs-bindgen -- \
    --clang-option '--target=x86_64-pc-linux-gnu' \
    preprocess \
      --input hs-bindgen/examples/simple_structs.h \
      --output Generated.x86_64-pc-linux-gnu.hs

Targeting aarch64-linux-gnu fails with an UnrecognizedType error.

Details
$ cabal run hs-bindgen -- \
    --clang-option '--target=aarch64-linux-gnu' \
    preprocess \
      --input hs-bindgen/examples/simple_structs.h \
      --output Generated.aarch64-linux-gnu.hs
hs-bindgen: UnrecognizedType {unrecognizedTypeKind = simpleEnum CXType_Char_U, unrecognizedTypeTrace = CallStack (from HasCallStack):
  collectBacktrace, called at src/HsBindgen/C/Fold/Common.hs:92:30 in hs-bindgen-0.1.0-inplace:HsBindgen.C.Fold.Common
  unrecognizedType, called at src/HsBindgen/C/Fold/Type.hs:82:13 in hs-bindgen-0.1.0-inplace:HsBindgen.C.Fold.Type}

Targeting arm64-apple-darwin works! There are no differences, however.

Targeting i386-pc-linux-gnu works, and this 32-bit architecture has differences:

$ diff Generated.x86_64-pc-linux-gnu.hs Generated.i386-pc-linux-gnu.hs
103c103
<   sizeOf = \_ -> 16
---
>   sizeOf = \_ -> 12
105c105
<   alignment = \_ -> 8
---
>   alignment = \_ -> 4

Cabal provides the arch() condition to test the current architecture (as well as os() to test the current operating system). Arch defines the known architectures and has an OtherArch constructor for unknown architectures. If we want to generate Cabal configuration, we will need to translate from target triplets to that set of architectures known by Cabal.

(As I mentioned before, we may want to avoid modifying the user's .cabal file and instead write generated configuration to a separate file that the user can copy-and-paste from.)

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Nov 26, 2024

When using the preprocessor, we output system-dependent values in the Haskell source code. It therefore seems like we would need to have separate library packages/modules for different architectures.

One (untested) idea is to create architecture-specific sub-modules that are included only when building on a matching architecture (using conditions in the .cabal file), and using CPP to re-export from the appropriate sub-module. The following example uses the Cabal Arch constructor names for consistency.

Acme.Foo           -- Re-exports Acme.Foo.X86_64 when built on x86_64
Acme.Foo.AArch64   -- Not included when built on x86_64
Acme.Foo.I386      -- Not included when built on x86_64
Acme.Foo.X86_64    -- Included when built on x86_64

This example illustrates another design decision. Should separate modules for different architectures be created even when they are identical? This example creates separate modules, which I think is a good idea.

(BTW, I assume that cross-platform support is only relevant to preprocessor usage, as Template Haskell runs on the target platform.)

TravisCardwell added a commit that referenced this issue Dec 4, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.

(Cherry-picked from `source-info` for experimentation)
TravisCardwell added a commit that referenced this issue Dec 5, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 9, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 17, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
TravisCardwell added a commit that referenced this issue Dec 18, 2024
We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants