Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C source information to ASTs #338

Merged
merged 42 commits into from
Dec 23, 2024
Merged

Add C source information to ASTs #338

merged 42 commits into from
Dec 23, 2024

Conversation

TravisCardwell
Copy link
Collaborator

@TravisCardwell TravisCardwell commented Dec 17, 2024

The primary goal of this PR is to add C source information to the ASTs (#316), for use in test and documentation generation. This is the initial attempt at implementing this so that we have something to discuss (#316 (comment)). We plan to discuss during the call on Thursday (December 19th).

This needed to build upon the new minimal default name mangling (#331), so the single commit from that branch is cherry-picked. Since a primary motivation is to support test generation (#22), my initial implementation of test generation is also implemented in this branch. Some commits are cherry-picked, some commits are rewritten from scratch with this different design, and new commits are added.

This comment serves as a guide to the various commits. Some commits have descriptions, and links to related comments are included below.

  1. Prepare types for adding source information
    • Add Field type (cherry-picked)
    • Add OpaqueStruct and OpaqueEnum
  2. Add source locations
  3. Name mangling
  4. Add origin information (Add C source information to Hs and SHs ASTs #316 (comment))
    • Add origin info to Hs AST
    • Add origin info to SHs AST
  5. Test generation (cherry-picked)
    • Add hs-bindgen-testlib (WIP)
    • Continue hs-bindgen-testlib (WIP)
    • Rename to SameSemantics
    • Update SameSemantics properties/assertions
    • Update Storable properties/assertions
    • Add sizeof and alignof assertions/tests
    • Audit C types and headers
    • Rename to Preturb
    • Add size to Preturb
    • Add missing comments
    • Rewrite GenSeq
    • Rename to HsGenSeq1SameSemanticsCGenSeq1
    • Format C using clang-format
    • Apply C linter suggestions
    • Move Text.SimplePrettyPrint
    • Refactor Text.SimplePrettyPrint
    • Sort lists
    • Implement test generation (WIP)
  6. Test generation (new) (Add C source information to Hs and SHs ASTs #316 (comment))
    • Fix typos
    • Use Hs AST
    • Add DeclName (Add C source information to Hs and SHs ASTs #316 (comment))
    • Implement C header and source generation (WIP)
    • Add hangs
    • Implement Haskell test module generation (WIP)
    • Generate Spec and Main modules
    • Define/use common context
    • Add documentation
    • Add Main module to sample Cabal config
  7. Fix source location normalization (Add C source information to ASTs #338 (comment))
    • Render Hs AST using Data.TreeDiff.Expr
    • Revert make typedef source locations relative
    • Make SourcePath relative in toExpr instance

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Dec 17, 2024

Tests fail on Windows because the source paths in the .hs fixtures are displayed as strings and Windows uses backslash directory separators that do not match the forward slash directory separators in the fixtures. The declarations are rendered using the Show instance.

Note that this issue is avoided in the .tree-diff.txt fixtures by displaying source paths as a list instead.

One way to resolve this issue would be to change the Show SourcePath instance to render source paths in a platform-independent manner. Rendering them as lists like in the .tree-diff.txt fixtures is not a straightforward solution, however, because we render them in strings that specify the line and column (such as examples/simple_func.h:3:22) in Show instances of other types.

We could mangle source path strings, converting Windows directory separators to POSIX directory separators. Perhaps it is better to display source paths using actual/native directory separators to users, though. I can implement it if we decide to go this route.

My inclination is to not do special rendering in Show instances. The Show SourcePath instance could then render as lists, while the instances for the types that include SourePaths can derive Show instances. Special rendering of source paths can be implemented using separate functions, optionally using type classes (like TTC).

@edsko
Copy link
Collaborator

edsko commented Dec 18, 2024

Re rendering of paths, see

which I introduced for that precise reason.

@TravisCardwell
Copy link
Collaborator Author

Indeed, that newtype allows us to define our own instances. The two options suggested in the above comment:

  • Mangle SourcePath paths so that they are consistent across platforms for the purposes of our tests. Displayed paths would use POSIX directory separators even on Windows.
  • Use Show instances for debugging and testing only, rendering SourcePath as a list like in .tree-diff.txt fixtures. Use separate functions to render output, rendering SourcePath with the actual directory separators.

Note that this branch already includes a change that makes the paths relative to the package directory when under that directory (#316 (comment)).

@edsko
Copy link
Collaborator

edsko commented Dec 18, 2024

Ah, sorry. Yes, I think we should go for something again to what we do with the tree-diff tests, so that any massaging we do, is confined to the tests only.

We were using a tuple, but that does not scale well when more data is
added.  I ran into this when working on annotations (#256, PR #276).  I
ran into it again when adding source information to support test
generation (#22).

The `Field` types are used by *both* `Struct`/`Record` and `Newtype`
types.
Two tests fail because the source locations include absolute paths when
data types are generated for included `typedef`s.  This needs to be
resolved before merge.
This commit changes name mangling so that there are minimal changes by
default, not attempting to create Haskell-style names.  One significant
difference with the previous implementation is that it can now add a
prefix if/when the first character is not valid, used to handle leading
underscores when creating type constructors.

`MkHsName` is removed, replaced by `mkHsNamePrefixInvalid` and
`mkHsNameDropInvalid` functions that are now passed to `translateName`
like other options.

`ctxFieldVarSingleConstr` is removed since constructors now have the
same name as types by default, and we do not generate sum types anyway.
If we do so in the future, we need to decide how to do name mangling for
constructors as well as accessors.

`HsBindgen.C.Fold.Type.mkDefnName` is changed to follow the conventions
of the new defaults.
The initial implementation allowed users to override name mangling by
specifying the C name.  This is problematic when the name of an
anonymous structure/union needs to be overridden.  Specifying a C name
worked when some name mangling was performed while creating the C AST,
but it does not work when all name mangling is done by the name mangler.
With this change, users override by specifying the translated Haskell
name.

Overrides must now be performed after creating the Haskell name.  The
parameter order of `translateName` is updated for consistency.
This commit removes `DefnName` and instead tracks the "path" of a
structure, renaming the type to `DeclPath`.

`mkDefnName` is now implemented as part of name mangling (as
`getDeclPathParts`), in a way that works with name mangling options.

Context `AnonNamedFieldTypeConstrContext` is removed, replaced with
`StructTypeConstrContext` that contains a `DeclPath` instead of a
`CName`.  New function `translateDeclPath` is a high-level function
for translating `DeclPath`s.
At this point, there are no differences with the Hs AST, so those types
are used as-is.  If we ever introduce transformations, we should define
SHs origin types that record the transformations made.
The original design threaded the sequence because I thought we would
need to recursively generate sequential values.  It sounds like we will
not do that, however.

With this design, the Haskell and C APIs match and can therefore be
tested for equivalence.

Two (unit test) assertions are added:

* A value sequentially generated in Haskell has the expected value in C.
* A value sequentially generated in C has the expected value in Haskell.
We need this functionality for test generation, so it should be moved
out of the backend.  I am putting it outside of the `HsBindgen`
hierarchy to emphasize that it is the general part of pretty printing.
* `hs-bindgen` dependencies are removed
* Documentation is updated
* Context mutation is implemented using a function, for consistency
* The `Pretty` `Natural` instance is removed
When generating C code, we have to know how to write each C name (such
as `struct foo` vs. just `foo`).  This commit adds type `DeclName` for
use in the `DeclPathStruct` constructor of type `DeclPath` to do this.

Note that this could alternatively be done by making `CName` a phantom
type that specifies the namespace.  I implemented `DeclName` for now as
a simple/minimal change.
The motivation for this is to normalize representations just for tests.
`SourcePath` values are rendered as lists, like in the `tree-diff`
fixtures.

Note that `ediffGolden1` is *not* used because we do not have `Eq`
instances for the `Hs` AST.  We could use it if we decide to implement
`Eq` instances.  For now, per-line string comparisons are made.  This is
still an improvement over the previous implementation using `Show`
because the `Expr`s are pretty-printed across multiple lines.
This commit essentially reverts the previous commit that made `typedef`
source locations relative.  Tests fail (gain) due to use of absolute
paths.  This will be addressed in the next commit.
@TravisCardwell
Copy link
Collaborator Author

I changed the hs golden tests to pretty-print Data.TreeDiff.Expr representations of declarations instead of using Show. This allows us to use the same instance for SourcePath that is used in the tree-diff fixtures.

Note that I did not use ediffGolden1 because we do not have Eq instances for the Hs AST. We could use it if we decide to implement Eq instances. For now, per-line string comparisons are made. This is still an improvement over the previous implementation using Show because the Exprs are pretty-printed across multiple lines.

I imagine that we would like to make source locations relative in the tests only as well.

I planned on doing this by moving the type class to the tests package. Doing so at this level requires a lot more instances, of course, since the AST has to be traversed. Unfortunately, I ran into the age-old Haskell issue of public API vs testing. It worked great for the Haskell representations because we export some modules specifically for testing, though I had to add some. It did not work for the C API, however, because our HsBindgen.Lib.CHeader type is explicitly opaque.

The Cabal file includes the following comment. Perhaps we can use an internal library to expose more things for testing?

-- Exposed for the sake of tests
-- TODO: We should reconsider the proper way to export these.

Making paths relative in general requires context (current working directory), but we do not have to worry about the general case since it is specific to our tests. I implemented a fix in the ToExpr instance for SourcePath. It should work as long as we keep our include directories under hs-bindgen* package directories.

Alas, my initial implementation did not work in CI! Here is a diff:

- ["hs-bindgen", "musl-include", "bits", "alltypes.h"],
+ ["hs-bindgen-0.1.0", "musl-include", "bits", "alltypes.h"],

In the CI environment, Cabal appends the version number to the package directory names! I resolved this issue by normalizing package directory names.

@TravisCardwell TravisCardwell requested a review from edsko December 19, 2024 06:32
Copy link
Collaborator

@edsko edsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in a meet. Conclusions:

  • Name mangling will get a CName -> HsName map in addition to the HsName -> HsName map, or alternatively a Maybe CName -> HsName -> HsName map, where the latter allows the user to specify the CName if they so desire (useful to state intent and useful for disambiguation).
  • We will make paths relative to the cwd as we construct them, rather than doing this in tests.
  • We will use the tree-diff infrastructure for comparison, adding missing Eq instances (though it's not totally clear what they are required for).

Feel free to merge when ready.

We decided to implement relative path support when constructing a
`SourcePath`, so that it is consistent across the whole application.

The directory to make paths relative to is passed as configuration
throughout the whole application, so that we can enable users to
configure it.  For now, I implemented it by consistently passing it as
the first parameter to functions (`Reader`-style).  Every parameter is
named `relPath` and is documented in the same way.  If we implement more
configuration the future, we may want to consider using a `Reader`
transformer.

Threading configuration through an application changes the API for many
functions. orz
Since not all generated Haskell names have corresponding C names,
overriding is primarily done based on the generated Haskell name.  This
changes adds an *optional* C name that allows users to create overrides
in cases where more than one C name is being translated to the same
Haskell name.
@TravisCardwell TravisCardwell merged commit 983aba0 into main Dec 23, 2024
10 checks passed
@TravisCardwell TravisCardwell deleted the source-info branch December 23, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants