Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify foreign import headers #333

Closed
phadej opened this issue Dec 5, 2024 · 17 comments
Closed

How to specify foreign import headers #333

phadej opened this issue Dec 5, 2024 · 17 comments

Comments

@phadej
Copy link
Collaborator

phadej commented Dec 5, 2024

For CApiFFI we need to include a header for declrations, e.g.

foreign import capi "header.h foo" foo :: Int -> IO ()

However, with file-processing interface we don't know what's the canonical import. Is it some-header.h or clang-c/Index.h (i.e. whether to drop directories or not).

How to solve this?

Note: rust-bindgen doesn't have such problem as it doesn't have CApiFFI like system. If we use ordinary ForeignFunctionInterface, then the linker will find symbol, so

  • if we use CApiFFI, should user specify how to import the header?
  • Or should we not use capi (but then we need to at least care about calling conventions)
@TravisCardwell
Copy link
Collaborator

My understanding is that users should specify header files using paths that are relative to an include directory, in which case we generate CApiFFI directives using the passed include file path (including any directory names).

For the libclang example, pkgconf sets include directory /usr/include (on my system), so clang-c/Index.h successfully references /usr/include/clang-c/Index.h.

When manually configuring, users should configure both include-dirs and install-includes in the .cabal file, where include files may be relative to an include directory. Users should then specify the header file to translate as it is specified in install-includes.

Am I missing or misunderstanding anything?

@phadej
Copy link
Collaborator Author

phadej commented Dec 5, 2024

I'm not fan of looking up extra information (system include paths, Cabal includes) if the opposite cost is just make user specify a bit more information.

I think it will clarify when hs-bindgen usage, speficially as preprocessor in Cabal is fleshed out. Until then doing the simplest thing feels best.

@TravisCardwell
Copy link
Collaborator

A quick clarification: in my understanding described above, it is up to the user to specify include directories relative to include paths (as configured by pkgconfig or manually in their .cabal file). In our implementation, we do not have to query system include paths for Cabal configuration; we simply use the relative include paths that users specify.

@phadej
Copy link
Collaborator Author

phadej commented Dec 5, 2024

@TravisCardwell even if understand you, my point stands:

think it will clarify when hs-bindgen usage, speficially as preprocessor in Cabal is fleshed out.

For TH usage nothing you said really make much sense, users would need to specify where to find the file, but also how include it from C anyway (and the latter may have a default for common cases).

It might also make sense to always require some kind of wrapper header (so it's file local to package), even if it's literally just #include <something>. That would require #294 in someway, SelectFromMainFile won't work then.

@phadej
Copy link
Collaborator Author

phadej commented Dec 5, 2024

TL;DR I think it's time to work towards the real example, not just golden-fixture runs in our test-suite. There is a very simple TH test, and that can be extended; but Cabal/preprocessor usage is completely unexplored.

@TravisCardwell
Copy link
Collaborator

👍

We should probably test against real examples using pkgconf as well as manual configuration. My impression is that manual configuration keeps things simple when doing so is possible, while pkgconf is desired when the C code has (non-trivial) dependencies.

When using pkgconf for local C code, I found that I had to configure environment variables. Building required setting PKG_CONFIG_PATH to find the header, while executing required setting LD_LIBRARY_PATH to find the dynamically-linked shared object file.

@edsko
Copy link
Collaborator

edsko commented Dec 20, 2024

Related to #71 .

@edsko edsko added this to the 1: `Storable` instances milestone Dec 20, 2024
@edsko
Copy link
Collaborator

edsko commented Jan 31, 2025

Perhaps this issue is about more than just paths. Suppose some package provides two headers a.h and b.h, and a.h does a #include of internal.h, some header used for internal organization but not part of the public interface. We now have two things to consider:

  1. Functions defined in internal.h will be imported from that header, not from a.h. It might be nicer to import from a.h instead?
  2. However, if we do import from a.h instead, then we might have the opposite problem. We don't currently support multiple headers (Generalize to C multiple headers and/or multiple Haskell modules #75). This means that (like Rust bindgen?) we require users to write a new header file that would include a.h and b.h; but we almost certainly do not want to import from that user-written header (it might not even be installed).

Of course, if a "public" header imports from a "private" header, this necessarily means that that "private" header must also be installed and accessible, so importing from the private header must at least work (I think?). As long as the bindgen generated bindings are only ever generated "last minute" I guess this is therefore not a big problem either way. It might be a problem if the bindings are checked in and the internal structure is different than expected, perhaps on different machines?

@phadej
Copy link
Collaborator Author

phadej commented Jan 31, 2025

, so importing from the private header must at least work (I think?)

It won't in general. Consider

// public.h
#define FOO int

#include "private.h"
// private.h
FOO foobar(FOO x);

which is reasonable pattern, where some header could setup various things ("configuration") for the rest of the library.

@edsko
Copy link
Collaborator

edsko commented Jan 31, 2025

Ah, yes. Nice example!

@phadej
Copy link
Collaborator Author

phadej commented Jan 31, 2025

To clarify, importing foobar would work with normal ccall (cares only about the symbol name, though that could be mangled too with ## e.g.); but won't with capi.

@edsko
Copy link
Collaborator

edsko commented Jan 31, 2025

My proposal would be:

  1. We use the name of the header as specified when running hs-bindgen, for everything that is exported in that header and in any of its (transitive) #includes.
  2. For libraries that consist of multiple public headers, without a single header to rule them all, we use the mappings that @TravisCardwell is working on that which use also for external bindings.

@TravisCardwell
Copy link
Collaborator

I have been digesting this for a while. This comment documents my current thoughts.

When generating external bindings, we may want to allow users to specify a list of headers to translate (not just one) as well as a list of external bindings to be used as dependencies. Standard library bindings will likely often be used as dependencies, but we should make it possible to have no dependencies as well.

In one mode, we can the generate external bindings configuration file (YAML) while the specified headers are translated. Only those specified headers are output in the generated configuration for the types (etc.), not the (possibly internal) header files where they are actually declared.

We need to support repeated declarations, where the same type is exposed by more than one header file. For example, we may first encounter int32_t in inttypes.h but need to also associate it with stdint.h when that file is translated.

We need to take care to not include types (etc.) that are already included in dependencies. For example, if we create separate external bindings for threads.h since it is an optional part of the standard, we should generate the bindings using the base and hs-bindgen-runtime-libc bindings as dependencies, and type struct tm should not be included in the threads.h bindings even though that header includes time.h.

In some cases, users may not want to create bindings for everything. Note that users may not be able to parse everything if we do not support everything (such as long double). For such cases, a separate mode could use an existing external bindings configuration file to guide the translation. The file specifies the types (etc.) that should be parsed/included. For a type/macro/function to be included, however, all of the types that it depends on must also be included. If the configuration file does not specify such a necessary type, it is an error. If all of the header(s) parse without error, a user could first create the external bindings configuration file automatically (as described above) and then edit the file to remove what they do not want to include. Otherwise, they need to create the file manually.

Note that the standard library remains a special case that we implement and maintain by hand, as it defines the standard types (etc.), not those of a specific implementation.

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Feb 10, 2025

How about we always import from the header file that is being translated? It exposes everything that it imports, after all.

When we support translating more than one header file, users would be able to specify all of the headers that they would like to import from. For example, if somelib.h is imported by foo.h, then the types for that library can import directly from somelib.h by specifying both somelib.h and foo.h as input paths. The order is significant when there are dependencies. Perhaps the preprocess command could look something like this:

$ hs-bindgen \
    --include-path "$(pwd)/cbits" \
    --system-include-path /usr/lib/clang/19/include \
    --system-include-path /usr/local/include \
    --system-include-path /usr/include \
    --select-all \
    preprocess \
      --input somelib.h \
      --input foo.h \
      --module Acme.Foo \
      --output src/Acme/Foo.hs

Note that input paths must be relative to an include path. In the above example, library header somelib.h may be in /usr/include while user header foo.h may be in $(pwd)/cbits.

I have unfortunately not been able to find a (good) way to get the include path from libclang. It is possible in the C++ API, but one has to create an instance and start the preprocessor before the information is available.

@edsko
Copy link
Collaborator

edsko commented Feb 11, 2025

How about we always import from the header file that is being translated? It exposes everything that it imports, after all.

This is essentially what I am proposing also right? So seems we are on the same page?

@TravisCardwell
Copy link
Collaborator

Yes! We are indeed on the same page. I had to work through it before I understood why the other options I had in mind do not work. 🙇

At some point, there were comments about not wanting to import from header files that users write just to specify what they want to translate. This is no longer viable, correct? We would include from such a header, which must be in a directory added to the include path.

I think that specifying multiple headers is a good way for users to specify which headers may be imported from. On the command line, perhaps we could specify them via (one or more) arguments (instead of using options). For the TH API, perhaps we can have a [FilePath] argument instead of FilePath. Users must take care to order headers so that dependencies are satisfied.

@TravisCardwell
Copy link
Collaborator

With PR #417, the user-specified header path (relative, as specified in a #include) is used for foreign import headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants