Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem significant languages and build directory implementation details #8725

Open
dcbaker opened this issue May 3, 2021 · 29 comments
Open
Labels
design discussion Discussions about meson design and features language:rust modules:python Issues specific to the python module
Milestone

Comments

@dcbaker
Copy link
Member

dcbaker commented May 3, 2021

Meson has a strict rule that the layout of the build directory as an implementation detail, this makes a lot of sense as it gives us (Meson) flexibility to change the layout when it makes sense (we've done this a number of times, including with things like shortening the names of generated directories, or adding the private dir). This is also completely fine for languages like C in which the filesystem layout is an implementation detail of the language, and can easily be fixed with include_directories(). There are however, a growing number of languages supported by Meson (Rust, Python), or with pending support in Meson (Cython, Zig), in which the filesystem layout is the import layout. This introduces real problems when trying to mix generated and static sources, as the compilers generally expect a single, unified tree.

I'd like to propose then a way to provide sources as a tree, but in an abstract way such that Meson can make configure time decisions about what it wants to do to fulfill these requirements. Specifically, I'm proposing:

executable(
  'foo',
  {
    '' : 'main.rs',
    'foo' : [generated_rs],
    'foo/bar' : ['other.rs'],
  }
)

This specifically would tell Meson that when compiling a file main.rs, it must be in a directory such that there is a subdir foo, which contains the output of generated_rs (whatever that happens to be), and foo must have a subdir bar, with other.rs. Meson could at configure time decide that all of the inputs are static (non-generated) files, and do nothing, or it could decide that since generated_rs is a custom_target, to copy/link all of the files into a new directory and compile them there.

This solves the problem that Rust, Zig, and Cython have of needing a single tree of filres, but also creates an abstraction such that Meson can continue to hide the actual layout of the build dir, as this tree could exist anywhere within the build dir.

@dcbaker dcbaker added modules:python Issues specific to the python module design discussion Discussions about meson design and features language:rust labels May 3, 2021
@tristan957
Copy link
Contributor

Copied from IRC:

Hmm. I think Java would have an issue with a generated file being off in the build folder as well due to the way imports work
But I think that can be solved through class paths similar to include directories
but don't quote me because I am not a Java genius

@tristan957
Copy link
Contributor

I did run into this with Cython earlier. Was not happy.

@bugaevc
Copy link

bugaevc commented May 18, 2021

In Rust+Cargo, they usually work around this by having Cargo set the OUT_DIR environment variable when invoking both the buildscript and the compiler. The buildscript thus knows where to write generated code to, and the static code can do tricks like

mod generated {
    include!(concat!(env!("OUT_DIR"), "/generated.rs"));
}

(or even

#[path = concat!(env!("OUT_DIR"), "/generated.rs")]
mod generated;

but that doesn't work.)

@dcbaker
Copy link
Member Author

dcbaker commented May 18, 2021

Ah, yes. There's another problem with that. Ninja doesn't support environment variables by design, so to handle that we would have to either implement a command line switch for rustc to override the env! macro, or Meson would have to have a wrapper that allowed passing environment variables as command line switches. I'll likely have to fix that too in the future.

@bugaevc
Copy link

bugaevc commented May 19, 2021

or Meson would have to have a wrapper that allowed passing environment variables as command line switches

That's just /bin/env, no?

@eli-schwartz
Copy link
Member

Not on Windows it isn't...

@jpakkane
Copy link
Member

This introduces real problems when trying to mix generated and static sources, as the compilers generally expect a single, unified tree

How do existing systems handle this then? Do they mandate that all generated source files must be put in the source directory instead of a build directory?

@dcbaker
Copy link
Member Author

dcbaker commented May 19, 2021

Cargo, AFAICT, doesn't handle this case at all. They rely on tricks in the source code with env vars (either using the concat! macro or the include! macro. I found a bug about this at one point, but can't find it now. That doesn't match the pattern in Meson of well, not doing ugly hacks. For Cython we could just hack the crap out of the PYTHONPATH (I guess java is the same?).

@bugaevc
Copy link

bugaevc commented May 19, 2021

Wouldn't it be cleanest to add a flag to rustc that would cause it to search for modules under multiple directories, much like include directories work in C-based languages?

Perhaps like this:

$ rustc lib.rs --mod-path build/generated/

Then upon seeing

#[path = "foo/bar.rs"]
mod meh;

rustc would search both at ./foo/bar.rs and at build/generated/foo/bar.rs

@tristan957
Copy link
Contributor

tristan957 commented May 19, 2021

@dcbaker the term you are looking for in Java-land is classpath.

https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html

Unfortunately classpath looks at .class files (.java compiled) so there is a little bit more work that has to be done other than telling javac that these files exist.

I think this can be solved by maybe being able to teach dependency() about JAR files and/or more specialization in the JAR method. A generated Java file could have dependencies on static code in your project or have dependencies on something from Maven Central for instance.

@tristan957
Copy link
Contributor

OR maybe classpath isn't even the right approach to solving this problem in Java. The amount of hassle that I just described seems to tell me it isn't the best approach or would even work to begin with. If generated.java depends on static.java and static.java depends on generated.java classpath couldn't solve your issue at all. I need to do some research to be able to effectively give a good opinion.

@dcbaker
Copy link
Member Author

dcbaker commented May 19, 2021

@bugaevc For Rust that is definitely an option, though we'd need to be able to pass it multiple times, rustc --mod-path generated --mode-path some/other/dir src/lib.rs I'll have to look at cython for the same thing.

@tristan957
Copy link
Contributor

Ok I just got done doing some testing with javac. We should get this feature for free in Java.

com/micron/hse/MyStatic.java

package com.micron.hse;

import com.micron.hse.MyGenerated;

public class MyStatic {
        public static void main(String[] args) {
                System.out.println(MyGenerated.hello());
        }
}

build/com/micron/hse/MyGenerated.java

package com.micron.hse;

public class MyGenerated {
        public static String hello() { return "hello"; }
}

Looks like javac just cares about packages matching up.

javac -d . com/micron/hse/MyStatic.java build/com/micron/hse/MyGenerated.java

^ That works fine. If done correctly you should see 2 .class files in com/micron/hse

@dcbaker
Copy link
Member Author

dcbaker commented May 19, 2021

So it's really just Cython and Rust that this affects.

@dcbaker
Copy link
Member Author

dcbaker commented May 19, 2021

and possibly zig

@rgommers
Copy link
Contributor

I think for Cython the situation is a little worse than I expected. It relies not only on files in the same directory, but walks all the way up the package hierarchy. An example taken from SciPy:

scipy/
  __init__.py
  linalg.pxd
  linalg/
    __init__.py
    _generate_cy.py   # generated cython_blas.pxd (plus a bunch more files)
    _decomp_update.pyx.in   # is a template file, translated to _decomp_update.pyx by a custom_target step
    meson.build

The most relevant part of the linalg/meson.build file is:

_decomp_update_pyx = custom_target('_decomp_update',
  output : '_decomp_update.pyx',
  input : '_decomp_update.pyx.in')

# Extension relies on the generated cython_blas.pxd too
py3.extension_module('_decomp_update', _decomp_update_pyx, dependencies : py3_dep,
  install : true, subdir : 'scipy/linalg')

So there's a .pxd file at the scipy/ level, and at the scipy/linalg/ level that are both needed. To cimport, Cython also needs to see __init__.py files, or it will fail to compile the .c file (it will complain about "relative import outside source tree").

As a hack, I tried copying all the files that are needed to the build dir, like so:

# Needed to trick Cython, it won't do a relative import outside a package
_dummy_init = custom_target('_dummy_init',
  output : '__init__.py',
  input : '__init__.py',
  command : ['cp', '@INPUT@', '@OUTDIR@'])

Then the build succeeds. If I do not copy scipy/__init__.py to the build dir (extra annoying to do, because it needs to be done in the meson.build file one directory up from where we build the extension), then the generated C code that is supposed to look like:

#define __PYX_HAVE__scipy__linalg___decomp_update

silently changes to

#define __PYX_HAVE__linalg___decomp_update

and we get an incomprehensible error at import time (it'll look like pyximport related issues like cython/cython#2977).

So it can be made to work, but it's pretty hacky even with the "structured sources" idea in gh-8775.

@dcbaker
Copy link
Member Author

dcbaker commented Jun 14, 2021

Out of curiousity, why would something like:

_decomp_update_pyx = custom_target('_decomp_update',
  output : '_decomp_update.pyx',
  input : '_decomp_update.pyx.in')

py.extension_module(
  'foo',
  {
    '': ['__init__.py', 'lina1g.pyx',],
    'lina1g': ['__init__.py', '_generate_cy.py', _decomp_update_pyx], 
  }
)

Work? For Cython I suspect we need to set the --workdir= flag to point to the builddir, but otherwise that seems not too bad, unless I'm missing something?

@rgommers
Copy link
Contributor

Maybe that's not too bad indeed. The first part should then be:

_decomp_update_pyx = custom_target('_decomp_update',
  output : 'linalg/_decomp_update.pyx',
  input : 'linalg/_decomp_update.pyx.in')   # or some variant (is `/` portable?)

It seems like this code is then in the wrong meson.build file - basically if you have a lot of Cython code that all ends up in the top-level directory rather than right next to where to .pyx files live. But maybe that's not too much of an issue.

The alternative I thought of was to have one custom_target at the top level that simply copies over all __init__.py and all .pxd files into the build dir.

@dcbaker
Copy link
Member Author

dcbaker commented Jun 14, 2021

Yeah, the idea behind the structured inputs was that you could just describe the layout to Meson and it would ensure that it exists. It actually uses (many) custom targets behind the scene to do the copies if you need them.

Also, yes, / is portable in the Meson DSL, we os.normpath them

@dcbaker dcbaker added this to the 0.59.0 milestone Jun 22, 2021
@tristan957
Copy link
Contributor

@dcbaker can this be closed since we now have structured_sources()?

@dcbaker
Copy link
Member Author

dcbaker commented Feb 25, 2023

I think so. I guess we never got this implemented for cython though

@tristan957
Copy link
Contributor

Do envision structured source for pymod.extension_module()?

@dcbaker
Copy link
Member Author

dcbaker commented Feb 25, 2023

Yeah. If you’re generating pyx files you need to structure the sources appropriately

@rgommers
Copy link
Contributor

There's two reasons for having to copy files to the build dir for Cython:

  1. Mixing generated and non-generated files (e.g. a generated .pyx with a non-generated .pxd)
  2. Dealing with determining import names, which requires copying __init__.py files to the build dir for generated .pyx

To get rid of (2) there's now --module-name in the cython CLI:
Fully qualified module name. If not given, it is deduced from the import path if source file is in a package, or equals the filename otherwise. So I think we should integrate that to get rid of (2).

For (1), structured_sources may still be useful indeed.

@tristan957
Copy link
Contributor

1 is the issue that I currently have since I generate the pyx files to add documentation.

@eli-schwartz
Copy link
Member

Feels like this should be solved by something like -Iincludedir/ but for cython_args?

@rgommers
Copy link
Contributor

Feels like this should be solved by something like -Iincludedir/ but for cython_args?

You'd hope, but I don't think that that will work - there is such a thing as from . cimport xxx which does depend on two files being next to each other (for cimport xxx there's a search path indeed).

@eli-schwartz
Copy link
Member

It's not ideal to restrict which valid code you're allowed to use "because build systems", but I guess using absolute rather than relative cimports could be a low-effort way to solve that for situations where out of tree builds are done?

That would have the benefit of avoiding otherwise unneeded file copying.

The alternative would be... I dunno, a special version of -Iincludedir that treats two directories as overlays instead of simply adding them both to the search path?

@rgommers
Copy link
Contributor

rgommers commented Mar 2, 2023

Perhaps - it would need testing whether absolute imports work in both situations (generated .pyx + non-generated .pxd, and vice versa).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion Discussions about meson design and features language:rust modules:python Issues specific to the python module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants