Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] cmake_layout with subproject vs conan.tools.scm.Git #12360

Closed
db4 opened this issue Oct 21, 2022 · 16 comments
Closed

[question] cmake_layout with subproject vs conan.tools.scm.Git #12360

db4 opened this issue Oct 21, 2022 · 16 comments
Assignees
Milestone

Comments

@db4
Copy link
Contributor

db4 commented Oct 21, 2022

I'm trying to make layout with multiple subprojects work. My sources are stored in a git repo organized as follows:

(git root)
└─── pkg1
│   │   CMakeLists.txt
│   └─── src
│       │   source_file.cpp
│       │   ...
└─── pkg2
|   │   CMakeLists.txt
|   └─── src
|       |   ...
...

Trying to build them in Conan cache I wrote

    def layout(self):
        self.folders.root = ".."
        self.folders.subproject = "pkg1"
        cmake_layout(self)

    def source(self):
        git = Git(self)
        git.clone(URL, target='.')
        git.checkout(REVISION)
        self.output.info("Syncing git submodules")
        git.run("submodule sync --recursive")
        git.run("submodule update --init --recursive")

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

This doesn't work: git.clone() clones the whole git repo inside pkg1 subfolder and cmake.configure() fails. What's the recommended way to use scm.Git with package layout feature?

@seesturm
Copy link

I'd also have this question. Is this documented feature supported at all?

@memsharded
Copy link
Member

Hi all,

Sorry @db4 this was not responded back then, it probably fell through the cracks.

We would need a bit more detail what is failing, and something that we can actually reproduce. It is not clear why cmake.configure() fails, and what error message is printing. It is also not very clear what should happen:

  • When the pkg1 is built in the cache, the only source that is necessary is the source inside the pkg1 folder?
  • Where is the conanfile.py located in that layout?
  • I guess there is a conanfile.py inside each pkgX folder?

It seems that Git object indeed doesn't implement anything like "clone and checkout only a subfolder of a project", and that should be implemented manually. Still, it should be orthogonal to the layout().

Hi @seesturm, if it is the same question, could you please provide those details to understand better the problem? Thanks.

@seesturm
Copy link

For demo purposes I've created https://github.com/seesturm/subprj . I can build locally using "conan install ." + "conan build ." in pkg1 folder. But "conan create ." fails for me.

CMake Error: The source directory "$HOME/.conan2/p/t/pkg164137f9894528/b/pkg1" does not appear to contain CMakeLists.txt.

As the original submitter writes, the problem is that the git is not cloned into "b" but into "b/pkg1" which makes CMakeLists.txt end up in "b/pkg1/pkg1".

@memsharded
Copy link
Member

Thanks very much for the repro case.
I now understand better the issue. It seems we already provisioned some utility to deal with it, because this use case requires a folder manipulation in the source() method, but still didn't make it public, waiting for user feedback and probably for a smarter way to do or avoid the folder manipulation. This works in my fork:

diff --git a/pkg1/conanfile.py b/pkg1/conanfile.py
index 74ddfce..0be5162 100644
--- a/pkg1/conanfile.py
+++ b/pkg1/conanfile.py
@@ -2,6 +2,7 @@ from conan import ConanFile
 from conan.tools.scm import Git
 from conan.tools.files import update_conandata
 from conan.tools.cmake import CMakeToolchain, CMake, cmake_layout, CMakeDeps
+from conan.tools.files.files import swap_child_folder


 class pkg1Recipe(ConanFile):
@@ -28,6 +29,7 @@ class pkg1Recipe(ConanFile):
                sources = self.conan_data["sources"]
                git.clone(url=sources["url"], target=".")
                git.checkout(commit=sources["commit"])
+               swap_child_folder(self.source_folder, "pkg1")

        def generate(self):
                deps = CMakeDeps(self)

It would be a matter to polish the implementation, find a better name for it, and make it part of the public tools. I am annotating this for next 2.0.3, but if you want to give it a try (don't rely on it, because it is private and will certainly break), that would be useful

@memsharded memsharded added this to the 2.0.3 milestone Mar 21, 2023
@db4
Copy link
Contributor Author

db4 commented Mar 21, 2023

Hi @memsharded,

but if you want to give it a try, that would be useful

Unfortunately, I can't. When I started the issue, I tried to make the feature work for Conan 1.x. Recipes that need it haven't been ported to 2.x yet.

@memsharded
Copy link
Member

from conan.tools.files.files import swap_child_folder exists in latest 1.X too, I think I tested the above in 1.X. We usually target things first to 2.0 nowadays, but the above function was in 1.X

@db4
Copy link
Contributor Author

db4 commented Mar 23, 2023

Well, I tried your approach with swap_child_folder. It seems to be problematic: swap_child_folder cannot correctly handle child folders consisting of several directories (like packages/pkg1) and fails if the directory tree has repeated names somewhere (pkg1/pkg1). I created a better implementation that's free of these problems. I can share it but first I would like to know why all this is necessary. Why can't git.clone() use a proper location? All it has to do is just ignore self.folders settings. Quick and dirty proof of the concept:

    def source(self):
        tmp = self.folders.source
        self.folders.source = None
        with chdir(self, self.source_folder):
            shutil.rmtree(os.path.normpath(self.folders.subproject).split(os.sep)[0])
            git = tools.Git()
            ver = self.conan_data["sources"][self.version]
            git.clone(ver["url"])
            git.checkout(ver["revision"])
            self.folders.source = tmp

@memsharded
Copy link
Member

Thanks for pointing this out @db4
Indeed there are some limitations, the former implementation was too simplistic, this is the reason we are trying to come up with a more complete one in #13509, that takes those cases into account.

I have tried what you suggest, but unfortunately I couldn't make it work. The issue seems to be that source() method already executes inside the self.source_folder, it is the cwd. Trying to remove, overwrite or something like that inevitably crashes, as the resource is busy. Did your proof of concept work, did you try it?

@db4
Copy link
Contributor Author

db4 commented Mar 24, 2023

For the record, here is my implementation of swap_child_folder:

def move_child_folder(parent_folder, child_folder):
    """ move child_folder subtree into parent_folder """
    child_path = os.path.join(parent_folder, child_folder)
    for (root, dirs, files) in os.walk(child_path):
        rel_root = os.path.relpath(root, child_path)
        for d in dirs:
            os.makedirs(os.path.join(parent_folder, rel_root, d), exist_ok=True)
        for f in files:
            shutil.move(os.path.join(root, f), os.path.join(parent_folder, rel_root, f))
    # We don't care about empty directories left in child_folder. They don't affect git state
    # but cleaning them correctly could be challenging. Consider this case:
    # a/a/a/b -> a/a/b

The issue seems to be that source() method already executes inside the self.source_folder, it is the cwd. Trying to remove, overwrite or something like that inevitably crashes, as the resource is busy. Did your proof of concept work, did you try it?

Sure. Note self.folders.source = None in the code above. That effectively resets self.source_folder to the root. Then chdir to the root, remove empty subproject directories that are already created to the moment, clone repo, and restore everything in the end. Yes, I tried it on 1.59 and it works. It's obviously a hack, but maybe Git.clone() could implement it better internally?

@memsharded
Copy link
Member

#13509 was merged to develop (for Conan 1.60 and 2.0.3) with the new move_folder_contents() helper, taking into account those things. Could you please check it? Thanks!

@db4
Copy link
Contributor Author

db4 commented Mar 31, 2023

Hi @memsharded

Unfortunately, it doesn't work for me:

    def source(self):
        sources = self.conan_data["sources"][self.version]
        git = Git(self)
        git.clone(url=sources["url"], target=".")
        git.checkout(commit=sources["revision"])
        src_folder = os.path.normpath(self.source_folder)
        dst_folder = os.path.dirname(os.path.dirname(src_folder))
        self.output.info(f"{src_folder=} {dst_folder=}")
        move_folder_contents(src_folder, dst_folder)
Ocd/2.21.1: Configuring sources in D:/.conan\8959ae\1\ObjectClassificationDetection/Ocd\.
Ocd/2.21.1: Cloning git repo
Ocd/2.21.1: Checkout: e5cb9b414c4bb8affd86b38a82f0d3b226877e20
Ocd/2.21.1: src_folder='D:\\.conan\\8959ae\\1\\ObjectClassificationDetection\\Ocd' dst_folder='D:\\.conan\\8959ae\\1'
ERROR: Ocd/2.21.1: Error in source() method, line 66
        move_folder_contents(src_folder, dst_folder)
        OSError: Cannot change permissions for D:\.conan\8959ae\1\ObjectClassificationDetection\Ocd! Exception info: (<class 'PermissionError'>, PermissionError(13, 'The process cannot access the file because it is being used by another process'), <traceback object at 0x000001A4F371E8C0>)

BTW, it would be great if you do os.path.normpath(self.source_folder) inside layout() so that self.source_folder doesn't end with /.

@memsharded memsharded modified the milestones: 1.60, 1.61 May 8, 2023
@memsharded memsharded modified the milestones: 1.61, 1.62 Sep 11, 2023
@franramirez688 franramirez688 modified the milestones: 1.62, 1.63 Nov 7, 2023
@czoido czoido modified the milestones: 1.63, 1.64 Feb 12, 2024
@memsharded memsharded modified the milestones: 1.64, 2.3.0 Apr 4, 2024
@memsharded
Copy link
Member

Hi all,

This has been moved to Conan 2 scope, to make sure there isn't any gap or missing thing in Conan 2.
I am having a look, but struggling a bit to understand the issue. I have started with the code shared above by @seesturm, and made a unit test, but this one is working.

The first step is making sure we have a unit test that captures the scenario.

This is the current unit test, that is working in Conan 2:

def test_move_sources():
    pkg = textwrap.dedent("""
        import os
        from conan import ConanFile
        from conan.tools.scm import Git
        from conan.tools.files import update_conandata, move_folder_contents
        from conan.tools.cmake import CMakeToolchain, CMake, cmake_layout

        class pkg1Recipe(ConanFile):
            name = "pkg1"
            version = "1.0"
            settings = "os", "compiler", "arch", "build_type"

            def layout(self):
                self.folders.root = ".."
                self.folders.subproject = "pkg1"
                cmake_layout(self)

            def export(self):
                git = Git(self, self.recipe_folder)
                scm_url, scm_commit = git.get_url_and_commit()
                update_conandata(self, {"sources": {"commit": scm_commit, "url": scm_url,
                                                    "folder": "pkg1"}})

            def source(self):
                git = Git(self)
                sources = self.conan_data["sources"]
                git.clone(url=sources["url"], target=".")
                git.checkout(commit=sources["commit"])
                move_folder_contents(self, os.path.join(self.source_folder, sources["folder"]),
                                    self.source_folder)

            def generate(self):
                tc = CMakeToolchain(self)
                tc.generate()

            def build(self):
                cmake = CMake(self)
                cmake.configure()
                cmake.build()

            def package_info(self):
                self.cpp_info.libs = ["pkg1"]
        """)
    c = TestClient()
    c.init_git_repo({"pkg1/conanfile.py": pkg,
                     "pkg1/CMakeLists.txt": gen_cmakelists()})
    c.run("create pkg1")
    print(c.out)

If you can please have a look, and try to propose changes to the unit test that would capture the issue, that would help a lot, thanks!

@thorntonryan
Copy link
Contributor

thorntonryan commented May 1, 2024

@memsharded , thanks for the timely example!

I think this would also work:

def source(self):
    git = Git(self)
    sources = self.conan_data["sources"]
    with chdir(self, ".."):
       git.fetch_commit(url=sources['url'], commit=sources['commit'])

Then we wouldn't need to move files around.
It at least works for me in my initial testing with Conan 1.x, conan install for editable packages and conan create


I wonder if it'd be possible to teach the Git commands to cd to the self.folders.root prior to checkout when a subproject was detected...

It'd also be sort of amazing if we could recognize this sub-project setup and attempt to do a sparse1 checkout too

Footnotes

  1. https://git-scm.com/docs/git-sparse-checkout

@memsharded
Copy link
Member

I wonder if it'd be possible to teach the Git commands to cd to the self.folders.root prior to checkout when a subproject was detected...

In Conan 2, the constructor is Git.__init__(self, conanfile, folder=".", excluded=None), and the folder arg becomes the public self.folder attribute, so it can also be dynamically changed later if desired. This is documented in https://docs.conan.io/2/reference/tools/scm/git.html#git

https://git-scm.com/docs/git-sparse-checkout is still experimental, and might require quite a modern git version, so it sounds a bit too early to make it built-in. At the moment, Git is just a very thin layer over the git command, so git.run("sparse-checkout .... would be perfectly possible for users wanting to use it.

@puetzk
Copy link

puetzk commented May 2, 2024

Fair. Our thought process was that Git.fetch_commit is also experimental, and also requires a recent git (partial clone, shallow clone, and sparse checkout are all only mainstream in the last few years, though the implementation aremuch older). I think clone --depth is from git 1.5.0 and sparse-checkout from git 1.7.0
https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
https://github.blog/2021-11-10-make-your-monorepo-feel-small-with-gits-sparse-index/

As far as doing it yourself, the problem is you can't configure sparse filters before git init, and to be effective in reducing amount downloaded, you really need to insert the sparse checkout filters before git fetch, fetch_commit (usefully) combines those steps. So making the fetch sparse (based on .subproject) seemed like a fairly natural extension to its place as the way to fetch only the necessary sources.

But while we have some repos with multiple packages, they are not enormous monorepos, so it's only a nice-to-have.

@memsharded memsharded modified the milestones: 2.3.0, 2.4.0 May 6, 2024
@memsharded
Copy link
Member

Some of the features or at least the modern interfaces are way newer like git 2.25, as seen in https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout.

From what I see in those posts, this is far from trivial, and giving this is a nice to have (performance) it seems too much of an effort at the moment.

As the conversation has also deviated from the original post and the layout question has been responded, I am closing this ticket. If someone wants to contribute it, I think it should have no big risk if it can be limited to an opt-in method in the Git class, so it would be acceptable. Thanks all for the feedback!

@memsharded memsharded closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants