Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple inclusion of symlinks #276

Closed
agoose77 opened this issue Jun 13, 2022 · 8 comments · Fixed by #299
Closed

Allow multiple inclusion of symlinks #276

agoose77 opened this issue Jun 13, 2022 · 8 comments · Fixed by #299

Comments

@agoose77
Copy link

agoose77 commented Jun 13, 2022

In the process of porting a package to Hatch, I noticed that it was failing to include a directory that was symlinked to elsewhere in the project. At the moment we have a first-seen-wins approach that means the source directory (in this instance) is being ignored.

In general, I think we should want users to be able to treat symlinks are distinct (rather than looking at the inode). Do you agree?

def safe_walk(path):
seen = set()
for root, dirs, files in os.walk(path, followlinks=True):
stat = os.stat(root)
identifier = stat.st_dev, stat.st_ino
if identifier in seen:
del dirs[:]
continue
seen.add(identifier)
yield root, dirs, files

See scikit-hep/pyhf#1888 for the project in question!

@agoose77 agoose77 changed the title Allow revisiting of symlinks Allow multiple inclusion of symlinks Jun 13, 2022
@ofek
Copy link
Collaborator

ofek commented Jun 13, 2022

Can we do that while also preventing infinite loops?

@agoose77
Copy link
Author

I think so, let me come up with a patch.

@ofek
Copy link
Collaborator

ofek commented Jun 13, 2022

@pytest.mark.requires_unix
def test_infinite_loop_prevention(self, temp_dir):
project_dir = temp_dir / 'project'
project_dir.ensure_dir_exists()
with project_dir.as_cwd():
config = {'tool': {'hatch': {'build': {'include': ['foo', 'README.md']}}}}
builder = Builder(str(project_dir), config=config)
(project_dir / 'README.md').touch()
foo = project_dir / 'foo'
foo.ensure_dir_exists()
(foo / 'bar.txt').touch()
(foo / 'baz').symlink_to(project_dir)
assert [f.path for f in builder.recurse_included_files()] == [
str(project_dir / 'README.md'),
str(project_dir / 'foo' / 'bar.txt'),
]

@ofek
Copy link
Collaborator

ofek commented Jun 13, 2022

Does that project currently work with setuptools? Seems like it does the same thing: https://github.com/pypa/setuptools/blob/78cb747d66bda1a6f6649e82690aaf5083a89d69/setuptools/_distutils/filelist.py#L275-L299

@agoose77
Copy link
Author

In pyhf (which is not my project, but I did look at contributing a new pyproject.toml for), the wheel building does currently work. There is a symlink from docs/.../a-symlink to src/pyhf/schemas. Hatch seems to find docs first, so by the time it encounters src it cannot visit schemas. The reason it's a problem here and not in setuptools is that I assume setuptools is not visiting the docs, so it only ever sees one instance of the (dev, inode) pair

@matthewfeickert
Copy link
Contributor

matthewfeickert commented Jun 21, 2022

Does that project currently work with setuptools?

Yes, pyhf currently uses setuptools as our build backend.

Hatch seems to find docs first, so by the time it encounters src it cannot visit schemas

I haven't looked at what is actually being done here yet, but why does hatch need to look at all of the directories in the repository when a project is very probably only packaging a subset?

edit: Seems like that shouldn't be necessary/a problem with tool.hatch.build.targets.

@ofek
Copy link
Collaborator

ofek commented Jun 21, 2022

Hatchling inclusion is based mostly on pattern filtering rather than raw paths. Let me think about this.

@ofek
Copy link
Collaborator

ofek commented Jul 5, 2022

matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Jan 20, 2023
* Migrate build system to Hatchling and add Hatch configuration in
  pyproject.toml.
   - Use pip v21.2+'s ability to handle recursive dependencies to define
     nested extras.
     c.f. https://hynek.me/articles/python-recursive-optional-dependencies/
   - Use hatch-vcs to dynamically determine the version information from Git.
   - Use 'only-include' option with `[tool.hatch.build.targets.sdist]` to avoid
     problem with hatchling discovering the symlink of src/pyhf/schemas under
     docs/ before src/ and skipping including it in the sdist.
     c.f. #791 and
          pypa/hatch#276
* Remove all setuptools related files: setup.py, setup.cfg, MANIFEST.in.
* Add .flake8 config file as the config for flake8 had previously been in setup.cfg
  as flake8 doesn't support pyproject.toml.
* Remove use of check-manifest from packaging GitHub Actions workflow.
* Remove mention of setup.cfg from release checklist.
* Add Scikit-HEP admins info to maintainers metadata.

Co-authored-by: Angus Hollands <goosey15@gmail.com>
Co-authored-by: Ofek Lev <oss@ofek.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants