Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'File exists' error when building pkg_tar from srcs containing directories #135

Open
cjmakes opened this issue Jul 18, 2019 · 17 comments
Open
Assignees
Labels
P3 An issue that we are not working on but will review quarterly wontfix

Comments

@cjmakes
Copy link

cjmakes commented Jul 18, 2019

Description of the problem / feature request:

When building a pkg_tar from a filegroup defined in an external repository containing multiple files in a directory, PackgeTar throws an io exception because the directory already exists. However, the issue is not present when using --spawn_strategy=local.

Bugs: what's the simplest, easiest way to reproduce this bug.

See reproduction below.

What's the output of bazel info release? Happens with Bazel up through 4.x

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

log.txt

Is there something that I am not understanding about how the rules should work or is this really a bug? If this is just not how the rules should work what is the solution?

@buchgr
Copy link

buchgr commented Jul 29, 2019

@laurentlb why is this assigned to team-Remote-Exec?

@laurentlb
Copy link
Contributor

the issue is not present when using --spawn_strategy=local.

It sounds like an error related to execution. I don't know which exact team should look at this. Feel free to reassign.

@buchgr
Copy link

buchgr commented Jul 29, 2019

@laurentlb isn't that for the dev ex team, of who you are a member of, to find out?

@dslomov
Copy link
Contributor

dslomov commented Aug 7, 2019

Please do not drop issues on the floor - keep them assigned to some team.

@dslomov
Copy link
Contributor

dslomov commented Aug 7, 2019

Assigning to team-Rules-Server for initial investigation/triage.

@aiuto aiuto transferred this issue from bazelbuild/bazel Feb 20, 2020
@aiuto aiuto assigned cjmakes and unassigned aiuto Apr 17, 2020
@aiuto
Copy link
Collaborator

aiuto commented Apr 17, 2020

This got lost in the bazel repo but we move pkg_tar to this repo last year. I'm going through the issue backlog.

Do you have a standalone repo for this, against a newer bazel than 0.27?

@aiuto
Copy link
Collaborator

aiuto commented Aug 7, 2020

Closing as obsolete

@aiuto aiuto closed this as completed Aug 7, 2020
@ash2k
Copy link

ash2k commented Feb 15, 2021

This is still an issue. See https://github.com/ash2k/rules_pkg_bug.

$ bazel --version
bazel 4.0.0

$ bazel build --verbose_failures --sandbox_debug //:broken_package                                                      
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1564776078 -0400"
DEBUG: Repository rules_python instantiated at:
  /Users/mikhail/src/rules_pkg_bug/WORKSPACE:13:23: in <toplevel>
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/rules_pkg/deps.bzl:33:10: in rules_pkg_dependencies
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/utils.bzl:201:18: in maybe
Repository rule git_repository defined at:
  /private/var/tmp/_bazel_mikhail/96dd2b16606e402518a65fd887b915de/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //:broken_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/mikhail/src/rules_pkg_bug/BUILD.bazel:12:8: Writing: bazel-out/darwin-fastbuild/bin/broken_package.tar failed: I/O exception during sandboxed execution: /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/dir1 (File exists)
Target //:broken_package failed to build
INFO: Elapsed time: 0.109s, Critical Path: 0.00s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

$ ls -la /Volumes/ramdisk/bazel-sandbox.332ac94c8699b84322a35ae9570fabb10a838e2c8d09c1ac9ed1b057a48dfea1/darwin-sandbox/11/execroot/rules_pkg_bug/    
total 0
drwxr-xr-x  5 mikhail  staff  160 Feb 15 20:45 .
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 ..
drwxr-xr-x  4 mikhail  staff  128 Feb 15 20:45 bazel-out
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 dir1
drwxr-xr-x  3 mikhail  staff   96 Feb 15 20:45 external

@aiuto
Copy link
Collaborator

aiuto commented Feb 16, 2021

Inlining the reproduction

filegroup(
    name = "files",
    srcs = glob(
        ["dir1/**"],
        exclude_directories = 0,
    ),
)

pkg_tar(
    name = "broken_package",
    srcs = [":files"],
)

pkg_tar(
    name = "working_package",
    srcs = glob(
       ["dir1/**"],
    ),
    strip_prefix = "dir1",
)
dir1/
  file1.txt
  dir2/
    file2.txt

I don't have an idea on the best fix right now. We're trying to get strip_prefix out of pkg_tar and add it to pkg_filegroup, so the shape of the usage is likely to be different.

@aiuto aiuto reopened this Feb 16, 2021
@aiuto aiuto added P2 An issue that should be worked on when time is available and removed need more information labels May 3, 2021
@aiuto aiuto assigned aiuto and unassigned cjmakes May 12, 2021
@aiuto aiuto added P3 An issue that we are not working on but will review quarterly and removed P2 An issue that should be worked on when time is available labels May 12, 2021
@aiuto aiuto changed the title File exists error when building pkg_tar from external filegroup containing directories File exists error when building pkg_tar from filegroup containing directories May 12, 2021
@aiuto
Copy link
Collaborator

aiuto commented May 12, 2021

Before fixing this, let's agree on the expected output of broken_package.

I am going to assert that any previous behavior which flatten the paths returned by the glob is incorrect. The output, strip_prefix=dir1 or not should include:

  • dir1/file1.txt
  • dir1/dir2/file2.txt

So that is what I am working towards.

The snag I have hit is that the list of files in the glob appears to be broken. dir1 and dir1/dir2 are showing up as files with is_directory==False, instead of true. This is just broken. I'll have to investigate why.

@aisbaa
Copy link

aisbaa commented Jun 10, 2021

Any update on this?

I did hit this when packaging py_binary which uses python interpreter from rules_pyenv. Wasted 2 days with all sorts of workarounds without any success. --spawn_strategy=local provided tar that I can use, but then the other part of the project failed to build.

ref: digital-plumbers-union/rules_pyenv#5

@aiuto
Copy link
Collaborator

aiuto commented Jun 10, 2021

I had a good discussion with the core team today.

The fundamental problem is that directories as action inputs is a bad idea in Bazel (see this piece of the documentation). The glob() is not important, we can reproduce a failure with something simple.

pkg_tar(
    name = "broken",
    srcs = [
        "dir1",
        "dir1/dir2/file2.txt",
    ]
)

This might work for local execution, but it won't work for sandboxed execution. This is not likely to change for a long time.
So, the workaround is to never use exclude_directories=0 or provide raw directories as srcs to a pkg_* rule.
Given the example tree above, you can do.

Choice 1: Let pkg_tar create the directories

Either of these work.

pkg_tar(
    name = "tree_glob",
    srcs = glob(
       ["dir1/**"],
    ),
    strip_prefix = '.',
)

filegroup(
    name = "dir_glob",
    srcs = glob(["dir1/**"]),
)

pkg_tar(
    name = "tree_files",
    srcs = [":dir_glob"],
    strip_prefix = '.',
)

Choice 2: If you need to specify modes of a dir, use pkg_mkdirs

In a few weeks, you will be able to do this.

pkg_mkdirs(
    name = "dirs",
    dirs = [
        "dir1/dir2",
    ],
    attributes = pkg_attributes(mode='751', user='sys', group='adm')
)

pkg_tar(
    name = "precise_layout",
    srcs = [
        ":dirs",
        ":dir_glob"
    ],
    strip_prefix = '.',
)

which yields.

$ tar tvf bazel-bin/precise_layout.tar
drwxr-xr-x sys/adm           0 1999-12-31 19:00 ./
drwxr-xr-x sys/adm           0 1999-12-31 19:00 ./dir1/
drwxr-x--x sys/adm           0 1999-12-31 19:00 ./dir1/dir2/
-r-xr-xr-x 0/0               6 1999-12-31 19:00 ./dir1/dir2/file2.txt
-r-xr-xr-x 0/0               6 1999-12-31 19:00 ./dir1/file1.txt

@aiuto aiuto changed the title File exists error when building pkg_tar from filegroup containing directories 'File exists' error when building pkg_tar from srcs containing directories Jun 10, 2021
@aiuto aiuto added the wontfix label Jun 10, 2021
@aisbaa
Copy link

aisbaa commented Jun 10, 2021

Thank you, I've tested with exclude_directories=1 and it did solve File exists error. But I got hit by another fundamental issue link or target filename contains space. It is well described in bazelbuild/bazel#374 and was open since 2015. I think I'll park the idea of using rules_pyenv with rules_pkg.

Thank you for in-depth explanation!

@aiuto
Copy link
Collaborator

aiuto commented Jun 11, 2021

In the next release of rules_pkg you might see better results with space in target names. I've changed the way the rule passes the manifest of what goes into the tarball to the helper that builds it. Now everything should be unicode clean and not have problems with escaping on the command line.
That said, I don't know your particular use case and the new features are a week or two away from being available.

@aisbaa
Copy link

aisbaa commented Jun 11, 2021

That could solve my issues, I'll make sure to test it when its out. Thank you for maintaining and developing rules_pkg!

My use case is fairly simple, I just want to use rules_pyenv so that CI would not have to re-download python during each build.

  • The down side of using rule_pyenv with pkg_tar, is that pkg_tar includes python interperter if include_runfiles is set to True. I don't need interpreter archived together with the code, so I've created genrule to repackage same tar without the interpreter.
  • However I still need include_runfiles = True, so that pkg_tar would find and include non-python files (f.e.: certifi/cacert.pem).

Having that said, I do believe that rules_python should have a way to expose all files, not just python files...

@aiuto
Copy link
Collaborator

aiuto commented Jun 11, 2021

Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).

  1. pure python code and data files, with a portable wrapper that invokes the system python
  2. python code with data and .so files, but now we have to be careful to compile the .so's for a particular target system, so this is not portable
  3. a fully, self-extracting binary with an embedded interpreter. Again, targeted for a particular system.

Case 1 is easy. Just pull in all the files
Case 3 is equally easy, it is always a single file.
Case 2 is hard, because people might want a combination of .so and .dll in the same package, so it can be more portable

@aisbaa
Copy link

aisbaa commented Jun 15, 2021

Yes. This is really a job for rules_python. There should be a few options for py_binary (or other rules like py_hermetic_binary).

Could not agree more and I've discovered that they have rules for packaging (rule_python/docs/packaging.md) which might do it.

  1. pure python code and data files, with a portable wrapper that invokes the system python

That is actually part of py_binary. The down side is that it pulls interpreter when rules_pyenv is used.

lazcamus added a commit to lazcamus/rules_pyenv that referenced this issue Mar 10, 2022
There are 2 incompatibilities with the python install that pyenv creates:
* empty directories - bazel's sandbox mode + rules_pkg doesn't work with
  them: bazelbuild/rules_pkg#135
* spaces in filenames - bazelbuild/bazel#374

So pass flags to glob() to exclude these sticking points, and you can now build
a container (or tar) with rules_pyenv built python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 An issue that we are not working on but will review quarterly wontfix
Projects
None yet
Development

No branches or pull requests

7 participants