Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tailor generated a target for an invalid requirements.txt file #15276

Closed
davidbeers opened this issue Apr 28, 2022 · 7 comments
Closed

tailor generated a target for an invalid requirements.txt file #15276

davidbeers opened this issue Apr 28, 2022 · 7 comments
Labels
bug onboarding Issues that affect a new user's onboarding experience

Comments

@davidbeers
Copy link

Describe the bug
Getting started with Pants in my Python monorepo and quickly got the error when trying to run test goal first time

MappingError: Failed to parse ./ml_projects/roof_faces_ffs/BUILD:
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Steps to reproduce:
Configured pants.toml like so:

[GLOBAL]
pants_version = "2.10.0"

backend_packages = [
    "pants.backend.python"
]

[source]
root_patterns = [
    "src/",
    "tests/"
]
[anonymous-telemetry]
enabled = false

Then ran ./pants tailor to autogenerate BUILD files.

This command put BUILD files in src and subdirectories as well as tests and subdirectories: any directory that contained a *.py file got a BUILD. Visual inspection shows many of the BUILD files in the `src' directories have a single line of code:

python_requirements()

Note that this python_requirements() macro doesn't seem to be well documented as no example code shows it, nor is it in the docs I could find readily. I found a passing mention of it here and it appears to have been automatically added because of the presence of requirements.txt in the same directory.

I noticed that other generated BUILD files nested under the /src directories contained:

python_sources()

Not sure what I had accomplished at this point (the step-by-step "Getting Started" instructions seem to peter out after telling you to run tailor) I figured I would see if the tests goal would run, so entered ./pants test path/to/a/pytest/test.py

After Pants crunches for a few seconds I get the "Failed to parse" error listed above. If I go through the automatically generated build files and add an empty line to files with python_requirements() I can sometimes make the error go away (or move to a different file) but it's inconsistent: I have added blank lines at the top of all the BUILD files and the error persists so I suspect I was seeing only some variation in the order of execution before hitting the error. The error only occurs on BUILD files containing python_requirements().

Pants version
2.10.0

OS
MacOS

Additional info
Please write some more docs about what we can expect to see happen when the tailor command is run since many things are mysterious about this to new users:

  1. I expected BUILD files to be created in the /src and /tests directories since those were configured root_patterns. I didn't expect to see BUILD files in every subdirectory of these configured roots: I have a nested structure to provide some namespacing, not to have a separate build for every leaf in my tree. Perhaps BUILD doesn't always mean BUILD and I should ignore these? Or perhaps tailor isn't working properly because I've structured my code in this nested manner? (which I think is common?) I can't tell at this point and feel pretty confused.
  2. The BUILD files that tailor adds to leaf directories contain the python_sources() macro, which also doesn't seem to be documented very well considering it's output from the very first command the Getting Started instructions have the new user execute.

Or if I find I can get some traction with Pants I would be happy to help with documenting things that I learn and that seem not to be documented.

@davidbeers davidbeers added the bug label Apr 28, 2022
@stuhood
Copy link
Member

stuhood commented Apr 29, 2022

MappingError: Failed to parse ./ml_projects/roof_faces_ffs/BUILD:
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Can you try passing --print-stacktrace (i.e. ./pants --print-stacktrace test $file)? It's possible that it's actually the content of the requirements.txt file that is the problem, rather than the BUILD file.

Are you able to attach the literal content of both/either of those files, unmodified? Perhaps renamed to *.txt so that git will let you.

Note that this python_requirements() macro doesn't seem to be well documented as no example code shows it, nor is it in the docs I could find readily. I found a passing mention of it here and it appears to have been automatically added because of the presence of requirements.txt in the same directory.

It is documented by the thirdparty dependencies page and the python_requirements page.

Note that the version of the docsite you are looking at matters quite a bit: your link above is to v2.8, which won't have as much detail.

I noticed that other generated BUILD files nested under the /src directories contained:

python_sources()

Not sure what I had accomplished at this point (the step-by-step "Getting Started" instructions seem to peter out after telling you to run tailor) I figured I would see if the tests goal would run, so entered ./pants test path/to/a/pytest/test.py

Yea: sorry about that. There is a missing link between "Getting started" and the per-language introduction pages. For Python in particular, the next step is to work through the Python section.

1. I expected BUILD files to be created in the /src and /tests directories since those were configured `root_patterns`. I didn't expect to see BUILD files in every subdirectory of these configured roots: I have a nested structure to provide some namespacing, not to have a separate build for every leaf in my tree. Perhaps BUILD doesn't always mean BUILD and I should ignore these? Or perhaps tailor isn't working properly because I've structured my code in this nested manner? (which I think is common?) I can't tell at this point and feel pretty confused.

A BUILD file supplies metadata for files (test timeouts and tags, extra dependencies, etc), rather than necessarily demanding that Pants do anything: we've found that having metadata live near the files it applies to is the most scalable way to manage it over time... otherwise you end up with tons of metadata at the root of the repository, or arbitrarily far away from the files it applies to.

2. The BUILD files that tailor adds to leaf directories contain the `python_sources()` macro, which also doesn't seem to be documented very well considering it's output from the very first command the Getting Started instructions have the new user execute.

python_sources has a dedicated page, but in most cases you'll want to start on the page that gives an overview of all file types: https://www.pantsbuild.org/docs/python-backend.


Most critically though: having tailor working is priority number one: once it is, the need to actually worry about the contents of individual BUILD files goes down considerably.

@davidbeers
Copy link
Author

Thanks, @stuhood. Stacktrace:

Engine traceback:
  in select
  in pants.core.goals.test.run_tests
  in pants.backend.python.goals.pytest_runner.run_python_test (sundial_utils/tests/sundial_utils/analytics/test_rando.py)
  in pants.backend.python.goals.pytest_runner.setup_pytest_for_target
  in pants.engine.internals.graph.transitive_targets
  in pants.engine.internals.graph.transitive_dependency_mapping
  in pants.engine.internals.graph.resolve_targets (sundial_utils/tests/sundial_utils/analytics/test_rando.py)
  in pants.engine.internals.graph.resolve_unexpanded_targets (sundial_utils/tests/sundial_utils/analytics/test_rando.py)
  in pants.engine.internals.graph.resolve_dependencies (sundial_utils/tests/sundial_utils/analytics/test_rando.py)
  in pants.backend.python.dependency_inference.rules.infer_python_dependencies_via_imports (sundial_utils/tests/sundial_utils/analytics/test_rando.py)
  in pants.backend.python.dependency_inference.module_mapper.map_module_to_address
  in pants.backend.python.dependency_inference.module_mapper.map_third_party_modules_to_addresses
  in pants.backend.python.dependency_inference.module_mapper.find_all_python_projects
  in pants.engine.internals.graph.find_all_targets_singleton
  in pants.engine.internals.graph.find_all_targets
  in pants.engine.internals.graph.resolve_targets
  in pants.engine.internals.graph.resolve_unexpanded_targets
  in pants.engine.internals.build_files.addresses_from_address_specs
  in pants.engine.internals.graph.resolve_targets
  in pants.backend.python.macros.python_requirements.generate_from_python_requirement
Traceback (most recent call last):
  File "/Users/david.beers/.cache/pants/setup/bootstrap-Darwin-x86_64/2.10.0_py39/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 705, in native_engine_generator_send
    res = func.send(arg)
  File "/Users/david.beers/.cache/pants/setup/bootstrap-Darwin-x86_64/2.10.0_py39/lib/python3.9/site-packages/pants/backend/python/macros/python_requirements.py", line 100, in generate_from_python_requirement
    digest_contents[0].content.decode(), rel_path=requirements_full_path
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

@davidbeers
Copy link
Author

davidbeers commented Apr 29, 2022

Ah I think I may see a possible cause of the problem. There are a couple of subprojects that have requirements.txt that contain only a comment (because the actual requirements need to be extracted from a Dockerfile). They aren't the subproject where the test is being run, but would that do it?

I'll try running tailor with --ignore-paths to exclude the parts of the repo that might not be tailor-ready.

@davidbeers
Copy link
Author

That workaround got me around the problem. The error was indeed from requirements.txt files that didn't contain any dependency entries, which admittedly seems like a corner case that most users probably don't run into. Running tailor with those paths excluded got me to a state where I can run a test without hitting the bug.

@stuhood
Copy link
Member

stuhood commented May 2, 2022

That workaround got me around the problem. The error was indeed from requirements.txt files that didn't contain any dependency entries, which admittedly seems like a corner case that most users probably don't run into.

Great! Are you able to include any more detail about the content of those files? The fact that this was a unicode error would lead me to believe that they were actually binary?

In any case, this is related to #14974.

@stuhood stuhood changed the title Auto-generated BUILD files fail to parse tailor generated a target for an invalid requirements.txt file May 2, 2022
@stuhood stuhood added the onboarding Issues that affect a new user's onboarding experience label May 2, 2022
@davidbeers
Copy link
Author

No, the requirements.txt files in question were not binary. Just text files containing a "#comment".

@stuhood
Copy link
Member

stuhood commented Jun 2, 2022

Thanks again for the report!: I've opened #15734 for next steps here.

@stuhood stuhood closed this as completed Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug onboarding Issues that affect a new user's onboarding experience
Projects
Status: Done
Development

No branches or pull requests

2 participants