Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python distribution task for user-defined setup.py + integration with ./pants {run/binary/test} #5141

Merged
merged 91 commits into from
Feb 7, 2018

Conversation

CMLivingston
Copy link
Contributor

@CMLivingston CMLivingston commented Nov 28, 2017

Problem

There is no way to consume a python_library with c_extensions. Relates to #4740.

Solution

Create a new target, python_distribution, that specifies only a distribution name that maps to the directory the BUILD file lives in (a distribution's project directory). Pants will treat the directory that the python_distribution target is defined in aa a distribution project directory (containing a setup.py and sources) and run the pex command on that directory to build an inline wheel using the setup.py. The ./pants binary and ./pants run tasks will dump this distribution into the binary pex for consumption (or requirements pex for the ./pants run case). The setup.py takes care of linking the package sources and c/cpp code together as setuptools Extensions. We've run a similar example by our customer to validate that this meets their needs.

Result

Running ./pants binary or ./pants run on a python_binary target that depends on a python distribution will package the distribution as a wheel and include it in the created pex binary or requirements pex (for ./pants run) for consumption. This example has simple C code to demonstrate the ability to call C extensions from a package built by using this python_distribution target.

@CMLivingston CMLivingston changed the title Python distribution example + ./pants binary task WIP: Python distribution example + ./pants binary task Nov 28, 2017
@wisechengyi
Copy link
Contributor

wisechengyi commented Nov 29, 2017

Quoting myself from #4740

How about naming it python_c_extension_library? python_distribution is a bit confusing to me at first as I thought it would provide the version of python I wanted

@CMLivingston @kwlzn thoughts?

Edit: correct issue #

@CMLivingston CMLivingston changed the title WIP: Python distribution example + ./pants binary task WIP: Python distribution example + ./pants {binary/run} task Nov 29, 2017
@kwlzn
Copy link
Member

kwlzn commented Nov 29, 2017

How about naming it python_c_extension_library? python_distribution is a bit confusing to me at first as I thought it would provide the version of python I wanted

@wisechengyi it's not specific to c-extensions tho, so that'd be framing it up wrong. the term Distribution is well known in the python community (and is the 'dist' in sdist/bdist et al), but is definitely an overloaded term in general.

maybe python_dist or python_sdist would be slightly more idiomatic and a little clearer?

@CMLivingston
Copy link
Contributor Author

@wisechengyi I am pretty torn on it; on one hand I see the confusion that could be had there in terms of the name, however I agree with Kris that the convention of using Distribution in the community aligns with this use case. Abbreviating to python_dist sounds like a happy medium, but since we are building wheels here, I think python_bdist would be better than python_sdist?

@wisechengyi
Copy link
Contributor

Thanks @kwlzn @CMLivingston for explanation. I think python_dist sounds good, and it leaves us the flexibility in case we want to build both sdist and bdist down the road.

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good start - handful of initial thoughts.

class PythonDistribution(PythonTarget):
"""A Python distribution containing c/cpp extensions.

:API: public
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once this or anything else goes in marked public you won't be able to remove or change it without a two minor release deprecation cycle. so it would probably make sense to revisit the marking once we've demonstrated good use of it and nothing else needs to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Like Hello world, but built with Pants.

python_binary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as-is, this example isn't the most readable because of how many files/dirs its split around. it would probably make sense to collapse this example into a single dir that had a singular BUILD file containing a python_dist + python_binary and all of the sources in one location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pants.base.payload_field import PrimitiveField

class PythonDistribution(PythonTarget):
"""A Python distribution containing c/cpp extensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's nothing specific to c/cpp extensions here - it's just a python dist, full stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


python_distribution(
name='superhello'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defining sources in the setup.py won't be enough to support things like changed or other features that rely on target->sources mappings.

since the Target impl subclasses PythonTarget, if you weren't overriding sources=[] in the super call it would have an implicit sources= field that would be partially correct in this example (i.e. only globs *.py at the moment, not the c files or other files). it would probably also make sense to 1) permit/require a definition of sources 2) explicitly define sources= in this example since the point is to be illustrative about how to use it 3) consider what the most appropriate default sources globs should be for python_dist.

Copy link
Contributor Author

@CMLivingston CMLivingston Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of #3, what if we just used rglobs(*.h|*.py|*.c|*.cpp) as an implicit default? This way the user would not need to define the sources they care about in two places (setup.py and then have to also list them out in a sources param). I'm not sure I understand the value in allowing the user to define sources beyond restricting change detection to a subset of rglobs(*.h|*.py|*.c|*.cpp).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. rglobs would prevent python_dist from being able to co-exist with other targets in a dir or any child dir, which I think would be bad in general. I'm fairly sure (but not positive) that all of the existing implicit sources globs you'd find would be regular globs for this reason.
  2. globbing all of the .h/.c/.cpp to me, for python_dist, feels unusual/unexpected as a user. I could see *.py - but I'd argue that C extension dists are typically an exception vs the norm in the general case, so we should probably require that users explicitly declare those when needed.

I'm open to other perspectives and counter arguments tho, but from an adoption strategy perspective in my experience it's way easier to add shortcuts like this later than it would be to take them away once delivered if we find they lead to bad/unwanted behavior - so IMHO, starting with the least surprising implicit behavior would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload.add_fields({
'platforms': PrimitiveField(tuple(maybe_list(platforms or [])))
})
super(PythonDistribution, self).__init__(sources=[], payload=payload, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that subclassing PythonTarget is going to net you a handful of fields that may or may not make sense in the dist context, like provides, resources and resource_targets. should consider the ramifications of what would happen if a user defines these expecting them to Just Work - and potentially add tests to cover those cases or disallow their definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Create a temporary working directory, pydistworkdir
pydist_workdir = os.path.join(workdir, 'pydistworkdir')
safe_mkdir(pydist_workdir)
pex_name = "%s.pex" % target.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see this keying all writes to some path under e.g. .pants.d/pyprep/local_dists/<target fingerprint>/... - ideally with that key computed outside of this function and handed in as a target dir vs being handed the workdir directly. it would also probably be a good idea to include some form of no-op reuse of previously built wheels on re-keys to the same fingerprint for build performance.

in order to ensure that sources are properly declared, it should likely also copy the declared sources for the target into a temporary dir in that path before invoking the setup.py in it. this will ensure that any temporary files that are created as part of that land in the workdir and thus don't need manual cleanup such as what you're doing for *.egg-info below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:param dist_targets: PythonDistribution targets to dump
:param workdir: Working directory for python targets (./pantsd/python)
:param log: Use this logger.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good, in general, to add type info to all docstrings. e.g.

:param list dist_targets: A list of `PythonDistribution` targets to build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We noticed that docstrings in most of the files we have touched don't have types. Should we add them to this file alone or should we address it in a separate pull request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't need to be uniform, but in general always adding type info going forward would be preferred.

whl_dist = chroot_deps_contents[0] # TODO: find better way to grab .whl from chroot
whl_location = os.path.join(pydist_workdir, fingerprint, '.deps', whl_dist)
return whl_location
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not successfully building a dist here should probably be an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for tgt in dist_targets:
whl_location = build_python_distribution_from_target(tgt, workdir)
if whl_location:
locations.add(whl_location)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if two different python_dist targets here key to the same wheel, that should probably be an error because otherwise one will be silently ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder = PEXBuilder(path=path, interpreter=interpreter, copy=True)
dump_requirements(builder, interpreter, req_libs, self.context.log)
if python_dist_targets:
dump_python_distributions(builder, python_dist_targets, self.workdir, self.context.log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens here if the py_dist target happens to conflict or overwrite a resolved requirement?

seems to me like we should be dumping those prior to the resolve and then considering them in the full transitive resolve closure to reveal any resolution impact.

this also raises the question of how to handle transitive requirements of python_dist setup.py.. ideally we could construct the wheel intransitively and then let the resolve handle pulling in the transitive requirements as part of this.

there's also a question of how to handle setup_requires in a given setup.py, but that's something we can probably just punt on for now and revisit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Would this approach require anything beyond moving this logic up higher (above dump_requirements) and letting dump_requirements de-dupe transitive requirements?

How would we ensure that 3rd party requirements resolved as part of a python distribution (specified by 3rdparty/python:dep) and requirements specified in setup.py are not duplicated? It seems like this would necessitate handling setup_requires in the here and now as well.

Copy link
Contributor

@UnrememberMe UnrememberMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks promising. I have gave verbal feedback before and I will add those as comments here:

  • I would suggest to give python_distribution a sources field, even if always uses as sources=rglob('**/*'). This should trigger test of depedent modules' test
  • In long term, I would like to see some way to enable breaking down the C/C++ codes into libraries that we can create a build graph from. This should facilitate better modulation.

@CMLivingston CMLivingston changed the title WIP: Python distribution example + ./pants {binary/run} task Python distribution example + ./pants binary task Dec 12, 2017
@@ -53,6 +56,7 @@ def register_goals():
task(name='interpreter', action=SelectInterpreter).install('pyprep')
task(name='requirements', action=ResolveRequirements).install('pyprep')
task(name='sources', action=GatherSources).install('pyprep')
task(name='python-dists', action=PythonCreateDistributions).install('pyprep')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing this task on pyprep seems like the best place to put it, however with this choice it will only work in ./pants run if the produced .whl is dumped into the leftmost runner pex, which seems like an anti-pattern. Thoughts?

@CMLivingston
Copy link
Contributor Author

CMLivingston commented Dec 12, 2017

This is ready for another look when convenient. Unit/integration tests will follow after we solidify the task approach.

@CMLivingston CMLivingston changed the title Python distribution example + ./pants binary task Python distribution task for user-defined setup.py + integration with ./pants {run/binary/test} Dec 13, 2017
Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good mod a handful of questions and comments.

would also need some basic test coverage here before this can be merged.

'super_greet.c',
'hello_package/hello.py',
'hello_package/__init__.py',
'setup.py'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version='1.0.0',
ext_modules=[c_module],
packages=find_packages(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python_tests(
name='superhello',
sources=[
'test_superhello.py'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I guess the linter catches only certain indent errors in *.py files, but not in BUILD files? Pretty sure it flagged indent issues in *py.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



def test_superhello():
assert hello.hello() == "Super hello"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

name='main',
source='main.py',
dependencies=[
'examples/src/python/example/python_distribution/hello/superhello:superhello',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just reference :superhello, since its in the same BUILD file

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fingerprint = dist_tgt.payload.fingerprint()
dist_target_dir = os.path.join(local_dists_workdir, fingerprint)
if not os.path.exists(dist_target_dir):
log.debug('Creating working directory for target %s with fingerprint %s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no way to use a FingerprintStrategy under care of a Task.invalidated() block to do this instead?

built_dists = set()

if dist_targets:
# Check for duplicate distribution names, since we write the pexes to <dist>/<name>.pex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this task produces wheels, not pex files tho - which I wouldn't expect to write to dist (until consumed by a pex).

afaict, its possible for two python_dist targets to share the same base target name - but have totally different dist names based on their setup.py files. so I'm not sure this validation buys much here, fwict? thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really know the dist_name will be created from the target definition currently. It seems like a reasonable approximation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right that this doesn't really protect against much. Just removed.

if len(dists) == 0:
raise TaskError('No distributions were produced by python_create_distribution task.')
elif len(dists) > 1:
raise TaskError('Ambiguous whls found: %s' % (' '.join(dists)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is reasonable, but is this a case you've actually seen before? I wouldn't expect this to be possible without seriously non-standard setup.py.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a check we added before we implemented the fingerprint strategy, so it may not be necessary anymore. Seems safe/reasonable enough to keep, but do you think we should remove?


:param req_libs: A list of :class:`PythonRequirementLibrary` targets to resolve.
:param local_python_dist_targets: A list of :class:`PythonDistribution` targets to resolve.
:returns a PEX containing target requirements and any specified python dist targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: :returns:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

built_dists = self.context.products.get_data(PythonCreateDistributions.PYTHON_DISTS)
if built_dists:
for dist in built_dists:
builder.add_dist_location(dist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me like this is sidestepping the actual resolve, but ultimately I'm curious if this covers the case of:

  1. user has a target that depends on 3rdparty/python:pytorch which provides e.g. pytorch==1.0.0
  2. user also depends on a python_dist that provides pytorch==2.0.0
  3. when the pex is built, it blows up indicating two incompatible versions of pytorch?

if so, great. if not, this could spell out trouble at runtime - particularly because of transitive deps that users might not be aware of. either way this case should probably get covered with an integration test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Let us get back to you on that, definitely need an integ test regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really good observation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this issue get addressed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered in the same way a resolve would blow up on normal python targets (i.e. throwing an error indicating two incompatible versions of pytorch). I have an integration test covering this.

@kwlzn kwlzn requested a review from benjyw December 15, 2017 21:27
safe_mkdir(dist_target_dir)

# Copy sources and setup.py over for packaging.
sources_rel_to_target_base = dist_tgt.sources_relative_to_target_base()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codes for copying sources file should be taken care of with a invalidated block I believe. (https://github.com/pantsbuild/pants/blob/master/src/python/pants/task/task.py#L309)

built_dists = self.context.products.get_data(PythonCreateDistributions.PYTHON_DISTS)
if built_dists:
for dist in built_dists:
builder.add_dist_location(dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really good observation.

if not 'setup.py' in sources_basenames:
raise TargetDefinitionException(self,
'A setup.py is required to create a python_dist. You must include a setup.py file in your sources field.')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should check for multiple setup.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

built_dists = set()

if dist_targets:
# Check for duplicate distribution names, since we write the pexes to <dist>/<name>.pex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really know the dist_name will be created from the target definition currently. It seems like a reasonable approximation?

if not os.path.exists(install_dir):
safe_mkdir(install_dir)
setup_runner = SetupPyRunner(dist_target_dir, 'bdist_wheel', interpreter=interpreter, install_dir=install_dir)
setup_runner.run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: How long does a single setup runner run take? Is it anyway to run multiple setup runner simultaneously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure at the moment, and this will likely be a follow up concern after this PR has landed.

@@ -18,6 +18,7 @@

from pants.contrib.jax_ws.targets.jax_ws_library import JaxWsLibrary


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of a style issue? If not, I would rather not submit a change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a style fix.

python_tests(
name='superhello',
sources=[
'test_superhello.py'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 spaces here vs. 2 below in dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names[name] = dist_target

with self.invalidated(dist_targets, invalidate_dependents=True) as invalidation_check:
local_dists_workdir = self.local_dists_workdir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for local_dists_workdir any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -101,7 +104,8 @@ def extra_requirements(self):
def create_pex(self, pex_info=None):
"""Returns a wrapped pex that "merges" the other pexes via PEX_PATH."""
relevant_targets = self.context.targets(
lambda tgt: isinstance(tgt, (PythonRequirementLibrary, PythonTarget, Files)))
lambda tgt: isinstance(tgt, (PythonDistribution,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation here is valid but a bit odd. I think we generally prefer to break earlier.

lambda tgt: isinstance(tgt, (
  PythonDidstribution, ...))

or

lambda tgt: isinstance(
  tgt,
  (PythonDistribution...)
)

seems to be a bit better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,50 @@
# coding=utf-8
# Copyright 2014 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output = subprocess.check_output(pex)
self.assertIn('Super hello', output)
else:
print('No python {} found. Skipping.'.format(version))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: instead of skip the test, if there is a wrong version of python installed, we should test that an error is generated appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pants convention to skip tests if the system does not have the target interpreter. The test util checks for the interpreter by making a subprocess.check_output call, so worrying about the error is out of the scope of these tests changes, I think. We can discuss further offline.

built_dists = self.context.products.get_data(PythonCreateDistributions.PYTHON_DISTS)
if built_dists:
for dist in built_dists:
builder.add_dist_location(dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this issue get addressed here?

Copy link
Contributor

@UnrememberMe UnrememberMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, feel free to take or leave.

@@ -57,7 +57,12 @@ def __init__(self,
sources_basenames = [os.path.basename(source) for source in sources]
if not 'setup.py' in sources_basenames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION

  count_of_setup_py = [os.path.basename(s) for s in sources].count('setup.py')
  if count_of_setup_py == 0: 
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handful of comments - fwict it seems like there's still a decent functional gap here in that a python_dist's transitive deps won't be considered during resolution. this feels like a feature customers will need from day 1.

if count_of_setup_py > 1:
raise TargetDefinitionException(self,
'Multiple setup.py files detected. You can only specify one '
'setup.py in a python_dist target.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this second check seems like an arbitrary restriction to me - it's not clear why a file named setup.py in e.g. a nested child dir rglob'd into a python_dist would matter as long as we can identify the main setup.py entrypoint target?

conversely, if that invalid, nested setup.py was the only one in a target then this wouldn't complain either.

checking only for the existence of "at least one "top-level" setup.py relative to the target definition" would probably be more correct, since thats the shape of a typical python dist.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. ✅

"""Return whether `invalid_targets` contains at least one target from `targets`.

:param invalid_targets: A list of targets that have been invalidated by a cache manager.
:param targets: A list of targets to check for membership in `invalid_targets`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docstring param ordering

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Dump built python distributions, if any, into the requirements pex.
# NB: If a python_dist depends on a requirement X and another target in play requires
# an incompatible version of X, this will cause the pex resolver to throw an incompatiblity
# error and Pants will abort the build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good to test for, but I'm not sure I'd expect to see many outbound dependency edges from a python_dist other than for transitive pins. in fact, it could make sense to disallow dependencies= for this new target type altogether as a simplification measure since they don't make much sense in the dist building context (and pins could still be composed with a python_dist under a bare target).

but the test case I mentioned earlier (and the one I would be most worried about) would be if the python_dist's setup.py declared itself as pytorch==1.2.3 and then a target depended on that same requirement + e.g. python_requirement_library(pytorch==1.2.4) transitively. by just dumping the local dist into the pex chroot (vs considering that as part of the resolve), I believe this could lead to a pex that would be broken in subtle, weird ways.

I also don't see how any transitive deps of the python_dist would be resolved and included currently - and both of these problems appear to be somewhat intertwined at the impl level.

Copy link

@lgvital lgvital Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sound like fair concerns. Let me ask a couple clarifying questions to make sure I understand them correctly. I'm not entirely clear so this might be easier to clarify over chat / hangouts. Lmk if that's viable for you.

  1. I assume disallowing dependencies= would be mainly a #simplify measure, but it wouldn't cover the test case that you mention?
  2. Re: test case are you referring to a python_dist with a setup.py that has:
setup(
  name='pytorch',
  version='1.2.3'
)

and a target that depends on this python_dist and another target that depends on 1.2.4 transitively like:

python_binary(
  name='main',
  source='main.py',
  dependencies=[
    ':main2',
    ':pytorch',
  ]
)

python_library(
  name='main2',
  source='main2.py',
  requirements=[
    'python_requirement_library(pytorch==1.2.4)',
  ]
)
  1. Hope I'm wrong here, but it seems like resolving transitive deps is a pretty complex/nontrivial undertaking. I understand the need to #BeRigorous, but I'm wondering if we can balance it with "Iterative, team-based development"? What if "this feels like a feature customers will need from day 1" weren't true for our internal customer that requested this feature? Assuming we address all other concerns, is there a path we can take to unblock our customer now and tackle transitive deps in a separate pull request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. correct, two separate things there.

  2. that would be exactly it for the conflict case. for transitive dep resolution, the idea would be a python_binary that depended on a python_dist with a setup.py that declared a install_requires=['some_transitive_dep'] - which would be transitively imported from the python_binary's main.py during a test/run. I believe currently this would fail on the import of some_transitive_dep due to no resolver treatment for setup.py declared deps.

  3. hmm, I don't get the sense it'd be super complex. the first bit would be extracting the injectable requirement spec from the built wheel or setup.py, but you can probably get away with simple post-build wheel filename parsing for that (and then e.g. '=='.join([name, version]). the second bit would be injecting that inferred spec in with the rest of the transitive requirement specs at build time before they're all passed to the resolver (and ensuring the dir containing the wheel is injected into the find-links paths with precedence). afaict, this would solve for both cases described in 2. but a TODO (ideally w/ github issue link) and iteration is fine also.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you for the clarifications and guidance on #3. I'll take a stab and if it proves too complex / time consuming I may just TODO it.

So, related to #1 - the use case for including deps for a python_dist is that our internal customer wants to import tensorflow in setup.py so that they can load all the tensorflow includes into include_dirs. Would there be a way around this, if we weren't going to include deps inpython_dist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...our internal customer wants to import tensorflow in setup.py so that they can load all the tensorflow includes into include_dirs. Would there be a way around this, if we weren't going to include deps inpython_dist?

the dependencies declared on this target would never be made available to the executing setup.py for use in the current model. they would only be pulled in transitively for any target roots and resolved into deps for a final pex product (pex, chroot or otherwise).

setuptools/setup.py et al already make an affordance for this case with setup_requires. this ensures the declared setup_requires deps are present at setup.py execution time vs install time. making it so that the user could declare "i need tensorflow during setup.py execution time" would, afaict, involve plumbing full support for setup_requires in SetupPyRunner via either native setup.py declarations or BUILD file declarations. I think from the pants/pex side, this would involve composing a setup-time pex/chroot and executing the setup.py inside of that where it'd be able to import tensorflow from a hermetic environment.

we typically don't deal with setup_requires much in pants/pex land because of the emphasis on bdists (wherein setup.py is pre-executed for us) for repeatability - but it'd certainly be a nice feature to have for python_dist consumption, since it'd bring us closer to parity with pip.

format, e.g. ``'CPython>=3', or just ['>=2.7','<3']`` for requirements
agnostic to interpreter class.
"""
self.address = address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict, this happens in Target.__init__() so can kill that here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


:return: A boolean indicating if any target in `targets` exists in `invalid_targets`.
"""
return any([target in invalid_targets for target in targets])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use set logic here:

>>> set([3,2,6]).isdisjoint(set([3,5,9]))
False
>>> set([3,2,6]).isdisjoint(set([1,5,9]))
True

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(dists) == 0:
raise TaskError('No distributions were produced by python_create_distribution task.')
else:
return os.path.join(os.path.abspath(install_dir), dists[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can kill the else here and treat the if check as guard clause.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Note that we check for the existence of the directory, instead of for invalid_vts,
# to cover the empty case.
if not os.path.isdir(path):
if not os.path.isdir(path) or targets_are_invalid(python_dist_targets, invalid_targets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious.. what is special about PythonDistribution targets vs others covered by the target_set_id path keying that necessitated this extra check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right, we didn't need that and removed it.


def resolve_requirements(self, req_libs):
with self.invalidated(req_libs) as invalidation_check:
def resolve_requirements(self, req_libs, python_dist_targets=()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on how python_dist_targets is used below, it seems like you could easily get at this with an inner local_python_dists = self.context.targets(is_local_python_dist) vs the new signature + changing the calling pattern of all callers. this would #simplify and reduce duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Note that we check for the existence of the directory, instead of for invalid_vts,
# to cover the empty case.
if not os.path.isdir(path):
if not os.path.isdir(path) or targets_are_invalid(python_dist_targets, invalid_targets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superhello_project = 'examples/src/python/example/python_distribution/hello/superhello'
superhello_tests = 'examples/tests/python/example/python_distribution/hello/test_superhello'

def test_pants_binary(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to use the skipif decorator for these?

@unittest.skipIf(has_python_version(...), "no python ...")
def test_thing(...):
  pass

..but I would also expect python2.7 to be a strict requirement of pants' CI and test environments, so it's probably not even worth doing this check in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it like this to leave room for extension when we want to test with Python 3 (similar to interpreter selection tests).

In this snippet, wouldn't we be doing not has_python_version(...) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if not 'setup.py' in sources:
raise TargetDefinitionException(
self, 'A setup.py in the top-level directory relative to the target definition is required.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference in the error message to the target address will be nice.

Copy link
Contributor Author

@CMLivingston CMLivingston Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetDefinitionException contains this information.

@@ -128,6 +131,12 @@ def _create_binary(self, binary_tgt, results_dir):
dump_sources(builder, tgt, self.context.log)
dump_requirements(builder, interpreter, req_tgts, self.context.log, binary_tgt.platforms)

# Dump built python distributions, if any, into builder's chroot.
built_dists = self.context.products.get_data(BuildLocalPythonDistributions.PYTHON_DISTS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are two local dists belong to two separate targets (say A and B). Running pants binary A B should generate .pex only contain the right local dist. Does this code handle the multiple targets scenario correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import super_greet

def hello():
print(super_greet.super_greet())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? I assume this was for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. That returns a string which needs to be printed for verification in stdout.


def test_pants_binary(self):
self._maybe_test_pants_binary('2.7')
command=['binary', '{}:main'.format(self.superhello_project)]
pants_run_27 = self.run_pants(command=command)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its assumed that pants is running 2.7, is it necessary to specify that in these variables? Or are you future-proofing a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def not necessary, I can clean them up.


python_dist(
name='superhello_with_conflicting_dep',
sources=[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the 'conflicting dep' in this example case?

your setup.py provides a package called superhello. your python_dist and python_binary both depend on the same pex_lib @ pex==1.2.11.

to have a conflicting case here to exercise the core resolver consideration that we've been discussing, your setup.py would need to either stomp on the pex name as built into a dist with a differing version - or depend on one via install_requires - in order to create a conflict (ideally, tests for both).

and again, your python_dist should also not declare a dependencies= here because it is /providing/ a dep (and dragging its transitive dependencies along with it if/when the resolver is wired up correctly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was for declaring a dep in the binary below (3rdparty/pex which conflicts iirc), but I think this is all probably unnecessary because we know targets can handle this correctly already.
So you are proposing we kill of dependencies field of python dist all together and have internal teams list their deps (like tensorflow) in install_requires instead? Because writing python code that depends on TF in your py_dist and linking to 3rdparty/python:tensorflow in the dependencies field of a consuming binary target seems confusing.

"""Return the absolute path of the whl in a setup.py install directory."""
dists = glob.glob(os.path.join(install_dir, '*.whl'))
if len(dists) == 0:
raise TaskError('No distributions were produced by python_create_distribution task.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

producing >1 wheels should probably be an error condition, or at least a warning - vs the silent masking happening here now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_reqs_pex = [
self.resolve_requirements([self.context.build_graph.get_target(addr)])
]
pexes = extra_reqs_pex + pexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was probably better before, as the one liner.

@@ -1,5 +1,5 @@
# coding=utf-8
# Copyright 2016 Pants project contributors (see CONTRIBUTORS.md).
# Copyright 2017 Pants project contributors (see CONTRIBUTORS.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@lgvital lgvital left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous suggestions have been addressed in previous commits. Just one minor nit here.

@@ -30,6 +33,10 @@ def has_python_sources(tgt):
return isinstance(tgt, (PythonLibrary, PythonTests, PythonBinary)) and tgt.has_sources()


def is_local_python_dist(tgt):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar helper method above (is_distribution) and we have another call to isinstance in setup_py.py. Would it be useful to have one helper method, maybe in a util file?

@CMLivingston CMLivingston force-pushed the python-distribution-task branch 3 times, most recently from ac3afc8 to cb4204d Compare January 23, 2018 23:03
@UnrememberMe
Copy link
Contributor

Is there something we needed to do before we can land this change? The failed tests seem to be come over TravisCI infrastructure.

@stuhood
Copy link
Member

stuhood commented Jan 26, 2018

I've just restarted them.

But the answer to your question is really "get a shipit from @kwlzn", probably.

@UnrememberMe
Copy link
Contributor

The latest CI runs only failure comes from golang lint check, Exception message: No <meta name="go-import"> tag found at google.golang.org/grpc/codes, which shouldn't have anything to do with this branch.
Please instruct the next step.

@kwlzn
Copy link
Member

kwlzn commented Feb 1, 2018

I've restarted the failing shard - looks good. Will review tomorrow.

@UnrememberMe
Copy link
Contributor

Ping.

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good - on the test cases in particular. handful of comments.

@@ -34,6 +37,7 @@ def build_file_aliases():
PythonBinary.alias(): PythonBinary,
PythonLibrary.alias(): PythonLibrary,
PythonTests.alias(): PythonTests,
'python_dist': PythonDistribution,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use PythonDistribution.alias() vs 'python_dist'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if vt.valid:
built_dists.add(self._get_whl_from_dir(os.path.join(vt.results_dir, 'dist')))
else:
if vt.target.dependencies :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise TargetDefinitionException(
vt.target, 'The `dependencies` field is disallowed on `python_dist` targets. List any 3rd '
'party requirements in the install_requirements argument of your setup function.'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this check for dependencies + TargetDefinitionException raising to happen in the PythonDistribution.__init__() vs at task execution time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this is not possible in the init method. My attempts have been blocked by AssertionError: Cannot retrieve dependencies of BuildFileAddress(testprojects/src/python/python_distribution/fasthello_with_install_requires/BUILD, fasthello) because it is not in the BuildGraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, not feasible elsewhere: target constructors don't see their dependencies currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(dists) == 0:
raise TaskError('No distributions were produced by python_create_distribution task.')
if len(dists) > 1:
raise TaskError('Ambiguous local python distributions found: %s' % (' '.join(dists)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pants repo style dictates '{}'.format(blah) form (vs pex's %s).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



def build_req_libs_provided_by_setup_file(context, local_built_dists, class_name):
"""Build a requirements library from a local wheel.
Copy link
Member

@kwlzn kwlzn Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use of the term 'build' here is misleading.. how about inject_synthetic_req_libs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, how about taking just a build_graph vs the entire context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -19,6 +19,7 @@ def product_types(cls):

def execute(self):
req_libs = self.context.targets(has_python_requirements)
if req_libs:
dist_tgts = self.context.targets(is_local_python_dist)
if req_libs or dist_tgts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing how dist_tgts is used here?

..but maybe has_python_requirements should be expanded to also match PythonDistribution targets if they now provide a resolvable requirement on the surface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not be viable in the current implementation fwict, as PythonDistribution targets would get treated as requirements libraries elsewhere in the codebase and break things. I see the need for the distinction because tasks that operate on req libs need PythonRequirementsLibrary targets, and while these are provided by py dist targets, it appears to me that the current backend does not accept PythonDistribution targets as a has_python_requirements target as of right now.

"""
local_dist_targets = self.context.targets(is_local_python_dist)
tgts = req_libs + local_dist_targets
with self.invalidated(tgts) as invalidation_check:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to conform more to the existing calling convention - so ideally the caller passes in local_dist_targets as either a combined or separate param here? that would align better with how things are used below too, afaict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tgts = req_libs + (local_dist_targets or []) works as well.

req_name = '=='.join([whl_metadata[0], whl_metadata[1]])
local_whl_reqs.append(PythonRequirement(req_name, repository=whl_dir))
if local_whl_reqs:
addr = Address.parse(class_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the class name feels like an odd Address to use here? it won't be unique, for one - so this likely needs to be generative into some synthetic namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# binary targets that each have their own unique python_dist depencency.
if any([tgt.id in dist for tgt in binary_tgt.closure(exclude_scopes=Scopes.COMPILE)]):
builder.add_dist_location(dist)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are you ensuring alignment between the dump_requirements provided req->dists vs the dists explicitly added here?

i.e. how does dump_requirements know where to resolve custom dists from in the case where they're injected into req_tgts above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# This protects against the case where a single `./pants binary` command builds two
# binary targets that each have their own unique python_dist depencency.
if any([tgt.id in dist for tgt in binary_tgt.closure(exclude_scopes=Scopes.COMPILE)]):
builder.add_dist_location(dist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to add a test case for the case described here - did not see one below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwlzn kwlzn requested a review from jsirois February 5, 2018 22:03
@kwlzn
Copy link
Member

kwlzn commented Feb 5, 2018

+ @benjyw and @jsirois for a second pair of eyes on the Task impl et al.

Copy link
Member

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one last important comment about synthetic address namespacing and a couple of readability/doc string things - but otherwise I think we're good to land this and iterate.

in the binary for the case where a user specifies mulitple binary targets in a single invocation of
`./pants binary`.
:return: a :class: `PythonRequirementLibrary` containing a local wheel and its
transitive dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict, it doesn't actually include transitive dependencies here - just the surface requirement that aligns to the built wheel.

addr = Address.parse(synthetic_address)
build_graph.inject_synthetic_target(addr, PythonRequirementLibrary, requirements=local_whl_reqs)
req_libs = [build_graph.get_target(addr)]
return req_libs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here seems fine but I think this function would be more readable/maintainable if you decomposed it a bit - something along the lines of (note the renamed function name + docstring, too):

def inject_synthetic_dist_requirements(build_graph, local_built_dists, synthetic_address, binary_tgt=None):
  """Injects synthetic `PythonRequirement` objects from a `PythonDistribution` into the `BuildGraph`.
  ...
  """
  def should_create_req(bin_tgt, loc):
    if not bin_tgt:
      return True
    # <-- Add comment here about why this is necessary
    return any([tgt.id in loc for tgt in bin_tgt.closure()])

  def python_requirement_from_wheel(path):
    base = os.path.basename(path)
    whl_dir = os.path.dirname(path)
    whl_metadata = base.split('-')
    req_name = '=='.join([whl_metadata[0], whl_metadata[1]])
    return PythonRequirement(req_name, repository=whl_dir)

  local_whl_reqs = [
    python_requirement_from_wheel(whl_location)
    for whl_location in local_built_dists
    if should_create_req(binary_tgt, whl_location)
  ]

  if not local_whl_reqs:
    return []

  addr = Address.parse(synthetic_address)
  build_graph.inject_synthetic_target(addr, PythonRequirementLibrary, requirements=local_whl_reqs)
  return [build_graph.get_target(addr)]

if built_dists:
req_tgts = inject_req_libs_provided_by_setup_file(self.context.build_graph,
built_dists,
binary_tgt.invalidation_hash(), binary_tgt) + req_tgts
Copy link
Member

@kwlzn kwlzn Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally speaking, synthetic addresses would map to the path hierarchy of a given target w/ some synthetic name (e.g. //path/to/the/dir:my_injected_dist_0).. this maps to e.g. //<some_sha_here> in the root.

at a minimum I think you would want to key this into the target's path namespace, e.g. ~:

':'.join((binary_tgt.address.rel_path, binary_tgt.invalidation_hash()))

..it'd also be good to break out the last two params here each onto its own line for readability while you're there.

Copy link
Member

@kwlzn kwlzn Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at that point, its probably just as safe/sane to use a static name key like:

':'.join((binary_tgt.address.rel_path, '{}_dist_reqs'.format(binary_tgt.address.target_name))

# Note that we check for the existence of the directory, instead of for invalid_vts,
# to cover the empty case.
if not os.path.isdir(path):
with safe_concurrent_creation(path) as safe_path:
# Handle locally-built python distribution dependencies.
built_dists = self.context.products.get_data(BuildLocalPythonDistributions.PYTHON_DISTS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed something: What happens if only these local dists change? The invalidated() block won't notice that, so the requirements pex won't be rebuilt. It seems like their originating targets should participate in the invalidation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess this is why the local_dist_targets arg exists? But they aren't passed in practice, as far as I can tell. And the duplication is a bit weird. Any reason not to do the inject_synthetic_dist_requirements() when the distribution is created, i.e., in build_local_python_distributions.py? That seems more idiomatic. Then downstream tasks like this one don't have to distinguish between the different provenances of requirement libraries, and can treat them uniformly.

cosmicexplorer added a commit that referenced this pull request Jun 21, 2018
…with ctypes (#5815)

### Problem

It is currently possible to expose native code to Python by compiling it in a `python_dist()` target, specifying C or C++ source files as a `distutils.core.Extension` in the `setup.py` file, as well as in the target's sources. `python_dist()` was introduced in #5141. We introduced a "native toolchain" to compile native sources for this use case in #5490.

Exposing Python code this way requires using the Python native API and `#include <Python.h>` in your source files. However, python can also interact with native code that does not use the Python native API, using the provided `ctypes` library. For this to work, the `python_dist()` module using `ctypes` needs to have a platform-specific shared library provided within the package. This PR introduces the targets, tasks, and subsystems to compile and link a shared library from native code, then inserts it into the `python_dist()` where it is easily accessible.

### Solution

- Introduce the `ctypes_compatible_c_library()` target which covers C sources (`ctypes_compatible_cpp_library()` for C++), and can specify what to name the shared library created from linking the object files compiled from its sources and dependencies.
- Introduce `CCompile`, `CppCompile`, and `LinkSharedLibraries` to produce the shared libraries from the native sources. The compile tasks use options `CCompileSettings` or `CppCompileSettings` to define file extensions for "header" and "source" files.  
- Introduce the `CCompileSettings` and `CppCompileSettings` subsystems to control compile settings for those languages.
- Convert `BuildLocalPythonDistributions` to proxy to the native backend through the new `PythonNativeCode` subsystem.
- Move all the `BinaryTool` subsystems to a `subsystems/binaries` subdirectory, and expose them to the v2 engine through `@rule`s defined in the subsystem's file.
- Move some of the logic in `pex_build_util.py` to `setup_py.py`, and expose datatypes composing the setup.py environment through `@rule`s in `setup_py.py`. `SetupPyRunner.for_bdist_wheel()` was created to set the wheel's platform, if the `python_dist()` target contains any native sources of its own, or depends on any `ctypes_compatible_*_library`s. 

**Note:** the new targets are specifically prefixed with `ctypes_compatible_` because we don't yet eclipse the functionality of `contrib/cpp`. When the targets become usable for more than this one use case, the name should be changed.

### Result

To see how to link up native and Python code with `ctypes`, here's (most of) the contents of `testprojects/src/python/python_distribution/ctypes`:
*BUILD*:
```python
ctypes_compatible_c_library(
  name='c_library',
  sources=['some_math.h', 'some_math.c', 'src-subdir/add_three.h', 'src-subdir/add_three.c'],
  ctypes_dylib=native_artifact(lib_name='asdf-c'),
)

ctypes_compatible_cpp_library(
  name='cpp_library',
  sources=['some_more_math.hpp', 'some_more_math.cpp'],
  ctypes_dylib=native_artifact(lib_name='asdf-cpp'),
)

python_dist(
  sources=[
    'setup.py',
    'ctypes_python_pkg/__init__.py',
    'ctypes_python_pkg/ctypes_wrapper.py',
  ],
  dependencies=[
    ':c_library',
    ':cpp_library',
  ],
)
```
*setup.py*:
```python
setup(
  name='ctypes_test',
  version='0.0.1',
  packages=find_packages(),
  # Declare two files at the top-level directory (denoted by '').
  data_files=[('', ['libasdf-c.so', 'libasdf-cpp.so'])],
)
```
*ctypes_python_pkg/ctypes_wrapper.py*:
```python
import ctypes
import os


def get_generated_shared_lib(lib_name):
  # These are the same filenames as in setup.py.
  filename = 'lib{}.so'.format(lib_name)
  # The data files are in the root directory, but we are in ctypes_python_pkg/.
  rel_path = os.path.join(os.path.dirname(__file__), '..', filename)
  return os.path.normpath(rel_path)


asdf_c_lib_path = get_generated_shared_lib('asdf-c')
asdf_cpp_lib_path = get_generated_shared_lib('asdf-cpp')

asdf_c_lib = ctypes.CDLL(asdf_c_lib_path)
asdf_cpp_lib = ctypes.CDLL(asdf_cpp_lib_path)

def f(x):
  added = asdf_c_lib.add_three(x)
  multiplied = asdf_cpp_lib.multiply_by_three(added)
  return multiplied
```

Now, the target `testprojects/src/python/python_distribution/ctypes` can be depended on in a BUILD file, and other python code can freely use `from ctypes_python_pkg.ctypes_wrapper import f` to start jumping into native code.

#### Follow-up issues:
1. #5933 
2. #5934 
3. #5949
4. #5950
5. #5951
6. #5962
7. #5967
8. #5977
CMLivingston pushed a commit to CMLivingston/pants that referenced this pull request Jun 22, 2018
Closes pantsbuild#5831
Prep for release 1.8.0dev3 (pantsbuild#5937)

Ban bad `readonly` shell pattern (pantsbuild#5924)

This subverts `set -e` and means that failed commands don't fail the
script, which is responsible for late failures on CI such as
https://travis-ci.org/pantsbuild/pants/jobs/389174049 which failed to
download protoc, but only failed when something tried to use it.
Fixup macosx platform version. (pantsbuild#5938)

This needs to match the Travis CI osx platform we pre-build wheels on.

Work towards pantsbuild#4896.
Allow pants to select targets by file(s) (pantsbuild#5930)

Fixes pantsbuild#5912

Problem
There should be an option to accept literal files, and then select the targets that own those files. Similar to how the --changed-parent triggers a diff and the targets are selected based on the result.
The proposed syntax is something like:

$ ./pants \
  --owner-of=this/is/a/file/name.java \    # flag triggers owner lookup
  --owner-of=this/is/a/file/name/too.py \  # flag triggers owner lookup
  compile                                  # goal
Solution
I've created a global option --owner-of= that takes a list of files as a parameter, and created a OwnerCalculator class to handle the logic similar to how ChangeCalculator works with the --changed-* subsystem.

Result
Now users will be able to run goals on files without needing to know which target owns those files.
Also, the list-owners goal can be deprecated in favor of --owner-of=some/file.py list

It is important to note that multiple target selection methods are not allowed, so it fails when more than one of --changed-*, --owner-of, or target specs are supplied.
e.g. this fails:

$ ./pants \
  --owner-of=this/is/a/file/name.java \    # flag triggers owner lookup
  --owner-of=this/is/a/file/name/too.py \  # flag triggers owner lookup
  compile
  <another target>
Integration test for daemon environment scrubbing (pantsbuild#5893)

This shows that pantsbuild#5898 works, which itself fixed pantsbuild#5854
Add the --owner-of= usage on Target Address documentation (pantsbuild#5931)

Problem
The documentation of the feature proposed in PR pantsbuild#5930

Solution
I decided to put it inside Target Addresses because that is where a user would look if they needed a feature like this, I think.

Result
More docs, and that's always good.
Add script to get a list of failing pants from travis (pantsbuild#5946)

[jvm-compile] template-methodify JvmCompile further; add compiler choices (pantsbuild#5923)

Introduce `JvmPlatform.add_compiler_choice(name)`, which allows plugins to register compilers that can be configured.
This patch pulls out some template methods in JvmCompile to make it easier to extend. It also pushes some of the implementations of those methods down into ZincCompile, where appropriate.

These changes should be covered by existing tests, but it could make sense to add tests around the interfaces of the new template methods. I don't anticipate there being a large number of implementations at this time though, so I didn't think it'd be worth it.

Add the following template methods

* `create_empty_extra_products` Allows subclasses to create extra products that other subclasses might not need, that ought to be constructed even if no compile targets are necessary.

* `register_extra_products_from_contexts` rename of `_register_vts`. This allows subclasses to register their extra products for particular targets.
* `select_runtime_context` Not 100% happy with this, but I'm working on something that needs to have different types of compile contexts. It allows subclasses to specify a context that provides paths for the runtime classpath if the default context isn't quite right for the usages in the base class.
* `create_compile_jobs` Pulled this out into a separate method so that subclasses can create multiple graph jobs per target.

* Pushed down behavior from JvmCompile that should live in zinc via the template methods extracted above. There's probably more that could be done here, but this was the first cut of it.
* Moved the execute definition from BaseZincCompile to ZincCompile so that it's possible to subclass BaseZincCompile with a different compiler name.
release notes for 1.7.0.rc1 (pantsbuild#5942)

Use target not make_target in some tests (pantsbuild#5939)

This pushes parsing of the targets through the engine, rather than
bypassing it.

This is important because I'm about to make these targets require an
EagerFilesetWithSpec as their source/sources arg, rather than being
happy with a list of strings.
Add new remote execution options (pantsbuild#5932)

As described in pantsbuild#5904, a few configuration values that are important to testing of remote execution are currently hardcoded.

Extract existing options to a `ExecutionOptions` collection (which should become a `Subsystem` whenever we add support for consuming `Subsystems` during bootstrap), and add the new options.

Fixes pantsbuild#5904.
Separate the resolution cache and repository cache in Ivy (pantsbuild#5844)

move glob matching into its own file (pantsbuild#5945)

See pantsbuild#5871, where we describe an encapsulation leak created by implementing all of the glob expansion logic in the body of `VFS`.

- Create `glob_matching.rs`, exporting the `GlobMatching` trait, which exports the two methods `canonicalize` and `expand`, which call into methods in a private trait `GlobMatchingImplementation`.

**Note:** `canonicalize` calls `expand`, and vice versa, which is why both methods were moved to `glob_matching.rs`.

Orthogonal glob matching logic is made into a trait that is implemented for all types implementing `VFS`, removing the encapsulation leak. The `VFS` trait is now just four method signature declarations, making the trait much easier to read and understand.
Enable fromfile support for --owner-of and increase test coverage (pantsbuild#5948)

The new `--owner-of` option was broken in the context of `pantsd`, but didn't have test coverage due to the `--changed` and `--owner-of` tests not running under `pantsd`. Additionally, `fromfile` support is useful for this option, but was not enabled.

Mark some integration tests as needing to run under the daemon, and enable `fromfile` support for `--owner-of`.
[pantsd] Robustify client connection logic. (pantsbuild#5952)

Fixes pantsbuild#5812

under full-on-assault stress testing via:

$ watch -n.1 'pkill -f "pantsd \[" pantsd-runner'
this will mostly behave like:

WARN] pantsd was unresponsive on port 55620, retrying (1/3)
WARN] pantsd was unresponsive on port 55620, retrying (2/3)
WARN] pantsd was unresponsive on port 55626, retrying (3/3)
WARN] caught client exception: Fallback(NailgunExecutionError(u'Problem executing command on nailgun server (address: 127.0.0.1:55630): TruncatedHeaderError(u"Failed to read nailgun chunk header (TruncatedRead(u\'Expected 5 bytes before socket shutdown, instead received 0\',)).",)',),), falling back to non-daemon mode

23:30:24 00:00 [main]
               (To run a reporting server: ./pants server)
23:30:38 00:14   [setup]
23:30:39 00:15     [parse]
...
mid-flight terminations (simulated via single-shot pkill calls) also result in a more descriptive error with traceback proxying:

23:40:51 00:04     [zinc]
23:40:51 00:04     [javac]
23:40:51 00:04     [cpp]
23:40:51 00:04     [errorprone]
23:40:51 00:04     [findbugs]CRITICAL]
CRITICAL] lost active connection to pantsd!
Exception caught: (<class 'pants.bin.remote_pants_runner.Terminated'>)
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_loader.py", line 73, in <module>
    main()
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_loader.py", line 69, in main
    PantsLoader.run()
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_loader.py", line 65, in run
    cls.load_and_execute(entrypoint)
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_loader.py", line 58, in load_and_execute
    entrypoint_main()
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_exe.py", line 39, in main
    PantsRunner(exiter, start_time=start_time).run()
  File "/Users/kwilson/dev/pants/src/python/pants/bin/pants_runner.py", line 39, in run
    return RemotePantsRunner(self._exiter, self._args, self._env, bootstrap_options).run()
  File "/Users/kwilson/dev/pants/src/python/pants/bin/remote_pants_runner.py", line 162, in run
    self._run_pants_with_retry(port)
  File "/Users/kwilson/dev/pants/src/python/pants/java/nailgun_client.py", line 221, in execute
    return self._session.execute(cwd, main_class, *args, **environment)
  File "/Users/kwilson/dev/pants/src/python/pants/java/nailgun_client.py", line 94, in execute
    return self._process_session()
  File "/Users/kwilson/dev/pants/src/python/pants/java/nailgun_client.py", line 69, in _process_session
    for chunk_type, payload in self.iter_chunks(self._sock, return_bytes=True):
  File "/Users/kwilson/dev/pants/src/python/pants/java/nailgun_protocol.py", line 206, in iter_chunks
    chunk_type, payload = cls.read_chunk(sock, return_bytes)
  File "/Users/kwilson/dev/pants/src/python/pants/java/nailgun_protocol.py", line 182, in read_chunk
    raise cls.TruncatedHeaderError('Failed to read nailgun chunk header ({!r}).'.format(e))

Exception message: abruptly lost active connection to pantsd runner: NailgunError(u'Problem talking to nailgun server (address: 127.0.0.1:55707, remote_pid: -28972): TruncatedHeaderError(u"Failed to read nailgun chunk header (TruncatedRead(u\'Expected 5 bytes before socket shutdown, instead received 0\',)).",)',)
Re-shade zinc to avoid classpath collisions with annotation processors. (pantsbuild#5953)

Zinc used to be shaded before the `1.x.y` upgrade (pantsbuild#4729), but shading was removed due to an overabundance of optimism. When testing the zinc upgrade internally, we experienced a classpath collision between an annotation processor and zinc (in guava, although zinc has many other dependencies that could cause issues).

Shade zinc, and ensure that our annotation processor uses a very old guava in order to attempt to force collisions in future.
Improve PythonInterpreterCache logging (pantsbuild#5954)

When users have issues building their Python interpreter cache, they are often very confused because does not currently log much about the process to help users debug. Here we add log lines describing what/where Pants looks to build the interpreter cache, and the results of what it found. This should help users better understand/debug the process.
use liblzma.dylib for xz on osx and add platform-specific testing to the rust osx shard (pantsbuild#5936)

See pantsbuild#5928. The `xz` archiver wasn't tested on osx at all, and failed to find `liblzma.so` on osx (it should have been `liblzma.dylib`). There were additional errors with library search paths reported in that PR which I was not immediately able to repro. This PR hopefully fixes all of those errors, and ensures they won't happen again with the addition of platform-specific testing (see previous issue at pantsbuild#5920).

- Switch to a statically linked `xz`.
- Fix the incorrect key `'darwin'` in the platform dictionary in the `LLVM` subsystem (to `'mac'`).
- Add the tag `platform_specific_behavior` to the new python target `tests/python/pants_test/backend/python/tasks:python_native_code_testing`, which covers the production of `python_dist()`s with native code.
- Add the `-z` argument to `build-support/bin/ci.sh` to run all tests with the `platform_specific_behavior` tag. Also clean up old unused options in the getopts call, and convert echo statements to a simpler heredoc.
- Change the name of the "Rust Tests OSX" shard to "Rust + Platform-specific Tests OSX", and add the `-z` switch to the `ci.sh` invocation.

**Note:** the tests in `tests/python/pants_test/backend/native/subsystems` are going to be removed in pantsbuild#5815, otherwise they would be tagged similarly.

`./pants test tests/python/pants_test/backend/python/tasks:python_native_code_testing` now passes on osx, and this fact is now being tested in an osx shard in travis.
Support output directory saving for local process execution. (pantsbuild#5944)

Closes pantsbuild#5860
reimplement a previous PR -- ignore this

This commit is a reimplementation of registering @rules for backends, because this PR began before
that one was split off.

add some simple examples to demonstrate how to use backend rules

...actually make the changes to consume backend rules in register.py

revert accidental change to a test target

remove extraneous log statement

fix lint errors

add native backend to release.sh

isolate native toolchain path and hope for the best

add config subdir to native backend

really just some more attempts

start trying to dispatch based on platform

extend EngineInitializer to add more rules from a backend

refactor Platform to use new methods in osutil.py

refactor the native backend to be a real backend and expose rules

register a rule in the python backend to get a setup.py environment

make python_dist() tests pass

make lint pass

create tasks and targets for c/c++ sources

- refactors the "native toolchain" and introduces the "binaries" subdirectory of subsystems

start by adding a new ctypes testproject

add c/c++ sources

add example BUILD file

add some native targets

add tasks dir

remove gcc

try to start adding tasks

clean some leftover notes in BuildLocalPythonDistributions

update NativeLibrary with headers

move DependencyContext to target.py

add native compile tasks

houston we have compilation

run:

./pants -ldebug --print-exception-stacktrace compile testprojects/src/python/python_distribution/ctypes:

for an idea

use the forbidden product request

change target names to avoid conflict with cpp contrib and flesh out cpp_compile

now we are compiling code

we can link things now

now we know how to infer headers vs sources

fix the test case and fix include dir collection

(un)?suprisingly, everything works, but bdist_wheel doesn't read MANIFEST.in

houston we have c++

bring back gcc so we can compile

halfway done with osx support

now things work on osx????????

ok, now it works on linux again too

first round of review

- NB: refactors the organization of the `results_dir` for python_dist()!
- move ctypes integration testing into python backend tests

revert some unnecessary changes

refactor native_artifact to be a datatype

fix some copyright years

add ctypes integration test

add assert_single_element method in collections.py

decouple the native tools for setup.py from the execution environment

streamline the selection of the native tools for setup.py invocation

make gcc depend on binutils on linux for the 'as' assembler

fix logging visibility by moving it back into the task

make the check for the external llvm url more clear

refactor local dist building a bit

- use SetupPyRunner.DIST_DIR as the source of truth
- add a separate subdir of the python_dist target's results_dir for the
  python_dist sources
- move shraed libs into the new subdir

fix imports

second round of review

- fixed bugs
- expanded error messages and docstrings

make a couple docstring changes

fix dist platform selection ('current' is not a real platform)

lint fixes

fix broken regex which modifies the `.dylib` extension for python_dist()

fix the ctypes integration test

respond to some review comments

clear the error message if we can't find xcode cli tools

refactor the compilation and linking pipeline to use subsystems

- also add `PythonNativeCode` subsystem to bridge the native and python backends

refactor the compilation and linking pipeline to use subsystems

add some notes

fix rebase issues

add link to pantsbuild#5788 -- maybe use variants for args for static libs

move `native_source_extensions` to a new `PythonNativeCode` subsystem

update native toolchain docs and remove bad old tests

move tgt_closure_platforms into the new `PythonNativeCode` subsystem

remove unnecessary logging

remove compile_settings_class in favor of another abstractmethod

refactor `NativeCompile` and add documentation

improve debug logging in NativeCompile

document NativeCompileSettings

refactor and add docstrings

convert provides= to ctypes_dylib= and add many more docstrings

remove or improve TODOs

improve or remove FIXMEs

improve some docstrings, demote a FIXME, and add a TODO

link FIXMEs to a ticket

add notes to the ctypes testproject

update mock object for strict deps -- test passes

fix failing integration test on osx

add hack to let travis pass

fix the system_id key in llvm and add a shameful hack to pass travis

swap the order of alias_types

remove unused EmptyDepContext class

remove -settings suffix from compile subsystem options scopes

add AbstractClass to NativeLibrary

bump implementation_version for python_dist() build

- we have changed the layout of the results_dir in this PR

add ticket link and fix bug

revert indentation changes to execute() method

refactor `assert_single_element()`

revert addition of `narrow_relative_paths()`

add link to pantsbuild#5950

move common process invocation logic into NativeCompile

revert an unnecessary change

turn an assert into a full exception

revert unnecessary change

use get_local_platform() wherever possible

delete duplicate llvm subsystem

fix xcode cli tools resolution

change `ctypes_dylib=` to `ctypes_native_library=`

add a newline

move UnsupportedPlatformError to be a class field

remove unused codegen_types field

fix zinc-compiler options to be valid ones

Construct rule_graph recursively (pantsbuild#5955)

The `RuleGraph` is currently constructed iteratively, but can be more-easily constructed recursively.

Switch to constructing the `RuleGraph` recursively, and unify a few disparate diagnostic messages.

Helps to prepare for further refactoring in pantsbuild#5788.
Allow manylinux wheels when resolving plugins. (pantsbuild#5959)

Also plumb manylinux resolution support for the python backend, on by
default, but configurable via `python_setup.resolver_use_manylinux`.

Fixes pantsbuild#5958
`exclude-patterns` and `tag` should apply only to roots (pantsbuild#5786)

The `--exclude-patterns` flag currently applies to inner nodes, which causes odd errors. Moreover, tags should also apply only to roots. See pantsbuild#5189.

- added `tag` & `exclude_patterns` as params to `Specs`
- add tests for both
- modify changed tests to pass for inner node filtering

Fixes pantsbuild#5189.
Remove value wrapper on the python side of ffi. (pantsbuild#5961)

As explained in the comment in this change, the overhead of wrapping our CFFI "handle"/`void*` instances in a type that is shaped like the `Value` struct was significant enough to care about.

Since the struct has zero overhead on the rust side, whether we represent it as typedef or a struct on the python side doesn't make a difference (except in terms of syntax).

6% faster `./pants list ::` in Twitter's repo.
return an actual field

use temporary native-compile goal

Cobertura coverage: Include the full target closure's classpath entries for instrumentation (pantsbuild#5879)

Sometimes Cobertura needs access to the dependencies of class files being instrumented in order to rewrite them (pantsbuild#5878).

This patch adds an option that creates a manifest jar and adds an argument to the Cobertura call so that it can take advantage of it.

class files that need to determine a least upper bound in order to be rewritten can now be instrumented.

Fixes  pantsbuild#5878
add ticket link

fix xcode install locations and reduce it to a single dir_option

return the correct amount of path entries

Record start times per graph node and expose a method to summarize them. (pantsbuild#5964)

In order to display "significant" work while it is running on the engine, we need to compute interesting, long-running leaves.

Record start times per entry in the `Graph`, and add a method to compute a graph-aware top-k longest running leaves. We traverse the graph "longest running first", and emit the first `k` leaves we encounter.

While this will almost certainly need further edits to maximize it's usefulness, visualization can begun to be built atop of it.
Prepare the 1.8.0.dev4 release (pantsbuild#5969)

Mark a few options that should not show up in `./pants help`. (pantsbuild#5968)

`./pants help` contains core options that are useful to every pants command, and the vast majority of global options are hidden in order to keep it concise. A few non-essential options ended up there recently.

Hide them.
adding more documentation for python_app (pantsbuild#5965)

The python_app target doesn't have the documentation specific for it and has a documentation that is specific to jvm_app.

Added a few lines of documentation.

There is no system-wide change, only a documentation change.
Remove DeprecatedPythonTaskTestBase (pantsbuild#5973)

Use PythonTaskTestBase instead.

Fixes pantsbuild#5870
Chris first commit on fresh rebase

Merge branch 'ctypes-test-project' of github.com:cosmicexplorer/pants into clivingston/ctypes-test-project-third-party

unrevert reverted fix (NEEDS FOLLOWUP ISSUE!)

put in a better fix for the strict_deps error until the followup issue is made

add ticket link

Merge branch 'ctypes-test-project' of github.com:cosmicexplorer/pants into clivingston/ctypes-test-project-third-party

Shorten safe filenames further, and combine codepaths to make them readable. (pantsbuild#5971)

Lower the `safe_filename` path component length limit to 100 characters, since the previous 255 value did not account for the fact that many filesystems also have a limit on total path length. This "fixes" the issue described in pantsbuild#5587, which was caused by using this method via `build_invalidator.py`.

Additionally, merge the codepath from `Target.compute_target_id` which computes a readable shortened filename into `safe_filename`, and expand tests. This removes some duplication, and ensure that we don't run into a similar issue with target ids.

The specific error from pantsbuild#5587 should be prevented, and consumers of `safe_filename` should have safe and readable filenames.

Fixes pantsbuild#5587.
Whitelist the --owner-of option to not restart the daemon. (pantsbuild#5979)

Because the `--owner-of` option was not whitelisted as `daemon=False`, changing its value triggered unnecessary `pantsd` restarts.

Whitelist it.
Prepare the 1.8.0rc0 release. (pantsbuild#5980)

Robustify test_namespace_effective PYTHONPATH. (pantsbuild#5976)

The real problem is noted, but this quick fix should bolster against
interpreter cache interpreters pointing off to python binaries that
have no setuptools in their associated site-packages.

Fixes pantsbuild#5972
make_target upgrades sources to EagerFilesetWithSpec (pantsbuild#5974)

This better simulates how the engine parses BUILD files, giving a more
faithful experience in tests.

I'm about to make it a warning/error to pass a list of strings as the
sources arg, so this will make tests which use make_target continue to
work after that.

Also, make cloc use base class scheduler instead of configuring its own.
Lib and include as a dep-specifc location

source attribute is automatically promoted to sources (pantsbuild#5908)

This means that either the `source` or `sources` attribute can be used
for any rule which expects sources. Places that `source` was expected
still verify that the correct number of sources are actually present.
Places that `sources` was expected will automatically promote `source`
to `sources`.

This is a step towards all `sources` attributes being
`EagerFilesetWithSpec`s, which will make them cached in the daemon, and
make them easier to work with with both v2 remote execution and in the
v2 engine in general. It also provides common hooks for input file
validation, rather than relying on them being done ad-hoc in each
`Target` constructor.

For backwards compatibility, both attributes will be populated on
`Target`s, but in the future only the sources value will be provided.

`sources` is guaranteed to be an `EagerFilesetWithSpec` whichever of
these mechanisms is used.

A hook is provided for rules to perform validation on `sources` at build
file parsing time. Hooks are put in place for non-contrib rule types
which currently take a `source` attribute to verify that the correct
number of sources are provided. I imagine at some point we may want to
add a "file type" hook too, so that rules can error if files of the
wrong type were added as sources.

This is a breaking change for rules which use both the `source` and `sources` attributes (and where the latter is not equivalent to the former), or where the `source` attribute is used to refer to something other than a file. `source` is now becoming a
reserved attribute name, as `sources` and `dependencies` already are.

This is also a breaking change for rules which use the `source`
attribute, but never set `sources` in a Payload. These will now fail to
parse.

This is also a slightly breaking change for the `page` rule - before,
omitting the `source` attribute would parse, but fail at runtime. Now,
it will fail to parse.

This is also a breaking change in that in means that the source
attribute is now treated like a glob, and so if a file is specified
which isn't present, it will be ignored instead of error. This feels a
little sketchy, but it turns out we did the exact same thing by making
all sources lists be treated like globs...
Override get_sources for pants plugins (pantsbuild#5984)

1.7.0 release notes (pantsbuild#5983)

No additional changes, so it's a very short release note.
Fixups for native third party work

hardcode in c/c++ language levels for now

remove all the unnecessary code relating to file extensions

Merge branch 'ctypes-test-project' of github.com:cosmicexplorer/pants into clivingston/ctypes-test-project-third-party

Caching tests are parsed through the engine (pantsbuild#5985)

reimplement a previous PR -- ignore this

This commit is a reimplementation of registering @rules for backends, because this PR began before
that one was split off.

add some simple examples to demonstrate how to use backend rules

...actually make the changes to consume backend rules in register.py

revert accidental change to a test target

remove extraneous log statement

fix lint errors

add native backend to release.sh

isolate native toolchain path and hope for the best

add config subdir to native backend

really just some more attempts

start trying to dispatch based on platform

extend EngineInitializer to add more rules from a backend

refactor Platform to use new methods in osutil.py

refactor the native backend to be a real backend and expose rules

register a rule in the python backend to get a setup.py environment

make python_dist() tests pass

make lint pass

create tasks and targets for c/c++ sources

- refactors the "native toolchain" and introduces the "binaries" subdirectory of subsystems

start by adding a new ctypes testproject

add c/c++ sources

add example BUILD file

add some native targets

add tasks dir

remove gcc

try to start adding tasks

clean some leftover notes in BuildLocalPythonDistributions

update NativeLibrary with headers

move DependencyContext to target.py

add native compile tasks

houston we have compilation

run:

./pants -ldebug --print-exception-stacktrace compile testprojects/src/python/python_distribution/ctypes:

for an idea

use the forbidden product request

change target names to avoid conflict with cpp contrib and flesh out cpp_compile

now we are compiling code

we can link things now

now we know how to infer headers vs sources

fix the test case and fix include dir collection

(un)?suprisingly, everything works, but bdist_wheel doesn't read MANIFEST.in

houston we have c++

bring back gcc so we can compile

halfway done with osx support

now things work on osx????????

ok, now it works on linux again too

first round of review

- NB: refactors the organization of the `results_dir` for python_dist()!
- move ctypes integration testing into python backend tests

revert some unnecessary changes

refactor native_artifact to be a datatype

fix some copyright years

add ctypes integration test

add assert_single_element method in collections.py

decouple the native tools for setup.py from the execution environment

streamline the selection of the native tools for setup.py invocation

make gcc depend on binutils on linux for the 'as' assembler

fix logging visibility by moving it back into the task

make the check for the external llvm url more clear

refactor local dist building a bit

- use SetupPyRunner.DIST_DIR as the source of truth
- add a separate subdir of the python_dist target's results_dir for the
  python_dist sources
- move shraed libs into the new subdir

fix imports

second round of review

- fixed bugs
- expanded error messages and docstrings

make a couple docstring changes

fix dist platform selection ('current' is not a real platform)

lint fixes

fix broken regex which modifies the `.dylib` extension for python_dist()

fix the ctypes integration test

respond to some review comments

clear the error message if we can't find xcode cli tools

refactor the compilation and linking pipeline to use subsystems

- also add `PythonNativeCode` subsystem to bridge the native and python backends

refactor the compilation and linking pipeline to use subsystems

add some notes

fix rebase issues

add link to pantsbuild#5788 -- maybe use variants for args for static libs

move `native_source_extensions` to a new `PythonNativeCode` subsystem

update native toolchain docs and remove bad old tests

move tgt_closure_platforms into the new `PythonNativeCode` subsystem

remove unnecessary logging

remove compile_settings_class in favor of another abstractmethod

refactor `NativeCompile` and add documentation

improve debug logging in NativeCompile

document NativeCompileSettings

refactor and add docstrings

convert provides= to ctypes_dylib= and add many more docstrings

remove or improve TODOs

improve or remove FIXMEs

improve some docstrings, demote a FIXME, and add a TODO

link FIXMEs to a ticket

add notes to the ctypes testproject

update mock object for strict deps -- test passes

fix failing integration test on osx

add hack to let travis pass

fix the system_id key in llvm and add a shameful hack to pass travis

swap the order of alias_types

remove unused EmptyDepContext class

remove -settings suffix from compile subsystem options scopes

add AbstractClass to NativeLibrary

bump implementation_version for python_dist() build

- we have changed the layout of the results_dir in this PR

add ticket link and fix bug

revert indentation changes to execute() method

refactor `assert_single_element()`

revert addition of `narrow_relative_paths()`

add link to pantsbuild#5950

move common process invocation logic into NativeCompile

revert an unnecessary change

turn an assert into a full exception

revert unnecessary change

use get_local_platform() wherever possible

delete duplicate llvm subsystem

fix xcode cli tools resolution

change `ctypes_dylib=` to `ctypes_native_library=`

add a newline

move UnsupportedPlatformError to be a class field

remove unused codegen_types field

fix zinc-compiler options to be valid ones

return an actual field

use temporary native-compile goal

add ticket link

fix xcode install locations and reduce it to a single dir_option

return the correct amount of path entries

unrevert reverted fix (NEEDS FOLLOWUP ISSUE!)

put in a better fix for the strict_deps error until the followup issue is made

add ticket link

hardcode in c/c++ language levels for now

remove all the unnecessary code relating to file extensions

fix osx failures and leave a ticket link

Add rang header-only lib for integration testing

Merge branch 'ctypes-test-project' of github.com:cosmicexplorer/pants into clivingston/ctypes-test-project-third-party

Fix TestSetupPyInterpreter.test_setuptools_version (pantsbuild#5988)

Previously the test failed to populate the `PythonInterpreter` data
product leading to a fallback to the current non-bare interpreter which
allowed `setuptools` from `site-packages` to leak in.

Fixes pantsbuild#5467
Refactor conan grab into subsystem

Engine looks up default sources when parsing (pantsbuild#5989)

Rather than re-implementing default source look-up.

This pushes sources parsing of default sources through the engine, in parallel,
rather than being later synchronous python calls.

This also works for plugin types, and doesn't change any existing APIs.

It updates the Go patterns to match those that the engine currently performs,
rather than ones which aren't actually used by any code.
Add unit tests, refactor

C/C++ targets which can be compiled/linked and used in python_dist() with ctypes (pantsbuild#5815)

It is currently possible to expose native code to Python by compiling it in a `python_dist()` target, specifying C or C++ source files as a `distutils.core.Extension` in the `setup.py` file, as well as in the target's sources. `python_dist()` was introduced in pantsbuild#5141. We introduced a "native toolchain" to compile native sources for this use case in pantsbuild#5490.

Exposing Python code this way requires using the Python native API and `#include <Python.h>` in your source files. However, python can also interact with native code that does not use the Python native API, using the provided `ctypes` library. For this to work, the `python_dist()` module using `ctypes` needs to have a platform-specific shared library provided within the package. This PR introduces the targets, tasks, and subsystems to compile and link a shared library from native code, then inserts it into the `python_dist()` where it is easily accessible.

- Introduce the `ctypes_compatible_c_library()` target which covers C sources (`ctypes_compatible_cpp_library()` for C++), and can specify what to name the shared library created from linking the object files compiled from its sources and dependencies.
- Introduce `CCompile`, `CppCompile`, and `LinkSharedLibraries` to produce the shared libraries from the native sources. The compile tasks use options `CCompileSettings` or `CppCompileSettings` to define file extensions for "header" and "source" files.
- Introduce the `CCompileSettings` and `CppCompileSettings` subsystems to control compile settings for those languages.
- Convert `BuildLocalPythonDistributions` to proxy to the native backend through the new `PythonNativeCode` subsystem.
- Move all the `BinaryTool` subsystems to a `subsystems/binaries` subdirectory, and expose them to the v2 engine through `@rule`s defined in the subsystem's file.
- Move some of the logic in `pex_build_util.py` to `setup_py.py`, and expose datatypes composing the setup.py environment through `@rule`s in `setup_py.py`. `SetupPyRunner.for_bdist_wheel()` was created to set the wheel's platform, if the `python_dist()` target contains any native sources of its own, or depends on any `ctypes_compatible_*_library`s.

**Note:** the new targets are specifically prefixed with `ctypes_compatible_` because we don't yet eclipse the functionality of `contrib/cpp`. When the targets become usable for more than this one use case, the name should be changed.

To see how to link up native and Python code with `ctypes`, here's (most of) the contents of `testprojects/src/python/python_distribution/ctypes`:
*BUILD*:
```python
ctypes_compatible_c_library(
  name='c_library',
  sources=['some_math.h', 'some_math.c', 'src-subdir/add_three.h', 'src-subdir/add_three.c'],
  ctypes_dylib=native_artifact(lib_name='asdf-c'),
)

ctypes_compatible_cpp_library(
  name='cpp_library',
  sources=['some_more_math.hpp', 'some_more_math.cpp'],
  ctypes_dylib=native_artifact(lib_name='asdf-cpp'),
)

python_dist(
  sources=[
    'setup.py',
    'ctypes_python_pkg/__init__.py',
    'ctypes_python_pkg/ctypes_wrapper.py',
  ],
  dependencies=[
    ':c_library',
    ':cpp_library',
  ],
)
```
*setup.py*:
```python
setup(
  name='ctypes_test',
  version='0.0.1',
  packages=find_packages(),
  # Declare two files at the top-level directory (denoted by '').
  data_files=[('', ['libasdf-c.so', 'libasdf-cpp.so'])],
)
```
*ctypes_python_pkg/ctypes_wrapper.py*:
```python
import ctypes
import os

def get_generated_shared_lib(lib_name):
  # These are the same filenames as in setup.py.
  filename = 'lib{}.so'.format(lib_name)
  # The data files are in the root directory, but we are in ctypes_python_pkg/.
  rel_path = os.path.join(os.path.dirname(__file__), '..', filename)
  return os.path.normpath(rel_path)

asdf_c_lib_path = get_generated_shared_lib('asdf-c')
asdf_cpp_lib_path = get_generated_shared_lib('asdf-cpp')

asdf_c_lib = ctypes.CDLL(asdf_c_lib_path)
asdf_cpp_lib = ctypes.CDLL(asdf_cpp_lib_path)

def f(x):
  added = asdf_c_lib.add_three(x)
  multiplied = asdf_cpp_lib.multiply_by_three(added)
  return multiplied
```

Now, the target `testprojects/src/python/python_distribution/ctypes` can be depended on in a BUILD file, and other python code can freely use `from ctypes_python_pkg.ctypes_wrapper import f` to start jumping into native code.

1. pantsbuild#5933
2. pantsbuild#5934
3. pantsbuild#5949
4. pantsbuild#5950
5. pantsbuild#5951
6. pantsbuild#5962
7. pantsbuild#5967
8. pantsbuild#5977
Add simple integration test

Pull in master

Minor cleanup

Fix CI errors

Debug log stdout

Fix integration test

Fix integration test

Fix lint error on third party
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants