Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monorepo versioning for python libraries using pants #10633

Closed
adabuleanu opened this issue Aug 17, 2020 · 11 comments
Closed

monorepo versioning for python libraries using pants #10633

adabuleanu opened this issue Aug 17, 2020 · 11 comments
Assignees

Comments

@adabuleanu
Copy link

adabuleanu commented Aug 17, 2020

Hi all,

Does pants support any versioning for python libraries? From what I have read, you have to update the version number for each python library in the BUILD files, under the setup_py construct
e.g.

python_library(
    name = "my_lib",
    sources = ['**/*.py'],
    provides=setup_py(
        name="my_lib",
        description="My Lib",
        version="X.Y.Z",
    )
)

This because tedious and error prone when you have multiple internal python library dependencies.
e.g. : you have lib A -depends-> lib B -depends-> lib C . If you modify lib C and you want to build lib A, lib B and lib C, you have to update the version in the BUILD files for each of the 3 libraries. If not, you will overwrite existing artifacts.

Or at least this is my understanding after reading the documentation. I found something related to semantic version in the docs, but it is only available for java artifacts https://v1.pantsbuild.org/publish.html. I know versioning inside a monorepo is a hot topic when you move to a monorepo, but I was wondering how pants handles this for python libraries.

I am just starting with pants and I am currently using the latest stable version (1.30)

Sorry if I did not respect any ticket format, but I did not found any guide on this.

Thank you,
Adrian

@Eric-Arellano
Copy link
Contributor

Hello @adabuleanu, great question.

This is indeed possible with Pants through creating a macro: https://www.pantsbuild.org/v2.0/docs/macros. You could create a macro like this:

def my_org_setup_py(**kwargs):
     if "version" not in kwargs:
         kwargs["version"] = "2.1.1"
     return setup_py(**kwargs)

Then, in your BUILD file:

python_library(
  ...
  provides=my_org_setup_py(
       name=...
  )

Would this work? There's another option if you want to define the version in some other file than your macro definition file, such as setting up a VERSION file, but it's more complex.

@adabuleanu
Copy link
Author

I could not get pants macros to work. I have put build_file_prelude_globs = ["pants-plugins/macros.py"] in pants.toml according to the documentation, but still get the error NameError("name 'scm_setup_py' is not defined",).

My pants.toml

[GLOBAL]
pants_version = "1.30.0"
v1 =  false  # Turn off the v1 execution engine.
v2 = true  # Enable the v2 execution engine.
pantsd = true  # Enable the Pants daemon for better performance.
print_exception_stacktrace = true

backend_packages = []  # Deregister all v1 backends.

# List v2 backends here.
backend_packages2.add = [
  'pants.backend.python',
  'pants.backend.python.lint.docformatter',
  'pants.backend.python.lint.black',
  # 'pants.backend.python.typecheck.mypy',
  'pants.backend.python.lint.flake8',
  'pants.backend.python.lint.isort',
  'pants.backend.python.lint.pylint',
]

# List v2 plugins here.
plugins2 = []

build_file_prelude_globs = [
  "pants-plugins/macros.py",
]

[source]
root_patterns = [
  "/src",
  "/tests",
  "3rdparty"
]

[python-setup]
interpreter_constraints = [">=3.6"]
requirement_constraints = "3rdparty/python/constraints.txt"

[python-repos]
indexes = ["...."]

[test]
use_coverage = true

[black]
config = "pyproject.toml"

[docformatter]
args = ["--wrap-summaries=100", "--wrap-descriptions=100"]

# [mypy]
# config = "mypy.ini"

[flake8]
config = ".flake8"

[isort]
config = ".isort.cfg"

[pylint]
config = ".pylintrc"

My pants-build/macros.py

def scm_setup_py(**kwargs):
    print("test")
    if "version" not in kwargs:
        kwargs["version"] = "2.1.1"
    return setup_py(**kwargs)

and my BUILD file

python_library(
    name = "my-lib",
    sources = ['**/*.py'],
    dependencies = [
        "3rdparty/python:requests",
        "3rdparty/python:kubernetes",
        "3rdparty/python:semver",
        "3rdparty/python:ansible",
        "3rdparty/python:urllib3",
		"3rdparty/python:colorama",
		"3rdparty/python:dotty-dict",
		"3rdparty/python:distro",
		"3rdparty/python:typing-extensions",
    ],
    provides=scm_setup_py(
        name="my_lib",
        description="My lib",
    )
)

Macros seem to be helpful to set up automation on versioning. But I was interested in a pants feature that accomplishes this out of the box. Basically, something like: if you change lib C, during the setup-py build it will bump the semver patch version and it will also bump the semver versions for other dependent libraries (lib B and lib C) during build. Something similar to how java artifacts are handled during publish: https://v1.pantsbuild.org/publish.html.
Don't care about publishing (will use twine for that as it is decoupled from the build process), but need this during the build stage "The publishing mechanism uses Semantic Versioning ("semver") for versioning. Versions are dotted number triples (e.g., 2.5.6); when Pants bumps a version, it specifically bumps the patch number part. Thus, if the current version is 2.5.6, Pants bumps to 2.5.7. To publish a minor or major version instead of a patch, you override the version number on the command line."

@Eric-Arellano
Copy link
Contributor

Hm, I gave you a bad recommendation on using a macro like the above. Normally, macros are meant for creating target types, like python_library. Note in the docs that there is no return statement, as calling python_library(**kwargs) triggers a side effect to create a target.

Instead, setup_py is what we call an "object", which is literally a Python object and is not a normal target. Objects aren't documented in our plugin docs because we were hoping to redesign them soon, but I think we should document them now. It turns out that objects and macros do not play nicely together at all. I'm sorry I didn't test that out before giving you the recommendation!

This should work instead.

# pants-plugins/setup_py/register.py
from pants.backend.python.python_artifact import PythonArtifact
from pants.build_graph.build_file_aliases import BuildFileAliases

def scm_setup_py(version: str = "2.1.1", **kwargs) -> PythonArtifact:
    return PythonArtifact(version=version, **kwargs)

def build_file_aliases():
    return BuildFileAliases(objects={"scm_setup_py": scm_setup_py})

This is inspired by

def pants_setup_py(
name: str, description: str, additional_classifiers: Optional[List[str]] = None, **kwargs
) -> PythonArtifact:
"""Creates the setup_py for a Pants artifact.
:param name: The name of the package.
:param description: A brief description of what the package provides.
:param additional_classifiers: Any additional trove classifiers that apply to the package,
see: https://pypi.org/pypi?%3Aaction=list_classifiers
:param kwargs: Any additional keyword arguments to be passed to `setuptools.setup
<https://pythonhosted.org/setuptools/setuptools.html>`_.
:returns: A setup_py suitable for building and publishing Pants components.
"""
if not name.startswith("pantsbuild.pants"):
raise ValueError(
f"Pants distribution package names must start with 'pantsbuild.pants', given {name}"
)
standard_classifiers = [
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
# We know for a fact these OSs work but, for example, know Windows
# does not work yet. Take the conservative approach and only list OSs
# we know pants works with for now.
"Operating System :: MacOS :: MacOS X",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python",
"Topic :: Software Development :: Build Tools",
]
classifiers = FrozenOrderedSet(standard_classifiers + (additional_classifiers or []))
notes = PantsReleases.global_instance().notes_for_version(PANTS_SEMVER)
return PythonArtifact(
name=name,
version=VERSION,
description=description,
long_description=Path("src/python/pants/ABOUT.rst").read_text() + notes,
long_description_content_type="text/x-rst",
url="https://github.com/pantsbuild/pants",
project_urls={
"Documentation": "https://www.pantsbuild.org/",
"Source": "https://github.com/pantsbuild/pants",
"Tracker": "https://github.com/pantsbuild/pants/issues",
},
license="Apache License, Version 2.0",
zip_safe=True,
classifiers=list(classifiers),
**kwargs,
)

which shows some more complex things you could do like setting default values. Because you can have imports, you would be able to use open() here to read from an external file like. You'd want that file to live in the build root, and then to use get_buildroot() from pants.base.environment, like this so that the path to the file is correct:

from pathlib import Path
from pants.base.environment import get_buildroot

def scm_setup_py(...):
    version = Path(get_buildroot(), "_VERSIONS").read_text()

Activate this plugin in your pants.toml by following the instructions at https://www.pantsbuild.org/v2.0/docs/plugins-overview#enabling-plugins.

Then, you can use scm_setup_py in your BUILD files in lieu of setup_py.

--

The above approach wouldn't fully solve what you're asking for, though. The above "object" will allow you to do things like set the version based on the distribution name (your approach #1 in Slack), but, it should be thought of as read-only. It is not safe to actually write because the above function definition will get called whenever your BUILD file is parsed, rather than only being called when you run setup-py.

If you want the more complex functionality, you'd want to set up a custom goal using the Rules API, which will allow you to do things like:

a) Safely write to the build root
b) Evaluate the dependencies of the distribution, so that you can consider all dependent distributions when calculating the version.

We'd be happy to help you write this plugin.

@adabuleanu
Copy link
Author

Thx. I got my setup running. I've hit a blocker regarding the mechanism on how pants calls the version function. This is my version implementation per python libraries:

BUILD file

python_library(
    name = "nci-core-lib",
    sources = ['**/*.py'],
    dependencies = [
...
    ],
    provides=scm_setup_py(
        name="nci_core",
        description="NCI Core",
        path="src/python/nci_core"
    )
)

scm_setup_py implementation in register.py

from pants.backend.python.python_artifact import PythonArtifact
from pants.build_graph.build_file_aliases import BuildFileAliases
from git import Repo
from git.exc import InvalidGitRepositoryError
from semver import VersionInfo
import os
import logging

def scm_setup_py(name: str, path: str, **kwargs) -> PythonArtifact:

    repo_path = get_repo_path(os.getcwd())
    version = get_version(repo_path, path)
    logging.warning(f"version: {version}")
        
    return PythonArtifact(name=name, version=version, **kwargs)

def build_file_aliases():
    return BuildFileAliases(objects={"scm_setup_py": scm_setup_py})

def get_version(repo_path: str, subtree_path: str) -> str:
    """
    Function for returning the version based on the latest Git Tag on a subtree
    :type repo_path: str
    :type subtree_path: str
    :return: str
    """
    logging.warning(f"subtree path: {subtree_path}")
    repo = Repo(repo_path)
    subtree_commit_last = repo.git.log("--pretty=%H", "-n", "1", "--", subtree_path)
    logging.warning(f"subtree latest commit id: {subtree_commit_last}")
    subtree_tag_last = repo.git.describe("--abbrev=0", subtree_commit_last)
    logging.warning(f"subtree latest tag: {subtree_tag_last}")

    is_dirty: bool = repo.is_dirty(path=subtree_path)
    logging.warning(f"subtree dirty: {is_dirty}")

    commit_deltas = sum(1 for _ in repo.iter_commits(f"{subtree_tag_last}..{subtree_commit_last}"))
    logging.warning(f"subtree commit deltas: {commit_deltas}")

    commit_sha: str = repo.commit(subtree_commit_last).hexsha[0:7]
    logging.warning(f"subtree sha: {commit_sha}")

    if not is_dirty:
        if commit_deltas == 0:
            # Case 1: User commits the changes and adds a git tag
            return str(subtree_tag_last)

        # Case 2: User commits the changes without creating a new tag
        return f"{str(VersionInfo.parse(subtree_tag_last).bump_patch())}-{commit_deltas}-g{commit_sha}"

    # Case 3: User doens't commit the changes
    return f"{str(VersionInfo.parse(subtree_tag_last).bump_patch())}-{commit_deltas}-g{commit_sha}-dirty"


def get_repo_path(repo_path: str) -> str:
    """
    Get the path for the repository
    :type repo_path: str
    :return: str
    """
    try:
        Repo(repo_path)
    except InvalidGitRepositoryError:
        return get_repo_path(os.path.join(repo_path, ".."))

    return repo_path

It looks like the versioning function does not work when pants "caches" setup-py runs. See below example:

# get the latest commit per subtree "src/python/nci_core"
$ git log -- src/python/nci_core | head -1
commit 1f92d3f134b22dd0504b16af96ca51fa8e762e1a

# get version for subtree
$ git describe 1f92d3f134b22dd0504b16af96ca51fa8e762e1a
10.0.0-3-g1f92d3f1

# see if pants versioning works (my version is correct because I bump the tag with a patch: 10.0.1-3-g1f92d3f)
$ ./pants setup-py src/python/nci_core:nci-core-lib
20:01:51 [INFO] initializing pantsd...
20:01:52 [INFO] pantsd initialized.
20:01:52.65 [WARN] subtree path: src/python/nci_core
20:01:52.66 [WARN] subtree latest commit id: 1f92d3f134b22dd0504b16af96ca51fa8e762e1a
20:01:52.67 [WARN] subtree latest tag: 10.0.0
20:01:52.68 [WARN] subtree dirty: False
20:01:52.69 [WARN] subtree commit deltas: 3
20:01:52.70 [WARN] subtree sha: 1f92d3f
20:01:52.70 [WARN] version: 10.0.1-3-g1f92d3f
20:01:52.76 [INFO] Completed: Find all code to be published in the distribution
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-3-g1f92d3f

# run it again without any changes (looks like doing some "caching")
$ ./pants setup-py src/python/nci_core:nci-core-lib
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-3-g1f92d3f

# make a change to subtree (BUILD file)
$ echo "" >> src/python/nci_core/BUILD

# check what git says (looks like an unstagged file)
$ git status
modified:   src/python/nci_core/BUILD

# run pants again (should generate version 10.0.1-3-g1f92d3f-dirty)
$ ./pants setup-py src/python/nci_core:nci-core-lib
20:11:18.06 [WARN] subtree path: src/python/nci_core
20:11:18.14 [WARN] subtree latest commit id: 1f92d3f134b22dd0504b16af96ca51fa8e762e1a
20:11:18.16 [WARN] subtree latest tag: 10.0.0
20:11:18.18 [WARN] subtree dirty: True
20:11:18.19 [WARN] subtree commit deltas: 3
20:11:18.20 [WARN] subtree sha: 1f92d3f
20:11:18.20 [WARN] version: 10.0.1-3-g1f92d3f-dirty
20:11:18.22 [INFO] Completed: Find all code to be published in the distribution
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-3-g1f92d3f-dirty

# commit change
$ git add src/python/nci_core/BUILD
$ git commit -m "test"

# get the latest commit per subtree
$ git log -- src/python/nci_core | head -1
commit fff44f36fa7acd3488c34f013d83654d57f731ca

# get version for subtree
$ git describe fff44f36fa7acd3488c34f013d83654d57f731ca
10.0.0-4-gfff44f36


# run pants again (should generate version 10.0.1-4-gfff44f36)
$ ./pants setup-py src/python/nci_core:nci-core-lib
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-3-g1f92d3f-dirty

# and it doesn't - it looks like still showing/running the previous ("cached") result

# removing dist directory does not help
$ rm -rf dist
$ ./pants setup-py src/python/nci_core:nci-core-lib
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-3-g1f92d3f-dirty

# how about modifying the version function?
$ echo "" >> pants-plugins/src/python/utilities/register.py 
$ ./pants setup-py src/python/nci_core:nci-core-lib
20:18:10 [INFO] initializing pantsd...
20:18:11 [INFO] pantsd initialized.
20:18:11.33 [WARN] subtree path: src/python/nci_core
20:18:11.34 [WARN] subtree latest commit id: fff44f36fa7acd3488c34f013d83654d57f731ca
20:18:11.35 [WARN] subtree latest tag: 10.0.0
20:18:11.37 [WARN] subtree dirty: False
20:18:11.37 [WARN] subtree commit deltas: 4
20:18:11.38 [WARN] subtree sha: fff44f3
20:18:11.38 [WARN] version: 10.0.1-4-gfff44f3
20:18:11.44 [INFO] Completed: Find all code to be published in the distribution
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-4-gfff44f

# this looks to be working, but why?

# more examples
# modify a file in the subtree
$ echo "" >> src/python/nci_core/common/constants.py

# confirm it is an unstaged file
$ git status
src/python/nci_core/common/constants.py

# what does pants say (version should be 10.0.1-4-gfff44f3-dirty)
./pants setup-py src/python/nci_core:nci-core-lib
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-4-gfff44f3

# and is not, let's do the update in the versioning function again
$ echo "" >> pants-plugins/src/python/utilities/register.py 

# let's see if pants does it right this time
$ ./pants setup-py src/python/nci_core:nci-core-lib
20:21:57 [INFO] initializing pantsd...
20:21:58 [INFO] pantsd initialized.
20:21:58.71 [WARN] subtree path: src/python/nci_core
20:21:58.73 [WARN] subtree latest commit id: fff44f36fa7acd3488c34f013d83654d57f731ca
20:21:58.74 [WARN] subtree latest tag: 10.0.0
20:21:58.75 [WARN] subtree dirty: True
20:21:58.76 [WARN] subtree commit deltas: 4
20:21:58.76 [WARN] subtree sha: fff44f3
20:21:58.76 [WARN] version: 10.0.1-4-gfff44f3-dirty
20:21:58.82 [INFO] Completed: Find all code to be published in the distribution
Writing setup.py chroot for src/python/nci_core:nci-core-lib to dist/nci_core-10.0.1-4-gfff44f3-dirty

# and it does

Also, is there any way of "deducting" the library path from the BUILD file (something similar to %(buildroot) used in pants.toml or inside the custom scm_setup_py function?
This is how I have it now, but I am looking into a way to get it from pants.

    provides=scm_setup_py(
        name="nci_core",
        description="NCI Core",
        path="src/python/nci_core"
    )

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Aug 21, 2020

Hey @adabuleanu, thanks for sharing all this code. It's helpful to see in code the logic that you're aiming for.

The caching issue sounds like an instance of #10360. I'll ask another contributor if fixing #10360 would fix this issue. Otherwise, you'd want to create a plugin using the Rules API. https://www.pantsbuild.org/v2.0/docs/rules-api-concepts. Once I hear back from the contributor, I'll be happy to help you port the above logic into a plugin.

is there any way of "deducting" the library path from the BUILD file

There is, with a little clunky of an API called "Context Aware Object Factory" (CAOF). You'd have something like this:

from pants.base.build_environment import get_buildroot
from pants.backend.python.python_artifact import PythonArtifact
from pants.build_graph.build_file_aliases import BuildFileAliases

class ScmSetupPy:
    def __init__(self, parse_context):
        self._parse_context = parse_context

    def __call__(self, name: str, **kwargs) -> PythonArtifact:
        build_root = get_buildroot()
        # This is relative to the build root, e.g. `src/python/demo`.
        setup_py_rel_path = self._parse_context.rel_path
        version = get_version(build_root, setup_py_rel_path)
        return PythonArtifact(name=name, version=version, **kwargs)


def build_file_aliases():
    return BuildFileAliases(context_aware_object_factories={"scm_setup_py": ScmSetupPy})

Note that this also uses the function get_buildroot() rather than os.getcwd().

@benjyw
Copy link
Contributor

benjyw commented Aug 24, 2020

@adabuleanu Eric is out for a couple of days, so I will pick this up.

@benjyw
Copy link
Contributor

benjyw commented Aug 24, 2020

Thanks for the detailed information above, that made it easy to figure out why you're seeing what you're seeing.

To explain the caching problem: Pants only runs your get_version() function when it parses the BUILD file, and if the BUILD file hasn't changed then it won't reparse it. It does reparse when you modify the get_version() function, because the pants daemon (pantsd) restarts when your plugin code changes, and so the BUILD graph (which is cached in memory in pantsd) has to be recreated.

So what we need is to figure out a way to invalidate correctly here. Will think about it for a bit and reply here.

@benjyw
Copy link
Contributor

benjyw commented Aug 26, 2020

OK, I think I have an idea here. @Eric-Arellano let's discuss Friday?

@benjyw
Copy link
Contributor

benjyw commented Sep 1, 2020

Update: we've sketched out a plan to make it easy to use custom logic to generate a version (or any other setup() kwarg). Will post here when it's ready for use, within the next couple of days.

@Eric-Arellano Eric-Arellano self-assigned this Sep 1, 2020
Eric-Arellano added a commit that referenced this issue Sep 1, 2020
We often request a target in tests for some address. This is common enough to be worth factoring up.

We also port `run_setup_py_test.py` to use Pytest style, and to use this new util. In addition, we make some small improvements to `run_setup_py.py`. This is prework for #10633.
@benjyw
Copy link
Contributor

benjyw commented Sep 17, 2020

Hey @adabuleanu , @Eric-Arellano, just wondering where this is at right now?

@Eric-Arellano
Copy link
Contributor

Closing thanks to the plugin hook at https://www.pantsbuild.org/docs/plugins-setup-py. A couple users have used this, and it seems there are far too many different versioning schemes for Pants to ship with one canonical approach.

Thanks for prompting us to add this hook, Adrian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants