Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clientless pom #8

Closed
wants to merge 6 commits into from
Closed

Conversation

wfscheper
Copy link
Contributor

@pombredanne

Look over the API changes and let me know if you think this works for your use case. I'm still debating whether I really need to require a coordinate string, so I'd like to hear from you on that.

wfscheper added 3 commits June 7, 2017 16:00
Remove conary-isms and use pbr.
Use six for compatibility between python2 and python3 and tox for
running tests. Biggest source of actual code change is around supporting
rich comparision methods in python3.
Adds the classmethods Pom.fromstring and Pom.parse. These are modeled on
the lxml.etree functions of the same name. If a MavenClient is passed
in, then the resulting Pom object will attempt to dynamically load
data from external POM files available via the client.

Sem-Ver: feature
Fixes: sassoftware#4
@pombredanne
Copy link
Contributor

@wfscheper Let me review this in details tomorrow.

@@ -75,7 +75,7 @@ def __init__(self, cacheDir=None):
if cacheDir is None:
cacheDir = "/tmp/%s-maven-cache" % getpass.getuser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is POSIX specific. I think you could use tempfile.gettempdir() for portability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will fix this.

setup.cfg Outdated

[bdist_wheel]
universal = 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your would need to add license_file = LICENSE to ensure that the built wheel contains a proper license.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pbr handles this for me, but I'll double check that the LICENSE file is installed and fix it if its not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a pleasant surprise if pbr does this... Note also that AFAIK pbr is mostly obsolete now that the latest setuptools support extensive, declarative metadata in setup.cfg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, and I added the license_file entry to my setup.cfg.

pbr's main role is to generate the AUTHORS file and let us manage versioning through commit history.

I'm less enthused by its use of metadata in setup.cfg given that its basically a one-to-one rewrite of a setup.py. You're not really gaining much, though it is nice to have one place to put a lot of configuration. pbr also generates a changelog for you, but it's just a dump of your commit history which isn't terribly useful as a changelog.

@pombredanne
Copy link
Contributor

pombredanne commented Jun 8, 2017

@wfscheper you wrote:

I'm still debating whether I really need to require a coordinate string, so I'd like to hear from you on that.

So there are eventually two use cases:

  1. I provide coordinates and some mechanism to fetch and resolve data (e..g. a client)
  2. I provide a POM file

For case 2, there is no such thing as coordinates known ahead of time: these are in the POM and unknown until after parsing

So what could possibly work is this:
Update the constructor in https://github.com/wfscheper/pymaven/blob/01d41e6011e3f5281380230cb7bdbe17bab2c88e/pymaven/pom.py#L62 to:
def __init__(self, coordinate=None, client=None, pom_data=None):

Then something more or less like this:

  • parse pom_data if provided. Also extract the group_id, artifact_id and version from it.
  • if coordinates are provided... then check these are coherent with the parse (eventually dealing with properties? )

In all cases I never know coordinates ahead of time so in any subclass I would need to replicate most of the __init__ method.

return self._pom_factory(group, artifact, version)

@property
@memoize("_pom_data")
def pom_data(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get why you are trying to recreate some XML here... why? If I provided an actual string of XML as an input why would you recreate this to then reparse? (and as a side note this also implies that coordinates were provided)

Copy link
Contributor Author

@wfscheper wfscheper Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.pom_data is a property (a method that is used as if it were an attribute). The memoize decorator ensures that we only call self.pom_data() once, and store the result in self._pom_data.

If you used Pom.parse or Pom.fromstring to create the Pom object, then self.pom_data() is never actually called, the memoize decorator just returns the value of self._pom_data.

If you used Pom() and didn't pass a client, then I use the EMPTY_POM template and fill it out with the values from the coordinate you passed in. If you did pass in a client, then I use the client to get the POM data from the repository. Make sense?

Copy link
Contributor

@pombredanne pombredanne Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfscheper It does make sense, but I cannot see any use case where you have as an input coordinates but no pom file and no client. The library is useless in this case so this should not even be allowed: you cannot do anything of value.

from .versioning import VersionRange


EMPTY_POM = """\
Copy link
Contributor

@pombredanne pombredanne Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebuilding a POM as XML is likely to be rather problematic: I may have provided deps and this would ignore them. And if I want to extend this with extra attributes for #3 my hunch that this will complicate things rather badly... Why would you need this?

Copy link
Contributor Author

@wfscheper wfscheper Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is that I would use this when I'm provided with coordinates but no client or xml. Would it be clearer if I called it MINIMAL_POM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combo of coordinates with no client and no xml should be an error IMHO.

wfscheper and others added 3 commits June 8, 2017 10:16
Ensure that our LICENSE file is correctly packaged and installed.
Use tempfile.mkdtemp to create the cacheDir rather than assuming we can
use /tmp.
Prefer a handwritten changelog to pbr's dump of the git history.
@pombredanne
Copy link
Contributor

I have been thinking of another possible approach....
So what about this?

  1. Pom is created with `init(self, coordinates, client=None)
  2. coordinates is string that is either a POM coordinates or a path to a local file. eg. something liek coordinates_or_path
    2.1 if coordinates and not client, and we have POM coordinates, then init the class with just the data from the coordinates.
    2.2 if coordinates and not client, and we have a path in coordinates, then init the class with loading and parsing the file at the path stored in coordinates
    2.3 if coordinates and client, and we have coordinates in coordinates, then init the class with letting the client fetch the POM file for coordinates
    2.4 if coordinates and client, and we have a path in coordinates, then init the class with loading and parsing the file at the path stored in coordinates

Now, I am less clear about the case of having both a path and a client....

Also, after all a client is something that can retrieve Artifacts: there could be a dumb client that knows nothing about repositories and deals only with a path as an input and can only fetch that one path (and if provided with coordinates instead, it would just initialize a Pom/Artifact with the coordinates and fetch nothing) ... This could be another way.
All food for thoughts

@wfscheper
Copy link
Contributor Author

wfscheper commented Jun 9, 2017

The more we discuss the Pom.__init__ signature, the more I think that the current implementation is an artifact of my original use case (which was bootstrapping the full dependency tree from just a coordinate string). That's no longer the primary use case.

I think Pom.__init__(pom_data, client=None) makes more sense for a general purpose Pom object, and we provide three classmethods that are Pom factories: Pom.parse(source, client=None), Pom.fromstring(text, client=None), and Pom.fromcoordinate(coordinate, client).
pom_data should be an xml string, and the Pom.__init__ method will handle parsing it into an ElementTree.

Sound reasonable to you?

@pombredanne
Copy link
Contributor

Sound reasonable to you?
@wfscheper It sure does, but I am still not sure if makes sense to recreate any XML string though. It feelds weird when the input is coordinates and would be imposed if the constructor of Pom is Pom.__init__(pom_data, client=None)

@wfscheper
Copy link
Contributor Author

wfscheper commented Jun 11, 2017 via email

@pombredanne
Copy link
Contributor

Are you referring to where the PR was using the EMPTY_POM template? That would go away with the new signature.

perfect!

@pombredanne
Copy link
Contributor

@wfscheper to give you a better feel of where I want to go (and what I would like to contribute back eventually for #3 #9 #10 #11 and #12 ) ... See these:

Note: this is based of master and not this branch, but merging will not be of issue and I will rebase as needed when times comes.

Note that I have over 500+ tests with several POMs with oddities that were collected mostly carefully.

@pombredanne
Copy link
Contributor

@wfscheper any feedback on the subclass I mentioned above? I can start creating a PR, but I would rather base this on top of your refactoring than master I guess.

@wfscheper
Copy link
Contributor Author

wfscheper commented Jun 21, 2017 via email

@pombredanne
Copy link
Contributor

@wfscheper this sounds like a plan! 👍

@pombredanne
Copy link
Contributor

@wfscheper FWIW my patches have been baked in the latest release of scancode https://github.com/nexB/scancode-toolkit/releases/ .... and I would like to know if this is OK to have your name in the credits :) as this is a major improvement on our side for the Maven thingies

@pombredanne
Copy link
Contributor

and FYI to give some context of what we are up to:

  1. here we are collecting package info: https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode for Maven and many (eventually) most package format. This includes direct deps and any other deeper or pinned dep (when available in some lockfile of sorts)
  2. https://github.com/nexB/dependentcode/ will eventually use this data as an input to perform a full dep resolution (using a sat solver)

So the goals are 1. detect all the packages and all deps in a codebase. 2. report all the licensing and origin info. 3. report all known vulnerabilities for all detected or resolved packages (as part of https://github.com/nexB/vulnerablecode which is built by @kartiksibal as part of the GSoC https://summerofcode.withgoogle.com/organizations/4800434830049280/#4673431489478656 ) 4. report other useful info on this

@wfscheper
Copy link
Contributor Author

Closing. We will continue work on this in the clientless-pom branch of the main repo, and merge that in once its fully baked.

@wfscheper wfscheper closed this Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants