-
-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture and report package dependencies separate from actual packages #1237
Comments
Obviously, I'd be curious whether we could leverage ORT's Analyzer component for this in some way instead of re-implementing much of the same logic in Python within ScanCode... we could modify the Analyzer to report back dependencies in whatever format ScanCode requires. Currently, we write out dependency information in YAML (or JSON) that looks like https://github.com/heremaps/oss-review-toolkit/blob/master/scanner/src/funTest/assets/analyzer-result.yml. |
That would be awesome indeed! Especially since the approaches complement each other nicely: ORT is a dynamically collecting deps from running the package managers proper command, whereas ScanCode does only a static analysis of the manifests and runs nothing: so the two combined would cover all the use cases I can ever think of! Just be sure I get this right, the dependency section is this part here: https://github.com/heremaps/oss-review-toolkit/blob/0c42b9351edfcbdc699287f24c25f36f728e19ec/scanner/src/funTest/assets/analyzer-result.yml#L37 correct? and is fed by each analyzer such as here https://github.com/heremaps/oss-review-toolkit/blob/915cfb931297dfea1128c7f457de81dc92b2ae51/analyzer/src/main/kotlin/managers/Bower.kt#L63 And the main code is there https://github.com/heremaps/oss-review-toolkit/blob/a297595fce3763b0e30eec0ebbbe9abb420905d9/model/src/main/kotlin/Scope.kt#L28 And you PackageId are close enough to a package URL that the conversion will be 100% easy. |
Correct. We group dependencies by scope.
Right, except that line 63 you're quoting does not really contain any "feeding" code. Sticking to the example of Bower, probably a better line to quote is https://github.com/heremaps/oss-review-toolkit/blob/915cfb931297dfea1128c7f457de81dc92b2ae51/analyzer/src/main/kotlin/managers/Bower.kt#L105, which creates the actual Package entry, i.e. an entry such as starting at https://github.com/heremaps/oss-review-toolkit/blob/0c42b9351edfcbdc699287f24c25f36f728e19ec/scanner/src/funTest/assets/analyzer-result.yml#L78. The dependencies in the tree structure above are references to these packages.
Yes. Also see oss-review-toolkit/ort#20. |
@pombredanne I am in favour of this new scanner for certain package types (python especially). |
+1 |
I think than rather to just list bare dependencies there is something larger and more generic which is the notion of a From a higher level point of view, there are about four sets of data we can collect on a project or package:
Each of these four data pieces may exist or not. Their presence should dictate how we organize the normalized data returned from a scan. metadata are essential to the definition of what we call a package. So IMHO when we have metadata and that we can determine a Yet if there is no name, (say a nameless private Composer package) this would no longer be a package (it cannot be published nor consumed as such anywhere) but it is only project/application like. dependencies are either for a package or a project. Their presence alone (without metadata) are the mark of a project. For instance we can infer from the presence of a Like deps, build instructions alone are the mark of a project. version control information are either for a package or a project. Therefore, I want to add a new data structure and scanner called either
|
project and development_environment may not be equivalent in the case of an upstream project and a different downstream development_environment which can look quite different. |
We think the same, which is why we have separate Project and Package classes in ORT, and they don't even inherit form each other although their properties are similar in large parts.
We have all of that except the build instructions in ORT.
I'd prefer "project" for similarity to ORT 😉 |
@sschuberth thanks! we think along... in hindsight I wonder if |
From a user / developer perspective, I believe that's simply that it is: a project. And we also simply couldn't think of a less overloaded but equally fitting term 😉 |
As much as it would be good to align with ORT terminology wherever possible, I think that project is not a good term in this context because what we are trying to name is typically a subset of a project where the most common uses of the term "project" in our domain seem to be:
What we are trying to name is a (sub)set of files that are logically related by origin, license and function within a project (as defined above). A package represents the case where this set of files is grouped together by the original project - whether in a package created by a package manager or something as simple as an archive. These points are separate from how we might add the definition of a Development Environment which is a much broader idea that would typically cover many Development projects. |
That was not my understanding from reading the above text. What you describe here sounds a bit like what e.g. Gradle would call a "source set". But @pombredanne mentioned an additional attribute to capture: dependencies. As soon as dependencies come into the picture I find it less fitting to talk about sources / files, as usually dependencies are not managed / declared by the sources / files themselves, but by a high concept / wrapper, i.e. the build system / package manager.
I wouldn't object to "component", simply because, like you said, it pretty much fits all and anything 😉 |
So here is where I will be going for now following the details of this conversation and #1237 (comment) :
The topic of |
So here is where the latest on this. I am pushing this for comment in a branch:
|
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
And here is yet another updated proposal:
|
ok one last round... back toward keeping things simple enough and making fewer changes:
These are for later:
|
@pombredanne I think in the context of consolidation and other summation techniques this makes a lot of sense. |
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
This has been merged. Closing now! |
Today, we collect dependencies from the package manifests, but these are mostly potential, first-level dependencies. In contrast we have several cases such as lockfiles where we have concrete dependencies but we do not have much in terms of package metadata.
So in order to support Godep, Gemfile lock, pip requirements.txt, etc. (and we do have parsers for several of these) we should have a new file-level attribute (and scanner) that deals exclusively with dependencies and nothing else. We could even go as far as decoupling this from the base
--package
scan and return dependencies only when requested.The could still be returned as
package.dependencies
as they are today when found in a package manifest or when they can be related clearly to a manifest... or just reported underdependencies
when they come from a some lockfile or always.This needs some design and thinking of course.
Some of the dependencies format we miss or track:
@KinXer you input would be welcomed since you reported #631
@DennisClark @tdruez @JonoYang @MaJuRG @sschuberth feedback welcomed too.
The text was updated successfully, but these errors were encountered: