Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chores(analyzer): Print warning if duplicate packages #9260

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bennati
Copy link
Contributor

@bennati bennati commented Oct 9, 2024

When duplicate packages or projects are found in the dependency tree, print a warning instead of throwing an exception and inteerrupting the scan. Duplicate packages may arise when the same package is imported twice, in these cases the dependency tree will be completed as the package is imported at least once.

Please ensure that your pull request adheres to our contribution guidelines. Thank you!

@bennati bennati requested a review from a team as a code owner October 9, 2024 07:50
@@ -17,6 +17,8 @@
* License-Filename: LICENSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message: interrupting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded commit message

"Unable to create the AnalyzerResult as it contains packages and projects with the same ids: " +
duplicates.values
if (duplicates.isNotEmpty()) {
logger.warn {"Unable to create the AnalyzerResult as it contains packages and projects with the same ids: ${duplicates.values}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just logging, I believe an OrtIssue should be created - because the problem can potentially be severe, so there should be no risk of going unnoticed.

However, with the current logic, it may be difficult to implement that, because the problem is detected way after the duplicate IDs got inserted, so undoing the insertion is probably no more possible.

"Unable to create the AnalyzerResult as it contains packages and projects with the same ids: " +
duplicates.values
if (duplicates.isNotEmpty()) {
logger.warn {"Unable to create the AnalyzerResult as it contains packages and projects with the same ids: ${duplicates.values}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message now does not reflect what you're doing: You say an analyzer result cannot be created, but still create one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed log message

@sschuberth
Copy link
Member

Besides the current build errors, I'm actually not sure if we want the change. I currently don't oversee all side effects of allowing duplicate identifiers.

@bennati
Copy link
Contributor Author

bennati commented Oct 10, 2024

I appreciate the need of understanding in detail all side effects of allowing duplicate identifiers.
What this proposal is meant for, is making the user aware of the possibility of these side effects, which can be manually corrected, while allowing the scan to complete in case side effects are unlikely.

In my opinion, the scan should crash only if there is a non-recoverable error. Having ORT produce a notice file with potential errors is still safer than having no notice file.

@fviernau
Copy link
Member

Besides the current build errors, I'm actually not sure if we want the change. I currently don't oversee all side effects of allowing duplicate identifiers.

I agree, that side-effect are hard to oversee and probably huge. Also I am strongly for keeping identifiers unique.
What in my view would be the only option is not adding duplicate IDs to the result in the first place, and instead add an OrtIssue. This way it is indicated that something is missing while ids remain unique.

@bennati
Copy link
Contributor Author

bennati commented Oct 10, 2024

Is this what you had in mind?

Comment on lines +47 to +49
logger.warn {
"AnalyzerResult contains packages and projects with the same ids: ${duplicates.values}"
}

Check warning

Code scanning / detekt

Reports code blocks that are not followed by an empty line Warning

Missing empty line after block.
Comment on lines +51 to +59
for ((key, items) in duplicates) {
val issue = createAndLogIssue(
source = "analyzer",
message = "AnalyzerResult contains packages and projects with the same ids: ${items}"
)

val existingIssues = issues.getOrDefault(key, emptyList())
issues[key] = existingIssues + issue
}

Check warning

Code scanning / detekt

Reports code blocks that are not followed by an empty line Warning

Missing empty line after block.
for ((key, items) in duplicates) {
val issue = createAndLogIssue(
source = "analyzer",
message = "AnalyzerResult contains packages and projects with the same ids: ${items}"

Check warning

Code scanning / detekt

Detects simplifications in template strings Warning

Redundant curly braces
for ((key, items) in duplicates) {
val issue = createAndLogIssue(
source = "analyzer",
message = "AnalyzerResult contains packages and projects with the same ids: ${items}"

Check notice

Code scanning / QDJVMC

Redundant curly braces in string template Note

Redundant curly braces in string template
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.67%. Comparing base (546550e) to head (344d6cb).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #9260   +/-   ##
=========================================
  Coverage     67.67%   67.67%           
  Complexity     1187     1187           
=========================================
  Files           239      239           
  Lines          7796     7796           
  Branches        900      900           
=========================================
  Hits           5276     5276           
  Misses         2153     2153           
  Partials        367      367           
Flag Coverage Δ
funTest-non-docker 34.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fviernau
Copy link
Member

fviernau commented Oct 11, 2024

Is this what you had in mind?

No it isn't. It disregards that:

  • Duplicates can be across projects / packages.
  • Removing afterwards probably has a different effect, than skipping over it when attempting to add it in the first place.
  • Also the scopes are already constructed, which contain reference to projects or packages and which correspond to a
    project each.

...but, I also do not see a do-able way how to implement what I had in mind.

@bennati
Copy link
Contributor Author

bennati commented Oct 11, 2024

Ok, let me know if any better solution than this comes to mind which is also feasible and I can try to implement it.
Otherwise let's please merge this one so I can finally generate some notice file for my projects.

@fviernau
Copy link
Member

fviernau commented Oct 11, 2024

Ok, let me know if any better solution than this comes to mind which is also feasible and I can try to implement it.

I won't have the time to look into this anytime soon.

Otherwise let's please merge this one so I can finally generate some notice file for my projects.

IMO it should not be merged, due to not understood side effects and possible broken referential integrity.

Would you be willing to invest the time / effort to find another solution @bennati ?

@bennati
Copy link
Contributor Author

bennati commented Oct 11, 2024

I can develop a different solution, but first I need someone to explain how the solution should work and the ORT maintainers to commit to merge it once developed.

@bennati
Copy link
Contributor Author

bennati commented Nov 1, 2024

Any update on this?

@sschuberth
Copy link
Member

Any update on this?

I believe this PR needs to make more clear which of the three different root causes for duplicate identifiers it targets to address.

In any case, I believe it's wrong to unconditionally remove all duplicates from packages as that, depending on the root cause, could result in a loss of information.

Instead, to get started with addressing the general topic, I plan to revive #6533 which targets the case where duplicates occur between projects and packages.

@dimitris-iliou
Copy link

We have identified another instance related to the RapidJSON library that may need to be addressed. Specifically, the OSS Review Toolkit (ORT) reported a failure because it detected a duplicate dependency across two distinct packages:

[[Package(id=Identifier(type=SpdxDocumentFile, namespace=, name=rapidjson, version=v1.1.0), 
purl=pkg:generic/rapidjson@v1.1.0?download_url=https://github.com/Tencent/rapidjson/archive/refs/tags/v1.1.0.tar.gz,
 cpe=null, authors=[], declaredLicenses=[MIT], 
declaredLicensesProcessed=ProcessedDeclaredLicense(spdxExpression=MIT, mapped={}, unmapped=[]), 
concludedLicense=MIT, description=A fast JSON parser/generator for C++ with both SAX/DOM style API, 
homepageUrl=https://rapidjson.org/, binaryArtifact=RemoteArtifact(url=, hash=Hash(value=, algorithm=)), 
sourceArtifact=RemoteArtifact(url=https://github.com/Tencent/rapidjson/archive/refs/tags/v1.1.0.tar.gz, hash=Hash(value=, 
algorithm=)), vcs=VcsInfo(type=Git, url=ssh://git@main.gitlab.in.here.com:3389/professional-services/hdo/hdo-client.git, 
revision=2e21d71c1840c8d795ef73247b73b052acd73b8e, path=external/rapidjson), vcsProcessed=VcsInfo(type=Git, 
url=ssh://git@main.gitlab.in.here.com:3389/professional-services/hdo/hdo-client.git, 
revision=2e21d71c1840c8d795ef73247b73b052acd73b8e, path=external/rapidjson), isMetadataOnly=false, 
isModified=false, sourceCodeOrigins=null), Package(id=Identifier(type=SpdxDocumentFile, namespace=, 
name=rapidjson, version=v1.1.0), purl=pkg:generic/rapidjson@v1.1.0?
download_url=https://github.com/Tencent/rapidjson/archive/refs/tags/v1.1.0.tar.gz, cpe=null, authors=[], 
declaredLicenses=[MIT], declaredLicensesProcessed=ProcessedDeclaredLicense(spdxExpression=MIT, mapped={}, 
unmapped=[]), concludedLicense=MIT, description=A fast JSON parser/generator for C++ with both SAX/DOM style API, 
homepageUrl=https://rapidjson.org/, binaryArtifact=RemoteArtifact(url=, hash=Hash(value=, algorithm=)), 
sourceArtifact=RemoteArtifact(url=https://github.com/Tencent/rapidjson/archive/refs/tags/v1.1.0.tar.gz, hash=Hash(value=,
 algorithm=)), vcs=VcsInfo(type=Git, url=ssh://git@main.gitlab.in.here.com:3389/professional-
services/dra/clients/vehicle.git, revision=0b876fe71ff6056818fe4eebe0db31e21751679c, path=external/rapidjson), 
vcsProcessed=VcsInfo(type=Git, url=ssh://git@main.gitlab.in.here.com:3389/professional-services/dra/clients/vehicle.git,
 revision=0b876fe71ff6056818fe4eebe0db31e21751679c, path=external/rapidjson), isMetadataOnly=false, 
isModified=false, sourceCodeOrigins=null)]]

When duplicate packages or projects are found in the dependency tree, print a
warning instead of throwing an exception and inteerrupting the scan.
Duplicate packages may arise when the same package is imported twice, in these
cases the dependency tree will be completed as the package is imported at
least once.

Signed-off-by: Stefano Bennati <stefano.bennati@here.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants