Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulnerabilites]Add IVulnerabilityWriter #9712

Merged
merged 6 commits into from
Oct 31, 2023
Merged

[Vulnerabilites]Add IVulnerabilityWriter #9712

merged 6 commits into from
Oct 31, 2023

Conversation

ryuyu
Copy link
Contributor

@ryuyu ryuyu commented Oct 17, 2023

Update the Github Vulnerability interaction library to support IVulnerabilityWriters for writing vulnerabilities.
This is in preparation for use by the new v3 vulnerability resource updater.

Spec for Vulnerabilities in V3: https://github.com/NuGet/Engineering/pull/4940

@ryuyu ryuyu self-assigned this Oct 17, 2023
@ryuyu ryuyu requested a review from a team as a code owner October 17, 2023 23:56
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

using Microsoft.Extensions.Logging;
using NuGet.Services.Entities;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: order of dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved System above to be consistent with other files.

{
public interface IVulnerabilityWriter
{
Task<bool> WriteVulnerabilityAsync(PackageVulnerability vulnerability, bool wasWithdrawn = false);
Copy link
Contributor

@zhhyu zhhyu Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add the description of an interface.

public class Advisory
{

[JsonProperty(PropertyName = "url")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line above.

public async void FlushAsync(string outputFileName = null)
{
// This method is unused for this implementation.
throw new NotImplementedException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this method be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will be for when we aren't directly writing to the store location, which isn't the case for DB (DB implementation writes to DB on every call of WriteVulnerability).
The implementation planned for the v3 resource will internerally track all added vulnerabilities, then call Flush to write it out the final location (blob storage).
In short, IVulnerabilityWriter is treated sort of as a local cache of expected changes, and we call Flush to commit them. I didn't want to change DB writer behaviour, so this paradigm is ignored for DB implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: As mentioned in #9712 (comment) I have updated this to no-op instead.

@@ -66,6 +66,10 @@ protected void ConfigureIngestionServices(ContainerBuilder containerBuilder)
.RegisterType<PackageVulnerabilitiesManagementService>()
.As<IPackageVulnerabilitiesManagementService>();

containerBuilder.RegisterType<GalleryDbVulnerabilityWriter>()
.As<IVulnerabilityWriter>()
.SingleInstance();
Copy link
Contributor

@zhhyu zhhyu Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little concerned about whether it is safe to use singleton here or whether there is a need to do so. Is there any consideration to use singleton?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GalleryDbVulnerabilityWriter is a very simple passthrough implementation.
Since the PacakgeVulnerabilitiesManagementService above it that it passes through to is not single instance, we can probably drop it here. Good catch.

/// Commit changes to source
/// </summary>
/// <param name="outputFileName">Optional parameter to output to disk.</param>
void FlushAsync(string outputFileName = null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void

void -> Task. It is better to avoid using async-void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try
{
await _packageVulnerabilityService.UpdateVulnerabilityAsync(vulnerability, wasWithdrawn);
return true;
Copy link
Contributor

@zhhyu zhhyu Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true;

Do we need to return a bool value? If exceptions are thrown, it will always terminate the job because of the rethrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I'll look in to this to find out if there was a reason I made it return a bool in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like I can think of a good reason that it returned bool here as it is unused. I'll wait on making changes for now to see if anybody else has any opinions on this, and then I'll change it.

}

public async Task<int> WriteVulnerabilitiesAsync(IEnumerable<PackageVulnerability> vulnerabilities)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this method is not called by 2Db job. Do we need the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point.
It is a trivial implementation, so we can probably drop it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it a bit, I would like to leave this here in case the library changes more to call these methods on this interface. To this point, I have also updated the throwing implementation above to trivially do nothing and return.


namespace GitHubVulnerabilities2Db.Gallery
{
public class GalleryDbVulnerabilityWriter : IVulnerabilityWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding test coverage for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage was effectively already added. The existing tests call through this class now.

}
catch (Exception ex)
{
_logger.LogError("Failed to write vulnerability: {ErrorMessage}", ex.Message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to enrich this exception message, with the identity of a vulnerability like "GitHubDatabaseKey".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new error logging line here that logs the github Advisory URL that failed.


namespace NuGet.Services.GitHub.Entities
{
public class IndexEntry
Copy link
Contributor

@zhhyu zhhyu Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and "Advisory" are not used in this PR. This "IndexEntry" seems matching the data schema in vulnerability documents. Will it be better that we place these data entities next to the new 2Blob job, rather than put it in the shared layer that is also called by 2Db job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That's a good point. I'll pull them out of this PR. Might be a leftover from when Writer implementations were inside this library instead of the responsibility of the caller.


foreach(var vulnerability in vulnerabilities)
{
if (await WriteVulnerabilityAsync(vulnerability))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteVulnerabilityAsync(vulnerability)

Any vulnerability in the list can be in the state of "Withdrawn", am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the PackageVulnerability Entity that is used to pass around here doesn't contain that information. Let me see how feasible it is to create a wrapper class that splits into the NuGet.Entity class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the API to take a list of Tuples instead of the single PacakgeVulnerability IEnumerable.
This should enable us to pass through the wasWithdrawn.

@@ -4,6 +4,7 @@
using System;
using System.Linq;
using System.Threading.Tasks;
using GitHubVulnerabilities2Db.Gallery;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: order of dependencies.

catch (Exception ex)
{
_logger.LogError("Failed to write vulnerability {GitHubAdvisoryUrl}", vulnerability.AdvisoryUrl);
_logger.LogError("Error Message: {ExceptionMessage}", ex.Message);
Copy link
Contributor

@erdembayar erdembayar Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we making 2 separate error logs for same exception?
Usually we just log once per exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging 2 lines splits the lines in AI. I figured that having both information was useful, but since we are rethrowing the exception below, we can probably skip the log of the Exception message as it should show up in logs as exception. I'll take it out.

Copy link
Contributor

@erdembayar erdembayar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add link to tracking issue and spec in description?
And give little more description of exactly what part of spec being implemented here.

@ryuyu
Copy link
Contributor Author

ryuyu commented Oct 23, 2023

Could you please add link to tracking issue and spec in description? And give little more description of exactly what part of spec being implemented here.

I have included a link the Spec for Vulnerabilities in V3 to the description.
This is a prerequisite change to the existing code to enable a different write target for the vulnerabilities (this library was previously part of another job that was extracted in #9657

{
var dbVulnerability = vulnerability.Item1;
var wasWithdrawn = vulnerability.Item2;
await WriteVulnerabilityAsync(dbVulnerability, wasWithdrawn);
Copy link
Contributor

@agr agr Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation seems to be inserting vulnerabilities one by one. I have two concerns about that: if we are going to be inserting rows in SQL DB, inserting single row at a time is considerably slower than inserting a batch of rows in a single statement/transaction (it might not matter though if the number of rows inserted is low). Second is about failure recovery: what happens if it starts failing after writing some vulnerabilities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You concern is totally valid, but this is how the current implementation works.
I think if it fails after writing some, it should be ok as worst case, it should just rewrite/modify the existing vulnerabilities.
But I think fixing this at this time is beyond the scope of this PR.
I am again making minimal changes to implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is actually not used anywhere, is it? If so, maybe just throw NotImplementedException()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, please file a work item to update IPackageVulnerabilitiesManagementService to allow batch updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryuyu
Have you filed work item for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay on this folks. I have filed #9728 tracking the work @agr mentioned above.

public async Task FlushAsync(string outputFileName = null)
{
// This method is unused for this implementation.
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throwing NotImplementedException() is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated this a lot internally, but with @agr 's suggestion at #9712 (comment), I think we will need a non throwing implementation.

@zhhyu
Copy link
Contributor

zhhyu commented Oct 27, 2023

Thanks for addressing the feedback! I suggest throwing NotImplementedException() for unused implementations.

zhhyu
zhhyu previously approved these changes Oct 27, 2023
erdembayar
erdembayar previously approved these changes Oct 27, 2023
@@ -37,7 +37,7 @@ public async Task IngestAsync(IReadOnlyList<SecurityAdvisory> advisories)
var vulnerabilityTuple = FromAdvisory(advisory);
var vulnerability = vulnerabilityTuple.Item1;
var wasWithdrawn = vulnerabilityTuple.Item2;
await _packageVulnerabilityService.UpdateVulnerabilityAsync(vulnerability, wasWithdrawn);
await _vulnerabilityWriter.WriteVulnerabilityAsync(vulnerability, wasWithdrawn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Flush be called after all writes are done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good call. I'll add a call to flush here after so we don't need to make this change later. Note @zhhyu that this change will require a non throwing implementation of flush for the GalleryDbWriter.

@ryuyu ryuyu dismissed stale reviews from erdembayar and zhhyu via 6f43ced October 27, 2023 23:24
@ryuyu ryuyu merged commit 11a9f42 into dev Oct 31, 2023
@advay26 advay26 mentioned this pull request Nov 10, 2023
11 tasks
@ryuyu ryuyu deleted the ryuyu-githublib-update branch March 16, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants