Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph-builder: add plugin to parse openshift/cincinnati-graph-data #231

Conversation

steveej
Copy link
Contributor

@steveej steveej commented Feb 24, 2020

Add openshift_secondary_metadata_parser plugin:

  • Implement plugin logic

  • Test using fixtures produced with a custom graph-builder to obtain a
    snapshot of the following triple:

    • a vanilla graph produced by the registry scraper plugin
    • a vanilla graph augmented by the quay secondary metadata plugin
    • the data from the openshift/cincinnati-graph-data repository
      The test asserts equality of two results of the EdgeAddRemovePlugin.
      The first result is obtianed by passing the raw graph through the new
      parser plugin and then through the EdgeAddRemovePlugin.
      The second passing the quay metadata graph fixture through the new
      parser plugin.
      This ensures compatibility between the new plugin and the
      QuayMetadataPlugin with respect to equality for subsequent processing
      of their produced metadata.

    Note: for this to work I had to add raw metadata to the fixtures,
    which represents metadata present on quay, but not present as part of
    the existing cincinnati-graph-data. This raw metadata has also been
    added to the upstream data repository.


Split off the original #226.

  • Plugin to parse the contents of the repository and mangle the Graph
    • Impl
    • Tests which rely on the previous plugin?
    • Independent tests, i.e. generate a known file structure on-the-fly in the test
  • Double-check to either use tokio::fs functions or use spawn_blocking

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 24, 2020
@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch from d99e81a to e50d406 Compare February 24, 2020 21:27
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2020
@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch from e50d406 to 89cd657 Compare February 24, 2020 21:44

/// If true causes the removal of all processed metadata from the releases.
#[default(false)]
pub remove_consumed_metadata: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make this tuneable? Can we just always do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be helpful for debugging to leave them in, so see what annotations are on the release. Also for now it's still off by default, which is compatible with the existing deployments.

Copy link
Member

@LalatenduMohanty LalatenduMohanty Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be used to force the plugin to discard previous processed metadata and start with fresh metadata scanning ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This controls whether the metadata which is used by this plugin is removed after processing or not. As an example, if there is a previous.remove with the value 1.2.3 metadata on a release, this plugin will remove the edge, in the graph. If remove_consumed_metadata is true in the plugin configuration, it will also remove the metadata entry with the key previous.remove.

Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few nitpicks and odd parts

@@ -11,4 +11,6 @@ if ! type -f "${YAML_LINTER}"; then
exit 1
fi

find . -type f \( -name '*.yaml' -o -name '*.yml' \) -print0 | xargs -L 1 -0 "${YAML_LINT_CMD[@]}"
find . \
-path "./graph-builder/src/plugins/openshift_secondary_metadata_parser/test_fixtures" -prune -o \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: define path as a var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only one usage of it though, and I could see us wanting to prune more than just this one. usually I'm all for constants if they benefit the DRY principle, but in this case it would just add indirection.

async fn process_channels(&self, io: &mut InternalIO) -> Fallible<()> {
let channels_dir = self.settings.data_directory.join("channels");
let channels: Vec<graph_data_model::Channel> =
deserialize_directory_files(&channels_dir, regex::Regex::new("ya+ml")?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps its worth sticking to yaml only until file extensions go out of control :)

Copy link
Contributor Author

@steveej steveej Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently in line with #226. I think both (yml and yaml) are common extensions. Do you think we should make it configurable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are common, but I'd rather not leave a space for errors here.

@wking @LalatenduMohanty do you agree we should stick to ".yaml" and ignore (possible) ".yml" in graph-data repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard practice for yaml file uses .yaml , .yml . So we should only support these two extensions. Just supporting yaml is fine too but then someone might request to add yml as well. So supporting .yaml , .yml makes us future proof.

@@ -53,7 +57,8 @@ impl PluginSettings for EdgeAddRemovePlugin {
/// 2. next
/// 2. *.remove
/// 1. previous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems both previous and previous_regex data would be applied. Shouldn't we prefer previous_regex instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the existing syntax can coexist with the additional one, so I don't see why we should remove the existing one. What makes you think we should?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect we use either previous or previous_regex in blocked-edges, but current implementation allows both. This may lead to a few odd situations - lets throw error when both are specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why it would be an error. They could be used in conjunction as well. Why do you want to restrict it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not an error really, just feels that it might have undesirable consequences. Maybe I'm overreacting really as I can't come up with a potential issue on this one

Copy link
Contributor Author

@steveej steveej Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I don't see a potential issue in this part of the code either, and I've thought about it quite a bit 🙂

&to.to_string()
))?
.insert(
format!("{}.{}", self.settings.key_prefix, "previous.remove_regex"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure this would be a regex - is it an exact version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, correct, thanks!

@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch from 89cd657 to a44fe52 Compare February 25, 2020 14:41
@steveej
Copy link
Contributor Author

steveej commented Feb 25, 2020

Looks good, a few nitpicks and odd parts

Thanks for the review! I addressed your comments either by code changes or answers.

@steveej
Copy link
Contributor Author

steveej commented Feb 25, 2020

/retest

@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch 2 times, most recently from a7768b3 to 4f3357f Compare February 26, 2020 09:42
@steveej
Copy link
Contributor Author

steveej commented Feb 26, 2020

Rebased and last TODO item completed!

@steveej steveej requested review from LalatenduMohanty and removed request for rrati February 26, 2020 09:44
@steveej
Copy link
Contributor Author

steveej commented Feb 26, 2020

/retest

@LalatenduMohanty
Copy link
Member

@steveej Can we merge some of the commits with single line code changes still keep them logically separate? IMO all code changes related to the plugin should be just in one commit.

@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch from 4f3357f to 1c35ec0 Compare February 26, 2020 14:21
@steveej
Copy link
Contributor Author

steveej commented Feb 26, 2020

IMO all code changes related to the plugin should be just in one commit.

I suggest not squashing them, as it makes reverting commits more difficult in the future. Take ab718b4ab985601172e4345060990380693b2ba0 for example which exposes a constant. If this is squashed with the plugin commit, and we would want to revert the plugin commit, there is no guarantee the reversal won't break other code which in the meantime also depends on the public constant.

@vrutkovs
Copy link
Member

It seems squashing 87a928f and ab718b4 would be a good idea.

Other than that it looks good

…adata removal setting

* Add `previous.remove_regex` metadata key to remove edges by regex
  matching.
* Add a `remove_consumed_metadata` configuration option to control the
  removal of metadata after it has been processed.
Add openshift_secondary_metadata_parser plugin:
* Implement plugin logic
* Test using fixtures produced with a custom graph-builder to obtain a
  snapshot of the following triple:
    * a vanilla graph produced by the registry scraper plugin
    * a vanilla graph augmented by the quay secondary metadata plugin
    * the data from the openshift/cincinnati-graph-data repository
  The test asserts equality of two results of the EdgeAddRemovePlugin.
  The first result is obtianed by passing the raw graph through the new
  parser plugin and then through the EdgeAddRemovePlugin.
  The second passing the quay metadata graph fixture through the new
  parser plugin.
  This ensures compatibility between the new plugin and the
  QuayMetadataPlugin with respect to equality for subsequent processing
  of their produced metadata.

  Note: for this to work I had to add raw metadata to the fixtures,
  which represents metadata present on quay, but not present as part of
  the existing cincinnati-graph-data. This raw metadata has also been
  added to the upstream data repository.
@steveej steveej force-pushed the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch from 1c35ec0 to 09fcff9 Compare February 26, 2020 17:48
@steveej
Copy link
Contributor Author

steveej commented Feb 26, 2020

It seems squashing 87a928f and ab718b4 would be a good idea.

Agreed, these two are not unreasonable to squash, done!

Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 26, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: steveeJ, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 7f68cf5 into openshift:master Feb 26, 2020
@steveej steveej deleted the pr/plugin-parse-openshift-cincinnati-graph-metadatag branch March 20, 2020 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants