Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph-builder: use openshift/cincinnati-graph-data instead of quay labels #232

Conversation

steveej
Copy link
Contributor

@steveej steveej commented Feb 25, 2020

This configures the graph-builder binary to use the
GithubOpenshiftSecondaryMetadataScraperPlugin and the
OpenshiftSecondaryMetadataParserPlugin in conjunction to use
https://github.com/openshift/cincinnati-graph-data as the new source for
secondary metadata.


@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2020
@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from 9c82118 to 7efc17b Compare February 26, 2020 19:38
@steveej steveej changed the title [WIP/BLOCKED] graph-builder: use openshift/cincinnati-graph-data instead of quay labels graph-builder: use openshift/cincinnati-graph-data instead of quay labels Feb 26, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2020
@steveej steveej requested review from vrutkovs and removed request for rrati February 26, 2020 19:57
@steveej
Copy link
Contributor Author

steveej commented Feb 26, 2020

/retest

1 similar comment
@steveej
Copy link
Contributor Author

steveej commented Feb 27, 2020

/retest

@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from 7efc17b to d29f684 Compare February 27, 2020 07:37
@vrutkovs
Copy link
Member

We're now using prod metadata - but tests are still using channels "a" and "b"

@steveej
Copy link
Contributor Author

steveej commented Feb 27, 2020

We're now using prod metadata - but tests are still using channels "a" and "b"

Note that we're only using prod secondary metadata, while using test primary metadata 😄

When I started working on a solution for this locally, I stumbled upon a real issue:

[2020-02-27T10:06:44Z ERROR graph_builder::graph] Parsing {"message":"API rate limit exceeded for 82.197.161.223. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)","documentation_url":"https://developer.github.com/v3/#rate-limiting"} to Vec<Branch>

Our default delay of 30 seconds is too short for GitHub's API, the good news is we're not that far away:

For unauthenticated requests, the rate limit allows for up to 60 requests per hour. Unauthenticated requests are associated with the originating IP address, and not the user making requests.

I.e., raising the delay to 60 seconds, assuming our deployments have different IPs, will do it until we have authentication against the GitHub API. If we have consensus I will raise the delay to 60 in this PR as well.

@vrutkovs
Copy link
Member

If we have consensus I will raise the delay to 60 in this PR as well.

SGTM, perhaps we should invest into token support (most likely we'll use that on stage / prod anyway)

@steveej
Copy link
Contributor Author

steveej commented Feb 27, 2020

SGTM, perhaps we should invest into token support (most likely we'll use that on stage / prod anyway)

Definitely a mid-term goal, but I wouldn't block on it, as I think 60s, or even 120s is a small duration compared to the release frequency and the cluster sync frequency.

dist/openshift/cincinnati.yaml Outdated Show resolved Hide resolved
@steveej
Copy link
Contributor Author

steveej commented Feb 27, 2020

As per @LalatenduMohanty comment I have changed my opinion on blocking on authentication. The comment reminded me of our intention to follow best continuous delivery practices, which means that we shouldn't compromise our quality in master (which feeds stage) if not necessary.
In this case it can be prevented.

I implemented authentication in a new commit. Now we need to get secrets for CI, staging and production and wire them through in the deployment templates.

@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from 4c4d0a0 to 0cba757 Compare February 27, 2020 12:00
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 27, 2020
@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch 3 times, most recently from 27a69e4 to 6261258 Compare February 27, 2020 14:14
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2020
@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from b7287c4 to 374778c Compare March 11, 2020 08:54
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2020
@steveej
Copy link
Contributor Author

steveej commented Mar 30, 2020

Amending to my previous comment, we're not dealing with actual fluctuation of the CI, but I'm certain that this PR doesn't degrade the the performance of the Cincinnati stack either. This PR does change the e2e deployment to use production data, which is the likely cause of performance changes.

@steveej
Copy link
Contributor Author

steveej commented Mar 30, 2020

/retest

@vrutkovs
Copy link
Member

This LGTM, lets squash in a single commit

steveej added 3 commits March 31, 2020 11:37
* GithubOpenshiftSecondaryMetadataScraperPlugin: fix path in debug output

  Unpacking happens to a tmpdir and not to the final output_directory.
  Reflect this in the debug message.

* GithubOpenshiftSecondaryMetadataScraperPlugin: add revision setting

  When `revision` is set, the plugin will try to download this revision
  from the configured repository instead of looking up the latest one.

* GithubOpenshiftSecondaryMetadataScraperPlugin: fix should_update evaluation

  Only download if the wanted and completed commits *do not* match.
  Previously this has been mistakenly inverted.

* GithubOpenshiftSecondaryMetadataScraperPlugin: make directory handling safe and robust

  This changes the plugin create and use two temproary directories inside
  the given output directory; one per extraction and final output
  respectively. The path of the final output directory is passed through
  the following plugin via the IO parameters at the key defined with the
  public variable `GRAPH_DATA_DIR_PARAM_KEY`.
  With this change, the plugin only touches files within the configured
  output directory, while never removing any pre-existing files, which it
  both previously did.

  Also the unit test is changed to call the plugin twice, to ensure the
  plugin does not fail on subsequent runs.

* OpenshiftSecondaryMetadataParserPlugin: lookup data directory in parameters

  The data directory found in the IO parameters, if found, has precedence
  over the configured data directory.
…ondaryMetadataParser

We want to use an OpenShift specific default value for the key_prefix
setting. However, using a default value for the key_prefix string which
is not the default for the string type, which would be the empty string,
is not intuitive.  This can be alleviated by making the default value a
publicly expose constant and thus make it transparent.
…bstract releases

This switches to using the `Graph`'s implementation of `Eq` for
comparison and exludes abstract releases in the edge debug output.
@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from d900722 to 814bee1 Compare March 31, 2020 09:40
Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

🎉

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2020
@LalatenduMohanty
Copy link
Member

/hold I am still going through the PR

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2020
steveej added 7 commits March 31, 2020 14:59
I previously assumed the dev-dependency would refer to the version of
the same dependency from the regular dependency list.

According to this error my assumption was wrong.

```
cincinnati/Cargo.toml: dependency (prettydiff) specified without
providing a local path, Git repository, or version to use. This will be
considered an error in future versions
```

For now I don't see a better way than to manage these versions
separately, while bumping them in lockstep.
* graph-builder: by default use openshift/cincinnati-graph-data instead of quay labels

  This configures the graph-builder binary to by default use the
  GithubOpenshiftSecondaryMetadataScraperPlugin and the
  OpenshiftSecondaryMetadataParserPlugin in conjunction to use
  https://github.com/openshift/cincinnati-graph-data as the new source for
  secondary metadata.

  It also removes the NodeRemovePlugin because it's not envisioned to be
  used within the secondary metadata schema.

* tests/e2e: test with cincinnati-graph-data

  This switches the e2e test to verifying the a Cincinnati stack which is
  configured to fetch metadata from a specific revision of the
  cincinnati-graph-data repository [0].

  Also change the test to reuse the cincinnati graph comparison
  functionality which offers a helpful human readable failure output.

  [0]: https://github.com/openshift/cincinnati-graph-data

* dist/openshift: scrape secondary metadata from github

  This changes the OpenShift template plugin configuration to use the
  GitHub scraper plugins by default.
* Justfile: configure scraping from GitHub and add e2e recipes

* Justfile: refactor graph-builder arguments to global variables

  This makes it easier to override variables for the `run-graph-builder`
  and related recipes where it is a transitive dependency.
The previous change to using production data in the e2e graph-builder
configuration had a significant impact on the load balancing test
results. Until we have better tracing capabilities we simply lower the
expectations.
@steveej steveej force-pushed the pr/graph-builder-secondary-metadata-from-github branch from 814bee1 to a8cddc8 Compare March 31, 2020 13:00
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2020
@steveej
Copy link
Contributor Author

steveej commented Mar 31, 2020

New changes are detected. LGTM label has been removed.

Just reworded a commit message, no content changes:

$ git diff 814bee1 a8cddc8 | wc -l
0

@LalatenduMohanty
Copy link
Member

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, steveeJ, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,steveeJ,vrutkovs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 90efacd into openshift:master Mar 31, 2020
@steveej steveej deleted the pr/graph-builder-secondary-metadata-from-github branch March 31, 2020 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants