Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][ci] Add integration to ge.apache.org Gradle Enterprise server #19133

Merged
merged 4 commits into from
Jan 10, 2023

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Jan 4, 2023

Motivation

ASF Infra team together with Gradle Inc. has setup a dedicated Gradle Enterprise instance for Apache projects at https://ge.apache.org/ .

Gradle Enterprise works for both Gradle builds and Maven builds. There's a Maven extension which provides the integration for Maven builds, https://docs.gradle.com/enterprise/maven-extension/ .

You can find more information about the benefits of Gradle Enterprise on https://gradle.com . Gradle Enterprise is also for Maven builds!

Here's an example of builds scans for all Maven builds that are run as part of Pulsar CI workflow: https://ge.apache.org/scans?search.names=Git%20commit%20id%20short&search.timeZoneId=Europe/Helsinki&search.values=db7ab997&selection.buildScanB=ohtx53k6ymsgc
There are 69 maven builds that are run in total.

The build scans are also linked from the GitHub Actions results page, for example https://github.com/apache/pulsar/actions/runs/3839199234#summary-10438505199 shows this.

Mailing list discussion: https://lists.apache.org/thread/d36xbj8b7ntcyp85lv0730xfj4hodqxj

Modifications

Add configuration for Gradle Enterprise which will be activated only for builds in branches on apache/pulsar when a secret called GE_ACCESS_TOKEN exists.
A new environment variable called JOB_NAME is added so that the name of the GitHub Actions CI job can be passed on to Gradle Enterprise build scan values. GitHub Actions has a limitation that the job name isn't available as an environment variable.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added the area/ci label Jan 4, 2023
@lhotari lhotari added this to the 2.12.0 milestone Jan 4, 2023
@lhotari lhotari self-assigned this Jan 4, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 4, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2023

Codecov Report

Merging #19133 (e4ad807) into master (9ef54fd) will decrease coverage by 13.06%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #19133       +/-   ##
=============================================
- Coverage     47.22%   34.16%   -13.07%     
+ Complexity    10713     6485     -4228     
=============================================
  Files           713      607      -106     
  Lines         69697    57647    -12050     
  Branches       7485     5994     -1491     
=============================================
- Hits          32914    19693    -13221     
- Misses        33096    35302     +2206     
+ Partials       3687     2652     -1035     
Flag Coverage Δ
unittests 34.16% <ø> (-13.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ava/org/apache/pulsar/broker/admin/v1/Brokers.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pulsar/broker/admin/v1/Clusters.java 0.00% <0.00%> (-100.00%) ⬇️
.../org/apache/pulsar/broker/admin/v1/Properties.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pulsar/broker/admin/v2/ResourceGroups.java 0.00% <0.00%> (-100.00%) ⬇️
...ar/common/naming/PartitionedManagedLedgerInfo.java 0.00% <0.00%> (-100.00%) ⬇️
...e/pulsar/broker/admin/impl/ResourceQuotasBase.java 0.00% <0.00%> (-96.43%) ⬇️
...he/pulsar/broker/service/AnalyzeBacklogResult.java 0.00% <0.00%> (-92.31%) ⬇️
.../apache/pulsar/broker/admin/v1/ResourceQuotas.java 0.00% <0.00%> (-91.43%) ⬇️
...e/pulsar/broker/stats/AllocatorStatsGenerator.java 0.00% <0.00%> (-91.38%) ⬇️
...pache/pulsar/broker/stats/MBeanStatsGenerator.java 0.00% <0.00%> (-84.38%) ⬇️
... and 227 more

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari Thanks for your contribution!

It should be good to try. I have some questions here:

  1. How do you imagine we make use of the scan report?
  2. How much time does the new step consume?
  3. Is it possible to replace codecov with this scan so we reduce too many reports?

@tisonkun
Copy link
Member

tisonkun commented Jan 5, 2023

Also, maybe we can tag ready-for-test on this specific PR to preview the result now? I don't know exactly whether it works on a personal fork. Since it depends on ge.apache.org, I guess no?

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2023

@lhotari Thanks for your contribution!

It should be good to try. I have some questions here:

  1. How do you imagine we make use of the scan report?

there are many ways to make use of the scan reports. For example, the test execution reports are useful for analysing flaky tests and slow tests. Here's an example of test execution reports:
https://ge.apache.org/scans/tests?search.rootProjectNames=Pulsar&search.timeZoneId=Europe/Helsinki&tests.container=*
It seems that @poorbarcode has been working on improvements for slow tests in #17657 and child issues. Gradle Enterprise test reporting is a good way to find the slow tests.

For other parts of the build, the build scan contains detailed break downs of where time is spent in the build. It's simply a way to get a better understanding of what happens during the build. One of the challenges is that Gradle Enterprise doesn't seem to provide a nice way to group the related builds in list views. There are links in the build scan report header to find other builds in the same workflow run or for the same git commit hash.
image
However this isn't very helpful in the list views since there's no information there to see what build it actually is. That's why I added the links to the reports to the GitHub Actions workflow run summary view:
image

  1. How much time does the new step consume?

The Gradle Enterprise maven extension collects the information and logs and sends them when the build finishes. It doesn't seem to add more than a few seconds to execution. One detail is that this solution is only enabled for builds that happen on the apache/pulsar branches. The reason for this is that Gradle Enterprise has a secret token that is used for publishing to ge.apache.org . If we'd like to publish reports for pull requests, the token would have to be made public. I don't think that there are plans for doing that at the moment.

  1. Is it possible to replace codecov with this scan so we reduce too many reports?

This isn't a replacement for codecov. Currently codecov integration in apache/pulsar seems to be invalid and the results don't make sense. I think we should disable codecov reporting until it is fixed and it provides consistent results.

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2023

Also, maybe we can tag ready-for-test on this specific PR to preview the result now? I don't know exactly whether it works on a personal fork. Since it depends on ge.apache.org, I guess no?

@tisonkun I created lhotari#123 to ensure that the changes don't break builds in forks. You won't be able to publish build scans to ge.apache.org since the secret token is only available for builds on branches that exist in apache/pulsar repository. As a solution for forked builds, I think we could add a way to use public Gradle builds scans available at https://scans.gradle.com/ .

Currently, the maven extension will only get activated when the token for Gradle Enterprise is available. The activation happens in .github/actions/gradle-enterprise/action.yml by copying .mvn/ge-extensions.xml to .mvn/extensions.xml and by making the token available in GRADLE_ENTERPRISE_ACCESS_KEY environment variable that the Gradle Enterprise maven extension uses.

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2023

There seems to be some compatibility issue with Gradle Enterprise maven extension and the way how we retry tests with TestNG.

There's an error
Received a start event for 'DefaultTestDescriptor{id=-6948655823765472571, surefireForkId=2, name='org.apache.pulsar.client.api.SimpleSchemaTest', className='org.apache.pulsar.client.api.SimpleSchemaTest', parentId=3213794952218547303}' with duplicate id '-6948655823765472571'.

in build scan: https://ge.apache.org/s/lfbhlzzwce72c/failure?focused-exception-line=0-5#1

in GitHub Actions logs: https://github.com/apache/pulsar/actions/runs/3843180558/jobs/6546220728#step:11:1408

lhotari added a commit to lhotari/pulsar that referenced this pull request Jan 5, 2023
@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2023

@lhotari lhotari marked this pull request as draft January 5, 2023 16:08
@lhotari lhotari marked this pull request as ready for review January 9, 2023 14:25
@lhotari lhotari closed this Jan 9, 2023
@lhotari lhotari reopened this Jan 9, 2023
@lhotari lhotari requested a review from tisonkun January 10, 2023 12:11
@lhotari
Copy link
Member Author

lhotari commented Jan 10, 2023

I have isolated the issue in https://github.com/lhotari/gradle-enterprise-duplicate-test-id-repro

There's now a workaround in place to handle the issue. There was only one occurrence of the issue in the Pulsar code base.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Let's see what it brings to us.

BTW, compatibility issues can be a risk that there're further unstable cases so we need to investigate it.

import org.testng.annotations.Test;

@Test(groups = "broker-api")
public class SimpleSchemaWithSchemaValidationEnforcedTest extends SimpleSchemaTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file an issue and add a reference for this trick.

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member Author

lhotari commented Jan 10, 2023

BTW, compatibility issues can be a risk that there're further unstable cases so we need to investigate it.

@tisonkun I have reported this issue https://github.com/lhotari/gradle-enterprise-duplicate-test-id-repro to Gradle support. They will work on a fix eventually. One of the workarounds that was suggested, was switching to use JUnit Jupiter to run tests with testng-engine. (the problem doesn't appear with testng-engine) This is probably something we would like to do if we would like to migrate to Junit 5 eventually. testng-engine doesn't support testng xml test suites that are used by Pulsar integration tests.

@lhotari lhotari merged commit 5171d81 into master Jan 10, 2023
@tisonkun tisonkun deleted the experiment-ge.apache.org branch March 3, 2023 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants