Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved Gradle check into OpenSearch repo #1665

Closed
wants to merge 2 commits into from

Conversation

owaiskazi19
Copy link
Member

@owaiskazi19 owaiskazi19 commented Dec 7, 2021

Signed-off-by: Owais Kazi owaiskazi19@gmail.com

Description

Moved gradle check from private jenkins instance to OpenSearch repo.
Sample run: https://github.com/owaiskazi19/OpenSearch/runs/4705950671?check_suite_focus=true

Issues Resolved

Part of #2395

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@owaiskazi19 owaiskazi19 requested a review from a team as a code owner December 7, 2021 18:15
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@dblock
Copy link
Member

dblock commented Dec 7, 2021

Let's see if this actually succeeds on GHA. I was under the impression that it's too slow/takes too long. What are the time differences?

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

cd ../
REPORTS_DIR=`find ./search -name reports`
zip -9 -q -r gradle_check_${BUILD_NUMBER}_reports.zip ${REPORTS_DIR}
ls | grep "gradle_check_${BUILD_NUMBER}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

- name: Run Gradle
run: |
echo "./gradlew check --no-daemon --no-scan"
./gradlew check --no-daemon --no-scan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK if this is the last line the build fails and we're done, no?

Copy link
Member Author

@owaiskazi19 owaiskazi19 Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the script from the private repo. Let me know if you are suggesting to change the run something like:

echo "./gradlew check --no-daemon --no-scan --parallel"
./gradlew check --no-daemon --no-scan ----parallel
GRADLE_CHECK_STATUS=$?

if [ "$GRADLE_CHECK_STATUS" != 0 ]
then
   echo Gradle Check Failed!
   exit $GRADLE_CHECK_STATUS
fi
          `

if [ "$GRADLE_CHECK_STATUS" != 0 ]
then
echo Gradle Check Failed!
exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return $GRADLE_CHECK_STATUS

- name: Run Gradle
run: |
echo "./gradlew check --no-daemon --no-scan"
./gradlew check --no-daemon --no-scan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have multiple projects, we can leverage --parallel, it will make the runs faster.
Ref: https://docs.gradle.org/current/userguide/performance.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let me test it on my local.

@dblock
Copy link
Member

dblock commented Dec 7, 2021

1> [2021-12-08T02:47:40,694][INFO ][o.o.i.IndexingMemoryController] [testTranslogRecoveryWorksWithIMC] now throttling indexing for shard [[index][0]]: segment writing can't keep up
  1> [2021-12-08T02:47:41,087][INFO ][o.o.i.IndexingMemoryControllerTests] [testTranslogRecoveryWorksWithIMC] after test
  1> [2021-12-08T02:47:41,095][INFO ][o.o.i.IndexingMemoryControllerTests] [testNegativeMaxIndexBufferSize] before test
  1> [2021-12-08T02:47:41,097][INFO ][o.o.i.IndexingMemoryControllerTests] [testNegativeMaxIndexBufferSize] after test
  1> [2021-12-08T02:47:41,106][INFO ][o.o.i.IndexingMemoryControllerTests] [testShardAdditionAndRemoval] before test
  1> [2021-12-08T02:47:42,048][INFO ][o.o.i.IndexingMemoryControllerTests] [testShardAdditionAndRemoval] after test
  1> [2021-12-08T02:47:42,057][INFO ][o.o.i.IndexingMemoryControllerTests] [testSkipRefreshIfShardIsRefreshingAlready] before test
  1> [2021-12-08T02:47:42,277][INFO ][o.o.i.IndexingMemoryController] [testSkipRefreshIfShardIsRefreshingAlready] now throttling indexing for shard [[index][0]]: segment writing can't keep up
  1> [2021-12-08T02:47:52,284][INFO ][o.o.i.IndexingMemoryControllerTests] [testSkipRefreshIfShardIsRefreshingAlready] after test
         but: was <0L>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
  1> [2021-12-08T02:48:22,319][INFO ][o.o.i.IndexingMemoryControllerTests] [testActiveInactive] before test
  1> [2021-12-08T02:48:24,029][INFO ][o.o.i.IndexingMemoryController] [testActiveInactive] now throttling indexing for shard [[index][0]]: segment writing can't keep up
  1> [2021-12-08T02:48:24,048][INFO ][o.o.i.IndexingMemoryControllerTests] [testActiveInactive] after test
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.opensearch.indices.IndexingMemoryControllerTests.lambda$testSkipRefreshIfShardIsRefreshingAlready$6(IndexingMemoryControllerTests.java:490)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1048)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1021)
        at org.opensearch.indices.IndexingMemoryControllerTests.testSkipRefreshIfShardIsRefreshingAlready(IndexingMemoryControllerTests.java:488)

This is similar to what I saw before, not sure if disabling or fixing or rewriting these tests is the thing to do.

@owaiskazi19
Copy link
Member Author

Let's see if this actually succeeds on GHA. I was under the impression that it's too slow/takes too long. What are the time differences?

The time it took is almost the same as Jenkins run but for a failed build.

@tlfeng
Copy link
Collaborator

tlfeng commented Dec 7, 2021

Let's see if this actually succeeds on GHA. I was under the impression that it's too slow/takes too long. What are the time differences?

The time it took is almost the same as Jenkins run but for a failed build.

I remembered previously @Rishikesh1159 did some experiments to run gradle check through Github Actions, the whole check took 120 minutes in the GitHub hosted machines.

@dblock
Copy link
Member

dblock commented Dec 7, 2021

Let's see if this actually succeeds on GHA. I was under the impression that it's too slow/takes too long. What are the time differences?

The time it took is almost the same as Jenkins run but for a failed build.

I remembered previously @Rishikesh1159 did some experiments to run gradle check through Github Actions, the whole check took 120 minutes in the GitHub hosted machines.

It's worth trying a little harder this time and seeing whether it's because some timeouts are being hit causing retries, or something else. Ideally we'd use GHA for gradle checks.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Wrapper Validation failure 0c818d8

:alert: Gradle Wrapper integrity has been altered

@saratvemulapalli
Copy link
Member

Having gradle check in the open would bring in:

  • Community can see whats going on and how the code is tested.
  • No more maintaining Jenkins infrastructure for PR checks.
  • We can report code coverage to codecov.io.

For next steps, I would:

@dblock
Copy link
Member

dblock commented Dec 7, 2021

@saratvemulapalli hosted runners inside AWS infra are a non-starter, this has been explored plenty - there are alternatives to GHA but they aren't better than our Jenkins instance otherwise

@saratvemulapalli
Copy link
Member

@saratvemulapalli hosted runners inside AWS infra are a non-starter, this has been explored plenty - there are alternatives to GHA but they aren't better than our Jenkins instance otherwise

Interesting, curious why though?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 0c818d8

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 0c818d8
Log 1362

Reports 1362

@dblock
Copy link
Member

dblock commented Dec 7, 2021

@saratvemulapalli hosted runners inside AWS infra are a non-starter, this has been explored plenty - there are alternatives to GHA but they aren't better than our Jenkins instance otherwise

Interesting, curious why though?

I don't have the detail, but I'll point you where to ask.

There's also https://github.com/aws-actions/aws-codebuild-run-build, which may be a better solution.

@Rishikesh1159
Copy link
Member

Rishikesh1159 commented Dec 7, 2021

Let's see if this actually succeeds on GHA. I was under the impression that it's too slow/takes too long. What are the time differences?

The time it took is almost the same as Jenkins run but for a failed build.

I remembered previously @Rishikesh1159 did some experiments to run gradle check through Github Actions, the whole check took 120 minutes in the GitHub hosted machines.

When I tried to run gradle check on Github Actions, the rate of Random Test fails was higher compared to jenkins CI. This is because of slower machines Github Actions uses. There were couple integTests constantly failing and I had exclude them to see if gradle check works on Github Actions.

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
@dblock
Copy link
Member

dblock commented Jan 5, 2022

@owaiskazi19 So this just works if we fix the timeouts/random failures?

@owaiskazi19
Copy link
Member Author

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

@dblock
Copy link
Member

dblock commented Jan 5, 2022

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

Can we parallelize, get this down to ~1h?

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Jan 5, 2022

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

Can we parallelize, get this down to ~1h?

I have it like ./gradlew check --parallel for now. Is there any other way to parallelize I should be looking into?

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Jan 5, 2022

Also, I think if we can get a better instance for GHA that would help to speed up the gradle check.

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 5d97934
Log 1776

Reports 1776

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Jan 5, 2022

The failure on GHA is because of https://github.com/opensearch-project/OpenSearch/blob/main/server/src/test/java/org/opensearch/indices/IndexingMemoryControllerTests.java#L488-L490 and https://github.com/opensearch-project/OpenSearch/blob/main/server/src/test/java/org/opensearch/indices/IndexingMemoryControllerTests.java#L497.

./gradlew ':server:test' --tests "org.opensearch.indices.IndexingMemoryControllerTests.testSkipRefreshIfShardIsRefreshingAlready" -Dtests.seed=EFEC06DDA82AA01 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-CL -Dtests.timezone=America/Santa_Isabel -Druntime.java=17

@saratvemulapalli
Copy link
Member

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

This is nice, do you know if all the tests (which are ignored on your fork) are flaky, and have issues opened ?

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Jan 5, 2022

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

This is nice, do you know if all the tests (which are ignored on your fork) are flaky, and have issues opened ?

I might be missing a few but for the most recent failures I have opened: #1852 and reopened: #39.

@dblock
Copy link
Member

dblock commented Jan 5, 2022

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

Can we parallelize, get this down to ~1h?

I have it like ./gradlew check --parallel for now. Is there any other way to parallelize I should be looking into?

I'm thinking multiple jobs across different kinds of tests. For example, If there's a way to generate a list of all the test tasks with gradle test:tasks, you could spawn GHA workflows for each of these tasks in parallel, see https://code.dblock.org/2021/09/03/generating-task-matrix-by-looping-over-repo-files-with-github-actions.html for an example.

@saratvemulapalli
Copy link
Member

@owaiskazi19 So this just works if we fix the timeouts/random failures?

Yeah, it does and takes around 180 mins if there's no flaky tests.

Can we parallelize, get this down to ~1h?

I have it like ./gradlew check --parallel for now. Is there any other way to parallelize I should be looking into?

I'm thinking multiple jobs across different kinds of tests. For example, If there's a way to generate a list of all the test tasks with gradle test:tasks, you could spawn GHA workflows for each of these tasks in parallel, see https://code.dblock.org/2021/09/03/generating-task-matrix-by-looping-over-repo-files-with-github-actions.html for an example.

The simplest I can think of is splitting BWC tests and the rest to start with. Because they take a lot of time to run on Jenkins too.
The rest would be unit and integration tests, if needed we should figure out how to split them up.

@dblock dblock mentioned this pull request Jan 6, 2022
4 tasks
@owaiskazi19
Copy link
Member Author

Closing this one. Created #2166.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants