Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Unit test failure due to Refactor TSDescriptor to a sys catalog entity. #23645

Open
1 task done
shishir2001-yb opened this issue Aug 27, 2024 · 1 comment
Open
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@shishir2001-yb
Copy link

shishir2001-yb commented Aug 27, 2024

Jira Link: DB-12556

Description

Looks like the below test cases started failing after this commit.

XClusterYSqlTestConsistentTransactionsTest.MasterLeaderRestart:
Analyze Trends

../../src/yb/integration-tests/xcluster/xcluster_ysql-test.cc:277
Expected equality of these values:
  count % transaction_size
    Which is: 9
  0

org.yb.pgsql.TestLoadBalance.TestWithBlacklistedServer: Started failing in alma8-clang17-tsan
Analyze Trends

java.lang.AssertionError: Expected 6 tservers not found

org.yb.loadtester.TestClusterIsLoadBalancerIdle.testClusterIsLoadBalancerIdle: Started failing in alma8-clang17-tsan
Analyze Trends

java.lang.AssertionError: Assertion failed: expected numOps >= minOps, found numOps=10, minOps=250 for tserver process on bind IP 127.99.146.4, rpc port 28410, web port 17337, pid 10069

ForceMasterLookup/ClientTestForceMasterLookup.TestConcurrentLookups/1: Analyze Trends

LoadBalancerLegacyColocatedDBColocatedTablesTest.GlobalLoadBalancingWithLegacyColocatedDBColocatedTables:
Analyze trends

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@shishir2001-yb shishir2001-yb added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 27, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 27, 2024
druzac added a commit that referenced this issue Aug 29, 2024
Summary:
A prior commit (15786f3) unintentionally re-ordered some logic in the heartbeat handling path. Before this commit, there was a sequence:

1. Potentially register heartbeating TServer.
2. Fill response with various metadata for piggy-backing features.
3. Lookup heartbeating Tserver.

The commit merged the logic of registering and looking up a tserver into:
1. Potentially register heartbeating TServer and look it up.
2. Fill response with various metadata for piggy-backing features.

If a TServer needed to register say after a master leader failover, the old code would stop handling the heartbeat request at the lookup stage, after filling in the heartbeat for piggy-backing features. However the new code after commit 15786f3 would stop handling the heartbeat before populating the response. This broke a few system tests.

This diff simply reorders registering / looking up the TServer with populating the response. This is not exactly the same as the original semantics - if registering a TServer yields an error the old code wouldn't populate the response but the new code would - but these are edge cases. It's not entirely clear the old code ever returned an error on the registration path, or that this condition is ever triggered in practice. At any rate preserving these semantics is not worth the complexity.
Jira: DB-12556

Test Plan:
```
./yb_build.sh --cxx-test master_heartbeat-itest --gtest_filter MasterHeartbeatITest.PopulateHeartbeatResponseWhenRegistrationRequired
```

Reviewers: jhe, hsunder

Reviewed By: jhe, hsunder

Subscribers: hsunder, slingam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37629
jasonyb pushed a commit that referenced this issue Aug 29, 2024
Summary:
 bfd17b5 [PLAT-14601]Support restore to Point in time part 2 - Backup/Restore/Restore preflight related changes - Add JSON Property
 91d000a [doc] Thirdparty to integrations (#23598)
 Excluded: 6456777 [#23626] allow loading old dumps that do not have index pg_class OIDs
 c8b8be3 [DB-12586] yugabyted: Update schema migration UI (#23217)
 Excluded: 4ea354b [#23521] YSQL: Cost YB Bitmap Table Scan remote filters
 b80999d [PLAT-14974][PLAT-15045] Added prometheus user as part of yugabyte group
 63f1d65 [#23669] YSQL: Add more logs to debug an assertion failure
 1ad4795 [docs] [TA] Added TA-23476: YCQL currenttimestamp() precision (#23642)
 41ae6b4 [#23653] docdb: Adjust waits for MasterPathHandlersItest.TestUndeletedParentTablet in TSAN
 78b0ae4 [DB-12587] yugabyted: Update data migration UI (#23291)
 d234b3a [PLAT-15046] Create log directory with correct permissions to allow users to export logs without using sudo
 8713c18 [doc][yba] Clarify pre-req for cloud provider image upgrades (#23285)
 9be5c91 [#23448] YSQL: fix failing test PgAutoAnalyzeTest.CheckTableMutationsCount
 Excluded: 9d54710 [#22147] YSQL, QueryDiagnostics: Pgss support for query diagnostics
 417092a [#23373] DocDB: Add max_disk_throughput_mbps gflag to control disk full rejection
 e3a1a36 [PLAT-15035] Add support to sync gflags secret mount location to actual gflag file used by services
 23a6a4c [PLAT-14525][PLAT-14953] Add local provider tests for switchover, failover, change replica, and restart
 6026029 [PLAT-15100][Master]Observed two Scheduled Backup Policies tabs in Backup page
 2cf648b [#23581] CDCSDK: Support dynamic table addition with table removal
 b14851d [#23702] xClusterDDLRepl: Add extra logging
 8a0d6ff [#23645] docdb: Reorder heartbeat handling logic to fix regression.
 2b30b5e [Docs] Changes for Experimental AI (#23714)

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Differential Revision: https://phorge.dev.yugabyte.com/D37645
@druzac
Copy link
Contributor

druzac commented Sep 17, 2024

XClusterYSqlTestConsistentTransactionsTest.MasterLeaderRestart was fixed by 8a0d6ff. The other tests still need to be fixed.

druzac added a commit that referenced this issue Sep 19, 2024
Summary:
After 15786f3, a number of tests began regularly timing out on TSAN:
```
org.yb.pgsql.TestLoadBalance#TestWithBlacklistedServer
org.yb.pgsql.TestLoadBalance#TestWithBlacklistedServer
ForceMasterLookup/ClientTestForceMasterLookup.TestConcurrentLookups/1
LoadBalancerLegacyColocatedDBColocatedTablesTest.GlobalLoadBalancingWithLegacyColocatedDBColocatedTables
```

This diff adds point fixes for these tests.

Note `XClusterYSqlTestConsistentTransactionsTest.MasterLeaderRestart` was fixed by 8a0d6ff.
Jira: DB-12556

Test Plan:
```
./yb_build.sh tsan --java-test 'org.yb.pgsql.TestLoadBalance#TestWithBlacklistedServer' && \
  ./yb_build.sh tsan --java-test 'org.yb.pgsql.TestLoadBalance#TestWithBlacklistedServer' && \
  ./yb_build.sh tsan --cxx-test client-test --gtest_filter 'ForceMasterLookup/ClientTestForceMasterLookup.TestConcurrentLookups/1' && \
  ./yb_build.sh tsan --cxx-test load_balancer_colocated_tables-test --gtest_filter 'LoadBalancerLegacyColocatedDBColocatedTablesTest.GlobalLoadBalancingWithLegacyColocatedDBColocatedTables'
```

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase, slingam

Differential Revision: https://phorge.dev.yugabyte.com/D38139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

3 participants