Skip to content

[BUG] NPE happends when replay colocate join index #5390

@wangbo

Description

@wangbo

code version: 0.13.11
Describe the bug

2021-02-18 10:27:29,961 ERROR (replayer|75) [EditLog.loadJournal():816] Operation Type 94
java.lang.NullPointerException: null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:878) ~[spark-dpp-1.0.0.jar:1.0.0]
        at org.apache.doris.catalog.ColocateTableIndex.replayAddTableToGroup(ColocateTableIndex.java:444) ~[palo-fe.jar:3.4.0]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:628) [palo-fe.jar:3.4.0]
        at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2563) [palo-fe.jar:3.4.0]
        at org.apache.doris.catalog.Catalog$3.runOneCycle(Catalog.java:2347) [palo-fe.jar:3.4.0]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]

From the above log we can see, when replay a ColocatePersistInfo, its table id can't be find in Database, then FE crash.

//master fe.log, the same table create successfully twice
// request 1:
2021-02-18 10:17:11,823 INFO (thrift-server-pool-983|56147) [Catalog.createOlapTable():3924] successfully create table[app_rt_rider_accept_waybill_rider_offset;58010577]
// request 2:
2021-02-18 10:17:25,256 INFO (thrift-server-pool-1200|155296) [Catalog.createOlapTable():3924] successfully create table[app_rt_rider_accept_waybill_rider_offset;58016408]

// master fe.log,  master receive create table forward request from the same follower
// request 1:
2021-02-18 10:16:45,427 DEBUG (thrift-server-pool-983|56147) [FrontendServiceImpl.forward():592] receive forwarded stmt 20766895 from FE: xxx
2021-02-18 10:16:45,427 INFO (thrift-server-pool-983|56147) [StmtExecutor.analyze():399] begin to analyze stmt: 5074185, forwarded stmt id: 20766895
// request 2:
2021-02-18 10:17:05,450 DEBUG (thrift-server-pool-1200|155296) [FrontendServiceImpl.forward():592] receive forwarded stmt 20766895 from FE: xxx
2021-02-18 10:17:05,451 INFO (thrift-server-pool-1200|155296) [StmtExecutor.analyze():399] begin to analyze stmt: 5074223, forwarded stmt id: 20766895

// follower fe.log, receive create table request from client
2021-02-18 10:16:45,426 INFO (doris-mysql-nio-pool-19139|472265) [StmtExecutor.analyze():399] begin to analyze stmt: 20766895, forwarded stmt id: 0
2021-02-18 10:16:45,427 INFO (doris-mysql-nio-pool-19139|472265) [MasterOpExecutor.forward():98] Forward statement 20766895 to Master TNetworkAddress(hostname:xxx, port:xxx)

From the log above, we can conclude the bug reason:
1 A Follower receive a create table request from client.
2 The Follower forward the request to Master.
3 Master response throw exception because of unknown reason, so the Follower try to forward twice.
4 Master request the same create table request twice, the second request create a new table, but didn't write table to Edit log. And ColocatePersistInfo write to Edit log twice.
5 When replay ColocatePersistInfo, it can't find its table id, then NPE happends;

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/fixCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions