Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEGFAULT] Querying distributed tables with window partition causes segmentation fault #7705

Open
superbug1000 opened this issue Oct 15, 2024 · 6 comments

Comments

@superbug1000
Copy link

Running the following SQL query always causes a segmentation fault in Citus 12 with Postgres 16:

CREATE TABLE gianluca_camp_test(
    id BYTEA,
    account_id BYTEA NOT NULL,
    start_timestamp INT,
    PRIMARY KEY(account_id, id)
);

SELECT create_distributed_table('gianluca_acc_test', 'id');
SELECT create_distributed_table('gianluca_camp_test', 'account_id');

INSERT INTO gianluca_acc_test VALUES ('\x5b3f8be2f3fd8f25cf402b46'),('\x5aafffc481d8c13b05606f03');
INSERT INTO gianluca_camp_test VALUES
('\x3b3f8be2f3fd8f25cf402123','\x5aafffc481d8c13b05606f03',10),
('\x3b3f8be2f3fd8f25cf402456','\x5b3f8be2f3fd8f25cf402b46',20),
('\x3b3f8be2f3fd8f25cf402789','\x5aafffc481d8c13b05606f03',NULL);

SELECT
    DISTINCT "gianluca_acc_test"."id",
    MAX("gianluca_camp_test"."start_timestamp") OVER (PARTITION BY "gianluca_camp_test"."id")
FROM
    "gianluca_acc_test"
    LEFT OUTER JOIN "gianluca_camp_test" ON "gianluca_acc_test"."id" = "gianluca_camp_test"."account_id"
WHERE
    "gianluca_acc_test"."id" IN (
        '\x5b3f8be2f3fd8f25cf402b46',
        '\x5aafffc481d8c13b05606f03'
    );

The query executes successfully if:

  • The window partition is removed
  • Only one of the 2 IDs is included in the filter
  • Postgres 14 is used
  • The tables are not distributed
  • The join is removed, like so:
SELECT
    DISTINCT "gianluca_camp_test"."account_id",
    MAX("gianluca_camp_test"."start_timestamp") OVER (PARTITION BY "gianluca_camp_test"."id")
FROM
    "gianluca_camp_test"
WHERE
    "gianluca_camp_test"."account_id" IN (
        '\x5b3f8be2f3fd8f25cf402b46',
        '\x5aafffc481d8c13b05606f03'
    );

Logs:

2024-10-15 08:13:43.555 UTC [8571] LOG:  server process (PID 12396) was terminated by signal 11: Segmentation fault
2024-10-15 08:13:43.555 UTC [8571] DETAIL:  Failed process was running: SELECT
            DISTINCT "gianluca_acc_test"."id",
            MAX("gianluca_camp_test"."start_timestamp") OVER (PARTITION BY encode("gianluca_camp_test"."id", 'hex'))
        FROM
            "gianluca_acc_test"
            LEFT OUTER JOIN "gianluca_camp_test" ON "gianluca_acc_test"."id" = "gianluca_camp_test"."account_id"
        WHERE
            "gianluca_acc_test"."id" IN (
                '\x5b3f8be2f3fd8f25cf402b46',
                '\x5aafffc481d8c13b05606f03'
            );
2024-10-15 08:13:43.555 UTC [8571] LOG:  terminating any other active server processes
2024-10-15 08:13:43.556 UTC [12428] postgres@postgres FATAL:  the database system is in recovery mode
2024-10-15 08:13:43.558 UTC [8571] LOG:  all server processes terminated; reinitializing
2024-10-15 08:13:43.599 UTC [12429] LOG:  database system was interrupted; last known up at 2024-10-15 08:09:03 UTC
2024-10-15 08:13:44.085 UTC [12429] LOG:  database system was not properly shut down; automatic recovery in progress
2024-10-15 08:13:44.093 UTC [12429] LOG:  redo starts at 0/21B52F0
2024-10-15 08:13:44.097 UTC [12429] LOG:  invalid record length at 0/22BC3B0: expected at least 24, got 0
2024-10-15 08:13:44.097 UTC [12429] LOG:  redo done at 0/22BC378 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-10-15 08:13:44.112 UTC [12430] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-10-15 08:13:44.223 UTC [12430] LOG:  checkpoint complete: wrote 182 buffers (1.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.015 s, sync=0.069 s, total=0.120 s; sync files=154, longest=0.007 s, average=0.001 s; distance=1052 kB, estimate=1052 kB; lsn=0/22BC3B0, redo lsn=0/22BC3B0
@matikkane
Copy link

the versions used on Ubuntu 22.04 are:

ii  postgresql-16-citus-12.1-dbgsym  12.1.5.citus-1                          amd64        debug symbols for postgresql-16-citus-12.1
ii  postgresql-16-dbgsym             16.4-1.pgdg22.04+2                      amd64        debug symbols for postgresql-16

and here's the stack trace for the segfault, triggered by the test described by @superbug1000 above:

 continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000555afa8bae34 in get_eclass_for_sort_expr (root=0x555afc3589c0, expr=0x555afc359218, opfamilies=0x555afc359c68, opcintype=<optimized out>, collation=<optimized out>, sortref=<optimized out>, rel=0x0, create_it=true) at optimizer/path/./build/../src/backend/optimizer/path/equivclass.c:737
737	optimizer/path/./build/../src/backend/optimizer/path/equivclass.c: No such file or directory.
(gdb) bt
#0  0x0000555afa8bae34 in get_eclass_for_sort_expr (root=0x555afc3589c0, expr=0x555afc359218, opfamilies=0x555afc359c68, opcintype=<optimized out>, collation=<optimized out>,
    sortref=<optimized out>, rel=0x0, create_it=true) at optimizer/path/./build/../src/backend/optimizer/path/equivclass.c:737
#1  0x0000555afa8ca5d0 in make_pathkey_from_sortinfo (root=0x555afc3589c0, expr=0x555afc359218, opfamily=1976, opcintype=23, collation=0, reverse_sort=<optimized out>,
    nulls_first=false, sortref=2, rel=0x0, create_it=true) at optimizer/path/./build/../src/backend/optimizer/path/pathkeys.c:233
#2  0x0000555afa8cb147 in make_pathkey_from_sortop (create_it=true, sortref=2, nulls_first=<optimized out>, ordering_op=<optimized out>, expr=0x555afc359218, root=0x555afc3589c0)
    at optimizer/path/./build/../src/backend/optimizer/path/pathkeys.c:275
#3  make_pathkeys_for_sortclauses_extended (root=0x555afc3589c0, sortclauses=0x555afc358bc0, tlist=0x555afc3590f8, remove_redundant=true, sortable=0x7ffe58dbf550)
    at optimizer/path/./build/../src/backend/optimizer/path/pathkeys.c:1190
#4  0x0000555afa8e1c5a in standard_qp_callback (root=<optimized out>, extra=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:3490
#5  0x0000555afa8d9da3 in query_planner (root=0x555afc3589c0, qp_callback=0x555afa8e1b00 <standard_qp_callback>, qp_extra=0x7ffe58dbf6d0)
    at optimizer/plan/./build/../src/backend/optimizer/plan/planmain.c:207
#6  0x0000555afa8dc5df in grouping_planner (tuple_fraction=<optimized out>, root=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1495
#7  subquery_planner (glob=<optimized out>, parse=<optimized out>, parent_root=<optimized out>, hasRecursion=<optimized out>, tuple_fraction=<optimized out>)
    at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1064
#8  0x0000555afa8de3c1 in standard_planner (parse=parse@entry=0x555afc357978, query_string=query_string@entry=0x0, cursorOptions=cursorOptions@entry=0, boundParams=boundParams@entry=0x0)
    at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:413
#9  0x00007f41fd9e8420 in BuildSelectStatementViaStdPlanner (combineQuery=combineQuery@entry=0x555afc357978, remoteScanTargetList=remoteScanTargetList@entry=0x555afc358358,
    remoteScan=remoteScan@entry=0x555afc357b88) at ./src/backend/distributed/planner/combine_query_planner.c:304
#10 0x00007f41fd9ea890 in PlanCombineQuery (distributedPlan=<optimized out>, distributedPlan=<optimized out>, remoteScan=0x555afc357b88)
    at ./src/backend/distributed/planner/combine_query_planner.c:117
#11 FinalizeNonRouterPlan (distributedPlan=<optimized out>, distributedPlan=<optimized out>, customScan=0x555afc357b88, localPlan=0x555afc3062b8)
    at ./src/backend/distributed/planner/distributed_planner.c:1489
#12 FinalizePlan (localPlan=0x555afc3062b8, distributedPlan=<optimized out>) at ./src/backend/distributed/planner/distributed_planner.c:1470
#13 0x00007f41fd9ee8e1 in CreateDistributedPlannedStmt (planContext=planContext@entry=0x7ffe58dbfb50) at ./src/backend/distributed/planner/distributed_planner.c:800
#14 0x00007f41fd9eec4e in PlanDistributedStmt (rteIdCounter=<optimized out>, planContext=0x7ffe58dbfb50) at ./src/backend/distributed/planner/distributed_planner.c:685
#15 distributed_planner (parse=0x555afc176f60,
    query_string=0x555afc1754d0 "SELECT\n    DISTINCT \"gianluca_acc_test\".\"id\",\n    MAX(\"gianluca_camp_test\".\"start_timestamp\") OVER (PARTITION BY \"gianluca_camp_test\".\"id\")\nFROM\n    \"gianluca_acc_test\"\n    LEFT OUTER JOIN \"gianluca_c"..., cursorOptions=<optimized out>, boundParams=<optimized out>)
    at ./src/backend/distributed/planner/distributed_planner.c:277
#16 0x0000555afa9be7b9 in planner (boundParams=<optimized out>, cursorOptions=<optimized out>,
    query_string=0x555afc1754d0 "SELECT\n    DISTINCT \"gianluca_acc_test\".\"id\",\n    MAX(\"gianluca_camp_test\".\"start_timestamp\") OVER (PARTITION BY \"gianluca_camp_test\".\"id\")\nFROM\n    \"gianluca_acc_test\"\n    LEFT OUTER JOIN \"gianluca_c"..., parse=0x555afc176f60) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:279
#17 pg_plan_query (querytree=0x555afc176f60,
    query_string=query_string@entry=0x555afc1754d0 "SELECT\n    DISTINCT \"gianluca_acc_test\".\"id\",\n    MAX(\"gianluca_camp_test\".\"start_timestamp\") OVER (PARTITION BY \"gianluca_camp_test\".\"id\")\nFROM\n    \"gianluca_acc_test\"\n    LEFT OUTER JOIN \"gianluca_c"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
    at tcop/./build/../src/backend/tcop/postgres.c:906
#18 0x0000555afa9be8aa in pg_plan_queries (querytrees=0x555afc2e37c8,
    query_string=query_string@entry=0x555afc1754d0 "SELECT\n    DISTINCT \"gianluca_acc_test\".\"id\",\n    MAX(\"gianluca_camp_test\".\"start_timestamp\") OVER (PARTITION BY \"gianluca_camp_test\".\"id\")\nFROM\n    \"gianluca_acc_test\"\n    LEFT OUTER JOIN \"gianluca_c"..., cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
    at tcop/./build/../src/backend/tcop/postgres.c:998
#19 0x0000555afa9c2d65 in exec_simple_query (
    query_string=0x555afc1754d0 "SELECT\n    DISTINCT \"gianluca_acc_test\".\"id\",\n    MAX(\"gianluca_camp_test\".\"start_timestamp\") OVER (PARTITION BY \"gianluca_camp_test\".\"id\")\nFROM\n    \"gianluca_acc_test\"\n    LEFT OUTER JOIN \"gianluca_c"...) at tcop/./build/../src/backend/tcop/postgres.c:1195
#20 0x0000555afa9c5bc7 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at tcop/./build/../src/backend/tcop/postgres.c:4718
#21 0x0000555afa92e475 in BackendRun (port=0x555afc20fec0, port=0x555afc20fec0) at postmaster/./build/../src/backend/postmaster/postmaster.c:4464
#22 BackendStartup (port=0x555afc20fec0) at postmaster/./build/../src/backend/postmaster/postmaster.c:4192
#23 ServerLoop () at postmaster/./build/../src/backend/postmaster/postmaster.c:1782
#24 0x0000555afa924a20 in PostmasterMain (argc=5, argv=<optimized out>) at postmaster/./build/../src/backend/postmaster/postmaster.c:1466
#25 0x0000555afa5e9d5d in main (argc=5, argv=0x555afc0e12d0) at main/./build/../src/backend/main/main.c:198

@matikkane
Copy link

FYI - tested with Citus 12.1 & PG 15, no segfault there.

@nilroy
Copy link

nilroy commented Oct 15, 2024

Also it works if the tables in question has only 1 shard

  • Create the distributed tables like below
DROP TABLE gianluca_acc_test;
DROP TABLE gianluca_camp_test;
CREATE TABLE gianluca_acc_test(
    id BYTEA PRIMARY KEY
);
CREATE TABLE gianluca_camp_test(
    id BYTEA,
    account_id BYTEA NOT NULL,
    start_timestamp INT,
    PRIMARY KEY(account_id, id)
);

SELECT create_distributed_table('gianluca_acc_test', 'id', shard_count => 1);
SELECT create_distributed_table('gianluca_camp_test', 'account_id', shard_count => 1);
  • insert the data and test
INSERT INTO gianluca_acc_test VALUES ('\x5b3f8be2f3fd8f25cf402b46'),('\x5aafffc481d8c13b05606f03');
INSERT INTO gianluca_camp_test VALUES
('\x3b3f8be2f3fd8f25cf402123','\x5aafffc481d8c13b05606f03',10),
('\x3b3f8be2f3fd8f25cf402456','\x5b3f8be2f3fd8f25cf402b46',20),
('\x3b3f8be2f3fd8f25cf402789','\x5aafffc481d8c13b05606f03',NULL);

SELECT
    DISTINCT "gianluca_acc_test"."id",
    MAX("gianluca_camp_test"."start_timestamp") OVER (PARTITION BY encode("gianluca_camp_test"."id", 'hex'))
FROM
    "gianluca_acc_test"
    LEFT OUTER JOIN "gianluca_camp_test" ON "gianluca_acc_test"."id" = "gianluca_camp_test"."account_id"
WHERE
    "gianluca_acc_test"."id" IN (
        '\x5b3f8be2f3fd8f25cf402b46',
        '\x5aafffc481d8c13b05606f03'
    );
    
              id             | max
----------------------------+-----
 \x5aafffc481d8c13b05606f03 |  10
 \x5aafffc481d8c13b05606f03 |
 \x5b3f8be2f3fd8f25cf402b46 |  20
(3 rows)

Here we face no segfault.

@agedemenli
Copy link
Contributor

Worked with @thanodnl and found that varnullingrels is causing the issue.
Looking into bt, the last Citus function seems to be BuildSelectStatementViaStdPlanner,
which is calling standard_planner(combineQuery, NULL, 0, NULL)

Running the same query for PG 16.1 and PG 14.5, combineQuery is same for both pg versions in the logs:

POSTGRES 16.1
SELECT DISTINCT id, max(max) OVER (PARTITION BY worker_column_3) AS max FROM pg_catalog.citus_extradata_container(10, NULL::cstring(0), NULL::cstring(0), NULL::cstring(0)) remote_scan(id bytea, max integer, worker_column_3 bytea)

POSTGRES 14.5
SELECT DISTINCT id, max(max) OVER (PARTITION BY worker_column_3) AS max FROM pg_catalog.citus_extradata_container(10, NULL::cstring(0), NULL::cstring(0), NULL::cstring(0)) remote_scan(id bytea, max integer, worker_column_3 bytea)

To debug further, put a breakpoint on combine_query_planner.c:304, this line:
standardStmt = standard_planner(combineQuery, NULL, 0, NULL);

then we can see the entire object combineQuery using pprint.
Type p pprint(combineQuery) on gdb and it will be printed in the coordinator logs, like:

   {QUERY
   :commandType 1
   :querySource 0
   :canSetTag true
   :utilityStmt <>
   **<shortening it for readability>**
   :isReturn false
   :cteList <>
   :rtable (
      {RANGETBLENTRY
      :alias <>
   **<shortening it for readability>**
         :funccoltypes (o 17 23 17)
         :funccoltypmods (i -1 -1 -1)
         :funccolcollations (o 0 0 0)
         :funcparams (b)
         }
      )
      :funcordinality false
      :lateral false
      :inh false
      :inFromCl true
      :securityQuals <>
      }
   )
   :rteperminfos <>
   :jointree
   **<shortening it for readability>**
      }
   :mergeActionList <>
   :mergeUseOuterJoin false
   :targetList (
      {TARGETENTRY
      :expr
         {VAR
         :varno 1
         :varattno 1
         :vartype 17
         :vartypmod -1
         :varcollid 0
         :varnullingrels (b)
         :varlevelsup 0
         :varnosyn 1
         :varattnosyn 1
         :location -1
         }
      :resno 1
      :resname id
      :ressortgroupref 1
      :resorigtbl 17262
      :resorigcol 1
      :resjunk false
      }
      {TARGETENTRY
      :expr
         {WINDOWFUNC
         :winfnoid 2116
         :wintype 23
         :wincollid 0
         :inputcollid 0
         :args (
            {VAR
            :varno 1
            :varattno 2
            :vartype 23
            :vartypmod -1
            :varcollid 0
            :varnullingrels (b 3)  // **this is the culprit**
            :varlevelsup 0
            :varnosyn 2
            :varattnosyn 3
            :location 54
            }
         )
         :aggfilter <>
         :winref 1
         :winstar false
         :winagg true
         :location 50
         }
      :resno 2
      :resname max
      :ressortgroupref 2
      :resorigtbl 0
      :resorigcol 0
      :resjunk false
      }
      {TARGETENTRY
      :expr
         {VAR
         :varno 1
         :varattno 3
         :vartype 17
         :vartypmod -1
         :varcollid 0
         :varnullingrels (b)
         :varlevelsup 0
         :varnosyn 1
         :varattnosyn 3
         :location -1
         }
      :resno 3
      :resname <>
      :ressortgroupref 3
      :resorigtbl 0
      :resorigcol 0
      :resjunk true
      }
   )
   :override 0
   :onConflict <>
   :returningList <>
   :groupClause <>
   **<shortening it for readability>**
   :setOperations <>
   :constraintDeps <>
   :withCheckOptions <>
   :stmt_location 0
   :stmt_len 0
   }

varnullingrels is introduced in PG 16 I believe. So when running with PG14, it doesn't show up there.
On PG16, we somehow pass it to standard_planner incorrectly and the segfault happens.
Run below command in gdb to set it to null

(gdb) set ((Var *)(((WindowFunc *)(((TargetEntry *)(combineQuery->targetList->elements[1]->ptr_value))->expr))->args->elements[0]))->varnullingrels=NULL

This is not a suggestion/workaround for the users facing the issue. It is for documenting what we have found, for further debugging and fixing.

Now, after continuing gdb, the query should be executed without segfault, giving the correct output.
We should find out how varnullingrels gets added there, where it comes from, and fix it.

Assertion failure case, in case it helps somehow:
If you try to reproduce it in dev environment, you will get an assertion failure, instead of a segfault, here:
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/equivclass.c#L738

in PG14 however, this line was like
Assert(rel->reloptkind == RELOPT_BASEREL || rel->reloptkind == RELOPT_DEADREL); so it wouldn't fail. For some reason, RELOPT_DEADREL is removed before PG16 and now this assertion fails.

Disable assertions first to get the segfault.

@thanodnl
Copy link
Member

thanodnl commented Oct 18, 2024

Note; the removed RELOPT_DEADREL is less likely to be of significance. When the assert hits we are reading. simple_rel_array indexed at 3 while it's length is 2, which is probably the cause of the segfault in production, except we hit it on an assert in dev.

The 3 comes from the varnullingrels that gets combined into relids with pull_varno(...) earlier in the function causing the segfault. Hence editing varnullingrels with gdb preventing the segfault/assert and running query as expected.

@matikkane
Copy link

Respectfully asking @thanodnl @agedemenli - any estimates for the timeline of producing a patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants