Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP SCHEMA fails with "cache lookup failed" #5099

Closed
SaitTalhaNisanci opened this issue Jul 7, 2021 · 6 comments · Fixed by #5372
Closed

DROP SCHEMA fails with "cache lookup failed" #5099

SaitTalhaNisanci opened this issue Jul 7, 2021 · 6 comments · Fixed by #5372
Assignees
Labels

Comments

@SaitTalhaNisanci
Copy link
Contributor

CREATE SCHEMA k;
SET search_path to 'k';
CREATE TYPE comp_type AS (
    int_field_1 BIGINT,
    int_field_2 BIGINT
);

CREATE TABLE range_dist_table_2 (dist_col comp_type);
SELECT create_distributed_table('range_dist_table_2', 'dist_col', 'range');

[local] talha@talha:9700-25893=# DROP schema k CASCADE;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to type comp_type
drop cascades to table range_dist_table_2
ERROR:  cache lookup failed for type 16934
CONTEXT:  SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)"
PL/pgSQL function citus_drop_trigger() line 15 at PERFORM
Time: 62.606 ms

the workaround is to drop the table first.

@onderkalaci
Copy link
Member

@SaitTalhaNisanci can you repro this on earlier versions? If that's the case, this becomes a high priority issue. Also, is it only relevant to range distributed tables?

Seems related to #3741

@SaitTalhaNisanci
Copy link
Contributor Author

@onderkalaci it seems to be happening on even 9.5

I think it is mostly about using a composite type so I suspect it will happen with hash distributed tables too, but will check and update here

@onderkalaci
Copy link
Member

I tried with hash tables, and it works fine. I think otherwise we'd have realized such a critical issue earlier

@hanefi
Copy link
Member

hanefi commented Oct 11, 2021

I am able to reproduce this issue on both hash and range distributed tables. It appears that we try to build CitusTableCacheEntry, and we fail to access the type in this schema that is about to be dropped.

errstart(int elevel, const char * domain) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\utils\error\elog.c:341)
errstart_cold(int elevel, const char * domain) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\utils\error\elog.c:326)
getBaseTypeAndTypmod(Oid typid, int32 * typmod) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\utils\cache\lsyscache.c:2497)
getBaseType(Oid typid) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\utils\cache\lsyscache.c:2472)
GetDefaultOpClass(Oid type_id, Oid am_id) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\commands\indexcmds.c:2112)
citus.so!GetFunctionInfo(Oid typeId, Oid accessMethodId, int16 procedureId) (\home\hanefi\code\citus\src\backend\distributed\worker\worker_partition_protocol.c:340)
citus.so!BuildCachedShardList(CitusTableCacheEntry * cacheEntry) (\home\hanefi\code\citus\src\backend\distributed\metadata\metadata_cache.c:1467)
citus.so!BuildCitusTableCacheEntry(Oid relationId) (\home\hanefi\code\citus\src\backend\distributed\metadata\metadata_cache.c:1351)
citus.so!LookupCitusTableCacheEntry(Oid relationId) (\home\hanefi\code\citus\src\backend\distributed\metadata\metadata_cache.c:1157)
citus.so!IsCitusTable(Oid relationId) (\home\hanefi\code\citus\src\backend\distributed\metadata\metadata_cache.c:409)
citus.so!MasterRemoveDistributedTableMetadataFromWorkers(Oid relationId, char * schemaName, char * tableName) (\home\hanefi\code\citus\src\backend\distributed\commands\drop_distributed_table.c:137)
citus.so!master_remove_distributed_table_metadata_from_workers(FunctionCallInfo fcinfo) (\home\hanefi\code\citus\src\backend\distributed\commands\drop_distributed_table.c:111)
ExecInterpExpr(ExprState * state, ExprContext * econtext, _Bool * isnull) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execExprInterp.c:749)
ExecInterpExprStillValid(ExprState * state, ExprContext * econtext, _Bool * isNull) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execExprInterp.c:1824)
ExecEvalExprSwitchContext(_Bool * isNull, ExprContext * econtext, ExprState * state) (\home\hanefi\.pgenv\src\postgresql-14.0\src\include\executor\executor.h:339)
ExecProject(ProjectionInfo * projInfo) (\home\hanefi\.pgenv\src\postgresql-14.0\src\include\executor\executor.h:373)
ExecResult(PlanState * pstate) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\nodeResult.c:136)
ExecProcNodeFirst(PlanState * node) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execProcnode.c:463)
ExecProcNode(PlanState * node) (\home\hanefi\.pgenv\src\postgresql-14.0\src\include\executor\executor.h:257)
ExecutePlan(EState * estate, PlanState * planstate, _Bool use_parallel_mode, CmdType operation, _Bool sendTuples, uint64 numberTuples, ScanDirection direction, DestReceiver * dest, _Bool execute_once) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execMain.c:1551)
standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execMain.c:361)
citus.so!CitusExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (\home\hanefi\code\citus\src\backend\distributed\executor\multi_executor.c:214)
pg_stat_statements.so!pgss_ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (\home\hanefi\.pgenv\src\postgresql-14.0\contrib\pg_stat_statements\pg_stat_statements.c:1001)
ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\execMain.c:303)
_SPI_pquery(QueryDesc * queryDesc, _Bool fire_triggers, uint64 tcount) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\spi.c:2760)
_SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, Snapshot snapshot, Snapshot crosscheck_snapshot, _Bool read_only, _Bool allow_nonatomic, _Bool fire_triggers, uint64 tcount, DestReceiver * caller_dest, ResourceOwner plan_owner) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\spi.c:2532)
SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params, _Bool read_only, long tcount) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\executor\spi.c:651)
plpgsql.so!exec_run_select(PLpgSQL_execstate * estate, PLpgSQL_expr * expr, long maxtuples, Portal * portalP) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:5762)
plpgsql.so!exec_stmt_perform(PLpgSQL_execstate * estate, PLpgSQL_stmt_perform * stmt) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:2139)
plpgsql.so!exec_stmts(PLpgSQL_execstate * estate, List * stmts) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:1991)
plpgsql.so!exec_for_query(PLpgSQL_execstate * estate, PLpgSQL_stmt_forq * stmt, Portal portal, _Bool prefetch_ok) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:5915)
plpgsql.so!exec_stmt_fors(PLpgSQL_execstate * estate, PLpgSQL_stmt_fors * stmt) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:2796)
plpgsql.so!exec_stmts(PLpgSQL_execstate * estate, List * stmts) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:2023)
plpgsql.so!exec_stmt_block(PLpgSQL_execstate * estate, PLpgSQL_stmt_block * block) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:1910)
plpgsql.so!exec_toplevel_block(PLpgSQL_execstate * estate, PLpgSQL_stmt_block * block) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:1608)
plpgsql.so!plpgsql_exec_event_trigger(PLpgSQL_function * func, EventTriggerData * trigdata) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_exec.c:1182)
plpgsql.so!plpgsql_call_handler(FunctionCallInfo fcinfo) (\home\hanefi\.pgenv\src\postgresql-14.0\src\pl\plpgsql\src\pl_handler.c:272)
fmgr_security_definer(FunctionCallInfo fcinfo) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\utils\fmgr\fmgr.c:746)
EventTriggerInvoke(List * fn_oid_list, EventTriggerData * trigdata) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\commands\event_trigger.c:920)
EventTriggerSQLDrop(Node * parsetree) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\commands\event_trigger.c:791)
ProcessUtilitySlow(ParseState * pstate, PlannedStmt * pstmt, const char * queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\utility.c:1906)
standard_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\utility.c:978)
citus.so!ProcessUtilityInternal(PlannedStmt * pstmt, const char * queryString, ProcessUtilityContext context, ParamListInfo params, struct QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * completionTag) (\home\hanefi\code\citus\src\backend\distributed\commands\utility_hook.c:582)
citus.so!multi_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, struct QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * completionTag) (\home\hanefi\code\citus\src\backend\distributed\commands\utility_hook.c:264)
citus.so!ColumnarProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, struct QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * completionTag) (\home\hanefi\code\citus\src\backend\columnar\columnar_tableam.c:2117)
pg_stat_statements.so!pgss_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\contrib\pg_stat_statements\pg_stat_statements.c:1131)
ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\utility.c:523)
PortalRunUtility(Portal portal, PlannedStmt * pstmt, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\pquery.c:1147)
PortalRunMulti(Portal portal, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\pquery.c:1304)
PortalRun(Portal portal, long count, _Bool isTopLevel, _Bool run_once, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\pquery.c:786)
exec_simple_query(const char * query_string) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\postgres.c:1214)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\tcop\postgres.c:4486)
BackendRun(Port * port) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\postmaster\postmaster.c:4506)
BackendStartup(Port * port) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\postmaster\postmaster.c:4228)
ServerLoop() (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\home\hanefi\.pgenv\src\postgresql-14.0\src\backend\main\main.c:209)

@hanefi hanefi self-assigned this Oct 11, 2021
@hanefi
Copy link
Member

hanefi commented Oct 11, 2021

I created a small patch to list the objects that are used in the parameters to master_remove_distributed_table_metadata_from_workers

diff --git a/src/backend/distributed/sql/udfs/citus_drop_trigger/latest.sql b/src/backend/distributed/sql/udfs/citus_drop_trigger/latest.sql
index a440766ed..666dad843 100644
--- a/src/backend/distributed/sql/udfs/citus_drop_trigger/latest.sql
+++ b/src/backend/distributed/sql/udfs/citus_drop_trigger/latest.sql
@@ -11,6 +11,7 @@ BEGIN
     FOR v_obj IN SELECT * FROM pg_event_trigger_dropped_objects()
                  WHERE object_type IN ('table', 'foreign table')
     LOOP
+         RAISE NOTICE 'will drop object with [objid=% schema_name=% object_name=%]', v_obj.objid, v_obj.schema_name, v_obj.object_name;
         -- first drop the table and metadata on the workers
         -- then drop all the shards on the workers
         -- finally remove the pg_dist_partition entry on the coordinator
postgres=# DROP SCHEMA k cascade ;
DEBUG:  drop auto-cascades to composite type comp_type
DEBUG:  drop auto-cascades to type comp_type[]
DEBUG:  drop auto-cascades to type range_dist_table_2
DEBUG:  drop auto-cascades to type range_dist_table_2[]
DEBUG:  drop auto-cascades to toast table pg_toast.pg_toast_16921
DEBUG:  drop auto-cascades to index pg_toast.pg_toast_16921_index
DEBUG:  drop auto-cascades to trigger truncate_trigger_16926 on table range_dist_table_2
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to type comp_type
drop cascades to table range_dist_table_2
DEBUG:  EventTriggerInvoke 16619
NOTICE:  will drop object with [objid=16921 schema_name=k object_name=range_dist_table_2]
ERROR:  cache lookup failed for type 16920
CONTEXT:  SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)"
PL/pgSQL function citus_drop_trigger() line 14 at PERFORM
postgres=# SELECT typname FROM pg_type where oid=16920;
  typname
-----------
 comp_type
(1 row)

@naisila
Copy link
Member

naisila commented Sep 30, 2022

#5780 is a duplicate of this. The reason why we don't experience the problem with the range table is that the test example given in this issue doesn't create any shards.

CREATE SCHEMA k;
SET search_path to 'k';
CREATE TYPE comp_type AS (
    int_field_1 BIGINT,
    int_field_2 BIGINT
);

CREATE TABLE range_dist_table_2 (dist_col comp_type);
SELECT create_distributed_table('range_dist_table_2', 'dist_col', 'range');

SELECT count(*) FROM citus_shards WHERE table_name = 'range_dist_table_2'::regclass;

 count
-------
     0
(1 row)

@naisila naisila closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants