Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not rely on cache on citus_drop_trigger #5372

Merged
merged 2 commits into from
Nov 18, 2021
Merged

Conversation

hanefi
Copy link
Member

@hanefi hanefi commented Oct 13, 2021

DESCRIPTION: Fixes a bug that prevents DROP SCHEMA CASCADE

Note: the current changes are were too intrusive and has a bad impact on performance. I will try to isolate isolated the calls that are made in the citus_drop_trigger codepath to remove the impact on performance.

TODO:

Fixes: #5099
Fixes: #3741

@codecov
Copy link

codecov bot commented Oct 13, 2021

Codecov Report

Merging #5372 (c0d43d4) into master (a3cc9b4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #5372   +/-   ##
=======================================
  Coverage   92.76%   92.76%           
=======================================
  Files         215      215           
  Lines       45286    45332   +46     
=======================================
+ Hits        42008    42052   +44     
- Misses       3278     3280    +2     

@hanefi hanefi force-pushed the fix-broken-drop-schema branch from c79b2b1 to 176d998 Compare November 10, 2021 19:16
@hanefi hanefi marked this pull request as ready for review November 10, 2021 19:17
@@ -336,6 +337,16 @@ GetFunctionInfo(Oid typeId, Oid accessMethodId, int16 procedureId)
{
FmgrInfo *functionInfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));

/* if the type is being dropped, early exit with a null return value */
HeapTuple tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typeId));
if (!HeapTupleIsValid(tup))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow null heap tuples for a type only when we reach here via a drop trigger. There is no easy way to check if we are now doing a drop on the type.

{
ereport(DEBUG4, (errmsg("could not get the catalog entries for type %u",
typeId)));
return functionInfo;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lookup for the type failed, we do not create function manager info, and hence we fail to sort the shard placements using the comparator functions defined for these types.

This may result in distributed deadlocks as we do not have a way to sort these placements and potentially we will try to drop the shards in an arbitrary order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we return NULL here and have callers check for it? This 0 initialized struct seems a bit arbitrary.

src/backend/distributed/metadata/metadata_cache.c Outdated Show resolved Hide resolved
@hanefi hanefi force-pushed the fix-broken-drop-schema branch from 176d998 to 3bd03b9 Compare November 11, 2021 13:49
@hanefi hanefi force-pushed the fix-broken-drop-schema branch 2 times, most recently from 5f73f8e to afae282 Compare November 16, 2021 10:42
src/backend/distributed/metadata/metadata_sync.c Outdated Show resolved Hide resolved
Comment on lines 15 to 17
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to type comp_type
drop cascades to table range_dist_table_2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: do we want to decrease verbosity to remove these lines?

@hanefi hanefi force-pushed the fix-broken-drop-schema branch from 1a87dd6 to 35501b9 Compare November 16, 2021 19:33
src/backend/distributed/metadata/metadata_sync.c Outdated Show resolved Hide resolved
src/backend/distributed/metadata/metadata_sync.c Outdated Show resolved Hide resolved
{
ereport(DEBUG4, (errmsg("could not get the catalog entries for type %u",
typeId)));
return functionInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we return NULL here and have callers check for it? This 0 initialized struct seems a bit arbitrary.

char partitionMethod = PartitionMethodViaCatalog(relationId);
Var *partitionColumn = PartitionColumnViaCatalog(relationId);
GetIntervalTypeInfo(partitionMethod, partitionColumn, &intervalTypeId,
&intervalTypeMod);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this function behave when the type is gone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetIntervalTypeInfo does not do any cache lookups. It only uses the values in the arguments partitionMethod, and partitionColumn to populate intervalTypeId and intervalTypeMod.

If the partitionColumn that we read from catalog contains intervalTypeId for an already dropped object, and the table is range distributed, we might end up with an invalid intervalTypeId value.

Do you think this is worth investigating more?

Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, one clarification question above.


char partitionMethod = PartitionMethodViaCatalog(relationId);
bool hashDistributed = partitionMethod == DISTRIBUTE_BY_HASH;
bool citusTableWithNoDistKey = partitionMethod == DISTRIBUTE_BY_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, maybe we can push the partitionMethod == DISTRIBUTE_BY_NONE logic into the internal function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not come up with an easy way to accomplish this.

Right now, ShouldSyncTableMetadata calls IsCitusTableTypeCacheEntry that uses the cache whereas ShouldSyncTableMetadataViaCatalog directly does the comparison.

Maybe I can use IsCitusTableTypeInternal(partitionMethod, replicationModel, tableType) directly. To be able to do that, I will need to lookup the replication model via catalog. That seems like too much work.

Another solution would be to introduce yet another helper function like IsCitusTableTypeInternal that does not depend on replicationModel params. This idea also defeats the main purpose of having similar codepaths in the functions that use the catalog or cache.

@hanefi hanefi force-pushed the fix-broken-drop-schema branch from 305714a to 4da4ab3 Compare November 18, 2021 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DROP SCHEMA fails with "cache lookup failed" DROP SCHEMA public can fail with metadata syncing
3 participants