Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a single projection for multiple column masks #14420

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1267,7 +1267,14 @@ public List<ViewExpression> getColumnMasks(SecurityContext context, QualifiedObj
.forEach(masks::add);
}

return masks.build();
// Currently the use case of multiple masks on a single column is not supported, the reason being there's no guarantee about the order
// in which masks will be applied and whether the functions from different masks are compatible with each other.
List<ViewExpression> combinedMasks = masks.build();
if (combinedMasks.size() > 1) {
throw new TrinoException(NOT_SUPPORTED, format("Multiple masks on a single column is not supported: %s", columnName));
}

return combinedMasks;
}

private ConnectorAccessControl getConnectorAccessControl(TransactionId transactionId, String catalogName)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -305,26 +305,26 @@ private RelationPlan addColumnMasks(Table table, RelationPlan plan)
PlanBuilder planBuilder = newPlanBuilder(plan, analysis, lambdaDeclarationToSymbolMap, session, plannerContext)
.withScope(analysis.getAccessControlScope(table), plan.getFieldMappings()); // The fields in the access control scope has the same layout as those for the table scope

Map<Symbol, Expression> assignments = new LinkedHashMap<>();
for (Symbol symbol : planBuilder.getRoot().getOutputSymbols()) {
assignments.put(symbol, symbol.toSymbolReference());
}

for (int i = 0; i < plan.getDescriptor().getAllFieldCount(); i++) {
Field field = plan.getDescriptor().getFieldByIndex(i);

for (Expression mask : columnMasks.getOrDefault(field.getName().orElseThrow(), ImmutableList.of())) {
planBuilder = subqueryPlanner.handleSubqueries(planBuilder, mask, analysis.getSubqueries(mask));

Map<Symbol, Expression> assignments = new LinkedHashMap<>();
for (Symbol symbol : planBuilder.getRoot().getOutputSymbols()) {
assignments.put(symbol, symbol.toSymbolReference());
}
assignments.put(plan.getFieldMappings().get(i), coerceIfNecessary(analysis, mask, planBuilder.rewrite(mask)));

planBuilder = planBuilder
.withNewRoot(new ProjectNode(
idAllocator.getNextId(),
planBuilder.getRoot(),
Assignments.copyOf(assignments)));
}
}

planBuilder = planBuilder
.withNewRoot(new ProjectNode(
idAllocator.getNextId(),
planBuilder.getRoot(),
Assignments.copyOf(assignments)));

return new RelationPlan(planBuilder.getRoot(), plan.getScope(), plan.getFieldMappings(), outerContext);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ public void testDenyTableFunctionCatalogAccessControl()
}
}

// TODO: need to properly handle the ordering of multiple masks as they are not allowed currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TODO and test name is incorrect. I would simply remove TODO and name test to testMultipleColumnMasks

@Test
public void testColumnMaskOrdering()
{
Expand Down Expand Up @@ -259,16 +260,16 @@ public void checkCanShowCreateTable(ConnectorSecurityContext context, SchemaTabl
}
})));

transaction(transactionManager, accessControlManager)
assertThatThrownBy(() -> transaction(transactionManager, accessControlManager)
.execute(transactionId -> {
List<ViewExpression> masks = accessControlManager.getColumnMasks(
accessControlManager.getColumnMasks(
context(transactionId),
new QualifiedObjectName(TEST_CATALOG_NAME, "schema", "table"),
"column",
BIGINT);
assertEquals(masks.get(0).getExpression(), "connector mask");
assertEquals(masks.get(1).getExpression(), "system mask");
});
}))
.isInstanceOf(TrinoException.class)
.hasMessageMatching("Multiple masks on a single column is not supported: column");
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,8 @@ public void testMultipleMasksOnSameColumn()
USER,
new ViewExpression(USER, Optional.empty(), Optional.empty(), "custkey * 2"));

assertThat(assertions.query("SELECT custkey FROM orders WHERE orderkey = 1")).matches("VALUES BIGINT '-740'");
// When there are multiple masks on the same column, the latter one overrides the previous ones
assertThat(assertions.query("SELECT custkey FROM orders WHERE orderkey = 1")).matches("VALUES BIGINT '740'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in semantics. Looking at this, I realize we never settled the question of why we'd ever have multiple masks on a single column: #11654 (comment). That still doesn't make a lot of sense to me. cc @kokosing @ksobolew

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have multiple system access controls and each of them can return mask for a given column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the masks expected to be applied in such a case? There's no guarantee about the order in which the system access controls are invoked.

Also, you're talking about multiple system access controls, but the SystemAccessControl interface allows a single access control interface to return multiple masks for a given column. That does not make much sense -- if only one mask is to be applied, the interface should return a single mask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple situations in which more than one mask may apply to a single column. The simplest example is when there are two roles enabled and each of them applies a different column mask on a particular column. This immediately raises the question of ordering, of course. While it looks like Trino does not make such guarantees explicitly, the ordering is in practice deterministic as long as the access controls return them in a deterministic order. The masks and filters are returned and processed as Lists, and this implies ordering. The engine has no trouble dealing with this and applies the masks iteratively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still doesn't make a lot of sense to me. Masks are defined in terms of the values in the column. For example, if column is a string that contains only numbers, one might implement a mask function by parsing the number, doing some math (a hash, etc) on it and converting it back to a string. If the the engine applies the function on anything other than the original values (e.g., because another mask got injected in between), the mask will fail. This makes it hard for someone writing a mask function to reason about, as it requires non-local understanding of how every role, user and mask relate to each other.

If multiple masks are possible, it should be up to the connector to decide which one is the most appropriate and go with it. In the case of multiple system access controls returning masks, there are two options:

  • Pick the first one. This can produce non-deterministic results depending on which order the access controls get called. It could change across restarts of the server
  • Fail with an error indicating there's ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this now fail with .hasMessageMatching("Multiple masks on a single column is not supported: column");?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only happens after the second commit. Perhaps they should be reordered, so that the prohibition on multiple masks goes first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, it does pass in the CI apparently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see - the test is using TestingAccessControlManager, which replaces the actual AccessControlManager and does not check for multiple masks.

}

@Test
Expand Down Expand Up @@ -842,12 +843,13 @@ public void testMultipleMasksUsingOtherMaskedColumns()

// Mask "comment" and "orderstatus" using "clerk" ("clerk" appears between "orderstatus" and "comment" in table definition)
// "comment" and "orderstatus" are masked as the condition on "clerk" is satisfied
// This is to showcase that the three maskings are done simultaneously, not in a "sequential" or "chained" manner.
accessControl.reset();
accessControl.columnMask(
new QualifiedObjectName(LOCAL_CATALOG, "tiny", "orders"),
"clerk",
USER,
new ViewExpression(USER, Optional.empty(), Optional.empty(), "cast(regexp_replace(clerk,'(Clerk#)','***#') as varchar(15))"));
new ViewExpression(USER, Optional.empty(), Optional.empty(), "cast('###' as varchar(15))"));

accessControl.columnMask(
new QualifiedObjectName(LOCAL_CATALOG, "tiny", "orders"),
Expand All @@ -862,6 +864,6 @@ public void testMultipleMasksUsingOtherMaskedColumns()
new ViewExpression(USER, Optional.empty(), Optional.empty(), "if(regexp_extract(clerk,'([1-9]+)') IN ('951'), '***', comment)"));

assertThat(assertions.query(query))
.matches("VALUES (CAST('***' as varchar(79)), '*', CAST('***#000000951' as varchar(15)))");
.matches("VALUES (CAST('***' as varchar(79)), '*', CAST('###' as varchar(15)))");
}
}