Core persistence refactor phase 3 (diffbase on phase 2) - Add AtomicOperationMetaStoreManager which passes all tests #1139

dennishuo · 2025-03-09T06:21:19Z

Add AtomicOperationMetaStoreManager class which starts as just a full copy/paste
of PolarisMetaStoreManagerImpl, but remove all its usage of runInTransaction
instead only using the "doFooAtomically" BasePersistence methods.

Implement the intended compare-and-swap semantics in writeEntity/writeEntities
in AbstractTransactionalPersistence so that the TreeMap and EclipseLink impls
can both actually successfully be used in "single-durable-operation" mode
as plain BasePersistence implementations (e.g. without using their
runInTransaction capabilities at all).

Add PolarisTreeMapAtomicOperationMetaStoreManagerTest which covers this new class
in polaris-core:test, and add InMemoryAtomicOperationMetaStoreManagerFactory
which allows making all integration tests and the default gradle run use the new atomic
operations implementation if polaris.persistence.type=in-memory-atomic.

Add TODO comments for all methods that still require additional thought to achieve "strict"
correctness in the face of concurrency, but which aren't easily exercised by existing test cases.

With this initial implementation, all current tests pass, and the main four core mutation methods
are indeed "strictly" correct with compare-and-swap mechanics:

updateEntityPropertiesIfNotChanged
updateEntitiesPropertiesIfNotChanged
createEntityIfNotExists
createEntitiesIfNotExists

For all other methods, generally the baseline functionality is still "correct" in the face of
concurrency, but certain happens-before relationships are violated when grants interact with
entity mutations, until we fix the noted TODOs.

Additionally, some multi-entity methods (e.g. CreateCatalog which also creates a CatalogRole and
assigns grants), currently can leave a partial-creation state, until the TODOs are implemented to
change the ordering of operations appropriately.

The following is what I believe to be a comprehensive list of ways the current AtomicOperationMetaStoreManager
deviates from the transactional PolarisMetaStoreManagerImpl in the face of concurrency and
server failures, none of which seem to violate typical user expectations beyond borderline nuisance scenarios:

-If the server fails during CreateCatalog, it's possible for a
partially-initialized catalog (e.g. lacking catalog_admin and grants) to exist;
this should just be deleted and the creation should be retried in such a case.

-If grants on parents of an entity (e.g. namespaces, catalogs) are changed
in the middle of an operation on the child entity, it's possible for
happens-before semantics to be violated up to the duration of a single
in-flight request if grants are not cached, and up to the duration of
cache expiration in the worse case if the server crashes after updating
grants but before updating grantRecordVersions on affected entities.

-If a new entity is being created under an empty namespace/catalog at
the same time the parent namespace/catalog is being deleted, despite
the semantic of preventing deletion of non-empty containers, the
creation may succeed and the deletion of the parent may succeed,
effectively deleting the child entity as well; this can only happen
within the concurrency window of a single request (e.g. happening
within milliseconds of each other).

-In general, in the face of server failures or concurrency collisions,
grantRecords may be stale up to the lifetime of EntityCache elements,
if the server fails after updating grantRecords but before updating
grantRecordVersions

-In general, whether or not there are server failures or concurrency
collisions, the grantsOnSecurable and grantsToGrantee of a given
entity may not be evaluated at the same effective SNAPSHOT_READ
point in time, but this shouldn't be a problem because the securable
side of grants can already be modified independently from the
grantee side of grants.

…ManagerImpl into BaseMetaStoreManager

…r they are one-shot atomic or if they are intended to be run as part of a caller-managed transaction, in which case the action will not be flushed/durable until the transaction is committed. Delegate all the BasePersistence methods to subclass variations of *InCurrentTxn so that the TransactionalPersistence implementations also happen to fully implement the real semantics of BasePersistence.

…ence-phase2

…of "new" methods defined in `BasePersistence` while the original "doFoo" will mean that it expects to be run in a current transaction. Removed all the InCurrentTxn suffixes.

…etaStoreManager as a diffbase.

… copy/paste of PolarisMetaStoreManagerImpl, but remove all its usage of runInTransaction instead only using the "doFooAtomically" BasePersistence methods. Implement the intended compare-and-swap semantics in writeEntity/writeEntities in AbstractTransactionalPersistence so that the TreeMap and EclipseLink impls can both actually successfully be used in "single-durable-operation" mode as plain BasePersistence implementations (e.g. without using their runInTransaction capabilities at all).

…f implications of failure modes in the absence of additional bulk methods,. Reorder the internals of dropEntity to drop the entity first, instead of grants, so that the drop attempt can't put the entity into a partially-broken but still existing state; instead, server failures just might cause garbage grant records which are safe/expected but need to be cleaned up in a background task somewhere. Fix the atomicity of name-collision-checks in createCatalog and createPrincipal. With these changes, the behavior of AtomicOperationMetaStoreManager should be fully "safe"/correct, with only the following suboptimal scenarios: -If the server fails during CreateCatalog, it's possible for a partially-initialized catalog (e.g. lacking catalog_admin and grants) to exist; this should just be deleted and the creation should be retried in such a case. -If grants on parents of an entity (e.g. namespaces, catalogs) are changed in the middle of an operation on the child entity, it's possible for happens-before semantics to be violated up to the duration of a single in-flight request if grants are not cached, and up to the duration of cache expiration in the worse case if the server crashes after updating grants but before updating grantRecordVersions on affected entities. -If a parent entity (e.g. namespace, catalog) is deleted in the middle of creating or updating a child entity, the create/update may still return "success" despite the parent being deleted, which effectively deletes the child entity as well. -If a new entity is being created under an empty namespace/catalog at the same time the parent namespace/catalog is being deleted, despite the semantic of preventing deletion of non-empty containers, the creation may succeed *and* the deletion of the parent may succeed, effectively deleting the child entity as well; this can only happen within the concurrency window of a single request (e.g. happening within milliseconds of each other). -In general, in the face of server failures or concurrency collisions, grantRecords may be stale up to the lifetime of EntityCache elements. -In general, whether or not there are server failures or concurrency collisions, the grantsOnSecurable and grantsToGrantee of a given entity may not be evaluated at the same effective SNAPSHOT_READ point in time, but this shouldn't be a problem because the securable side of grants can already be modified independently from the grantee side of grants.

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

…rentTxn" versions of methods in TransactionalPersistence. The fact that this unit compiles before we've renamed doFooAtomically back to doFoo helps prove that we didn't leave anything accidentally behind to call the atomic versions inside a transaction, because as of this moment there is not unqualified method name doFoo anymore; only either doFooInCurrentTxn or doFooAtomically. Committing this as a separate unit to have the intermediate proof-of-correctness.

…ence-phase3-add-atomic-ops-metastore-manager

…from all method names; now all the unqualified methods in BasePersistence are implicitly supposed to behave "atomically". All the methods that expect to be run inside a transaction that's managed by the outer caller will have the "InCurrentTxn" suffix.

…rsistence instead of a TransactionalPersistence. Just use blind casts for now in PolarisMetaStoreManagerImpl; refactoring can be done later to better define how the Persistence impl is injected in a more typesafe way for compatible PolarisMetaStoreManager impls.

dimas-b

LGTM overall.

dimas-b · 2025-03-11T03:05:41Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BaseMetaStoreManager.java

+   */
+  public Map<String, String> deserializeProperties(PolarisCallContext callCtx, String properties) {
+
+    Map<String, String> retProperties = null;


nit: this null appears to be returned on parsing exceptions. This looks odd to me... Whether getDiagServices().fail() throws is very non-intuitive.

Yeah that's a good point; for now these are moved from PolarisMetaStoreManagerImpl.java into this shared code, so I'm trying to just preserve the old (maybe buggy) behavior in this PR. I think we have some redundant serialize/deserialize helpers in various places so maybe we can consolidate in one place in a followup.

dimas-b · 2025-03-11T03:27:28Z

...java/org/apache/polaris/core/persistence/transactional/AbstractTransactionalPersistence.java

+      // This is an "update".
+      if (refreshedEntity == null
+          || refreshedEntity.getEntityVersion() != originalEntity.getEntityVersion()
+          || refreshedEntity.getGrantRecordsVersion() != originalEntity.getGrantRecordsVersion()) {


I believe this approach implies that the transaction isolation level is "serializable", which is fine from my POV, I just want to confirm my understanding.

… instead of using the atomic one.

dennishuo added 6 commits March 4, 2025 08:56

Initial sketch of how shared code can be lifted from PolarisMetaStore…

806caae

…ManagerImpl into BaseMetaStoreManager

Merge branch 'main' of github.com:dennishuo/polaris into dhuo-persist…

831c552

…ence-phase2

Invert the method naming semantics; "doFooAtomically" is now the set …

26be511

…of "new" methods defined in `BasePersistence` while the original "doFoo" will mean that it expects to be run in a current transaction. Removed all the InCurrentTxn suffixes.

Copy/paste PolarisMetaStoreManagerImpl.java to a new AtomicOperationM…

68343f3

…etaStoreManager as a diffbase.

dennishuo requested review from adutra, ashvina, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain and ebyhr as code owners March 9, 2025 06:21

dennishuo mentioned this pull request Mar 9, 2025

Core persistence refactor phase 2 - Distinguish "InCurrentTxn" variations of all BasePersistence methods in TransactionalPersistence #1135

Closed

sfc-gh-dhuo reviewed Mar 10, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java Show resolved Hide resolved

dennishuo added 5 commits March 10, 2025 23:09

Merge branch 'main' of github.com:dennishuo/polaris into dhuo-persist…

79029be

…ence-phase3-add-atomic-ops-metastore-manager

Update javadoc for writeEntities per suggestion from offline discussion

0c8f61d

dimas-b approved these changes Mar 11, 2025

View reviewed changes

Restore application[-it].properties to default in-memory impl for now…

ea1af5a

… instead of using the atomic one.

dennishuo force-pushed the dhuo-persistence-phase3-add-atomic-ops-metastore-manager branch from 16ec9c2 to ea1af5a Compare March 11, 2025 04:00

dimas-b approved these changes Mar 11, 2025

View reviewed changes

dennishuo merged commit b6ebbaf into apache:main Mar 11, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core persistence refactor phase 3 (diffbase on phase 2) - Add AtomicOperationMetaStoreManager which passes all tests #1139

Core persistence refactor phase 3 (diffbase on phase 2) - Add AtomicOperationMetaStoreManager which passes all tests #1139

dennishuo commented Mar 9, 2025 •

edited

Loading

dimas-b left a comment

dimas-b Mar 11, 2025

dennishuo Mar 11, 2025

dimas-b Mar 11, 2025

dennishuo Mar 11, 2025

Core persistence refactor phase 3 (diffbase on phase 2) - Add AtomicOperationMetaStoreManager which passes all tests #1139

Core persistence refactor phase 3 (diffbase on phase 2) - Add AtomicOperationMetaStoreManager which passes all tests #1139

Conversation

dennishuo commented Mar 9, 2025 • edited Loading

dimas-b left a comment

Choose a reason for hiding this comment

dimas-b Mar 11, 2025

Choose a reason for hiding this comment

dennishuo Mar 11, 2025

Choose a reason for hiding this comment

dimas-b Mar 11, 2025

Choose a reason for hiding this comment

dennishuo Mar 11, 2025

Choose a reason for hiding this comment

dennishuo commented Mar 9, 2025 •

edited

Loading