Introduce alternate in-memory buffering event listener #2574

adutra · 2025-09-15T21:28:10Z

As promised, this is a proposal for an alternative implementation of InMemoryBufferPolarisPersistenceEventListener, leveraging Mutiny and Smallrye Fault Tolerance, and avoiding dealing with buffering and flushing manually.

For now I'm proposing a separate CDI bean, to facilitate reviews or comparisons.

But if the approach taken in this PR is considered a better solution for buffering events, I would suggest to replace InMemoryBufferPolarisPersistenceEventListener completely.

adutra · 2025-09-15T21:28:42Z

\cc @adnanhemani

adutra · 2025-09-15T21:29:20Z

runtime/service/build.gradle.kts

  testImplementation("io.quarkus:quarkus-junit5-mockito")
  testImplementation("io.quarkus:quarkus-rest-client")
  testImplementation("io.quarkus:quarkus-rest-client-jackson")
+  testImplementation("io.quarkus:quarkus-jdbc-h2")


I did some cleanup in this file which had a few duplicate dependencies.

adnanhemani

Thanks for this @adutra!! I agree, this looks quite clean and we should probably merge this and the previously-merged listeners together.

A few questions/comments to go, but I think this makes sense overall!

adnanhemani · 2025-09-17T04:51:08Z

...est/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventProducerTest.java

+  private void assertRows(String realmId, int expected, Duration timeout) {
+    String query = "SELECT COUNT(*) FROM polaris_schema.events WHERE realm_id = '" + realmId + "'";
+    await()
+        .atMost(timeout)


Seems like we are testing with actual delays. Is there any way around this? Personally, I would like if we not increase the running time of the test suite as much as possible.

Indeed, but then we need a separate test to test just the timeouts. I will do the change.

Done. It's indeed much faster now 👍

adnanhemani · 2025-09-17T04:59:23Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+  }
+
+  protected UnicastProcessor<PolarisEvent> createProcessor(String realmId) {
+    UnicastProcessor<PolarisEvent> processor = UnicastProcessor.create();


I don't believe that UnicastProcessor is thread-safe for multiple producers by default. However, if we pass in a ConcurrentLinkedQueue during create, we may be able to make it become thread-safe. Thoughts?

Very good question! But in fact, it is safe for multiple producers as the onNext method is synchronized. No need to use a MPSC queue.

BTW: the tests are testing this. If you look at the sendAsync method, it fires events from multiple threads.

adnanhemani · 2025-09-17T05:03:00Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+  protected UnicastProcessor<PolarisEvent> createProcessor(String realmId) {
+    UnicastProcessor<PolarisEvent> processor = UnicastProcessor.create();
+    processor
+        .emitOn(Infrastructure.getDefaultWorkerPool())


Question on this: is this creating a separate execution service? Is it possible for this to become a bottleneck in terms of how fast events can be ingested?

It it be better or worse to have the threads here managed by Quarkus instead?

On a Quarkus platform, Infrastructure.getDefaultWorkerPool() always returns Quarkus default worker pool. So this method is not creating a separate execution service – which is why we don't need to care about shutting it down either.

is there a way to configure this workerPool ? lets say if i my one instance is handling a lot of realms ?

Yes, it's configurable through the quarkus.thread-pool.* options:

https://quarkus.io/guides/all-config#quarkus-core_quarkus-thread-pool-core-threads

The core thread pool is responsible for running all blocking tasks in Quarkus. I think this is appropriate for this use case. Note: this thread pool does NOT handle HTTP requests.

If this becomes problematic one day, the solution would be to declare a separate Executor CDI bean and inject it here.

adnanhemani · 2025-09-17T11:24:12Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+        .intoLists()
+        .of(configuration.maxBufferSize(), configuration.bufferTime())
+        .subscribe()
+        .with(events -> flush(realmId, events));


Leaving here as a note for anyone else wondering:

Even though this pattern will dequeue events one-by-one when attempting to generate the list of events, I don't believe we will have the race condition where events are being added at a higher rate than the amount of events being drained. This is because only a fixed amount of events can be requested by the subscription at a time (which seems like Long.MAX_VALUE). If this load is continuously sustained, then it may overwhelm the service by overflowing the queue that is backing the UnicastProcessor - but I don't think there's many other ways to get around that.

If I understand correctly the concern, what matters here is the UnicastProcessor queue. For now, it's unbounded, so there should never be a BackPressureFailure raised, but the small risk is that the queue could grow if the database cannot keep up with the number of events coming in.

I think it's OK for now, if users report memory issues we can look into bounded queues. But I don't think the server can generate events faster than the database is able to write them.

I added a small safeguard: if the upstream terminates with a failure, a message is logged and the processor is evicted from the cache.

adnanhemani · 2025-09-17T11:26:27Z

...c/main/java/org/apache/polaris/service/events/listeners/PolarisPersistenceEventListener.java

+  public record ContextSpecificInformation(long timestamp, @Nullable String principalName) {}

-  abstract ContextSpecificInformation getContextSpecificInformation();
+  protected abstract ContextSpecificInformation getContextSpecificInformation();


Curious as to why these changes were made?

Because otherwise you cannot access this method from outside this package.

Generally speaking, since this class is meant to be subclassed, even by implementors outside Polaris, it's not recommended to use package-private members.

adnanhemani

LGTM 👍

@adutra - I can ask others to review and merge this as well. Would you like me to work on gluing this together with the already existing Event Listener, or would you like to do it?

adutra · 2025-09-18T13:14:27Z

LGTM 👍

@adutra - I can ask others to review and merge this as well. Would you like me to work on gluing this together with the already existing Event Listener, or would you like to do it?

I can take care of that. Let's merge this one first if everyone agrees? I'll need re-approvals because of a merge conflict 😄

singhpk234

LGTM as well, thanks @adutra !
Added some questions for my better understanding

singhpk234 · 2025-09-18T17:47:06Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+    LOGGER.error(
+        "Unexpected error while processing events for realm '{}'; some events may have been dropped",
+        realmId,
+        error);


I wonder if its just okay to drop events considering they are deemed to be Audit logs ?

Yes, since the beginning the consistency guarantee has been "at most once", so it's understood that some events may be lost.

singhpk234 · 2025-09-18T17:48:01Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+  protected UnicastProcessor<PolarisEvent> createProcessor(String realmId) {
+    UnicastProcessor<PolarisEvent> processor = UnicastProcessor.create();
+    processor
+        .emitOn(Infrastructure.getDefaultWorkerPool())


is there a way to configure this workerPool ? lets say if i my one instance is handling a lot of realms ?

singhpk234 · 2025-09-18T17:48:55Z

...rc/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryEventListener.java

+  protected ContextSpecificInformation getContextSpecificInformation() {
+    var principal = securityContext.getUserPrincipal();
+    var principalName = principal == null ? null : principal.getName();
+    return new ContextSpecificInformation(clock.millis(), principalName);


may be this is not in the scope of the pr, but do we know why just principal and not the activated roles ?

This one is on me, I forgot to put it in the original PR - I will enhance this in the future to include activated roles.

adutra · 2025-09-19T09:04:49Z

Merging this now, I will tackle the remaining improvements next.

* Avoid calling deprecated `TableMetadataParser.read(FileIO, InputFile)` method. (apache#2609) Call `read(InputFile)` instead, as instructed by Iceberg javadoc. * Add doc notes about EclipseLink removal (apache#2605) * chore(docs): add polaris-api-specs section (apache#2598) * docs(README): Updating the READMEs to Reflect the Project Structure (apache#2599) * docs(README): Updating the READMEs to Reflect the Project Structure * fix(deps): update dependency io.opentelemetry:opentelemetry-bom to v1.54.1 (apache#2613) * Add Code of Conduct entry to the ASF menu (apache#2537) * Use the ASF Code Of Conduct * Update site/hugo.yaml Co-authored-by: Robert Stupp <snazy@snazy.de> --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * fix(deps): update dependency org.postgresql:postgresql to v42.7.8 (apache#2619) * chore(deps): update dependency mypy to >=1.18, <=1.18.2 (apache#2617) * Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.23-6.1758133907 (apache#2612) * Introduce alternate in-memory buffering event listener (apache#2574) * fix(deps): update dependency org.assertj:assertj-core to v3.27.5 (apache#2618) * chore(deps): update dependency virtualenv to >=20.34.0,<20.35.0 (apache#2614) * Add Community Meeting 20250918 (apache#2622) * Add 1.1.0-incubating release on the website (apache#2621) * Add 1.1.0-incubating release content (apache#2625) * chore(errorprone): Enabling EqualsGetClass, PatternMatchingInstanceof, and UnusedMethod in ErrorProne (apache#2600) * fix(deps): update dependency com.adobe.testing:s3mock-testcontainers to v4.9.1 (apache#2626) * Unify create/loadTable call paths (apache#2589) In preparation for implementing sending non-credential config to REST Catalog clients for apache#2207 this PR unifies calls paths for create/load table operations. This change does not have any differences in authorization. This change is not expecte to have any material behaviour differences to the affected code paths. The main idea is to consolidate decision-making for that to include into REST responses and use method parameters like `EnumSet<AccessDelegationMode> delegationModes` for driving those decisions. * Remove numeric identifier from PolarisPrincipal (apache#2388) This change removes the requirement for Polaris principals to have a numeric identifier, by removing the only sites where such an identifier was required: - In the `Resolver`. Instead, the `Resolver` now performs a lookup by principal name. - In `PolarisAdminService`. Instead, the code now compares the principal name against the entity name. Note: the lookup in the `Resolver` is still necessary, because the `Resolver` also needs to fetch the grant records. * Include principal name in Polaris tokens (apache#2389) * Include principal name in Polaris tokens Summary of changes: - Instead of including the principal id twice in the token, the principal name is now used as the subject claim. While the default authenticator doesn't need the principal name and works with just the principal id, not having the "real" principal name available could be a problem for other authenticator implementations. - `DecodedToken` has been refactored and renamed to `InternalPolarisCredential`. It is also now a package-private component. - `TokenBroker.verify()` now returns PolarisCredential. * rename to InternalPolarisToken * main: bump to 1.2.0-incubating-SNAPSHOT (apache#2624) * bump version.txt to 1.2.0-incubating-SNAPSHOT * virtualenv: wider version range (apache#2623) see apache#2614 (comment) * Remove ActiveRolesProvider (apache#2390) Summary of changes: - As proposed on the ML, `ActiveRolesProvider` is removed, and `DefaultActiveRolesProvider` is merged into `DefaultAuthenticator`. `ActiveRolesAugmentor` is also merged into `AuthenticatingAugmentor`. - The implicit convention that no roles in credentials == all roles requested is removed as it is ambiguous. Credentials must explicitly include the `PRINCIPAL_ROLE:ALL` pseudo-role to request all roles available. - PersistedPolarisPrincipal is removed. It existed merely as a means of passing the `PrincipalEntity` from the authenticator to the roles provider. This is not necessary anymore. * NoSQL: adaptions * Last merged commit d1d359a --------- Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Artur Rakhmatulin <artur.rakhmatulin@gmail.com> Co-authored-by: Adam Christian <105929021+adam-christian-software@users.noreply.github.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Alexandre Dutra <adutra@apache.org>

github-project-automation bot added this to Basic Kanban Board Sep 15, 2025

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Sep 15, 2025

adutra commented Sep 15, 2025

View reviewed changes

adutra force-pushed the event-bus branch from 8d5db55 to c86cd43 Compare September 15, 2025 21:47

adnanhemani reviewed Sep 17, 2025

View reviewed changes

adnanhemani approved these changes Sep 17, 2025

View reviewed changes

dimas-b previously approved these changes Sep 18, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 18, 2025

adutra added 3 commits September 18, 2025 15:12

Introduce alternate in-memory buffering event listener

a389d5b

split tests

e595c9d

Log processor errors and invalidate failed processors

3b2aaa9

adutra dismissed dimas-b’s stale review via 3b2aaa9 September 18, 2025 13:13

adutra force-pushed the event-bus branch from ae36f0b to 3b2aaa9 Compare September 18, 2025 13:13

singhpk234 approved these changes Sep 18, 2025

View reviewed changes

adutra merged commit 4e2facb into apache:main Sep 19, 2025
14 checks passed

github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 19, 2025

adutra deleted the event-bus branch September 19, 2025 09:05

adutra mentioned this pull request Sep 19, 2025

chore(events): unify in-memory buffer listeners implementations #2628

Merged

Introduce alternate in-memory buffering event listener #2574

Introduce alternate in-memory buffering event listener #2574

Uh oh!

Conversation

adutra commented Sep 15, 2025

Uh oh!

adutra commented Sep 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adnanhemani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adutra Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adnanhemani left a comment

Choose a reason for hiding this comment

Uh oh!

adutra commented Sep 18, 2025

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adutra commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adutra Sep 19, 2025 •

edited

Loading