Skip to content

Conversation

@adnanhemani
Copy link
Contributor

As per discussion at this morning's Polaris Community Sync, I am introducing this event listener that will allow users to sink events to AWS CloudWatch Logs - where they can search through events for auditing purposes, if they wish. I do intend to introduce similar changes for Azure and GCP once this is merged so that we have parity across all CSPs - but this will help nail down the exact pattern to make this work.

Below is a screenshot of a sample event sent to CloudWatch:

Screenshot 2025-06-26 at 7 20 31 PM

@snazy
Copy link
Member

snazy commented Jun 27, 2025

Didn't look into the whole PR, but two things I noticed:

  • The PR's doing more than mentioned in the summary & description
  • There's no (integration) testing for the feature (IIRC there are solutions to test cloudwatch locally)

@adnanhemani
Copy link
Contributor Author

Hi @snazy,

The PR's doing more than mentioned in the summary & description

Yes, I'm assuming you're referring to the creation of one new event (onAfterCatalogCreated). It is much faster to use that to rapidly test new event listeners. If it is a sticking point, I'm happy to break that into a separate, few-line PR. WDYT?

There's no (integration) testing for the feature (IIRC there are solutions to test cloudwatch locally)

I didn't want to sink (no pun intended) more time trying to setup LocalStack for this repository if this approach was wildly off-the-mark. If you think the general approach is reasonable and the main things that will need change are implementation details, I will be glad to add that in the next revision. IIRC, this is also something that we only get with AWS - both Azure and GCP equivalents of CloudWatch do not have equivalent testing solutions (this could be outdated information but I can't find anything in a quick Google search). If having parity across CSPs on testing is important, maybe we settle for unit tests instead? Let me know your thoughts.

@adnanhemani
Copy link
Contributor Author

I've added a test using LocalStack, which should help emulate CloudWatch. This PR is ready for a full review.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some thread-safety/concurrency issues and memory leaks in this change, which have to be addressed.

The implementation can also be simplified.

I'd also recommend to move this to its own module to streamline CI and testing in general.

@adnanhemani adnanhemani requested a review from snazy July 21, 2025 07:39
@eric-maynard eric-maynard dismissed snazy’s stale review August 26, 2025 20:45

Hey @snazy, can you take another pass over this one? It's changed a lot since we initially reviewed.

* #transformAndSendEvent(HashMap)} method to define how the JSON event data should be transmitted
* or stored.
*/
public abstract class JsonEventListener extends PolarisEventListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note, that I really like having this as a public class because if someone does want to write a plugin that doesn't interact with any of the Iceberg or other based libraries, they can always utilize this instead and only depend on Polaris.

}

@Test
void shouldCreateLogGroupAndStream() {
Copy link
Member

@RussellSpitzer RussellSpitzer Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing the negative test here, we also ned to test that it doesn't fail if the log group and stream already exists. You could just start up again after the the first asserts?

RussellSpitzer
RussellSpitzer previously approved these changes Aug 28, 2025
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are just missing one test of the "already exists" branch but other than that I think this is ready to go

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 28, 2025
RussellSpitzer
RussellSpitzer previously approved these changes Aug 28, 2025
Comment on lines 127 to 128
properties.put("realm", callContext.getRealmContext().getRealmIdentifier());
properties.put("principal", securityContext.getUserPrincipal().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but I would recommend putting these into constants somewhere, e.g. AwsCloudWatchEventEventListener.KEY_REALM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting them as class-wide constants isn't a great idea IMO. These variables are RequestScoped, while the class as a whole is ApplicationScoped. The best we can probably do is to keep them as variables within the transformAndSendEvent function (which will run per request) - but I'm not sure what that helps to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm talking about the string literals like "realm" here. Having a constant will be useful if someone else wants to extract the value from the event, e.g. they won't have to hard-code properties.get("realm")

@WithName("log-group")
@WithDefault("polaris-cloudwatch-default-group")
@Override
String awsCloudwatchlogGroup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudwatch or CloudWatch? We are using a mix of both casings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. |
| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. |
| `polaris.event-listener.type` | `no-op` | Define the Polaris event listener type. Supported values are `no-op`, `aws-cloudwatch`. |
| `polaris.event-listener.aws-cloudwatch.log-group` | | Define the AWS CloudWatch log group name for the event listener. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would IAM that polaris is running with have permissions to write to this log group right ? i don't know where to best write it ? thoughts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to this page, let me know your thoughts!

singhpk234
singhpk234 previously approved these changes Sep 2, 2025
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks a lot @adnanhemani for all the work !

I have added some suggestions, which are optional to address and are good to have !

String resourceType,
String resourceName) {
if (existsCheck.get()) {
LOGGER.debug("Log {} [{}] already exists", resourceType, resourceName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.debug("Log {} [{}] already exists", resourceType, resourceName);
LOGGER.debug("Resource {} [{}] already exists", resourceType, resourceName);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - the resourceType here will be "stream" or "group" to make a final log line stating "Log stream [xyz] already exists"

| `polaris.config.rollback.compaction.on-conflicts.enabled` | `false` | When set to true Polaris will apply the deconfliction by rollbacking those REPLACE operations snapshots which have the property of `polaris.internal.rollback.compaction.on-conflict` in their snapshot summary set to `rollback`, to resolve conflicts at the server end. |
| `polaris.event-listener.type` | `no-op` | Define the Polaris event listener type. Supported values are `no-op`, `aws-cloudwatch`. |
| `polaris.event-listener.aws-cloudwatch.log-group` | `polaris-cloudwatch-default-group` | Define the AWS CloudWatch log group name for the event listener. |
| `polaris.event-listener.aws-cloudwatch.log-stream` | `polaris-cloudwatch-default-stream`| Define the AWS CloudWatch log stream name for the event listener. Ensure that Polaris' IAM credentials have the following actions: "PutLogEvents", "DescribeLogStreams", and "DescribeLogGroups" on the specified log stream/group. If the specified log stream/group does not exist, then "CreateLogStream" and "CreateLogGroup" will also be required. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not a blocker] IMHO adding IAM requirement might not be right place, is there a place where IAM with Polaris runs with requirements are there ? let me check too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see one other than the Getting Started - and I don't think that page is relevant here.

public void onAfterTableRefreshed(AfterTableRefreshedEvent event) {
HashMap<String, Object> properties = new HashMap<>();
properties.put("event_type", event.getClass().getSimpleName());
properties.put("table_identifier", event.tableIdentifier().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] wondering if we need catalog name too in this ? lets say i have a same namespace and table in the different catalogs in the realm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this but we should put this as part of the instrumentation of the event itself. I will look into this for #2480.

@singhpk234 singhpk234 merged commit 20753ed into apache:main Sep 3, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Sep 3, 2025
@adnanhemani
Copy link
Contributor Author

Working on rebase for this - seems like it is now failing in main

snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* Integration tests for Catalog Federation (apache#2344)

Adds a Junit5 integration test for catalog federation.

* Fix merge conflict in CatalogFederationIntegrationTest (apache#2420)

apache#2344 added a new test for catalog federation, but it looks like an undetected conflict with concurrent changes related to authentication have broken the test in main.

* chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.23-6.1755674729 (apache#2416)

* 2334 (apache#2427)

* Fix TableIdentifier in TaskFileIOSupplier (apache#2304)

we cant just convert a `TaskEntity` to a `IcebergTableLikeEntity` as the
`getTableIdentifier()` method will not return a correct value by using
the name of the task and its parent namespace (which is empty?).

task handlers instead need to pass in the `TableIdentifier` that they
already inferred via `TaskEntity.readData`.

* Fix NPE in CreateCatalog (apache#2435)

* Doc fix: Access control page update (apache#2424)

* 2418

* 2418

* fix(deps): update dependency software.amazon.awssdk:bom to v2.32.29 (apache#2443)

* Optimize PolicyCatalog.listPolicies (apache#2370)

this is a follow-up to apache#2290

the optimization is to use `listEntities` instead of `loadEntities` when
there is `policyType` filter to apply

* Add PolarisDiagnostics field to BaseMetaStoreManager (apache#2381)

* Add PolarisDiagnostics field to BaseMetaStoreManager

the ultimate goal is removing the `PolarisCallContext` parameter from every
`PolarisMetaStoreManager` interface method, so we make steps towards
reducing its usage first.

* Add feature flag to disallow custom S3 endpoints (apache#2442)

* Add new realm-level flag: `ALLOW_SETTING_S3_ENDPOINTS` (default: true)

* Enforce in `PolarisServiceImpl.validateStorageConfig()`

Fixes apache#2436

* Deprecate ActiveRolesProvider for removal (apache#2404)

* Client: fix openapi verbose output, remove doc generate, and skip test generations (apache#2439)

* Fix various issue in client code generation

* Use logger instead of print

* Add back exclude on __pycache__ as CI is not via Makefile

* Add back exclude on __pycache__ as CI is not via Makefile

* Add user principal tag in metrics (apache#2445)

* Added API change to enable tag

* Added test

* Added production readiness check

* fix(deps): update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.36.0 (apache#2454)

* fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.56.0 (apache#2447)

* fix(deps): update dependency gradle.plugin.org.jetbrains.gradle.plugin.idea-ext:gradle-idea-ext to v1.3 (apache#2428)

* Build: Make jandex dependency used for index generation managed (apache#2431)

Also allows specifying the jandex index version for the build.

This is a preparation step contributing to apache#2204, once a jandex fix for reproducible builds is available.

Co-authored-by: Alexandre Dutra <adutra@apache.org>

* Built: improve reproducible archive files (apache#2432)

As part of the effort for apache#2204, this change fixes a few aspects around reproducible builds:

Some Gradle projects produce archive files, but don't get the necessary Gradle archive-tasks settings applied: one not-published project but also the tarball&zip of the distribution. This change moves the logic to the new build-plugin `polaris-reproducible`.

Another change is to have some Quarkus generated jar files adhere to the same conventions, which are constant timestamps for the zip entries and a deterministic order of the entries. That's sadly not a full fix, as the classes that are generated or instumented by Quarkus differ in each build.

Contributes to apache#2204

* Remove commons-lang3 dependency (apache#2456)

outside of tests we can replace the functionality with jdk11 and guava.
also stop using `org.assertj.core.util` as its a non-public api.

* add refresh credentials property to loadTableResult (apache#2341)

* add refresh credentials property to loadTableResult

* IcebergCatalogAdapterTest: Added test to ensure refresh credentials endpoint is included

* delegate refresh credential endpoint configuration to storage integration

* GCP: Add refresh credential properties

* fix(deps): update dependency io.opentelemetry.semconv:opentelemetry-semconv to v1.37.0 (apache#2458)

* Add Delegator to all API Implementations (apache#2434)

Per the Dev ML, implements the Delegator pattern to add Events instrumentation to all Polaris APIs.

* Prefer java.util.Base64 over commons-codec (apache#2463)

`java.util.Base64` is available since java8 and we are already using it
in a few other spots.

in a follow-up we might be able to get rid of our `commons-codec` dependency
completely.

* Service: Move tests to the right package (apache#2469)

* Update versions in runtime LICENSE and NOTICE (apache#2468)

* fix(deps): update dependency com.adobe.testing:s3mock-testcontainers to v4.8.0 (apache#2475)

* fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9.1.0 (apache#2476)

* Service: Remove hadoop-common from polaris-runtime-service (apache#2462)

* Service: Always validate allowed locations from Storage Config (apache#2473)

* Add Community Sync Meeting 20250828 (apache#2477)

* Update dependency software.amazon.awssdk:bom to v2.33.0 (apache#2483)

* Remove PolarisCallContext.getDiagServices (apache#2415)

* Remove PolarisCallContext.getDiagServices usage

* Remove diagnostics from PolarisCallContext

* Feature: Expose resetCredentials via a new reset api to allow root user to reset credentials for an existing principal with custom values  (apache#2197)

* Add type-check to PolarisEntity subclass ctors (apache#2302)

currently one can freely "cast" any `PolarisEntity` to a more
specific type via their constructors.

this can lead to subtle bugs like we fixed in
a29f800

by adding type checks we discover a few more places where we need to be
more careful about how we construct new or handle existing entities.

note that we can add a check for `PolarisEntitySubType` in a followup,
but it requires more fixes currently.

* Fix CI (apache#2489)

Fix undetected merge conflict after apache#2197 + apache#2415 + apache#2434

* Use local diagnostics in TransactionWorkspaceMetaStoreManager

* Add resetCredentials to PolarisPrincipalsEventServiceDelegator

* Core: Prevent AIOOBE for negative codes in PolarisEntityType, PolarisPrivilege, ReturnStatus (apache#2490)

* feat(idgen): Start Implementation of NoSQL with the ID Generation Framework (apache#2131)

Create an ID Generation Framework.

Related to apache#650 & apache#844

Co-authored-by: Robert Stupp <snazy@snazy.de>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>

* perf(refactor): optimizing JdbcBasePersistenceImpl.listEntities (apache#2465)

- Reduced Column Selection: Only 6 columns instead of 16

- Eliminated Object Creation Overhead: Direct conversion to EntityNameLookupRecord without intermediate PolarisBaseEntity

* Add Polaris Events to Persistence (apache#1844)

* AWS CloudWatch Event Sink Implementation (apache#1965)

* Fix failing CI (apache#2498)

* Update actions/stale digest to 3a9db7e (apache#2499)

* Core: Prevent AIOOBE for negative policy codes in PredefinedPolicyType (apache#2486)

* Service: Add location tests for views (apache#2496)

* Update docker.io/jaegertracing/all-in-one Docker tag to v1.73.0 (apache#2500)

* Update dependency io.netty:netty-codec-http2 to v4.2.5.Final (apache#2495)

* Update actions/setup-python action to v6 (apache#2502)

* Update the Release Guide about the Helm Chart package (apache#2179)

* Update the Release Guide about the Helm Chart package

* Update release-guide.md

Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr>

* Add missing commit message

* Whitespace

* Use Helm GPG plugin to sign the Helm chart

* Fix directories during Helm chart copy to SVN

* Add Helm index to SVN

* Use long name for svn checkout

* Ensure the Helm index is updated after the chart is moved to SVN dist release

* Do not publish any Docker image before the vote succeeds

* Typos

* Revert "Do not publish any Docker image before the vote succeeds"

This reverts commit 5617e65.

* Don't mention Helm values.yaml in the release guide as it doesn't contain version details

---------

Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr>

* Update dependency com.azure:azure-sdk-bom to v1.2.38 (apache#2503)

* Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.23-6.1756793420 (apache#2504)

* Remove commons-codec dependency (apache#2474)

follow-up to f8ad77a

we can simply use guava instead and eliminate the extra dependency

* CLI: Remove SCRIPT_DIR and default config location to user home (apache#2448)

* Remove readInternalProperties helpers (apache#2506)

the functionality is already provided by the `PrincipalEntity`

* Add Events for Generic Table APIs (apache#2481)


This PR adds the Events instrumentation for the Generic Tables Service APIs, surrounding the default delegated call to the business logic APIs.

* Disable custom namespace locations (apache#2422)

When we create a namespace or alter its location, we must confirm that this location is within the parent location. This PR introduces introduces a check similar to the one we have for tables, where custom locations are prohibited by default. This functionality is gated behind a new behavior change flag `ALLOW_NAMESPACE_CUSTOM_LOCATION`. In addition to allowing us to revert to the old behavior, this flag allows some tests relying on arbitrarily-located namespaces to pass (such as those from upstream Iceberg).

Fixes: apache#2417

* fix for IcebergAllowedLocationTest (apache#2511)

* Remove unused config from SparkSessionBuilder (apache#2512)

Tests pass without it.

* Add Events for Policy Service APIs (apache#2479)

* Remove PolarisTestMetaStoreManager.jsonNode helper (apache#2513)

* Update dependency software.amazon.awssdk:bom to v2.33.4 (apache#2517)

* Update dependency com.nimbusds:nimbus-jose-jwt to v10.5 (apache#2514)

* Update dependency io.opentelemetry:opentelemetry-bom to v1.54.0 (apache#2515)

* Update dependency io.micrometer:micrometer-bom to v1.15.4 (apache#2519)

* Port missed OSS change

* NoSQL: adopt to updated test packages

* NoSQL: adapt to removed PolarisDiagnostics param

* NoSQL: fix libs.versions.toml

* NoSQL: include jandex plugin related changes from OSS

* NoSQL: changes for delete/set principal client-ID+secret

* Last merged commit c6176dc

---------

Co-authored-by: Pooja Nilangekar <poojan@umd.edu>
Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com>
Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Christopher Lambert <xn137@gmx.de>
Co-authored-by: Honah (Jonas) J. <honahx@apache.org>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>
Co-authored-by: Alexandre Dutra <adutra@apache.org>
Co-authored-by: fivetran-kostaszoumpatianos <kostas.zoumpatianos@fivetran.com>
Co-authored-by: Jason <jasonf20@gmail.com>
Co-authored-by: Adnan Hemani <adnan.h@berkeley.edu>
Co-authored-by: Yufei Gu <yufei@apache.org>
Co-authored-by: JB Onofré <jbonofre@apache.org>
Co-authored-by: fivetran-arunsuri <103934371+fivetran-arunsuri@users.noreply.github.com>
Co-authored-by: Adam Christian <105929021+adam-christian-software@users.noreply.github.com>
Co-authored-by: Artur Rakhmatulin <artur.rakhmatulin@gmail.com>
Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants