-
Notifications
You must be signed in to change notification settings - Fork 334
Add PolarisAdminService.loadEntities helper #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tmater
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XN137! Nifty change, overall LGTM!
| catalogPath = null; | ||
| catalogId = 0; | ||
| } else { | ||
| catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified, PolarisEntity.toCoreList() returns null when the input is null or empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had noticed that toCoreList has some null handling but if we passed PolarisEntity.toCoreList(List.of(null)) i think it would still result in an NPE ?
imo having the explicit "if null" makes the overall flow clearer either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, makes sense, forgot about that NPE.
| .map( | ||
| nameAndId -> | ||
| metaStoreManager.loadEntity( | ||
| getCurrentPolarisContext(), catalogId, nameAndId.getId(), nameAndId.getType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick question, is there a specific reason we need to save the catalogId earlier? I'm asking because nameAndId already provides getCatalogId().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be true but i am only following what the existing code was doing... whether getCatalogId() always returns the right value for all types of entities idk, so i was just sticking to the existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to elaborate: in listCatalogsUnsafe the returned entity should likely return the same value for getId and getCatalogId (not sure if it does) but it seems like the loadEntity api still requires us to pass 0 since catalogs are top level entites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it!
`PolarisAdminService` has multiple spots where it is working around the sub-optimal `PolarisMetaStoreManager` APIs. This results in multiple fixes like: PR-1949 PR-2258 While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer.
7075d13 to
529774f
Compare
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The culprit here is that although all matching entities are known during the list-operation, all you get from it is this EntityNameLookupRecord, so you have to load the same entities again.
Having a counterpart of PolarisMetaStoreManager#listEntities that doesn't yield the unnecessarily reduced result but the actual entities would help.
WDYT?
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XN137 for doing this. LGTM!
| catalogPath = PolarisEntity.toCoreList(List.of(catalogEntity)); | ||
| catalogId = catalogEntity.getId(); | ||
| } | ||
| // TODO: add loadEntities method to PolarisMetaStoreManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We will need a discussion on how persistence layer is going to support it, as it affects all types of persistence. Loading everything in one call can provide a consistent view, which is nice, but there are some caveats that the uber call may be too large, so that it hits the limits(e.g., memory limit). With that, I think it's premature to consider this as a TODO item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict under the hood listEntities is already fetching the fully fledged PolarisBaseEntity instances from the database (for the jdbc case):
Lines 507 to 513 in af69d9f
| datasourceOperations.executeSelectOverStream( | |
| query, | |
| new ModelEntity(), | |
| stream -> { | |
| var data = stream.filter(entityFilter); | |
| results.set(Page.mapped(pageToken, data, transformer, EntityIdToken::fromEntity)); | |
| }); |
it includes streaming/pagination.
it just happens that the given transformer turns PolarisBaseEntity into EntityNameLookupRecord... and then later the record is used to look up the full entity again.
so afaict, the memory footprint would not be very different if we had a loadEntities function (or changed listEntities to not only return a EntityNameLookupRecord).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so afaict, the memory footprint would not be very different if we had a loadEntities function
I agree. I also think the overall load is lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm not against the idea, but we will need a discussion.
yeah this is mentioned in the commit message and implied by the added TODO. |
* fix(deps): update dependency com.nimbusds:nimbus-jose-jwt to v10.4.1 (apache#2270) * chore(deps): update actions/download-artifact action to v5 (apache#2271) * fix(deps): update dependency boto3 to v1.40.3 (apache#2269) * Prefer diagnostics field in Resolver (apache#2247) * Stop mocking PolarisDiagnostics (apache#2248) if diagnostis checks are failing in our tests we want to know about it * Add TestServices.newCallContext (apache#2249) also add local `newCallContext` helper in some test classes * Nit: simplify runtime-service dependencies (apache#2273) Dependency "io.quarkus:quarkus-jdbc-postgresql" doesn't need any excludes (these excludes were for `SparkIT` which is now isolated in a separate module). * Minor fixes and enhancements to External IDP documentation (apache#2274) * Standardize logging libraries in tests (apache#2268) This change enforces the following test logging patterns: - Non-Quarkus modules use Logback Classic, configured via logback-test.xml - Quarkus modules use JBoss Logging Manager, configured in Quarkus configuration files. This change also introduces a workaround for the "duplicate log messages" issues with Gradle + JBoss Logging Manager. See this issue for context: quarkusio/quarkus#22844 The workaround implemented in this PR is very similar to the one proposed in this comment: quarkusio/quarkus#22844 (comment) Note: it's not entirely possible imho to suppress the following message on the console: ``` The Agroal dependency is present but no JDBC datasources have been defined. ``` This is because: 1. The message happens during augmentation phase, not during tests 2. And it suffers from the "duplicate message" issue with (it's actually Gradle that prints those messages). * Use Mockito Java agent for mock instrumentation (apache#2275) This change fixes the following warning during tests: Mockito is currently self-attaching to enable the inline-mock-maker. This will no longer work in future releases of the JDK. Please add Mockito as an agent to your build as described in Mockito's documentation: https://javadoc.io/doc/org.mockito/mockito-core/latest/org.mockito/org/mockito/Mockito.html#0.3 WARNING: A Java agent has been loaded dynamically (.../byte-buddy-agent-1.17.5.jar) WARNING: If a serviceability tool is in use, please run with -XX:+EnableDynamicAgentLoading to hide this warning WARNING: If a serviceability tool is not in use, please run with -Djdk.instrument.traceUsage for more information WARNING: Dynamic loading of agents will be disallowed by default in a future release * Use injected PolarisDiagnostics in MetaStoreManagerFactory impls (apache#2251) * Clean exit when running repair mode for client (apache#2287) * Clean exit when running repair mode for client * Clean exit when running repair mode for client * chore(deps): update dependency poetry to v2.1.4 (apache#2259) * chore(deps): update dependency poetry to v2.1.4 * fix pyproject --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * fix(deps): update dependency com.gradleup.shadow:shadow-gradle-plugin to v9 (apache#2289) * chore(deps): update docker.io/jaegertracing/all-in-one docker tag to v1.72.0 (apache#2285) * fix(deps): update dependency boto3 to v1.40.4 (apache#2284) * Remove PolarisCallContext.getClock (apache#2250) the clock is application scoped and thus should not be put into any realm or call specific context class. * Add PolarisAdminService.loadEntities helper (apache#2261) `PolarisAdminService` has multiple spots where it is working around the sub-optimal `PolarisMetaStoreManager` APIs. This results in multiple fixes like apache#1949 and apache#2258 While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer. * fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.55.0 (apache#2281) * fix: typo in server template files. (apache#2288) * NoSQL: merge related adoptions * Last merged commit d753e3d --------- Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Christopher Lambert <xn137@gmx.de> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Yujiang Zhong <42907416+zhongyujiang@users.noreply.github.com>
PolarisAdminServicehas multiple spots where it is working around the sub-optimalPolarisMetaStoreManagerAPIs.This results in multiple fixes like:
#1949
#2258
While eventually the underlying APIs should be improved, for now we can make a single central workaround and clean up some redundant code. Also we can improve the return types as callers are not interested in details of the entity layer.