-
Notifications
You must be signed in to change notification settings - Fork 3
Array resolution #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Array resolution #163
Changes from 31 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
ced5484
phase 1 changes
senthilb-devrev d065d22
phase 2 changes
senthilb-devrev c1f0d40
phase 3 complete and working
senthilb-devrev 0396184
working with array + scalar resolution
senthilb-devrev 11f30ec
working with only a scalar resolution field
senthilb-devrev c24c8a1
updating in meerkat browser
senthilb-devrev 13b7f72
re-using dimensions instead of re-creating it
senthilb-devrev 32f0d90
minor refactoring
senthilb-devrev c00155e
minor update
senthilb-devrev 7600c52
unnest working as expected
senthilb-devrev 75889b0
working properly
senthilb-devrev dffc096
working properly
senthilb-devrev 4f063ff
working again
senthilb-devrev 94a1e10
moving everything to browser too
senthilb-devrev 2f077b1
mionr refactoring working
senthilb-devrev c273540
final changes after testing and copy pasting same code from browser i…
senthilb-devrev 06662f5
minor refactoring
senthilb-devrev 9bb01ad
udpated tests for resolution.ts
senthilb-devrev 2b7a570
adding a test
senthilb-devrev d2c4fba
Merge remote-tracking branch 'refs/remotes/origin/main'
senthilb-devrev 95beab1
ensuring we are having the same order by using row_id
senthilb-devrev 4ae0e4e
final ordering changes for row number
senthilb-devrev 086736b
minor
senthilb-devrev 2f27183
final tests
senthilb-devrev 6f27e74
fixed final tests
senthilb-devrev 804383c
fixing test
senthilb-devrev 61d6bde
cr comments
senthilb-devrev 34ae940
moving code into dependent files for better readability
senthilb-devrev b72c658
moving to use a merged flow
senthilb-devrev 30f63dc
changes after testing
senthilb-devrev 1cb3e1b
fixing lint error
senthilb-devrev c8c7f61
cr comments
senthilb-devrev b236c78
changing type of resolutionConfig isArrayType
senthilb-devrev 2f350b7
minor updates
senthilb-devrev 46d4884
splitting resolution file into multiple generators
senthilb-devrev 4a32ce4
minor update
senthilb-devrev aca247b
cr comments
senthilb-devrev 5b33f51
updating package version for meerkat-core
senthilb-devrev a6351cc
added a todo
senthilb-devrev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
114 changes: 114 additions & 0 deletions
114
meerkat-browser/src/browser-cube-to-sql-with-resolution/steps/aggregation-step.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| import { | ||
| ContextParams, | ||
| getArrayTypeResolutionColumnConfigs, | ||
| getNamespacedKey, | ||
| Measure, | ||
| MEERKAT_OUTPUT_DELIMITER, | ||
| ResolutionConfig, | ||
| ROW_ID_DIMENSION_NAME, | ||
| TableSchema, | ||
| wrapWithRowIdOrderingAndExclusion, | ||
| } from '@devrev/meerkat-core'; | ||
| import { AsyncDuckDBConnection } from '@duckdb/duckdb-wasm'; | ||
| import { cubeQueryToSQL } from '../../browser-cube-to-sql/browser-cube-to-sql'; | ||
|
|
||
| /** | ||
| * Re-aggregate to reverse the unnest | ||
| * | ||
| * This function: | ||
| * 1. Groups by row_id | ||
| * 2. Uses MAX for non-array columns (they're duplicated) | ||
| * 3. Uses ARRAY_AGG for resolved array columns | ||
| * | ||
| * @param resolvedTableSchema - Schema from Phase 2 (contains all column info) | ||
| * @param resolutionConfig - Resolution configuration | ||
| * @param contextParams - Optional context parameters | ||
| * @returns Final SQL with arrays containing resolved values | ||
| */ | ||
| export const getAggregatedSql = async ({ | ||
| connection, | ||
| resolvedTableSchema, | ||
| resolutionConfig, | ||
| contextParams, | ||
| }: { | ||
| connection: AsyncDuckDBConnection; | ||
| resolvedTableSchema: TableSchema; | ||
| resolutionConfig: ResolutionConfig; | ||
| contextParams?: ContextParams; | ||
| }): Promise<string> => { | ||
| const aggregationBaseTableSchema: TableSchema = resolvedTableSchema; | ||
|
|
||
| // Identify which columns need ARRAY_AGG vs MAX | ||
| const arrayColumns = getArrayTypeResolutionColumnConfigs(resolutionConfig); | ||
| const baseTableName = aggregationBaseTableSchema.name; | ||
|
|
||
| const isResolvedArrayColumn = (dimName: string) => { | ||
| return arrayColumns.some((arrayCol) => { | ||
| return dimName.includes(`${arrayCol.name}${MEERKAT_OUTPUT_DELIMITER}`); | ||
| }); | ||
| }; | ||
|
|
||
| // Create aggregation measures with proper aggregation functions | ||
| // Get row_id dimension for GROUP BY | ||
| const rowIdDimension = aggregationBaseTableSchema.dimensions.find( | ||
| (d) => d.name === ROW_ID_DIMENSION_NAME | ||
| ); | ||
|
|
||
| if (!rowIdDimension) { | ||
| throw new Error('Row id dimension not found'); | ||
| } | ||
| // Create measures with MAX or ARRAY_AGG based on column type | ||
| const aggregationMeasures: Measure[] = []; | ||
|
|
||
| aggregationBaseTableSchema.dimensions | ||
| .filter((dim) => dim.name !== rowIdDimension?.name) | ||
| .forEach((dim) => { | ||
| const isArrayColumn = isResolvedArrayColumn(dim.name); | ||
|
|
||
| // The dimension's sql field already has the correct reference (e.g., __resolved_query."__row_id") | ||
| // We just need to wrap it in the aggregation function | ||
| const columnRef = | ||
| dim.sql || `${baseTableName}."${dim.alias || dim.name}"`; | ||
senthilb-devrev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| // Use ARRAY_AGG for resolved array columns, MAX for others | ||
| // Filter out null values for ARRAY_AGG using FILTER clause | ||
| const aggregationFn = isArrayColumn | ||
| ? `COALESCE(ARRAY_AGG(DISTINCT ${columnRef}) FILTER (WHERE ${columnRef} IS NOT NULL), [])` | ||
| : `MAX(${columnRef})`; | ||
|
|
||
| aggregationMeasures.push({ | ||
| name: dim.name, | ||
| sql: aggregationFn, | ||
| type: dim.type, | ||
| alias: dim.alias, | ||
| }); | ||
| }); | ||
|
|
||
| // Update the schema with aggregation measures | ||
| const schemaWithAggregation: TableSchema = { | ||
| ...aggregationBaseTableSchema, | ||
| measures: aggregationMeasures, | ||
| dimensions: [rowIdDimension], | ||
| }; | ||
|
|
||
| // Generate the final SQL | ||
| const aggregatedSql = await cubeQueryToSQL({ | ||
| connection, | ||
| query: { | ||
| measures: aggregationMeasures.map((m) => | ||
| getNamespacedKey(baseTableName, m.name) | ||
| ), | ||
| dimensions: rowIdDimension | ||
| ? [getNamespacedKey(baseTableName, rowIdDimension.name)] | ||
| : [], | ||
senthilb-devrev marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| }, | ||
| tableSchemas: [schemaWithAggregation], | ||
| contextParams, | ||
| }); | ||
|
|
||
| // Order by row_id to maintain consistent ordering before excluding it | ||
| return wrapWithRowIdOrderingAndExclusion( | ||
| aggregatedSql, | ||
| ROW_ID_DIMENSION_NAME | ||
| ); | ||
| }; | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.