DUPE (Do not merge) #358

aaronsteers · 2024-09-04T18:06:33Z

Duplicate of #343. Created in order to see CI test results. Do not merge.

Summary by CodeRabbit

New Features
- Introduced a new normalization step for primary keys to standardize their format, enhancing data integrity.
- Added support for a new data stream configuration designed to handle primary keys with dots, expanding system capabilities.
Bug Fixes
- Improved error handling for nested primary keys, providing clearer context in error messages.
Tests
- Enhanced test coverage for normalization logic by adding new test cases to handle special characters.
Chores
- Updated the .gitignore file to exclude IDE-specific files, maintaining a cleaner repository.

Fix nested primary key not supported bug

Upstream main

coderabbitai · 2024-09-04T18:06:41Z

Walkthrough

The changes involve updates to multiple files, including the addition of a new stream configuration for handling primary keys with dots, enhancements to primary key normalization logic, and the introduction of a new test case for normalization behavior. Additionally, the .gitignore file has been modified to include the .idea directory to prevent tracking of IDE-specific files. Overall, these modifications improve data handling and testing capabilities within the codebase.

Changes

File Path	Change Summary
`.gitignore`	Added `.idea` directory to the list of ignored files.
`airbyte/shared/catalog_providers.py`	Introduced normalization for primary keys using `LowerCaseNormalizer` and improved error handling.
`tests/integration_tests/fixtures/source-test/source_test/run.py`	Added new stream configuration "primary-key-with-dot" and corresponding sample record.
`tests/unit_tests/test_text_normalization.py`	Added a new test case to check normalization behavior with a period in the string.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant System
    User->>System: Request to process data stream
    System->>System: Normalize primary keys
    System->>System: Validate primary keys
    alt Validation successful
        System->>System: Process data
        System->>User: Return processed data
    else Validation failed
        System->>User: Return error message
    end

Wdyt?

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1775cb4 and fac9a71.

Files selected for processing (4)

.gitignore (1 hunks)
airbyte/shared/catalog_providers.py (2 hunks)
tests/integration_tests/fixtures/source-test/source_test/run.py (3 hunks)
tests/unit_tests/test_text_normalization.py (1 hunks)

Files skipped from review due to trivial changes (1)

.gitignore

Additional comments not posted (3)

tests/unit_tests/test_text_normalization.py (1)

214-214: LGTM!

The new test case ("some.col", "some_col", False) is a great addition to improve the test coverage for the normalization logic. It correctly checks that a string with a period is not normalized to an underscore when normalize_keys=False.

tests/integration_tests/fixtures/source-test/source_test/run.py (1)

68-84: The new stream configuration looks good! 👍

I like how you've:

Provided a clear description explaining the purpose of this stream

Defined the primary key and cursor settings appropriately

Specified the JSON schema for the expected data types

The code changes are approved.

airbyte/shared/catalog_providers.py (1)

153-155: The primary key normalization looks good! 👍

Normalizing the primary keys to lowercase using the LowerCaseNormalizer is a nice addition to standardize the format. This will help prevent potential issues related to case sensitivity.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)

tests/integration_tests/fixtures/source-test/source_test/run.py (1)
157-169: The sample record looks great! 🙌

It correctly follows the structure defined in the "primary-key-with-dot" stream configuration.

Just one minor suggestion:
sample_record_primary_key_with_dot = {
    "type": "RECORD",
    "record": {
        "stream": "primary-key-with-dot",
        "emitted_at": 1704067200,
        "data": {
            "table1.Column1": "value1",
            "table1.Column2": 1,
            "table1.empty_column": None,
            "table1.big_number": 1234567890123456,
        },
    },
}
Wdyt about moving the "stream" and "emitted_at" keys before the "data" key? This would make it consistent with the other sample records in the file.

Other than that, the code changes are approved!
airbyte/shared/catalog_providers.py (1)
157-170: The primary key validation logic looks solid! Just a minor suggestion.

The check to ensure that each primary key consists of exactly one node is a good validation step. Raising an AirbyteError with a detailed message when a key contains more than one node provides clear error reporting.

One small suggestion: Maybe we could mention in the error message that the issue is with the stream's configured catalog? Something like:
raise exc.AirbyteError(
    message=(
        "Nested primary keys are not supported in the configured catalog. "
        "Each PK column should have exactly one node. "  
    ),
    ...
)
What do you think?

sukantaroy01 and others added 22 commits July 22, 2024 15:33

Fix nested primary key not supported bug

dd3d52a

Update code to pass lint cheks

37ebcdb

Merge pull request #2 from pixisai/gads-fix-sr

a7ca8cd

Fix nested primary key not supported bug

Merge branch 'airbytehq:main' into main

1dc24e6

Merge branch 'airbytehq:main' into main

d9343f9

Merge branch 'airbytehq:main' into main

cf64dec

Merge branch 'airbytehq:main' into main

d88a765

Merge branch 'airbytehq:main' into main

3089e2d

Merge branch 'airbytehq:main' into main

a0b7134

Merge branch 'airbytehq:main' into main

c1f578d

Merge branch 'airbytehq:main' into main

4853670

Merge remote-tracking branch 'upstream/main'

5107547

Merge pull request #4 from pixisai/UPSTREAM_MAIN

fcce6bf

Upstream main

Merge branch 'airbytehq:main' into main

0975d3d

chore: fix tests

c1e620f

Merge pull request #5 from pixisai/UPSTREAM_MAIN

6580aa0

fix: fix import

7b2956c

fix: indent import

98066cd

fix: stream key (#7)

27d1c57

chore: reformat based on ruff

f788064

add unit test

9395913

improve parsing logic and guard statements for pk detection

fac9a71

coderabbitai bot reviewed Sep 4, 2024

View reviewed changes

coderabbitai bot approved these changes Sep 4, 2024

View reviewed changes

aaronsteers closed this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DUPE (Do not merge) #358

DUPE (Do not merge) #358

aaronsteers commented Sep 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 4, 2024 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

DUPE (Do not merge) #358

DUPE (Do not merge) #358

Conversation

aaronsteers commented Sep 4, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 4, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

aaronsteers commented Sep 4, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 4, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)