Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci) Run codeflash in on PRs modifying the Python CDK #44427

Merged
merged 20 commits into from
Aug 26, 2024

Conversation

girarda
Copy link
Contributor

@girarda girarda commented Aug 19, 2024

What

Adds a github action running Codeflash on PRs modifying the Python CDK.

Codeflash will inspect the new code and provide suggestions to improve the performance.

The github action is not required to complete before merging. There is no requirement for updating the code as per Codeflash's suggestions.

For reference:

The goal is to try the tool and see if it can provide value by suggesting performance improvement as we go.

This PR is built on top of this PR adding codeflash to the CDK.

I can integrate with airbyte-ci as a follow up step but I want to get this in ASAP so we can try the tool.

Code review guide

The main thing I want to make sure is the GHA has the right security filters.

User Impact

There should be no user impact.

Can this PR be safely reverted and rolled back?

  • YES 💚

@girarda girarda requested a review from a team as a code owner August 19, 2024 22:10
Copy link

vercel bot commented Aug 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Aug 22, 2024 9:59pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Aug 19, 2024
@girarda girarda requested a review from alafanechere August 19, 2024 22:11
@@ -114,3 +115,10 @@ optional_poetry_groups = ["dev"]
poetry_extras = ["file-based", "sphinx-docs", "vector-db-based"]
poe_tasks = ["check-ci"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ideally recommend adding poetry run code flash as a poe task and add this poe task here

Suggested change
poe_tasks = ["check-ci"]
poe_tasks = ["check-ci", "codeflash"]
required_env_variables = ["CODEFLASH_API_KEY", "CODEFLASH_PR_NUMBER"]

This would prevent the introduction of a new GHA workflow and make codeflash run on the listed python_versions.

You'd need to set the env vars in the GHA workflow here

@@ -149,6 +152,7 @@ runs:
SLACK_WEBHOOK: ${{ inputs.slack_webhook_url }}
SPEC_CACHE_BUCKET_NAME: ${{ inputs.spec_cache_bucket_name }}
SPEC_CACHE_GCS_CREDENTIALS: ${{ inputs.spec_cache_gcs_credentials }}
CODEFLASH_API_KEY: ${{ inputs.codeflash_api_key }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere am I missing something obvious here? After moving from a custom GHA to airbyte-ci, the workflow is failing because it can't access the CODEFLASH_API_KEY.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing one last change: the thing that would make airbyte-ci set this env var on the docker container running the poe task.
It has to be defined in [tool.airbyte-ci] in the pyproject.toml file. I'll add a suggestion comment below.

@@ -149,6 +152,7 @@ runs:
SLACK_WEBHOOK: ${{ inputs.slack_webhook_url }}
SPEC_CACHE_BUCKET_NAME: ${{ inputs.spec_cache_bucket_name }}
SPEC_CACHE_GCS_CREDENTIALS: ${{ inputs.spec_cache_gcs_credentials }}
CODEFLASH_API_KEY: ${{ inputs.codeflash_api_key }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing one last change: the thing that would make airbyte-ci set this env var on the docker container running the poe task.
It has to be defined in [tool.airbyte-ci] in the pyproject.toml file. I'll add a suggestion comment below.

airbyte-cdk/python/pyproject.toml Outdated Show resolved Hide resolved
Co-authored-by: Augustin <augustin@airbyte.io>
Comment on lines 5 to 11
for i in range(len(arr)):
# This is a diff
for j in range(len(arr) - 1):
if arr[j] > arr[j + 1]:
temp = arr[j]
arr[j] = arr[j + 1]
arr[j + 1] = temp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i in range(len(arr)):
# This is a diff
for j in range(len(arr) - 1):
if arr[j] > arr[j + 1]:
temp = arr[j]
arr[j] = arr[j + 1]
arr[j + 1] = temp
arr.sort()

Copy link

codeflash-ai bot commented Aug 22, 2024

⚡️ Codeflash found optimizations for this PR

📄 sorter() in airbyte-cdk/python/airbyte_cdk/bubble_sort.py

📈 Performance improved by 2,466,795% (24,667.95x faster)

⏱️ Runtime went down from 29.7 seconds to 1.21 millisecond

Explanation and details

Certainly! The current implementation of the sorter function is a simple Bubble Sort algorithm, which has a time complexity of O(n^2). This can be very slow for large arrays. We can make this more efficient by using Python's built-in sorting method, which is implemented as Timsort and has a time complexity of O(n log n).

Here's the optimized code.

This uses Python's built-in sorted() function, which is highly optimized for performance. If you need to sort the list in place, you can use the sort() method of the list.

Both versions will provide a significant performance improvement over the original Bubble Sort implementation.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

✅ 3 Passed − ⚙️ Existing Unit Tests

(click to show existing tests)
- test_bubble_sort.py

✅ 21 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import pytest  # used for our unit tests
from airbyte_cdk.bubble_sort import sorter

# unit tests

# Basic Functionality Tests
def test_simple_unsorted_list():
    codeflash_output = sorter([3, 1, 2])
    # Outputs were verified to be equal to the original implementation

def test_already_sorted_list():
    codeflash_output = sorter([1, 2, 3])
    # Outputs were verified to be equal to the original implementation

def test_reverse_sorted_list():
    codeflash_output = sorter([3, 2, 1])
    # Outputs were verified to be equal to the original implementation

# Edge Cases Tests
def test_empty_list():
    codeflash_output = sorter([])
    # Outputs were verified to be equal to the original implementation

def test_single_element_list():
    codeflash_output = sorter([1])
    # Outputs were verified to be equal to the original implementation

def test_two_elements_already_sorted():
    codeflash_output = sorter([1, 2])
    # Outputs were verified to be equal to the original implementation

def test_two_elements_reverse_sorted():
    codeflash_output = sorter([2, 1])
    # Outputs were verified to be equal to the original implementation

# Lists with Duplicates Tests
def test_all_identical_elements():
    codeflash_output = sorter([2, 2, 2])
    # Outputs were verified to be equal to the original implementation

def test_some_duplicates():
    codeflash_output = sorter([3, 1, 2, 2, 1])
    # Outputs were verified to be equal to the original implementation

# Lists with Negative Numbers Tests
def test_mixed_positive_and_negative_numbers():
    codeflash_output = sorter([3, -1, 2, -2, 1])
    # Outputs were verified to be equal to the original implementation

def test_all_negative_numbers():
    codeflash_output = sorter([-3, -1, -2])
    # Outputs were verified to be equal to the original implementation

# Lists with Floating Point Numbers Tests
def test_mixed_integers_and_floats():
    codeflash_output = sorter([3, 1.5, 2, 2.5, 1])
    # Outputs were verified to be equal to the original implementation

def test_all_floating_point_numbers():
    codeflash_output = sorter([3.1, 1.2, 2.3])
    # Outputs were verified to be equal to the original implementation

# Large Scale Test Cases
def test_large_list_random_integers():
    import random
    random_list = random.sample(range(10000), 1000)
    codeflash_output = sorter(random_list)
    # Outputs were verified to be equal to the original implementation

def test_large_list_sorted_integers():
    sorted_list = list(range(1000))
    codeflash_output = sorter(sorted_list)
    # Outputs were verified to be equal to the original implementation

def test_large_list_reverse_sorted_integers():
    reverse_sorted_list = list(range(999, -1, -1))
    codeflash_output = sorter(reverse_sorted_list)
    # Outputs were verified to be equal to the original implementation

# Performance and Scalability Tests
def test_very_large_list_random_integers():
    import random
    random_list = random.sample(range(100000), 10000)
    codeflash_output = sorter(random_list)
    # Outputs were verified to be equal to the original implementation

def test_very_large_list_sorted_integers():
    sorted_list = list(range(10000))
    codeflash_output = sorter(sorted_list)
    # Outputs were verified to be equal to the original implementation

def test_very_large_list_reverse_sorted_integers():
    reverse_sorted_list = list(range(9999, -1, -1))
    codeflash_output = sorter(reverse_sorted_list)
    # Outputs were verified to be equal to the original implementation

# Lists with Special Values Tests
def test_list_with_none_values():
    with pytest.raises(TypeError):
        sorter([3, 1, None, 2])
    # Outputs were verified to be equal to the original implementation

def test_list_with_mixed_types():
    with pytest.raises(TypeError):
        sorter([3, 'a', 2])
    # Outputs were verified to be equal to the original implementation

# Run the tests
if __name__ == "__main__":
    pytest.main()

🔘 (none found) − ⏪ Replay Tests

@octavia-squidington-iii octavia-squidington-iii removed the CDK Connector Development Kit label Aug 22, 2024
@girarda girarda dismissed alafanechere’s stale review August 26, 2024 14:19

wil refactor into using airbyte-ci separately

@girarda
Copy link
Contributor Author

girarda commented Aug 26, 2024

/approve-and-merge reason="adding a new non-required github action"

@octavia-approvington
Copy link
Contributor

Our work here is done
done

@octavia-approvington octavia-approvington merged commit b034e7d into master Aug 26, 2024
31 checks passed
@octavia-approvington octavia-approvington deleted the alex/codeflash_ci branch August 26, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants