Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtypes: Add type converstion to pyspark types #574

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aditya-nambiar
Copy link
Contributor

@aditya-nambiar aditya-nambiar commented Sep 26, 2024

Important

Add type conversion to PySpark types, new offline query feature set, and extensive logging for debugging.

  • Type Conversion:
    • Added to_spark_type() in schema.py to convert Python types to PySpark SQL DataType.
    • Handles Optional, List, Dict, dataclass, and basic types.
  • Feature Set:
    • Added QueryOffline feature set in test_featureset.py for offline query testing.
    • Includes extractors calc and calc2 for feature calculations.
  • Logging:
    • Extensive logging added in expr.py and branch.py for debugging purposes.
  • Testing:
    • Added tests for to_spark_type() in test_schema.py.
  • Dependencies:
    • Updated pyproject.toml to include pyspark dependency.

This description was created by Ellipsis for a21e5fc. It will automatically update as commits are pushed.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 8769520 in 30 seconds

More details
  • Looked at 360 lines of code in 6 files
  • Skipped 1 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. fennel/testing/branch.py:40
  • Draft comment:
    Remove debugging print statements to clean up the code.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_bDP4v3wby8KW0HJQ


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -410,6 +410,35 @@ def pa_to_pd(pa_data, ret_type, parse=True):
ret_type = output_dtype

serialized_ret_type = get_datatype(ret_type).SerializeToString()
print("I AM HERE")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debugging print statements to clean up the code.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 690673d in 32 seconds

More details
  • Looked at 332 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. fennel/expr/expr.py:412
  • Draft comment:
    Remove leftover print statements used for debugging to clean up the code.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. fennel/internal_lib/schema/test_schema.py:1068
  • Draft comment:
    Remove print statements used for debugging to clean up the code.
  • Reason this comment was not posted:
    Marked as duplicate.
3. fennel/testing/branch.py:40
  • Draft comment:
    Remove print statements used for debugging to clean up the code.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_mpspA2FiuPKrlov2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on a21e5fc in 33 seconds

More details
  • Looked at 330 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. fennel/expr/expr.py:410
  • Draft comment:
    Remove commented-out print statements to maintain code cleanliness and readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR description mentions extensive logging for debugging, but the code contains commented-out print statements. These should be removed to maintain code cleanliness.
2. fennel/testing/branch.py:40
  • Draft comment:
    Remove commented-out print statements to maintain code cleanliness and readability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The PR description mentions extensive logging for debugging, but the code contains commented-out print statements. These should be removed to maintain code cleanliness.
3. fennel/internal_lib/schema/schema.py:1082
  • Draft comment:
    Use datetime.datetime instead of datetime for correct type checking.
    elif py_type is datetime.datetime:
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The import statement from datetime import datetime, date indicates that datetime is already referring to datetime.datetime. Therefore, the current usage of datetime in the code is correct. The comment seems to be incorrect because it doesn't consider the import statement.
    I might be missing some context about how datetime is used elsewhere in the code, but based on the import statement, the usage seems correct.
    The import statement is clear, and the usage of datetime aligns with it. The comment does not provide strong evidence for a necessary change.
    The comment should be deleted because the usage of datetime is correct given the import statement.
4. fennel/internal_lib/schema/schema.py:1084
  • Draft comment:
    Use datetime.date instead of date for correct type checking.
    elif py_type is datetime.date:
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment suggests a change for clarity or convention, but the current code is functionally correct. The import statement already imports date from datetime, so there is no functional issue. The comment does not indicate a necessary code change, as the current code will work as intended.
    I might be overlooking a convention or style guide that prefers fully qualified names for clarity. However, the current import statement is clear and common in Python code.
    The import statement is clear and common practice in Python, and there is no functional issue with using date as imported. The comment does not indicate a necessary code change.
    The comment does not indicate a necessary code change, as the current code is functionally correct. The comment should be deleted.

Workflow ID: wflow_KM4bPiuP0WsSNYJc


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant