Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support None values in array, map and row types #281

Merged

Conversation

mdesmet
Copy link
Contributor

@mdesmet mdesmet commented Nov 4, 2022

Description

Fixes #269

Also refactors mappers from lambda functions to separate classes for better readability.

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 4, 2022
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment.

I did not comment on code that was just refactored even though they would have been majority of my comments. I'll submit separate PR with some suggestions in that area.

@@ -295,7 +295,7 @@ def __init__(

def retry(self, func, args, kwargs, err, attempt):
delay = self._get_delay(attempt)
time.sleep(delay)
sleep(delay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid unrelated changes in single commit

trino/client.py Outdated
Comment on lines 878 to 883
if value == 'Infinity':
return INF
if value == '-Infinity':
return NEGATIVE_INF
if value == 'NaN':
return NAN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline all these constants for INF, NEGATIVE_INF and NAN. Not used in multiple places.

trino/client.py Outdated
}


class RowMapperFactory:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep NoOpRowMapper to be below RowMapperFactory (or just above).

trino/client.py Outdated
and returns a RowMapper instance which will process rows of data
"""
no_op_row_mapper = NoOpRowMapper()
T = TypeVar("T")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue type args should be defined up top with imports since over time they get reused across entire class so there's no point trying to "keep them close to where they are used".

trino/client.py Outdated
def map(self, values: List[Any]) -> Optional[Tuple[Optional[Any], ...]]:
if values is None:
return None
return tuple(self.mappers[idx].map(value) for idx, value in enumerate(values))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idx is no better than i. If you want longer name index is much better.

@hashhar
Copy link
Member

hashhar commented Nov 5, 2022

@ebyhr PTAL. Please avoid leaving comments on code that was moved from existing code - out of scope for this PR and I'm submitting another PR on top of this with some cleanups. will request a review there.

@mdesmet mdesmet force-pushed the bug/none-values-experimental-python-types branch from 69af6a3 to 1898a2c Compare November 8, 2022 17:51
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left only minor comments.

@@ -156,6 +156,8 @@ def test_array(trino_connection):
SqlTest(trino_connection) \
.add_field(sql="CAST(null AS ARRAY(VARCHAR))", python=None) \
.add_field(sql="ARRAY['a', 'b', null]", python=['a', 'b', None]) \
.add_field(sql="ARRAY[1.2, 2.4, null]", python=[Decimal("1.2"), Decimal("2.4"), None]) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Refactor value mappers to separate classes"

"Refactoring" commit should focus on refactoring. It would be better to change the commit title.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 175 to 176
.add_field(sql="TIME '01:23:45.123'", python=time_3) \
.add_field(sql="CAST('01:23:45.123' AS TIME(3))", python=time_3) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we want to test is type mapping between Python client and Trino server, not literal and cast in Trino server. I suppose we can remove either one. If we still want to test both, test_date misses the cast coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed useless. I have removed the casts also we should not test precision of null values so I kept only one null test for each data type.

@mdesmet mdesmet force-pushed the bug/none-values-experimental-python-types branch from 7ee9f2c to e8cb004 Compare November 9, 2022 11:45
@mdesmet
Copy link
Contributor Author

mdesmet commented Nov 9, 2022

I made some further modifications:

  • simplified the code to only apply rounding if the input value exceeds Python's supported precision.
  • Removed usage of regex for performance
  • Usage of fromisoformat instead of strptotime to get a 54x boost in performance
  • Rounding using Decimal

@mdesmet mdesmet force-pushed the bug/none-values-experimental-python-types branch from e8cb004 to 35a6fa6 Compare November 9, 2022 20:24
trino/client.py Outdated
self.pattern = pattern
self.time_size = 9 + ms_size - ms_to_trim
def round_millis(value: str, precision: int) -> str:
if len(value) > precision:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes assumption that input is a non-fractional number? Let's defensively verify our assumption. Nothing prevents the caller from using the function in other contexts in future.

trino/client.py Outdated
self.time_size = 9 + ms_size - ms_to_trim
def round_millis(value: str, precision: int) -> str:
if len(value) > precision:
return str(round(Decimal(value) / Decimal(10 ** (len(value) - precision))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to perform half-odd rounding while docs say round does half-even. (https://docs.python.org/3/library/functions.html#round)

if two multiples are equally close, rounding is done toward the even choice

In [18]: round(Decimal('2.5'))
Out[18]: 2

In [19]: round(Decimal('3.5'))
Out[19]: 4
In [23]: round_millis('1234585', 6)
Out[23]: '123458'

In [24]: round_millis('1234575', 6)
Out[24]: '123458'

Also what if the value rounds up past 1 second? Receiver needs to perform the carry.

e.g. round_millis('9999999', 6) -> '1000000'.

trino/client.py Outdated
pattern += ".%f"
self.pattern = pattern
self.time_size = 9 + ms_size - ms_to_trim
def round_millis(value: str, precision: int) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be unit-testable and reusable if it returns int and conversion to str is left to caller.

trino/client.py Outdated

class TimeValueMapper(ValueMapper[time]):
def __init__(self, precision):
self.time_default_size = 9 # size of 'HH:MM:SS.' (the time string up to the milliseconds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include the . in the length. I assume to make it easier to use slicing?

@mdesmet mdesmet force-pushed the bug/none-values-experimental-python-types branch from 35a6fa6 to 4649f02 Compare November 10, 2022 09:34
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Timestamp changes have been extracted.

@hashhar hashhar merged commit dae4c41 into trinodb:master Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants