Skip to content

Conversation

@davidhcoe
Copy link
Contributor

Introduces a new setting to set the maximum timestamp precision to Microsecond. Setting this value will convert the default Nanosecond value to Microsecond to avoid the overflow that occurs when a date is before the year 1678 or after 2262.

Provides a fix for #2811 by creating a workaround that can be set by the caller.

@davidhcoe
Copy link
Contributor Author

Based on the response to snowflakedb/gosnowflake#1430, it didn't sound like a fix was going to happen in the gosnowflake driver, so I added this setting to work around the known issue with dates.

@davidhcoe davidhcoe marked this pull request as ready for review June 3, 2025 04:29
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone Jun 3, 2025
@davidhcoe davidhcoe changed the title feat(go/adbc/driver/snowflake): New setting to enable the maximum timestamp precision to microseconds feat(go/adbc/driver/snowflake): New setting to set the maximum timestamp precision to microseconds Jun 3, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel especially qualified to review the Go code, but have left some feedback.


``adbc.snowflake.sql.client_option.use_max_microseconds_precision``
When ``true``, nanoseconds will be converted to microseconds
to avoid the overflow of the timestamp type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines use tabs where everything else in this file is spaces. I suspect this is the cause of the checkin test failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not affect time values, only timestamp values. It might be worth calling that out here in the documentation.

public bool UseHighPrecision { get; set; } = true;

/// <summary>
/// The Snowflake setting to only have a max timestamp precision of microseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The Snowflake setting to only have a max timestamp precision of microseconds
/// The Snowflake setting to have a max timestamp precision of only microseconds

@davidhcoe
Copy link
Contributor Author

@zeroshade or @lidavidm - any other input on this one?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to summarize this issue as "Snowflake returns nanosecond-precision timestamps, but in a representation with a range beyond that of Arrow timestamp[ns]"?

Is there a way we can detect overflow in the nanosecond case? Silent data corruption isn't great, it would be nice if we could return StatusInvalidData and instruct the user to change the option

@davidhcoe
Copy link
Contributor Author

Is it correct to summarize this issue as "Snowflake returns nanosecond-precision timestamps, but in a representation with a range beyond that of Arrow timestamp[ns]"?

Is there a way we can detect overflow in the nanosecond case? Silent data corruption isn't great, it would be nice if we could return StatusInvalidData and instruct the user to change the option

I started down this path. The challenge is you specify the schema details before you know the values that would cause the overflow, and I wasn’t finding a good way to go back and change the schema details.

@lidavidm
Copy link
Member

lidavidm commented Jun 5, 2025

I started down this path. The challenge is you specify the schema details before you know the values that would cause the overflow, and I wasn’t finding a good way to go back and change the schema details.

I'm not talking about automatically changing the schema, just detecting overflow and erroring

@lidavidm
Copy link
Member

lidavidm commented Jun 5, 2025

If that's not possible that's not possible - but if we could do that that would be a better experience than silent corruption, IMO. (Or again, this is why I ask if the option should be a boolean or something more general - maybe the user is OK with disabling overflow checks)

@davidhcoe
Copy link
Contributor Author

If that's not possible that's not possible - but if we could do that that would be a better experience than silent corruption, IMO. (Or again, this is why I ask if the option should be a boolean or something more general - maybe the user is OK with disabling overflow checks)

I added this in the latest push. I decided that default behavior should be to not throw an error, since that's what the driver does today, but to give the option to do the enforcement.

@lidavidm
Copy link
Member

lidavidm commented Jun 5, 2025

So again, this is why I keep asking if the option should really be a boolean :)

Since the new option is only effective when we are asking for nanoseconds, it's effectively 3 possible states modeled with 4 possible configurations. Either we should always be checking for overflow regardless of the type or we should have a single option for nanoseconds, nanoseconds (but error on overflow), or microseconds.

@davidhcoe
Copy link
Contributor Author

So again, this is why I keep asking if the option should really be a boolean :)

Since the new option is only effective when we are asking for nanoseconds, it's effectively 3 possible states modeled with 4 possible configurations. Either we should always be checking for overflow regardless of the type or we should have a single option for nanoseconds, nanoseconds (but error on overflow), or microseconds.

Overflow only applies to large date ranges (before year 1677 or after 2262) that use nanoseconds. The native Go behavior is to just overflow to the wrong value (which is weird, tbh) so the only options are:

1.) Leave things alone
2.) Strictly enforce the integrity of the data and throw an error when the overflow will happen
3.) Use microseconds to avoid 1 and 2

@lidavidm
Copy link
Member

lidavidm commented Jun 5, 2025

So I think we agree then: there are only 3 possibilities, but it's being modeled with two booleans (4 possibilities)

@davidhcoe
Copy link
Contributor Author

So I think we agree then: there are only 3 possibilities, but it's being modeled with two booleans (4 possibilities)

ok. I will rework it to an enum

@davidhcoe
Copy link
Contributor Author

anything else needed here @lidavidm ?

@lidavidm
Copy link
Member

lidavidm commented Jun 6, 2025

@zeroshade any final comments?

@davidhcoe
Copy link
Contributor Author

can we merge this?

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lack of response here. This looks good, @lidavidm already covered everything I was thinking of.

@zeroshade zeroshade merged commit be059d6 into apache:main Jun 7, 2025
45 checks passed
@davidhcoe davidhcoe deleted the dev/snowflake-large-dates branch October 24, 2025 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants