Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Timezone as Arc<str> #3976

Merged
merged 5 commits into from
Mar 30, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 29, 2023

Which issue does this PR close?

Relates to #3955

Rationale for this change

Timezones are very often constant over a large number of types, e.g. if part of a session global default timezone. Maintaining separate allocations for every timestamp is therefore incredibly wasteful.

I originally wanted to propose storing Tz but this led to two major issues:

  • DataType: Ord and it isn't clear what this would mean
  • It would prevent roundtripping unrecognised timezones

Switching to Arc<str> allows reusing the same allocation whilst also reducing the size of the variant

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold added the api-change Changes to the arrow API label Mar 29, 2023
@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Mar 29, 2023
@tustvold
Copy link
Contributor Author

This, combined with #3965 and corresponding changes to the Union DataType we can get DataType down from 56 bytes to 24

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good idea to me.

To ease the API transition for people can we please add some doc strings / examples for how to construct such a variant? If you are into Arc<str> it seems obvious but I remember it was somewhat confusing at first for a Rust newcomer

Perhaps we could add convenience methods to construct these variants like

impl DataType {
  fn new_timestamp(timeunit: .., timezone: impl Into<Arc<str>>) -> Self {
  }

Or something

cc @maxburke and @jhorstmann and @viirya and @waitingkuo given this is a non trivial API change

@tustvold
Copy link
Contributor Author

Perhaps we could add convenience methods to construct these variants like

I'd rather avoid these as it results in a slightly confused API surface, but I will add some documentation to the variant itself

@tustvold tustvold merged commit 9a4374f into apache:master Mar 30, 2023
@tustvold tustvold mentioned this pull request Mar 30, 2023
@alamb
Copy link
Contributor

alamb commented Mar 30, 2023

I'd rather avoid these as it results in a slightly confused API surface, but I will add some documentation to the variant itself

Adding docs is a good start -- and we can always add a new method in the future if that would help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants