Make pipelines aware of a timezone configuration #249

roelschr · 2020-09-23T18:16:38Z

Why? 📖

While Spark's TimestampType timezone is controlled by the spark.sql.session.timeZone configuration option, python's datetime objects have their timezone controlled by the system's timezone (when they don't have a fixed tz suffix). This means some transformations can have their timestamps converted in different ways when running on different systems.

An example of possible irregular results happens when we automatically set the start_date of AggregatedFeatureSets (here). Sometimes the spark and the system can have different timezones, meaning that the timestamp coming from the spark dataframe, when collected into plain python as a datetime object can change, generating a start_date different then expected.

What? 🔧

This PR proposes to apply a timezone configuration that should be aware by each pipeline and that should be the same between spark and system. This timezone is configurable.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How everything was tested? 📏

TODO.

Checklist

My code follows the style guidelines of this project (docstrings, type hinting and linter compliance);
I have performed a self-review of my own code;
I have made corresponding changes to the documentation;
I have added tests that prove my fix is effective or that my feature works;
New and existing unit tests pass locally with my changes;
Add labels to distinguish the type of pull request. Available labels are bug, enhancement, feature, and review.

Attention Points ⚠️

Replace me for what the reviewer will need to pay attention to in the PR or just to cover any concerns after the merge.

sonarcloud · 2020-09-23T18:27:26Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

No Coverage information
0.0% Duplication

AlvaroMarquesAndrade

LGTM! You could, however, add a test. What do you think?

rafaelleinio · 2020-10-06T17:39:36Z

butterfree/pipelines/feature_set_pipeline.py

+        timezone: timestamp feature transformations will assume this timezone 
+            when they don't have a tz suffix.


I'd just inform here that spark con config (spark.sql.session.timeZone) and an env variable (TZ) will be set with this value.

lilbee101920 · 2021-02-26T16:53:15Z

butterfree/pipelines/feature_set_pipeline.py

@@ -1,4 +1,6 @@
 """FeatureSetPipeline entity."""
+import os


roelschr added the enhancement New feature or request label Sep 23, 2020

roelschr requested a review from a team as a code owner September 23, 2020 18:16

roelschr self-assigned this Sep 23, 2020

add timezone property to pipeline

082d134

roelschr force-pushed the roelschr/add-timezone-config branch from e3d4e24 to 082d134 Compare September 23, 2020 18:26

AlvaroMarquesAndrade approved these changes Sep 23, 2020

View reviewed changes

roelschr added the work in progress Work in progress label Sep 23, 2020

rafaelleinio reviewed Oct 6, 2020

View reviewed changes

lilbee101920 approved these changes Feb 26, 2021

View reviewed changes

butterfree/pipelines/feature_set_pipeline.py

@@ -1,4 +1,6 @@

"""FeatureSetPipeline entity."""

import os

Copy link

lilbee101920 Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14.4

lilbee101920 approved these changes Feb 26, 2021

View reviewed changes

albjoaov closed this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pipelines aware of a timezone configuration #249

Make pipelines aware of a timezone configuration #249

roelschr commented Sep 23, 2020 •

edited

Loading

sonarcloud bot commented Sep 23, 2020

AlvaroMarquesAndrade left a comment

rafaelleinio Oct 6, 2020

lilbee101920 Feb 26, 2021

		timezone: timestamp feature transformations will assume this timezone
		when they don't have a tz suffix.

Make pipelines aware of a timezone configuration #249

Make pipelines aware of a timezone configuration #249

Conversation

roelschr commented Sep 23, 2020 • edited Loading

Why? 📖

What? 🔧

Type of change

How everything was tested? 📏

Checklist

Attention Points ⚠️

sonarcloud bot commented Sep 23, 2020

AlvaroMarquesAndrade left a comment

Choose a reason for hiding this comment

rafaelleinio Oct 6, 2020

Choose a reason for hiding this comment

lilbee101920 Feb 26, 2021

Choose a reason for hiding this comment

roelschr commented Sep 23, 2020 •

edited

Loading