Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix categorical column after sequence_index column issue #357

Merged
merged 10 commits into from
Dec 16, 2021
Merged

Conversation

fealho
Copy link
Member

@fealho fealho commented Mar 23, 2021

Resolve #314.

@fealho fealho requested review from csala and pvk-developer March 23, 2021 02:17
tests/integration/timeseries/test_par.py Outdated Show resolved Hide resolved
@@ -67,7 +67,8 @@ def _fit(self, timeseries_data):

data_types = list()
context_types = list()
for field, meta in self._metadata.get_fields().items():
for field in self._entity_columns + self._data_columns:
meta = self._metadata.get_fields()[field]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would capture the fields_metadata in a variable before the loop to avoid having to call self._metadata.get_fields() at each iteration.

@@ -67,7 +67,8 @@ def _fit(self, timeseries_data):

data_types = list()
context_types = list()
for field, meta in self._metadata.get_fields().items():
for field in self._entity_columns + self._data_columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will possibly not work because the order of the columns will be altered, and also we would be missing the context_columns.

If the order of the key/value pairs from the self._metadata.get_fields() is the problem, maybe a possibility would be to just iterate over self._output_columns (which is the list of columns from the input data)?

Then, in order to solve the sequence_index problem, we could change line 74 (from the old code):

                if field == self._sequence_index:
                    data_types.append('continuous')

to

                if field == self._sequence_index:
                    data_types.extend(['continuous', 'continuous'])

And then just remove line 82 (from the old code).

def test_column_after_date():
"""Test that adding columns after the `sequence_index` column works."""
date = datetime.datetime.strptime('2020-01-01', '%Y-%m-%d')
daily_timeseries = pd.DataFrame({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth to make this test slightly more complex, so there are multiple data types and both entity columns and context columns.

@fealho fealho requested a review from a team as a code owner September 8, 2021 04:02
@fealho fealho requested a review from csala September 8, 2021 14:44
sdv/timeseries/deepecho.py Outdated Show resolved Hide resolved
})

model = PAR(entity_columns=['col'], sequence_index='date', epochs=1)
model.fit(daily_timeseries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be validating a bit more? For example, validate that the output types are actually right

@fealho fealho requested review from csala and pvk-developer and removed request for pvk-developer October 6, 2021 19:55
sdv/timeseries/deepecho.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #357 (9678f8d) into master (643de0a) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #357   +/-   ##
=======================================
  Coverage   65.01%   65.01%           
=======================================
  Files          34       34           
  Lines        2590     2590           
=======================================
  Hits         1684     1684           
  Misses        906      906           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 643de0a...9678f8d. Read the comment docs.

Copy link
Member

@pvk-developer pvk-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@fealho fealho merged commit f87f503 into master Dec 16, 2021
@fealho fealho deleted the par-wrong-type branch December 16, 2021 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical column after sequence_index column
4 participants