Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialization Errors for Float and Double Lists #1640

Closed
Agent007 opened this issue Jun 11, 2021 · 9 comments
Closed

Materialization Errors for Float and Double Lists #1640

Agent007 opened this issue Jun 11, 2021 · 9 comments

Comments

@Agent007
Copy link
Contributor

Expected Behavior

1-hot encoded features should be materializeable to the online store without error.

Current Behavior

Version 0.10 apparently introduced a regression where 1-hot encoded feature materialization throws errors.

Steps to reproduce

from datetime import datetime, timedelta
from feast import Entity, Feature, FeatureStore, FeatureView, FileSource, ValueType
import numpy as np
import pandas as pd
from pathlib import Path
df = pd.DataFrame(columns=['device_type', 'encoding', 'datetime'])
df['device_type'] = ['A', 'B']
df['encoding'] = [
    np.array([0.0, 1.0], dtype=np.float32),
    np.array([1.0, 0.0], dtype=np.float32)
]
df['datetime'] = pd.to_datetime([datetime.now(), datetime.now()])
device_encodings_path_str = 'data/device_encodings.parquet'
device_encodings_path = Path(device_encodings_path_str)
device_encodings_abs_path_str = str(device_encodings_path.absolute())
df.to_parquet(device_encodings_abs_path_str)
device_encodings = FileSource(path=device_encodings_abs_path_str,
                              event_timestamp_column='datetime')
device_type = Entity('device_type',
                     value_type=ValueType.STRING,
                     description='')
device_encoding_view = FeatureView(
    name="device_encoding",
    entities=["device_type"],
    features=[
        Feature(name="encoding", dtype=ValueType.FLOAT_LIST),
    ],
    ttl=timedelta(minutes=60),
    input=device_encodings,
    online=True,
)
fm = FeatureStore(repo_path=".")
fm.apply([device_type, device_encoding_view])
fm.materialize(start_date=datetime(2021, 1, 1), end_date=datetime.now())

The above code yields the below error output:

Materializing 2 feature views from 2021-01-01 00:00:00-08:00 to 2021-06-10 15:24:51-07:00 into the sqlite online store.
driver_hourly_stats:
100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2920.42it/s]
device_encoding:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/telemetry.py", line 155, in exception_logging_wrapper
    result = func(*args, **kwargs)
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 445, in materialize
    provider.materialize_single_feature_view(
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/infra/local.py", line 200, in materialize_single_feature_view
    rows_to_write = _convert_arrow_to_proto(table, feature_view, join_keys)
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/infra/provider.py", line 301, in _convert_arrow_to_proto
    value = python_value_to_proto_value(row[idx], feature.dtype)
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/type_map.py", line 339, in python_value_to_proto_value
    return _python_value_to_proto_value(value_type, value)
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/type_map.py", line 225, in _python_value_to_proto_value
    val=[
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/type_map.py", line 228, in <listcomp>
    else _type_err(item, np.float64)
  File "/usr/local/anaconda3/envs/feast/lib/python3.8/site-packages/feast/type_map.py", line 191, in _type_err
    raise ValueError(f'Value "{item}" is of type {type(item)} not of type {dtype}')
ValueError: Value "0.0" is of type <class 'float'> not of type <class 'numpy.float64'>

Specifications

  • Version: 0.10.7
  • Platform: local mode
  • Subsystem: Python SDK
@achals achals added kind/bug good first issue Good for newcomers labels Jun 11, 2021
@karlhigley
Copy link

This specific issue was resolved in #1628, but also exists for integer list features.

@woop
Copy link
Member

woop commented Jul 5, 2021

Thanks for raising this folks. Will schedule for development.

@karlhigley
Copy link

@woop 👍🏻

It's worth mentioning that empty lists also seem to be problematic with the type map.

@woop
Copy link
Member

woop commented Jul 6, 2021

Thanks @karlhigley. We'll need to add a more comprehensive test to get a better idea of our type coverage.

@judahrand
Copy link
Member

@woop 👍🏻

It's worth mentioning that empty lists also seem to be problematic with the type map.

This seems like a much less obvious/trivial fix. How do you infer a type when you lack any information? I'd be be more than happy to take a look at it if someone has ideas or a suggestion.

@karlhigley
Copy link

I have fixes for the issues I've encountered, but haven't yet lined up the CLA through my company in order to submit a PR. The gist is that when you convert from Python values to proto values, you can default to the stated feature type if you can't infer a type. That default behavior is currently only applied if the feature value is None, but should probably also apply to empty lists.

Making that change also implies that the conversion from Feast value types back to Python types has to change a little bit too, because then sometimes the output of MessageToDict is an empty dictionary. For list fields, it's sufficient to default the value associated with the "val" key to an empty list. (No change required for scalar fields.)

@judahrand
Copy link
Member

judahrand commented Sep 18, 2021

@karlhigley Can you provide a test case in which the conversion from Feast back to Python fails? You can look at my PR to find an example of a test for this issue that I added.

@karlhigley
Copy link

@judahrand The test I'd try would be to round-trip an empty list through the type map for int and float list fields.

@judahrand
Copy link
Member

This seems fixed now - at least with the test case provided. Can we close? @woop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants