Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update Feast object metadata in the registry #4257

Merged

Conversation

msistla96
Copy link
Contributor

@msistla96 msistla96 commented Jun 5, 2024

What this PR does / why we need it:

This PR is to fix an issue observed when updating a feature view: Thecreated_timestamp and materialization_intervals values already present in the SQL registry for the feature view are overwritten by the respective values created when feature view is updated, which is None. So when one queries the feature view from the registry, it return None for both these fields.

The fix requires updating the apply_object function to have the previously stored values of the metadata fields like created_timestamp etc assigned to the feast objects respective fields, before it gets updated by the SQL registry.
Hence this change has been made for the relevant Feast objects like Entity, SavedDataset, FeatureService, FeatureView,OnDemandFeatureView,StreamFeatureView.
This change has also been applied for all other registries other than SQL.

Which issue(s) this PR fixes:

Fixes

@tokoko
Copy link
Collaborator

tokoko commented Jun 6, 2024

Hey, thanks for raising this. Just curious, maybe you have taken a look, is this the case only for sql registry? why isn't the same happening for file-based registries?

@msistla96
Copy link
Contributor Author

@tokoko I noticed this for the SQL registry as our team uses it, but I'm not sure if this is happening for other registry types as well. If other registries do require this change, it would be separate changes to each of their apply methods from what I've seen so far.

assert len(feature_services) == 1
assert feature_services[0] == fs

test_registry.delete_feature_service("my_feature_service_1", project)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you be doing this? If you delete and apply again, of course created_timestamp will be regenerated. Isn't the whole point to assert that it's unchanged even if new object is applied when during apply an object already exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, this was supposed to be present for test_apply_feature_service_success only. I'll remove this.

assert feature_services[0] == fs
# The created_timestamp for the feature service should be set to the created_timestamp value stored from the
# previous apply
assert feature_services[0].created_timestamp is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop delete above, then I think this should be an equality check to assert that it's not changed. is not None doesn't seem to be checking much, unless I'm missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I initially wrote the code with the mindset that it shouldn't return a None when you do an apply, but it would be better to check this against the previous feature view's created timestamp as well.

@tokoko
Copy link
Collaborator

tokoko commented Jun 6, 2024

@msistla96 sure, I quickly took a look and this doesn't seem to be handled anywhere. That's why I find it strange that the newly added tests in test_universal_registry aren't failing for file-based registries.

Signed-off-by: msistla96 <msistla@expediagroup.com>
@msistla96 msistla96 force-pushed the update-feast-object-metadata branch from ad0ed8f to f8d3c85 Compare June 9, 2024 03:41
@msistla96
Copy link
Contributor Author

msistla96 commented Jun 9, 2024

@msistla96 sure, I quickly took a look and this doesn't seem to be handled anywhere. That's why I find it strange that the newly added tests in test_universal_registry aren't failing for file-based registries.

So the tests worked because for the other registries, the created_timestamp value does get set technically but it shouldn't for each time a Feast object gets updated. I've updated the code for all the other registries failing the test and also the tests as well.
I've also had to include an additional check to see if materialization_intervals should be updated or not, based on whether the feature view was materialized before or not.

Signed-off-by: msistla96 <msistla@expediagroup.com>
Signed-off-by: msistla96 <msistla@expediagroup.com>
@msistla96 msistla96 force-pushed the update-feast-object-metadata branch from af1c94d to ca88553 Compare June 9, 2024 03:55
@tokoko
Copy link
Collaborator

tokoko commented Jun 11, 2024

@msistla96 thanks. this is looking good. Can you add materialization_intervals test in test_universal_registry as well? You don't have to actually run anything, just call apply_materialization on test_registry.
@jeremyary Can you take a look why tests aren't running?

@msistla96 msistla96 changed the title fix: Update Feast object metadata in the SQL registry fix: Update Feast object metadata in the registry Jun 11, 2024
Signed-off-by: msistla96 <msistla@expediagroup.com>
@msistla96
Copy link
Contributor Author

msistla96 commented Jun 12, 2024

@tokoko, while adding the tests you mentioned, I noticed two bugs with Remote Registry:

  1. During apply_materialization, when Remote Registry converts the start and end dates to Timestamp before making the request to Registry Server here, Timestamp doesn't preserve the timezone information. So when RegistryServer converts them back to dates before calling the proxy apply_materialization here, the resultant date is timezone naive, i.e: it assumes the date relative to the local timezone rather than UTC timezone because of the mentioned reason. I made a change to RegistryServer in this PR to fix this as the integration tests were failing for Remote Registry.
    (FYI, this issue wrt Timestamp was raised long ago but Timestamp is not meant to store timezone info in the first place from what I understood)

To reproduce Timestamp's behavior:

from google.protobuf.timestamp_pb2 import Timestamp
from datetime import datetime
from pytz import utc

current_date = datetime.utcnow().replace(tzinfo=utc)
print("Current date: ",current_date, ", timezone: ",current_date.tzinfo)
sample_timestamp = Timestamp()
sample_timestamp.FromDatetime(current_date)

converted_date = datetime.fromtimestamp(
    sample_timestamp.seconds + sample_timestamp.nanos / 1e9)

print("Converted date: ", converted_date, ", timezone: ",converted_date.tzinfo)
  1. apply_materialization fails for Stream Feature Views. ApplyMaterializationRequest for RegistryServer doesn't have Stream Feature View as a property even though ApplyFeatureViewRequest does. Fixing this is out of scope for this PR, so for now I've omitted test cases for materializing stream feature views to prevent the integration tests from failing. I added a TODO in the file, but let me know if you want me to raise this as a bug as well.

and len(updated_feature_view.materialization_intervals) == 0
)

# Simulate materialization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put this before test_registry.apply_feature_view(updated_fv1, project) to make sure that an apply operation doesn't mess with materialization intervals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: msistla96 <msistla@expediagroup.com>
@tokoko
Copy link
Collaborator

tokoko commented Jun 12, 2024

2. apply_materialization fails for Stream Feature Views. ApplyMaterializationRequest for RegistryServer doesn't have Stream Feature View as a property even though ApplyFeatureViewRequest does. Fixing this is out of scope for this PR, so for now I've omitted test cases for materializing stream feature views to prevent the integration tests from failing. I added a TODO in the file, but let me know if you want me to raise this as a bug as well.

I had the same bug in ApplyFeatureViewRequest and fixed it later on, missed ApplyMaterializationRequest apparently. Yup, a separate bug ticket for this would be good. thanks for catching this.

…mpty only

Signed-off-by: msistla96 <msistla@expediagroup.com>
@msistla96
Copy link
Contributor Author

msistla96 commented Jun 12, 2024

@tokoko Could I have another review please? I realised that I missed a test case when apply_materialization adds the materialization_intervals which shouldn't call update_materialization_intervals, which I added in the last commit.

@tokoko
Copy link
Collaborator

tokoko commented Jun 12, 2024

@msistla96 that's an interesting corner case. Why do you think that should be the desired behavior? That actually would sort of address an issue raised here #4222, but I'm still unsure. We would effectively be allowing materialization intervals to be updates in two ways (apply_materialization and apply_feature_view).

@msistla96
Copy link
Contributor Author

msistla96 commented Jun 12, 2024

@msistla96 that's an interesting corner case. Why do you think that should be the desired behavior? That actually would sort of address an issue raised here #4222, but I'm still unsure. We would effectively be allowing materialization intervals to be updates in two ways (apply_materialization and apply_feature_view).

apply_materialization takes care of updating materialization_intervals, but apply_feature_view will ensure the materialization_interval stored doesn't get erased( the only time we would erase it is if the feature view was created for the first time and that becomes a separate feature view anyway). I think renaming update_materialization_intervals to copy_materialization_intervals would avoid this confusion. Does that make sense?

@EXPEbdodla
Copy link
Contributor

  1. apply_materialization fails for Stream Feature Views. ApplyMaterializationRequest for RegistryServer doesn't have Stream Feature View as a property even though ApplyFeatureViewRequest does. Fixing this is out of scope for this PR, so for now I've omitted test cases for materializing stream feature views to prevent the integration tests from failing. I added a TODO in the file, but let me know if you want me to raise this as a bug as well.

I had the same bug in ApplyFeatureViewRequest and fixed it later on, missed ApplyMaterializationRequest apparently. Yup, a separate bug ticket for this would be good. thanks for catching this.

@tokoko What is the purpose/ use case of calling apply_materialization for StreamFeatureViews? I'm assuming it mostly make sense for Feature Views only.

@tokoko
Copy link
Collaborator

tokoko commented Jun 13, 2024

@EXPEbdodla yeah, you're right, my bad. I've had a similar use (not with feast) case when I needed to update online store both directly and from offline with different datasets, but until we can practically support something like that, there's no point.

@msistla96
Copy link
Contributor Author

@tokoko I see Registry support Stream Feature Views for apply_materialization currently, so I'm a bit confused about your comment on not having it. Are there any changes required for this PR?

@tokoko
Copy link
Collaborator

tokoko commented Jun 13, 2024

@msistla96 Technically it is, but as feast workflow for stream feature views doesn't really contain materialization from offline to online, it's never used. It would be impossible to restrict it in signature with types only as StreamFeatureView extends FeatureView instead of BaseFeatureView (a historical quirk that probably needs to be revisited at some point).

I was still unsure about the update we discussed above, but you're probably right, overwriting with new materialization intervals if provided seems as good as any other option.

Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix

@msistla96
Copy link
Contributor Author

@tokoko Let me know when this would be merged.

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@franciscojavierarceo franciscojavierarceo merged commit 8028ae0 into feast-dev:master Jun 19, 2024
17 checks passed
nick-amaya-sp pushed a commit to sailpoint/feast that referenced this pull request Jul 23, 2024
franciscojavierarceo pushed a commit that referenced this pull request Jul 31, 2024
# [0.40.0](v0.39.0...v0.40.0) (2024-07-31)

### Bug Fixes

* Added missing type ([#4315](#4315)) ([86af60a](86af60a))
* Avoid XSS attack from Jinjin2's Environment(). ([#4355](#4355)) ([40270e7](40270e7))
* CGO Memory leak issue in GO Feature server ([#4291](#4291)) ([43e198f](43e198f))
* Deprecated the datetime.utcfromtimestamp(). ([#4306](#4306)) ([21deec8](21deec8))
* Fix SQLite import issue ([#4294](#4294)) ([398ea3b](398ea3b))
* Increment operator to v0.39.0 ([#4368](#4368)) ([3ddb4fb](3ddb4fb))
* Minor typo in the unit test. ([#4296](#4296)) ([6c75e84](6c75e84))
* OnDemandFeatureView type inference for array types ([#4310](#4310)) ([c45ff72](c45ff72))
* Remove redundant batching in PostgreSQLOnlineStore.online_write_batch and fix progress bar ([#4331](#4331)) ([0d89d15](0d89d15))
* Remove typo. ([#4351](#4351)) ([92d17de](92d17de))
* Retire the datetime.utcnow(). ([#4352](#4352)) ([a8bc696](a8bc696))
* Update dask version to support pandas 1.x ([#4326](#4326)) ([a639d61](a639d61))
* Update Feast object metadata in the registry ([#4257](#4257)) ([8028ae0](8028ae0))
* Using one single function call for utcnow(). ([#4307](#4307)) ([98ff63c](98ff63c))

### Features

* Add async feature retrieval for Postgres Online Store ([#4327](#4327)) ([cea52e9](cea52e9))
* Add Async refresh to Sql Registry ([#4251](#4251)) ([f569786](f569786))
* Add SingleStore as an OnlineStore ([#4285](#4285)) ([2c38946](2c38946))
* Add Tornike to maintainers.md ([#4339](#4339)) ([8e8c1f2](8e8c1f2))
* Bump psycopg2 to psycopg3 for all Postgres components ([#4303](#4303)) ([9451d9c](9451d9c))
* Entity key deserialization ([#4284](#4284)) ([83fad15](83fad15))
* Ignore paths feast apply ([#4276](#4276)) ([b4d54af](b4d54af))
* Move get_online_features to OnlineStore interface ([#4319](#4319)) ([7072fd0](7072fd0))
* Port mssql contrib offline store to ibis ([#4360](#4360)) ([7914cbd](7914cbd))

### Reverts

* Revert "fix: Avoid XSS attack from Jinjin2's Environment()." ([#4357](#4357)) ([cdeab48](cdeab48)), closes [#4355](#4355)
redhatHameed pushed a commit to RHEcosystemAppEng/feast that referenced this pull request Aug 5, 2024
# [0.40.0](feast-dev/feast@v0.39.0...v0.40.0) (2024-07-31)

### Bug Fixes

* Added missing type ([feast-dev#4315](feast-dev#4315)) ([86af60a](feast-dev@86af60a))
* Avoid XSS attack from Jinjin2's Environment(). ([feast-dev#4355](feast-dev#4355)) ([40270e7](feast-dev@40270e7))
* CGO Memory leak issue in GO Feature server ([feast-dev#4291](feast-dev#4291)) ([43e198f](feast-dev@43e198f))
* Deprecated the datetime.utcfromtimestamp(). ([feast-dev#4306](feast-dev#4306)) ([21deec8](feast-dev@21deec8))
* Fix SQLite import issue ([feast-dev#4294](feast-dev#4294)) ([398ea3b](feast-dev@398ea3b))
* Increment operator to v0.39.0 ([feast-dev#4368](feast-dev#4368)) ([3ddb4fb](feast-dev@3ddb4fb))
* Minor typo in the unit test. ([feast-dev#4296](feast-dev#4296)) ([6c75e84](feast-dev@6c75e84))
* OnDemandFeatureView type inference for array types ([feast-dev#4310](feast-dev#4310)) ([c45ff72](feast-dev@c45ff72))
* Remove redundant batching in PostgreSQLOnlineStore.online_write_batch and fix progress bar ([feast-dev#4331](feast-dev#4331)) ([0d89d15](feast-dev@0d89d15))
* Remove typo. ([feast-dev#4351](feast-dev#4351)) ([92d17de](feast-dev@92d17de))
* Retire the datetime.utcnow(). ([feast-dev#4352](feast-dev#4352)) ([a8bc696](feast-dev@a8bc696))
* Update dask version to support pandas 1.x ([feast-dev#4326](feast-dev#4326)) ([a639d61](feast-dev@a639d61))
* Update Feast object metadata in the registry ([feast-dev#4257](feast-dev#4257)) ([8028ae0](feast-dev@8028ae0))
* Using one single function call for utcnow(). ([feast-dev#4307](feast-dev#4307)) ([98ff63c](feast-dev@98ff63c))

### Features

* Add async feature retrieval for Postgres Online Store ([feast-dev#4327](feast-dev#4327)) ([cea52e9](feast-dev@cea52e9))
* Add Async refresh to Sql Registry ([feast-dev#4251](feast-dev#4251)) ([f569786](feast-dev@f569786))
* Add SingleStore as an OnlineStore ([feast-dev#4285](feast-dev#4285)) ([2c38946](feast-dev@2c38946))
* Add Tornike to maintainers.md ([feast-dev#4339](feast-dev#4339)) ([8e8c1f2](feast-dev@8e8c1f2))
* Bump psycopg2 to psycopg3 for all Postgres components ([feast-dev#4303](feast-dev#4303)) ([9451d9c](feast-dev@9451d9c))
* Entity key deserialization ([feast-dev#4284](feast-dev#4284)) ([83fad15](feast-dev@83fad15))
* Ignore paths feast apply ([feast-dev#4276](feast-dev#4276)) ([b4d54af](feast-dev@b4d54af))
* Move get_online_features to OnlineStore interface ([feast-dev#4319](feast-dev#4319)) ([7072fd0](feast-dev@7072fd0))
* Port mssql contrib offline store to ibis ([feast-dev#4360](feast-dev#4360)) ([7914cbd](feast-dev@7914cbd))

### Reverts

* Revert "fix: Avoid XSS attack from Jinjin2's Environment()." ([feast-dev#4357](feast-dev#4357)) ([cdeab48](feast-dev@cdeab48)), closes [feast-dev#4355](feast-dev#4355)
shuchu pushed a commit to shuchu/feast that referenced this pull request Aug 14, 2024
# [0.40.0](feast-dev/feast@v0.39.0...v0.40.0) (2024-07-31)

### Bug Fixes

* Added missing type ([feast-dev#4315](feast-dev#4315)) ([86af60a](feast-dev@86af60a))
* Avoid XSS attack from Jinjin2's Environment(). ([feast-dev#4355](feast-dev#4355)) ([40270e7](feast-dev@40270e7))
* CGO Memory leak issue in GO Feature server ([feast-dev#4291](feast-dev#4291)) ([43e198f](feast-dev@43e198f))
* Deprecated the datetime.utcfromtimestamp(). ([feast-dev#4306](feast-dev#4306)) ([21deec8](feast-dev@21deec8))
* Fix SQLite import issue ([feast-dev#4294](feast-dev#4294)) ([398ea3b](feast-dev@398ea3b))
* Increment operator to v0.39.0 ([feast-dev#4368](feast-dev#4368)) ([3ddb4fb](feast-dev@3ddb4fb))
* Minor typo in the unit test. ([feast-dev#4296](feast-dev#4296)) ([6c75e84](feast-dev@6c75e84))
* OnDemandFeatureView type inference for array types ([feast-dev#4310](feast-dev#4310)) ([c45ff72](feast-dev@c45ff72))
* Remove redundant batching in PostgreSQLOnlineStore.online_write_batch and fix progress bar ([feast-dev#4331](feast-dev#4331)) ([0d89d15](feast-dev@0d89d15))
* Remove typo. ([feast-dev#4351](feast-dev#4351)) ([92d17de](feast-dev@92d17de))
* Retire the datetime.utcnow(). ([feast-dev#4352](feast-dev#4352)) ([a8bc696](feast-dev@a8bc696))
* Update dask version to support pandas 1.x ([feast-dev#4326](feast-dev#4326)) ([a639d61](feast-dev@a639d61))
* Update Feast object metadata in the registry ([feast-dev#4257](feast-dev#4257)) ([8028ae0](feast-dev@8028ae0))
* Using one single function call for utcnow(). ([feast-dev#4307](feast-dev#4307)) ([98ff63c](feast-dev@98ff63c))

### Features

* Add async feature retrieval for Postgres Online Store ([feast-dev#4327](feast-dev#4327)) ([cea52e9](feast-dev@cea52e9))
* Add Async refresh to Sql Registry ([feast-dev#4251](feast-dev#4251)) ([f569786](feast-dev@f569786))
* Add SingleStore as an OnlineStore ([feast-dev#4285](feast-dev#4285)) ([2c38946](feast-dev@2c38946))
* Add Tornike to maintainers.md ([feast-dev#4339](feast-dev#4339)) ([8e8c1f2](feast-dev@8e8c1f2))
* Bump psycopg2 to psycopg3 for all Postgres components ([feast-dev#4303](feast-dev#4303)) ([9451d9c](feast-dev@9451d9c))
* Entity key deserialization ([feast-dev#4284](feast-dev#4284)) ([83fad15](feast-dev@83fad15))
* Ignore paths feast apply ([feast-dev#4276](feast-dev#4276)) ([b4d54af](feast-dev@b4d54af))
* Move get_online_features to OnlineStore interface ([feast-dev#4319](feast-dev#4319)) ([7072fd0](feast-dev@7072fd0))
* Port mssql contrib offline store to ibis ([feast-dev#4360](feast-dev#4360)) ([7914cbd](feast-dev@7914cbd))

### Reverts

* Revert "fix: Avoid XSS attack from Jinjin2's Environment()." ([feast-dev#4357](feast-dev#4357)) ([cdeab48](feast-dev@cdeab48)), closes [feast-dev#4355](feast-dev#4355)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants