Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove feature specs being able to declare their serving or warehouse stores #159

Merged
merged 17 commits into from
May 9, 2019

Conversation

tims
Copy link
Contributor

@tims tims commented Mar 21, 2019

Currently a Feature can declare it's own data stores in the feature spec, which must have been registered with Core before hand.. This adds a lot of complexity to declaring features and is very error prone.

Instead we should have the data stores dictated by Core, and feature specs should know nothing about them.

This means that a Feast deployment will now only be able to have 1 serving store and 1 warehouse store at a time.

Some things to note:

This changes the way features can configure some settings. For example redis expiry must now be set in FeatureSpec.options rather than FeautureSpec.datastores.serving.options

So the option key has changed from "expiry" to "redis.expiry". It is still called "expiry" when overriding the a default it in the StorageSpec.options however.
We need to find a better way to document this.

I think I like the idea of only have one place to set options in a FeatureSpec. But if it applies to the actual underlying storage or not depends on if that storage is actually being used. So it's not clear that these should be feature options at all.

@tims
Copy link
Contributor Author

tims commented Mar 21, 2019

/hold

in progress

@tims
Copy link
Contributor Author

tims commented Mar 21, 2019

ok... I think this covers everything... it really depends how much we trust our tests. Need to try some integration level testing and check things like the CLI and SDK still make sense.

@tims
Copy link
Contributor Author

tims commented Apr 1, 2019

I've rebased this against master, @zhilingc @pradithya please take a look

@tims
Copy link
Contributor Author

tims commented Apr 4, 2019

@woop Can you chime in on this based on our discussion this morning, we talked about whether having just a single serving store is the right strategy. And I think we agreed that it is in the short term.

Then we talked about how in the future we might make it possible to have multiple serving stores again; but they would be configured with their features or models, rather than the features being configured with stores.

So the strategy is:
• short term: remove multi serving store deployments via this pull request, and people will deploy multiple feasts if needed.
• long term: support having multiple serving stores again, but configure the stores not the features. Very much up for discussion on how this would work.

The short term strategy is to prioritise simpler deployment, configuration and usability.

Are we all aligned on this?

@woop
Copy link
Member

woop commented Apr 4, 2019

@woop Can you chime in on this based on our discussion this morning, we talked about whether having just a single serving store is the right strategy. And I think we agreed that it is in the short term.

Then we talked about how in the future we might make it possible to have multiple serving stores again; but they would be configured with their features or models, rather than the features being configured with stores.

So the strategy is:
• short term: remove multi serving store deployments via this pull request, and people will deploy multiple feasts if needed.
• long term: support having multiple serving stores again, but configure the stores not the features. Very much up for discussion on how this would work.

The short term strategy is to prioritise simpler deployment, configuration and usability.

Are we all aligned on this?

Just to be clear: We are proposing having only a single serving backend per deployment. We would still support clusters. We would still support different store types, just not within a single deployment of Feast. That may not be clear from the above phrasing. Anyway, I think it's the right move going forward. It would fix some of the complexity and provide abstraction.

Part of the problem was that the feature specs have a reference to the store. This seems unnatural. Another unrelated issue is that in many cases you want to have centralized feature management, but decentralized deployments in a production environment. This could lead to issues if only a single serving store is supported, since clients might want to have their own customized setups.

One proposal to alleviate this would be to have many serving deployments with their own store/backend, but with each deployment having only a single serving store and store type active at one time. Then we could have feature set specifications registered for each of these serving deployments. That could happen in a centralized core or even remotely at the serving deployment. Ultimately the serving layer would only pull in the feature data it needs for its clients, and nothing more, since the feature set specifications are driven by a specific serving need. How the ingestion stream knows which serving store to write to is also up for discussion. It could come from the fact that the serving deployment only has one store, or it could be captured in the feature set itself.

+--------+      +--------------------------+
|        |      |                          |
| Stream +----->+         Feast            |
|        |      |                          |
+--------+      +--+-----------------+-----+
                   |                 |
                   |                 |
                   |                 |
                   |                 |
                   |                 |
                   |                 |
         +---------v-+         +-----v-----+
         |           |         |           |
         | Serving 1 |         | Serving 2 |
         |  Redis    |         |  Bigtable |
         |           |         |           |
         +-----------+         +-----------+

Many different ways to go here, but I like the idea of the core/warehouse maintaining authority on features, but the serving layer being free to subscribe to only relevant features and using the appropriate stores.

@tims
Copy link
Contributor Author

tims commented Apr 4, 2019

100% agreed. This pull request is in alignment.

The only quirk I see is that it still allows features to provide options for stores, regardless of what the store actually is, like redis expiry. It just moves these out of the featureSpec.datastore.serving.options map into the featureSpec.options map, and adds a prefix eg: "redis.expiry". I would be okay with dropping them completely to be honest, just didn't want to do too much at once.

@zhilingc
Copy link
Collaborator

/test integration-batch

@zhilingc
Copy link
Collaborator

/retest

@thirteen37
Copy link
Collaborator

Is this somehow related to #38?

@tims
Copy link
Contributor Author

tims commented Apr 12, 2019

Is this somehow related to #38?

yes, in that it will make #38 redundant, it would not make sense to have options to disable serving or warehouse in a feature spec, if we want to remove features knowing about serving and warehouse stores themselves.

@zhilingc
Copy link
Collaborator

Looks fine to me, I made a PR against yours updating the CLI to be consistent with your API changes.

tims and others added 2 commits April 15, 2019 14:01
@tims
Copy link
Contributor Author

tims commented Apr 15, 2019

/cancel hold

@tims
Copy link
Contributor Author

tims commented Apr 15, 2019

/hold cancel

@tims
Copy link
Contributor Author

tims commented Apr 25, 2019

/retest

@tims
Copy link
Contributor Author

tims commented Apr 25, 2019

/test integration-batch

@tims
Copy link
Contributor Author

tims commented Apr 25, 2019

@pradithya @zhilingc All tests, including integration tests pass, can you lgtm+approve? :)

@davidheryanto
Copy link
Collaborator

davidheryanto commented Apr 25, 2019

@tims I see that we remove this command?

feast list storage
feast get storage STORAGE_ID

Actually I find being able to see the actual storage location is quite useful, especially if we're still dependent on BigQuery UI for people to quickly discover and access Warehouse data.

Since I want to know the Project ID and BigQuery dataset my Feast is storing the warehouse data to.

What do you think?

Ideally I believe Feast users should not need to know the storage details, but as of know, just using Feast dashboard, they cannot easily discover and play with Warehouse data, I think the Warehouse location may still be relevant?

And I think using Feast Python SDK to do Warehouse data exploration is not as effortless as using BigQuery UI?

@tims
Copy link
Contributor Author

tims commented Apr 26, 2019

I didn't remove that Zhiling removed it, @zhilingc what do you think?

@zhilingc
Copy link
Collaborator

I think it makes sense to remove the specific api (list_storage) since we only support default stores now, but I agree that maybe we do need some way for users to get information about the feast setup.

Although maybe it might be better to add the method back in (in this PR), mark it as deprecated, and implement that alternative in another PR. I'll pick this up if you guys agree @tims @davidheryanto

@tims
Copy link
Contributor Author

tims commented Apr 26, 2019

Yup I agree, that would be the easier way.

@davidheryanto
Copy link
Collaborator

Sounds good :)

@tims
Copy link
Contributor Author

tims commented Apr 29, 2019

@zhilingc @davidheryanto you need to give it lgtm or approve or it will just sit here for ever :)

@tims
Copy link
Contributor Author

tims commented May 8, 2019

/retest

@tims tims requested a review from zhilingc May 8, 2019 03:36
Add list storage functions back in, mark as deprecated
@zhilingc
Copy link
Collaborator

zhilingc commented May 8, 2019

/approve

@davidheryanto
Copy link
Collaborator

davidheryanto commented May 9, 2019

/approve

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidheryanto, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [davidheryanto,zhilingc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@davidheryanto
Copy link
Collaborator

/lgtm

@feast-ci-bot feast-ci-bot merged commit a401ae3 into feast-dev:master May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants