Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace string references used in non user facing components in Feast #674

Closed
mrzzy opened this issue May 3, 2020 · 4 comments
Closed
Labels

Comments

@mrzzy
Copy link
Collaborator

mrzzy commented May 3, 2020

Background

Currently Feast uses string references in 3 areas:

  • Feature Set References
  • Feature References
  • Subscriptions

As a result many implementations of string reference parsing and formatting has cropped up all over Feast's code:

Problem

Changes to the feature/feature set references/subscriptions are difficult as we have to track down all the different string parsing/formatting implementations. Changes to these areas should not require a comb over the entire codebase to rectify all these parsing and formatting implementations.

Examples of currently WIP and potential changes that is currently affected by this and why the problem is relevant:

Scope

String references that should still be kept around:

  • those exposed by the user facing API (ie the SDKs, config files) as they need to be humanly readable.
  • those used by interface with components outside of feast (ie PostgreSQL, BigQuery, etc.), although non string references should be preferred if possible.

Possible Solution

Replace string references in non user facing components with their Protobuf equivalents.
Examples of this include:

  • Using message types directly in protobuf definitions:
message FeatureRow {
    repeated Field fields = 2;
    google.protobuf.Timestamp event_timestamp = 3;
    FeatureSetReference feature_set_ref = 8;
    string ingestion_id = 7;
}
  • Using references as keys in maps directly.
HashMap<FeatureSetReference, FeatureSetSpec> featureSetMap = new HashMap<>();

⚠️ There might be possible caveats to this

  • Serialize references without having to extract its internals:
String refStr = Base64.getEncoder().encodeToString(featureSetReference.toByteArray());
@mrzzy mrzzy changed the title Remove string references used in non user facing components in Feast Replace string references used in non user facing components in Feast May 3, 2020
@woop
Copy link
Member

woop commented May 3, 2020

Some comments:

I agree on the Proto message type, but in the code itself FeatureReferences/FeatureSetReferences should be a non-Proto model class ideally, with toProto() and fromProto() methods. This would allow us to also have a single toStringRef() or fromStringRef().

FeatureSetRef and FeatureRef seem very similar. I think we could possibly have the following after version removal

// Defines a composite key that refers to a unique FeatureSet
message FeatureSetReference {
    // Name of the project
    string project = 1;
    // Name of the FeatureSet
    string name = 2;
}
message FeatureReference {
    // Name of the feature set
    FeatureSetReference feature_set_ref = 1;

    // Name of the feature
    string name = 2;
}

Although this would increase nesting, which is also frowned upon by some.

@ches
Copy link
Member

ches commented May 4, 2020

I mostly agree with the problem, but I'm not convinced about the solution. Swapping strings with encoded protobuf is merely trading one compatibility issue for another (less readable) one, in my mind. I think we're missing clear stated goals to answer whether the solution will meet them.

Definitely agree with @woop in general on avoiding the generated classes in domain logic if this is part of a solution—we've been moving away from that and although there is overhead (CPU + allocation/GC, and code maintenance), I believe it is a net win. The noted hash key behavior is one of many example pitfalls.

There are perhaps two related problems conflated, it could help to distinguish goals for solving them:

  1. Code maintenance effort and fragility. This is primarily what the issue description elaborates so far.
  2. Representations in persistence/serialization create challenges for evolving the data model.

For the first, I believe some goals for improving could be met sheerly through code factoring, encapsulation of the domain concepts of references as first-class types instead of Strings pervading.

The second one is harder, I'm not sure there is an infinitely flexible solution with reasonable complexity (though I'd be glad to hear suggestions), rather some design decisions need to be reached on #479 that we're ready to live with for awhile and/or have a plan for evolving. It's a conceptual problem before it's a technical one. If we're ready to go with #479 (comment) as the first steps, I think that should be a clear part of specs in this issue or an explicit non-goal, since #631 seems meant to cover that at least in part.

With this perspective, the first problem could be appreciably addressed without being blocked by exhaustive answers for #479, except that it might overlap a lot with work for #631. Once goals of this issue are clearly established, it might just become a subtask of #631.

Definitively solving the second is blocked by #479 in such ways that it could put this issue at risk of not being closeable in the near term.

@ches
Copy link
Member

ches commented May 4, 2020

For the first, I believe some goals for improving could be met sheerly through code factoring, encapsulation of the domain concepts of references as first-class types instead of Strings pervading.

Putting this another way, I'd like to explore how this might look with only code changes that are not proto schema changes, then evaluate if and when proto changes should be made. If there can be some win from the first step, it seems more in line with @woop's inclination in #479 (comment) to be conservative about the conceptual + data model direction, to defer until we learn more.

@stale
Copy link

stale bot commented Jul 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 15, 2020
@stale stale bot closed this as completed Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants