-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Labels to Source Object #835
Comments
Unfortunately I think the labels need to be within the |
Looking at this I believe the Unless you feel these labels should only apply to |
Sources has been generalized in #685 to contain To get this to work with the least amount of engineering is to embed the labels in the |
This problem probably indicates that we have chosen a poor abstraction with |
@davidheryanto Under the data model each source is 1:1 to each Feature Set. This means each source is intrinsically tied to the Feature Set, instead of a independent source object (ie it might be more fitting to look at each source as a embedded field in a Feature Set). Furthermore, the current data model allows sources to be duplicated (ie same config and type) which makes attempting to categorize and attribute them difficult, even if this is implemented. I think a viable solution to this problem is label the Feature Sets. Feature Sets are fundamentally a data ingestion concept, defining a data source/data schema on how data is sourced into Feast. As an added plus, Feature Sets also currently already supports labeling and we also recently added an API to filter Feature Sets by labels. As for automatically labeling Ingestion Jobs with labels, we could add automatic propagation of the labels from the Feature Sets to the Ingestion Jobs. |
I actually quite like this idea. |
This is not directly related to the issue topic but regarding the propagation of labels from FeatureSet objects to IngestionJob:
Since I believe currently an IngestionJob can ingest data for multiple FeatureSets, what should be our merge approach for the labels in the IngestionJob when multiple FeatureSets contains different labels? For example an ingestion job that involves these 3 FeatureSets:
What should be the final labels in the IngestionJob?
|
I am in favor of appending the values from the individual Feature Sets when the label name conflicts. This does not mean that multiple values are concatenated together to form a single value but rather that a single label on an Ingestion Job can be associated a set of unique values. In the case you described the labels exposed by the Ingestion Job would look something like this:
A less abstract walkthrough how this might be used for categorization
The corresponding Ingestion Job would expose the following labels:
Listing in Ingestion Job API added in #548 can be extended to take into account and allow matching of labels based on any of the associated values so |
I think since we have moved FeatureSet specs to be provided (dynamically) AFTER ingestion job creation versus BEFORE job creation previously, it is more tricky to propagate the label from FeatureSets to ingestion jobs. This is because (at least for Dataflow jobs) the labels can only be provided at job creation time. This means with our current approach, at Dataflow job creation time we have no info of FeatureSet labels to propagate to the Dataflow job. I will close this issue if we have no further alternatives for now. |
Is your feature request related to a problem? Please describe.
Currently,
source
in Feast is defined as the following in the FeatureSet specFrom management perspective, when we list all the sources we use in Feast, we sometimes want to be able to filter or group these sources according to the organisational structure, for instance, team managing the source, the country where the source data comes from.
Current source proto, however, does not allow us to provide such additional information.
Describe the solution you'd like
Add
labels
field to the Source proto. This is similar to the labels field we currently have in FeatureSethttps://github.com/feast-dev/feast/blob/master/protos/feast/core/FeatureSet.proto#L63
Describe alternatives you've considered
N/A
Additional context
These labels information in the Source object can potentially be used to label ingestion jobs that are using the respective sources as well. In order to help organize ingestion jobs objects.
The text was updated successfully, but these errors were encountered: