[API Design] How to manage classification labels? #141
Replies: 8 comments 10 replies
-
With @andreybavt we have decided that for the short term, we would revert to the former vision of the API, in order to unblock short-term user issues. We will keep the discussion open to revamp the classification label mechanisms in the future, after collecting more user information. |
Beta Was this translation helpful? Give feedback.
-
There are two things we're trying to solve with
To me, these points should be treated separately. First, let's take the second point to understand why we need it. Translating
|
Beta Was this translation helpful? Give feedback.
-
Hello @andreybavt, Thanks for that detailed analysis. I agree that there are 2 different issues to solve. To me, the number 1 priority is to "map actual value in the Inspector UI". Without a user-friendly display name for labels, it is impossible to use AI Inspect. Since AI Inspect is the module most currently used, I think it takes priority and needs a short-term solution. The topic "Translating prediction_function results into single value model results" is only related to AI Test, correct? Could you propose an "API design blueprint" of how to solve the two issues separately? Perhaps there is even a way to design a data structure that answers the two issues in one parameter. Thanks, Alex |
Beta Was this translation helpful? Give feedback.
-
Here's the impact of InspectDataset's target column value and prediction value are both displayed on screen, to obtain them here's what we do:
TestFor performance tests we need to compare prediction and actual values. That's why they need to be coming from the same set. Providing if len(classification_labels) == 2:
assert len(set(classification_labels).union(df[target].unique())) == 2, "classification_labels doesn't match stored labels" |
Beta Was this translation helpful? Give feedback.
-
From a user perspective, they are 2 different possibilities in terms of target column in classification: A. Target column is an integer
B. Target column is a string
|
Beta Was this translation helpful? Give feedback.
-
Clear on the two "business cases", it's important to start from a real user context. Do we know which scenario is most common for users? Scikit-learn models always output integers, correct? Is it the same for tensorflow, catboost and pytorch? How does huggingface stores labels in their datasets library? |
Beta Was this translation helpful? Give feedback.
-
Summary of the chosen solutionAfter discussion, we decided that
We chose this solution according to the following usage scenarios:
To meet scenarios 3 and 4, we decided that If the target values are stored as integers in the dataset, the user can change The consequence of the chosen solution are the following:
|
Beta Was this translation helpful? Give feedback.
-
Final solution:You will find the final solution in this doc : This doc contains:
|
Beta Was this translation helpful? Give feedback.
-
We are considering 2 different visions for the classification_labels parameters:
In this thread, let's discuss the real user implications of each option and evaluate pros/cons.
Beta Was this translation helpful? Give feedback.
All reactions