[API Design] How to manage classification labels? #141

alexcombessie · 2022-07-06T21:09:49Z

alexcombessie
Jul 6, 2022
Maintainer

We are considering 2 different visions for the classification_labels parameters:

We let people free of choosing the classification label they want (it was the former vision of the API). The classification label is then just a way to label the values in the target column.
We don’t let people free of choosing the classification_labels they want. We can give them a warning message to push them to give the classification_labels that correspond to the values of the target column.

In this thread, let's discuss the real user implications of each option and evaluate pros/cons.

alexcombessie · 2022-07-07T19:05:32Z

alexcombessie
Jul 7, 2022
Maintainer Author

With @andreybavt we have decided that for the short term, we would revert to the former vision of the API, in order to unblock short-term user issues.

We will keep the discussion open to revamp the classification label mechanisms in the future, after collecting more user information.

0 replies

andreybavt · 2022-07-08T10:07:56Z

andreybavt
Jul 8, 2022

There are two things we're trying to solve with classification_labels at the same time:

trying to map actual value in the Inspector UI
translating prediction_function results into single value model results

To me, these points should be treated separately. First, let's take the second point to understand why we need it.

Translating `prediction_function` results into single value model results

By design, in the case of classification prediction_function that the user uploads is supposed to return a matrix of floats where columns represent categories, rows - data points, and cells - probabilities of a given label for a given data point.

When the prediction is made one of the results is what we call raw_prediction, for each data row it's a single number (chosen either by using argmax or classification_threshold) that holds an index of the predicted classification label, so it's always a number of 0..N.
But this result by itself is useless since we need to convert it to an actual classification label, that's why we need a user to provide classification_labels list. We simply take an element of index = raw_prediction and get a final model prediction.

Even in the case when the target column contains 0 and 1 we still need to provide classification_labels because when the model is trained it's not guaranteed that internally it'll store label "0" on 0 position in the result probabilities matrix.

For example when sklearn's DecisionTreeClassifier is trained on titanic dataset (that contains 0s and 1s in Survived target column), then model.classes_ is equal to [1,0]. It means that the mapping between raw prediction and prediction is:

0->1
1->0

Now, regarding the original question about what to provide in classification_labels list and why we should stick to solution №2:

When calculating performance metrics we need to take the raw target column values of the dataset and compare them to the predicted results. In this case it's important to have classification_labels that correspond to the values in the target column otherwise we won't be able to tell how model results are related to the ground truth.

Trying to map actual value in the Inspector UI

Currently the second mission of classification_labels is to map raw dataset's target column values to something that we present in the UI.
To me, this is a completely different use case and it has nothing to do with classification_labels. Since the beginning, it didn't even work properly since if you take an example similar to the Titanic above, but replace Survived column by 'dead' and 'alive' the model.classes_ will return ['alive', 'dead'] so the mapping is

0->alive
1->dead

which is correct for the model, but if you try to use this model to inspect a titanic dataset where Survived is stored as 0 and 1 you'll get a wrong result because in that dataset 0 should mean dead.

Conclusion

So my point it that:

classification_labels should be used to map 0..N raw prediction to the target column values that are stored in the dataset that was used for model training. If target inside dataset contains dead/alive classification_labels needs to be ['alive', 'dead'] (the order depends on the model), if the target column contains 0s and 1s the classification_labels should be [1, 0]
If we want to make the target values more understandable in the Inspector, there might be an extra step to define another mapping (on dataset level or on inspection level). This mapping will be used twice: to map the actual and the predicted value.

0 replies

alexcombessie · 2022-07-09T14:52:13Z

alexcombessie
Jul 9, 2022
Maintainer Author

Hello @andreybavt,

Thanks for that detailed analysis. I agree that there are 2 different issues to solve.

To me, the number 1 priority is to "map actual value in the Inspector UI". Without a user-friendly display name for labels, it is impossible to use AI Inspect. Since AI Inspect is the module most currently used, I think it takes priority and needs a short-term solution.
Just to be 100% sure, does the latest update of the giskard client solves this?

The topic "Translating prediction_function results into single value model results" is only related to AI Test, correct?

Could you propose an "API design blueprint" of how to solve the two issues separately?

Perhaps there is even a way to design a data structure that answers the two issues in one parameter.

Thanks,

Alex

0 replies

andreybavt · 2022-07-11T10:35:08Z

andreybavt
Jul 11, 2022

Here's the impact of classification_labels on Inspect and Test modules

Inspect

Dataset's target column value and prediction value are both displayed on screen, to obtain them here's what we do:

For prediction: we use classification_labels to map predicted label index to the predicted label value which is presented to user
For target: there are 2 options, if target is an integer we use it to select a label by position in the classification_labels list. If it's not a number we simple use the raw column value as-is. To give an example where it won't work, try training a classifier on a target column containing "-1", it'll break the UI.

Test

For performance tests we need to compare prediction and actual values. That's why they need to be coming from the same set. Providing classification_labels as ['dead', 'alive'] for a dataset that contains [0, 1] in a target column won't work and will result in an error. To improve it, we can add an extra validation for binary classification like:

if len(classification_labels) == 2:
    assert len(set(classification_labels).union(df[target].unique())) == 2, "classification_labels doesn't match stored labels"

0 replies

jmsquare · 2022-07-11T13:51:16Z

jmsquare
Jul 11, 2022
Maintainer

From a user perspective, they are 2 different possibilities in terms of target column in classification:

A. Target column is an integer
This happens a lot for binary classification where target column has 1 and 0 as values. It can also happen when the target column contains indexes of a mapping
Two cases are possible:

classification_label does not contain the values of target column
This is useful for the user to give explicit description of the target values in AI Inspect.
Consequences: Business users will have an explicit description in AI Inspect. Difficult to use performance test. A workaround to compute performance tests can be to consider the target column values as index. So predicted_lbl_idx = predicted_labels in run_predict. This can work but if target column contains negative value, it will create an IndexError
classification_label contains the values of target column
This can be forced if we provide an explicit error message at uploading time.
Consequences: it's not very explicit to have integer as target variable values in AI Inspect. Business user might be lost.

B. Target column is a string
Two cases are possible:

classification_label does not contain the values of target column
This is possible when users do a typo in the way they write classification_label ("default" instead of "Default")
Consequences: Performance tests are not possible anymore. It's not possible to filter wrong examples in AI Inspect
classification_label contain the values of target column
This can be forced if we provide an explicit error message at uploading time
Consequences: perfect for the business user, AI Inspect and AI Test

2 replies

alexcombessie Jul 11, 2022
Maintainer Author

To reduce the number of possibilities, two questions:

do you agree that scenario B1 should always return an error?
Is it realistic to consider the use of classification_label in scenario A? It seems to me that both consequences of scenario A are negative for the user.

WDYT @jmsquare @andreybavt ?

andreybavt Jul 11, 2022

Yes, I do agree that if we allow B1 it's going to be a mess. it's fair to throw an error when we have enough info to validate it (both model and dataset). Besides, it's not how common frameworks work. If you train for example sklearn or catboost models on datasets with target dead/alive and then call predict they will return dead/alive and their model.classes_ will be ['dead', 'alive'] which is a simple and robust solution.

A2 is a good plan. If outputting 0s and 1s isn't understandable enough there needs to be another mapping, but it needs to be applied to both target and prediction. It's basically incorrect to mix classification_labels with raw target values.

alexcombessie · 2022-07-11T13:59:48Z

alexcombessie
Jul 11, 2022
Maintainer Author

Clear on the two "business cases", it's important to start from a real user context.
EITHER users store their classification labels as integers - scenario A
OR users store their classification labels as strings - scenario B

Do we know which scenario is most common for users?

Scikit-learn models always output integers, correct? Is it the same for tensorflow, catboost and pytorch?

How does huggingface stores labels in their datasets library?
Same question for popular benchmark datasets, do their store labels in integer or string?

6 replies

alexcombessie Jul 12, 2022
Maintainer Author

Indeed, the most critical question is how different types of ML models store labels/class names.
The question of how datasets store them is still interesting, as it gives us more context on how data scientists work.

I think it's important to decouple:

How people do things without Giskard, and how they expect things to work
What solution we propose

alexcombessie Jul 12, 2022
Maintainer Author

I would also add that we need to understand how other libraries solve this.

andreybavt Jul 12, 2022

sklearn, catboost, xgboost all have the same API. Their predict method outputs exactly what was stored in the target column at training time.

alexcombessie Jul 14, 2022
Maintainer Author

@jmsquare @andreybavt do you have an idea about my remaining questions?

Do we know which scenario is most common for users?

How does huggingface stores labels in their datasets library?

Same question for popular benchmark datasets, do their store labels in integer or string?

jmsquare Jul 14, 2022
Maintainer

Huggingface : integer
https://huggingface.co/docs/datasets/about_dataset_features

Tensorflow Keras: integer or string (Numpy array) https://www.tensorflow.org/api_docs/python/tf/keras/Model

PyTorch: custom depending on the way you define your lost function. Usually an int

Sklearn : string or int. For example, the Logistic regression: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

Catboost : string or int : https://catboost.ai/en/docs/concepts/python-reference_catboostclassifier_fit#y

Xgboost : string or int : https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRegressor.fit

jmsquare · 2022-07-15T16:31:21Z

jmsquare
Jul 15, 2022
Maintainer

Summary of the chosen solution

After discussion, we decided thatClassification_label is the list containing the distinct values of the target variable of the uploaded dataset.

These labels should be given in the same order as the output of prediction_function. This has to be specified in the documentation
Giskard will throw an error message if len(Classification_label ) is different from the length of the output array returned by prediction_function. Here is the ValueError: "Prediction output label shape and classification_labels shape do not match”. This is already implemented in the current version
Classification_labelcontains all the distinct values of the target column. Otherwise, here is the ValueError message: "Target column value {invalid_target_values} not declared in classification_labels parameter: {classification_labels}. Please modify classification_labels or rename the values of the target column in the dataset”. This is a warning in the current version, it has to be changed.
We need to change the code of the front-end displaying the actual/predicted value in the case of integer target values. There is a little “magic” in the UI code that is buggy right now

We chose this solution according to the following usage scenarios:

Display of the predicted value in AI inspect
Display of the actual value in AI Inspect
Filtering of the correct/wrong examples in AI Inspect + custom
Performance tests in AI Test

To meet scenarios 3 and 4, we decided that classification_label should contain the row target values of the train set. It then cannot only be a human-readable description of the target labels, since we won’t be able to assert equality with the values in the target variable in the backend.

If the target values are stored as integers in the dataset, the user can change classification_labels or rename the values of the target column before uploading the dataset in Giskard.

The consequence of the chosen solution are the following:

It meets the 4 usage scenarios
simpler implementation because fewer changes compared to the actual version + fewer additional functionalities
easy to debug for the user since it’s more WYSIWYG (we do not have to maintain both a frontend labels + backend labels)
needs a small change of the dataset to be able to display understandable labels in AI Inspect. This is something necessary in the case of multi-label classification with integer labels. This can be avoided for binary classification.

2 replies

andreybavt Jul 15, 2022

Good summary

I have a last doubt about

Classification_labelcontains all the distinct values of the target column. Otherwise, here is the ValueError message: "Target column value {invalid_target_values} not declared in classification_labels parameter: {classification_labels}. Please modify classification_labels or rename the values of the target column in the dataset”. This is a warning in the current version, it has to be changed.

I don't know if it's that big of a problem for a user if the new target column contains new values compared to what the model was trained on. It would just mean that the model will never predict certain values, which will degrade the performance. Also, among the four places that can be impacted:

Display of the predicted value in AI inspect
Display of the actual value in AI Inspect
Filtering of the correct/wrong examples in AI Inspect + custom
Performance tests in AI Test

1,2,3 should work out of the box and 4 is to be checked and may require some modifications, but in general, I don't see a reason why it wouldn't work.

So do we actually want to be extra strict?
@jmsquare @alexcombessie

alexcombessie Jul 17, 2022
Maintainer Author

To me, we should allow if some values (less than 100%) in the target column are not in the classification_labels from the model. But we should throw an error/warning if 100% of the values are not in the classification labels, as it may indicate that the dataset is wrong. For instance, if the model classification_labels is default / alive but the dataset has 0 and 1.

jmsquare · 2022-08-05T10:41:28Z

jmsquare
Aug 5, 2022
Maintainer

Final solution:

You will find the final solution in this doc :
https://docs.google.com/spreadsheets/d/1VPwGNmuU4i-rhV6xZAfh2thkjUZFwVj3cKIY3yY0ego/edit#gid=541944328

This doc contains:

12 usage scenarios
The current solution and the final solution

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Giskard

[API Design] How to manage classification labels? #141

{{title}}

Replies: 8 comments 10 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Giskard

[API Design] How to manage classification labels? #141

alexcombessie Jul 6, 2022 Maintainer

Replies: 8 comments · 10 replies

alexcombessie Jul 7, 2022 Maintainer Author

andreybavt Jul 8, 2022

Translating prediction_function results into single value model results

Trying to map actual value in the Inspector UI

Conclusion

alexcombessie Jul 9, 2022 Maintainer Author

andreybavt Jul 11, 2022

Inspect

Test

jmsquare Jul 11, 2022 Maintainer

alexcombessie Jul 11, 2022 Maintainer Author

andreybavt Jul 11, 2022

alexcombessie Jul 11, 2022 Maintainer Author

alexcombessie Jul 12, 2022 Maintainer Author

alexcombessie Jul 12, 2022 Maintainer Author

andreybavt Jul 12, 2022

alexcombessie Jul 14, 2022 Maintainer Author

jmsquare Jul 14, 2022 Maintainer

jmsquare Jul 15, 2022 Maintainer

Summary of the chosen solution

andreybavt Jul 15, 2022

alexcombessie Jul 17, 2022 Maintainer Author

jmsquare Aug 5, 2022 Maintainer

Final solution:

alexcombessie
Jul 6, 2022
Maintainer

Replies: 8 comments 10 replies

alexcombessie
Jul 7, 2022
Maintainer Author

andreybavt
Jul 8, 2022

Translating `prediction_function` results into single value model results

alexcombessie
Jul 9, 2022
Maintainer Author

andreybavt
Jul 11, 2022

jmsquare
Jul 11, 2022
Maintainer

alexcombessie Jul 11, 2022
Maintainer Author

alexcombessie
Jul 11, 2022
Maintainer Author

alexcombessie Jul 12, 2022
Maintainer Author

alexcombessie Jul 12, 2022
Maintainer Author

alexcombessie Jul 14, 2022
Maintainer Author

jmsquare Jul 14, 2022
Maintainer

jmsquare
Jul 15, 2022
Maintainer

alexcombessie Jul 17, 2022
Maintainer Author

jmsquare
Aug 5, 2022
Maintainer