-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Cost Sensitive One Against All (csoaa) multi class example
CSOAA stands for "Cost Sensitive One Against All" - A multi-class predictive modeling reduction in VW.
The option --csoaa <K>
where <K> is the number of distinct classes
directs vw to perform cost-sensitive K multi-class (as opposed to binary)
classification. It extends --oaa <K>
to support multiple labels per input example, and costs associated with classifying these labels.
-
Data-set labels can be 0 or 1-indexed. Use the flag
--indexing 0
specify labels in the range {0 ... <K-1>} or--indexing 1
to specify labels in the range {1 ... <K>}. Indexing will be automatically detected if not specified. -
<K> is the maximum label value, and must be passed as an argument to
--csoaa
-
The input/training format for
--csoaa <K>
is different than the traditional VW format:- It supports multiple labels on the same line
- Each label has a trailing cost
- Cost syntax looks just like weight syntax: a colon followed by a floating-point number.
For example:
4:3.2
means the class-label 4 with a cost of 3.2, but means the opposite of weights. - It is critical to note that costs are not weights. They are the inverse of weights.
A label with a lower cost is preferred over a label with a higher cost on the same line.
That's why they are called
'costs'
. - Another difference from traditional
vw
input format is that every line (both in training and testing) must include all the allowed labels at the beginning (before the 1st|
char).
-
The reduction with
--csoaa
is to a regression problem (i.e. conditional mean estimation), so forcing the loss function to logistic does not make much sense. Generally, when using multi-class, you should leave the--loss_function
alone and let the algorithm use the built-in default.
Assume we have a 3-class classification problem. We label our 3 classes {1,2,3}
Our data set csoaa.dat
is:
1:1.0 a1_expect_1| a
2:1.0 b1_expect_2| b
3:1.0 c1_expect_3| c
1:2.0 2:1.0 ab1_expect_2| a b
2:1.0 3:3.0 bc1_expect_2| b c
1:3.0 3:1.0 ac1_expect_3| a c
2:3.0 d1_expect_2| d
Notes:
- The first 3 examples (lines) have only one label (with costs) each, and the next 3 examples have multiple labels on the same line. Any number of class-labels between {1 .. <K>} (1..3 in this case) is allowed on each line.
- We assign a lower cost to the label we want to be preferred. e.g. in line 4 (tagged
ab1_expect_2
) we have a cost of 1.0, for class-label 2; and a higher cost 2.0, for class-label 1. - The input feature section following the '|' is the same as in traditional VW: you may have multiple name-spaces, numeric features, and optional weights for features and/or name-spaces (Note in this section the weights are weights, not costs, so they are positively correlated with chosen labels)
We train:
vw --csoaa 3 -d csoaa.dat -f csoaa.model
Which gives us this progress output:
final_regressor = csoaa.model
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading from csoaa.dat
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 3 3.0 known 3 2
0.833333 1.666667 6 6.0 known 1 3
finished run
number of examples = 7
weighted example sum = 7
weighted label sum = 0
average loss = 0.7143
best constant = 0
total feature number = 17
Now we can predict, loading the model csoaa.model
and using the same data-set csoaa.predict
as our test-set:
vw -t -i csoaa.model -d csoaa.dat -p csoaa.predict
Similar to what we do in vanilla classification or regression.
The resulting csoaa.predict
file has contents:
1 a1_expect_1
2 b1_expect_2
3 c1_expect_3
2 ab1_expect_2
2 bc1_expect_2
3 ac1_expect_3
2 d1_expect_2
Which is a perfect classification:
- all the
expect_1
lines have a predicted class of 1, - all the
expect_2
lines have a predicted class of 2, - and all the
expect_3
lines have a predicted class of 3.
QED
Test examples are different from standard VW test examples because you have to tell VW which labels are allowed. For example, assuming 4 possible labels (1,2,3,4), this is how a test line could look like:
1 2 3 4 | b d e
And here's another, where only labels (1,4) are allowed:
1 4 | b d e
At training time, if there's an example with label 2 that you know (for whatever reason) will never be label 4, you could specify it as:
1:1 2:0 3:1 | example...
This means that labels 1 and 3 have a cost of 1, label 2 has a cost of zero, and no other labels are allowed. You can do the same at test time:
1 2 3 | example...
VW will never predict anything other than the provided "possible" labels.
- Thanks to Ciemo for the example and for asking the right Qs on the mailing list.
- Thanks to Stephane for patiently answering Ciemo's Qs.
- See also Hal Daume's multi-class docs at https://www.umiacs.umd.edu/~hal/tmp/multiclassVW.html
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: