-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Feature interactions
Since VW v 7.10.2 number of changes were introduced in the mechanism of feature interactions generation (-q
and --cubic
):
- Support of interactions of arbitrary length.
- Better hashing.
- Filtering out unnecessary feature interactions.
- Filtering out unnecessary namespace interactions.
These may result in a smaller number of generated features, changes in their values and hashes.
VW's command line parameters -q ab
or --cubic abc
(where a
, b
, and c
are namespaces) are commonly used to generate pairs or triples of features from specified namespaces.
Now VW provides an additional --interactions
parameter that works like -q
or --cubic
but its argument's value may be longer. E.g. it could be --interactions abcd
, --interactions abcdef
etc. Moreover, -q
and --cubic
arguments are internally converted to --interactions
values. Both -q
and --cubic
are still supported for backward compatibility.
Although VW supports interactions of any length it is presumed that usage of long interactions may result in huge number of generated features. Also generation of interactions with length bigger than 3 will have some performance overhead due to processing in non-recursive loop.
VW currently uses murmur3 hash to encode single features. Features that were generated as a result of their interaction are stored in the same address space and thus have a hash too. But as there may be a huge number of generated features it's feasible to use a slightly worse but much faster hash to encode each of them. Thus instead of using murmur3 for generated features VW encodes it by hashing the murmur3 hashes of single features that compose it.
For example, two features a
and b
will have hash values murmur3(a)
and murmur3(b)
. But a new feature a*b
that was generated by -q
argument will have hash value hash(murmur3(a), murmur3(b))
instead of murmur3(ab)
. It's faster as we already know murmur3(a)
and murmur3(b)
at the moment of encoding their interaction a*b
.
Since version 7.10.2 VW uses 32bit FNV hash for hashing interactions of any length. It generates less collisions than the hash used before it and has comparable performance.
In previous versions, VW generated permutations of features for self-interacting namespaces. This means that -q ff
for |f a b c
produced the following new features:
a*a, a*b, a*c, b*a, b*b, b*c, c*a, c*b, c*c
There is a group of generated features that won't improve your model. b*a, c*a, c*b
will have the same weights after training as a*b, a*c, b*c
and processing them is just a waste of time. Although they'll have different hashes it seems that they can't improve model's result by making it more robust to hash collisions. Removal of such features may significantly reduce time required for model training and slightly improve its prediction power.
Since 7.10.2, VW doesn't generate unnecessary features for self-interacting namespaces. The new rule of feature generation for interacting namespaces is enabled by default, but could be switched off by passing the --permutations
flag via VW command line.
Note: due to the implementation of the simple combinations generation algorithm the namespaces in the interaction string are sorted. This allows grouping the same namespaces together and efficiently detecting the presence of self-interacting namespaces. This could affect the order of features in interaction and thus its hash value. So vw prints a warning message if such changes have been made. For example:
$ vw --cubic bab --cubic aaa
creating cubic features for triples: bab aaa
WARNING: some interactions contain duplicate characters and their characters order has been changed.
Interactions affected: 1.
In the example above, VW will continue to work with interactions aaa
and abb
.
P.S.: It may be noticed that categorical features a*a, b*b, c*c
will have the same weights as simple a, b, c
unless they have weight values != 1.0. VW currently doesn't exclude such features as it was shown that this rule is dataset dependent.
Using the same argumentation as in previous paragraph, we can show that unnecessary features may be generated not only when a namespace interacts with itself but also in cases like -q ab -q ba
. It could be seen that although interactions generated by -q ba
will have hashes other than -q ab
, they won't improve model results.
Such duplicate interactions may be unwillingly generated with wildcards like -q :: --cubic :::
or even --interactions ::::
.
Thus since 7.10.2 VW, automatically removes such interactions (providing a warning message).
$ vw --cubic :::
creating cubic features for triples: :::
WARNING: duplicate namespace interactions were found. Removed: 665942.
You can use --leave_duplicate_interactions to disable this behaviour.
This behaviour can be disabled with --leave_duplicate_interactions
flag.
NOTE: the performance hit mentioned below has been mitigated in version 8.10
for quadratics (-q ::
) and interactions are generated only for namespaces that have been seen in the examples.
Pre VW 8.10
for quadratics: Please note that at the moment the set of interactions is defined at a global level and not per-example. This means that if you instantiate interactions via a wildcard, e.g. -q ::
, all possible namespaces are interacted modulo filtering out unnecessary duplicates. This can introduce an overhead in processing examples and reduce speed. For example, if you have a single namespace a
in all examples, training with -q aa
will in general perform faster than -q ::
, compare this issue
Starting in VW 9.0 support has been added to specify interactions using the entire namespace name and not just the first character. This can be done using the --experimental_full_name_interactions <arg>
flag. The arg
is a |
separated list of namespace names or the wildcard :
character. This allows you to disambiguate between namespaces that start with the same character.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: