-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JSON
olgavrou edited this page Mar 17, 2021
·
17 revisions
The native C++ codebase can ingest JSON by passing --json
The following JSON format can be ingested into VW:
- Top-level properties are considered features for the default namespace.
- Top-level properties of type object or array are considered namespaces.
- Features are JSON strings, integer, float, boolean, arrays of integers and/or floats.
- Top-level properties starting with _ are ignored, except if they match a special property (e.g. "_label", "_multi", "_text", "_tag").
- Labels can be passed using top-level "_label" property. This is also supported for multiline examples, but the label needs to be part of one of the multiline examples.
- If the JSON value is either a string, integer or float is converted to a string and passed directly to VW label parser.
- If the JSON value is an object, the first property needs to match one of the JSON properties of SimpleLabel or ContextualBanditLabel.
- A number can only be used for
simple
labels. To do multiclass for example the label string must be passed (equivalent to text format) and it will be processed by the label parser.
- Tags can be applied by using the "_tag" property.
- Special text handling through "_text": properties named "_text" are processed using string splitting and not string escaping (see sample below).
- Multi-line examples as used by contextual bandits are specified by using the "_multi" property. Each entry itself is an example as described above and can optionally contain a label and/or tag. The top-level properties are used for the optional shared example.
The C# layer can ingest
- JSON strings
- JSON.NET's JsonReader
- C# objects serializable to the above JSON format using JSON.NET serializing rules. Thus JsonProperty annotations are inspected and so on. This is particularly useful if one needs to score a given object, then serialize it JSON and train from the JSON serialization as it circumvents the de-serialization for the scoring part.
Multiple example training sets can be found in the testing directory and their usage can be looked up from the testing script here. Do keep in mind that these are used for testing but they are a good reference point.
JSON | VW String |
---|---|
{
"f1":25,"f2":true,
"_aux":"some ignored info"
} |
| f1:25 f2 |
{
"f1":25,"f2":true,
"_tag":"mytag"
} |
mytag| f1:25 f2 |
{
"ns1":{"location":"New York"},
"f2":[1,0.2,3]
} |
|ns1 locationNew_York | :1 :.2 :.3 |
{
"ns1":{"location":"New York"},
"ns2":{"f2":3.4},"_label":1
} |
1 |ns1 locationNew_York |ns2 f2:3.4 |
{
"ns1":{"location":"New York", "f2":3.4},
"_label":{"Label":2,"Weight":0.3}
} |
2 0.3 |ns1 locationNew_York f2:3.4 |
{
"x":2,
"_text":"elections US iowa"
} |
| x:2 elections US iowa |
{
"UserAge":15,
"_multi":[
{"_text":"elections maine", "Source":"TV"},
{"Source":"www", "topic":4, "_label":"2:3:.3"}
]
} |
shared | UserAge:15 | elections maine SourceTV 2:3:.3 | Sourcewww topic:4 |
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: