Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
training.py: two tweaks to feature selection (#226)
1. Include posting amounts as a feature. This allows us to distinguish different classes of payments to the same payee (e.g. recurring membership fees, which often have a constant amount, from individual purchases). 2. For example key/value pairs, include the key by itself (with no substring of the value) as a feature. This is useful because different account types often have non-overlapping sets of example keys, and including the bare key as a value allows the decision tree to be effectively segmented by account type fairly close to the root. These two very small changes significantly improve training accuracy on my journal, from 94.81% to 99.32% (an 86% reduction in error rate!).
- Loading branch information