Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training.py: two tweaks to feature selection #226

Merged
merged 1 commit into from
Sep 21, 2024

Commits on Jan 8, 2024

  1. training.py: two tweaks to feature selection

    1. Include posting amounts as a feature. This allows us to distinguish
    different classes of payments to the same payee (e.g. recurring membership
    fees, which often have a constant amount, from individual purchases).
    
    2. For example key/value pairs, include the key by itself (with no substring
    of the value) as a feature. This is useful because different account types
    often have non-overlapping sets of example keys, and including the bare key as
    a value allows the decision tree to be effectively segmented by account type
    fairly close to the root.
    
    These two very small changes significantly improve training accuracy on my
    journal, from 94.81% to 99.32% (an 86% reduction in error rate!).
    jktomer committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    14248ed View commit details
    Browse the repository at this point in the history