Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training.py: two tweaks to feature selection #226

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

jktomer
Copy link
Contributor

@jktomer jktomer commented Jan 8, 2024

  • Include posting amounts as a feature. This allows us to distinguish different classes of payments to the same payee (e.g. recurring membership fees, which often have a constant amount, from individual purchases).

  • For example key/value pairs, include the key by itself (with no substring of the value) as a feature. This is useful because different account types often have non-overlapping sets of example keys, and including the bare key as a value allows the decision tree to be effectively segmented by account type fairly close to the root.

These two very small changes significantly improve training accuracy on my journal, from 94.81% to 99.32% (an 86% reduction in error rate!).

1. Include posting amounts as a feature. This allows us to distinguish
different classes of payments to the same payee (e.g. recurring membership
fees, which often have a constant amount, from individual purchases).

2. For example key/value pairs, include the key by itself (with no substring
of the value) as a feature. This is useful because different account types
often have non-overlapping sets of example keys, and including the bare key as
a value allows the decision tree to be effectively segmented by account type
fairly close to the root.

These two very small changes significantly improve training accuracy on my
journal, from 94.81% to 99.32% (an 86% reduction in error rate!).
@Zburatorul
Copy link
Collaborator

Awesome! 99% is really good. I find mine much lower. Do you have any suggestions for how to diagnose?

@Zburatorul Zburatorul merged commit 30dc718 into jbms:master Sep 21, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants