-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update an existing model, rather than learning a new one from scratch each time? #672
Comments
+1 |
1 similar comment
+1 |
This is my first time using dedupe - trying to dedupe 1.5 million records (mostly names/addresses). However, is this not what |
The |
+1 |
@adriennefranke @DustinReagan Were you guys able to just add labeled examples to the model without having to reload all the data? If not how did you go about the problem where more and more data keep coming in batches and then needs to be combined to maste deduped data. |
Hi, Note: the pull request has now been merged to master. |
Thanks for this project, it's a been very useful to me! However, I have a small question/issue:
Say I have a trained model and periodically get new training data that I'd like to use to update my model.
From what I can tell, it's impossible to load an existing settings file (which I believe contains the previously learned predicates?), add some new marked pairs, then train the model. Instead, it seems I have to:
It would be nice if I could skip step 1., since this seems to take the longest and in theory simply loading my existing settings file should get me to that point.
what I'm doing now (pseudocode):
What I'd like to be able to do:
Am I missing something?
Edit: thinking about it a bit more, I guess the model would need to have samples loaded up to re-train on the new data anyhow...so there's no way to skip the data load/sample step?
Thanks,
Dustin
The text was updated successfully, but these errors were encountered: