Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if we have training data, make sure it gets indexed by the active learner #761

Merged
merged 8 commits into from
Aug 9, 2019

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Jul 22, 2019

Start addressing #757, #753

@victorpoluceno
Copy link

I still see this error happening in a random way. Sometimes it works sometimes it does not. But in my case I'm using the RecordLink. This is the stack trace:

Traceback (most recent call last):
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/predicates.py", line 199, in __call__
    centers = [self.index._doc_to_id[doc]]
AttributeError: 'NoneType' object has no attribute '_doc_to_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 82, in <module>
    main(args)
  File "train.py", line 70, in main
    args.provider_code, args.model_version)
  File "train.py", line 39, in train
    sample_size=15000)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/api.py", line 901, in sample
    self.active_learner.mark(examples, y)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/labeler.py", line 348, in mark
    learner.fit_transform(self.pairs, self.y)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/labeler.py", line 198, in fit_transform
    recall=1.0)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/training.py", line 31, in learn
    dupe_cover = Cover(self.blocker.predicates, matches)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/training.py", line 383, in __init__
    self._cover(predicates, pairs)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/training.py", line 391, in _cover
    in enumerate(pairs)
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/training.py", line 393, in <setcomp>
    set(predicate(record_2, target=True)))}
  File "/Users/victorpoluceno/go/src/github.com/FindHotel/sovitin/scripts/candidates/build/dedupe/dedupe/predicates.py", line 203, in __call__
    raise AttributeError("Attempting to block with an index "
AttributeError: Attempting to block with an index predicate without indexing records

@victorpoluceno
Copy link

@fgregg after your latest changes, I can confirm this now works for RecordLink. 👍

Thanks for fixing this!

@quillan86
Copy link

Hi - have these changes been uploaded to pip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants