You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to use an existing training.json file on a dataset instead of getting output I have errors thrown:
csvdedupe --config_file=processors/csvdedupe-config.json --training_file=training.json --settings_file=processors/learned_settings data/finished/arts-and-cultural-assets-massachusetts-clustered.csv > test2.csv
INFO:root:imported 2673 rows
INFO:root:using fields: ['Name', 'Municipality']
INFO:root:taking a sample of 1500 possible pairs
INFO:dedupe.training:Final predicate set:
INFO:dedupe.training:(SimplePredicate: (sortedAcronym, Municipality), SimplePredicate: (wholeFieldPredicate, Name))
INFO:root:reading labeled examples from training.json
INFO:dedupe.api:reading training from file
Traceback (most recent call last):
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/predicates.py", line 168, in __call__
doc_id = self.index._doc_to_id[doc]
AttributeError: 'NoneType' object has no attribute '_doc_to_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/api.py", line 650, in readTraining
self.markPairs(training_pairs)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/api.py", line 730, in markPairs
self.active_learner.mark(examples, y)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/labeler.py", line 359, in mark
learner.fit_transform(self.pairs, self.y)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/labeler.py", line 195, in fit_transform
recall=1.0)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/training.py", line 26, in learn
dupe_cover = Cover(self.blocker.predicates, matches)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/training.py", line 379, in __init__
self._cover(predicates, pairs)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/training.py", line 387, in _cover
in enumerate(pairs)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/training.py", line 389, in<setcomp>
set(predicate(record_2, target=True)))}
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/predicates.py", line 170, in __call__
raise AttributeError("Attempting to block with an index "
AttributeError: Attempting to block with an index predicate without indexing records
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mzagaja/.virtualenvs/dedupe-examples/bin/csvdedupe", line 8, in<module>sys.exit(launch_new_instance())
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/csvdedupe/csvdedupe.py", line 180, in launch_new_instance
d.main()
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/csvdedupe/csvdedupe.py", line 110, in main
self.dedupe_training(deduper)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/csvdedupe/csvhelpers.py", line 257, in dedupe_training
deduper.readTraining(tf)
File "/Users/mzagaja/.virtualenvs/dedupe-examples/lib/python3.7/site-packages/dedupe/api.py", line 653, in readTraining
raise UserWarning('Training data has records not known '
UserWarning: Training data has records not known to the active learner. Read training in before initializing the active learner with the sample method, or use the prepare_training method.
Allegedly resolved in #761 on the dedupe side, but still manifesting here.
The text was updated successfully, but these errors were encountered:
When trying to use an existing training.json file on a dataset instead of getting output I have errors thrown:
Allegedly resolved in #761 on the dedupe side, but still manifesting here.
The text was updated successfully, but these errors were encountered: