-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get an error message while running match_string #63
Comments
Hi @iibarant please provide more information. 88510 is small enough to handle by my computer (which can process >500,000 records). |
Here's the code: from string_grouper import match_strings I run the code on MacBook Pro 2.4 GHz 8-Core Intel Core i9 32 GB 2667 MHz DDR4 Thank you! |
Thanks @iibarant Curious! This is an unexpected error. Can you please provide the traceback log (just copy and paste whatever python spits out) of the error so that I can determine where exactly the problem is stemming from in the code. |
There you go ... matches = match_strings(check2['full address']) File "", line 1, in File "/opt/anaconda3/lib/python3.8/site-packages/string_grouper/string_grouper.py", line 131, in match_strings File "/opt/anaconda3/lib/python3.8/site-packages/string_grouper/string_grouper.py", line 264, in fit File "/opt/anaconda3/lib/python3.8/site-packages/string_grouper/string_grouper.py", line 467, in _build_matches File "/opt/anaconda3/lib/python3.8/site-packages/sparse_dot_topn/awesome_cossim_topn.py", line 119, in awesome_cossim_topn File "sparse_dot_topn/sparse_dot_topn_threaded.pyx", line 133, in sparse_dot_topn.sparse_dot_topn_threaded.__pyx_fuse_0sparse_dot_topn_extd_threaded File "sparse_dot_topn/sparse_dot_topn_threaded.pyx", line 168, in sparse_dot_topn.sparse_dot_topn_threaded.sparse_dot_topn_extd_threaded OverflowError: value too large to convert to int |
Looks like the error stems from ‘sparse_dot_topn’, a package dependency. Could you try the following command just to see what happens: matches = match_strings(check2['full address'], max_n_matches=20) (This limits the output a bit.) |
Yes, that works. Thank you. Should I check whether the code works with greater max_n_matches ? I'm planning to keep similarity score > 0.9. Would it be possible to apply on the call? |
Ok good. Yes, I suggest you try successively larger values of |
Hi there,
I would like to run match_strings on addresses on df with 88510 rows and 3 columns.
All I get is
OverflowError: value too large to convert to int.
Is there a quick fix?
Thank you very much!
The text was updated successfully, but these errors were encountered: