-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detection is really slow in some cases #13
Comments
Hi @kaskawu! Thanks for your interest in the package and for reporting this issue. Strangely, I have a hard time replicating your results:
and with the change you propose:
What version of the That said, it does seem to make a massive difference on your system, so I'm certainly open to making this change. I do however want to make sure I fully understand the cause before implementing any changes. Thanks! |
That said, I tested across multiple python versions. I tried python 3.7 and 3.8, and the slowdown only happens on 3.8: Python 3.7:
Python 3.8:
|
Wow that's very interesting! Thanks for doing some more digging. I'll take a more detailed look at this soon, hopefully I can reproduce it in someway and figure out a good solution. Thanks again for reporting it! |
Same here, performance drops with python3.8
|
* Update URL regex to avoid catastrophic backtracking and increase performance. See [issue #13](#13) and [issue #15](#15). Thanks to @kaskawu for the fix and @jlumbroso for re-raising the issue. * Add ``num_chars`` keyword argument to ``read_as_dicts`` and ``csv2df`` wrappers. * Improve documentation w.r.t. handling large files. Thanks to @jlumbroso for raising this issue.
Hey there, first of all, great project!
The following commands takes a significant amount of time:
After benchmarking a little bit, the apparent cause is that the
unix_path
andurl
regexes in the detector are susceptible to a ReDOS .These change, which replace the regexes with (hopefully) equivalent ones fixes the most oblivious issues:
New results:
Python version: 3.8
The text was updated successfully, but these errors were encountered: