-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time required for sample_all function? #118
Comments
Thank you for reporting this @clj2567. To be able to help you, we would need some additional information about the data and metadata you are trying to sample. Also, could you provide us a code snippet that are you using to sample? |
The data is the one that is referenced in the paper https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings/data This is the metadata file that I am using.meta_sample.txt Here is the code snippet
|
Hi @clj2567 There is an error in the categorical values sampling implementation that provoked this behavior: the time it takes to sample increases exponentially with the number of categorical columns found in the dataset. This fix for this has been covered in the issue_120_compatibility_with_rdt_issue72 branch and will be released soon. |
Hi @csala , Thanks for the update. Is there any timeline around this? |
Yes. This will most likely be released this week. |
This should have been fixed in PR #121 |
* Bump version: 0.3.1.dev0 → 0.3.1.dev1 * Validates discrete columns * Fix lint
The sample_all function never really returns anything. It keeps running in a loop. Is there any solution for this? One instance it ran for 2 hours , still no output.
The text was updated successfully, but these errors were encountered: