-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded v1 #37
Multithreaded v1 #37
Conversation
Oh wow; it seems you are really getting serious about multi-threading. @CarstVaartjes had mentioned that being on the todo list on the "categoricals" issue for bcolz but I thought that was more long-term... Quick heads up: I have a very strong suspicion that the iterblocks benchmarks in #22 are not representative. I believe I accidentally ran them with an input-database that had a MUCH too small chunklen. As iterblocks is implemented in python in bcolz that would obviously be an issue to the too frequent looping in python. Rerunning the benchmarks is on my todo list but I have not had the time yet. You mention that there are still problems with this branch. Let me know if there is anything I can do to help. |
7f42e72
to
a005bdc
Compare
Hi, thanks for your feedback, I was having problems because of copy pasting your code without thinking what was going on, your branch was really really helpful, no way I could have done without using it as a reference (I would have loved to cheery picked some stuff that is exactly the same in both but I struggled). Now seems to work, at least test pass now, though some more intensive tests will be needed I think. For the moment I am working on this but this could change rapidly, I don't really know I cannot give you any specifics about this. |
a005bdc
to
357e283
Compare
@FrancescElies I think I am getting far too much credit here. - And I feel kind of guilty because I had a nicer version of the helper function laying around that implemented @CarstVaartjes's suggestion to drop the Recipe for porting to your code:
You probably still need to tweak this a bit because it was created for the string helper function. (See in-line comments.) |
so far no performance improvements |
No unless your number of unique values becomes truely huge, it would not expect any performance improvements. C++ vectors are know to be very high-performance datastructures. One situation where you might see minor performance effects is if your factors have large load factors: If almost every item is unique, the vector will grow much more and re-allocation and copying of existing values will occur more often. But even then, I think it's allocated size is grown exponentially (or something similar) to limit the number of re-allocations. I think CarstVaartjes only meant that this would clean up the implementation by eliminating a redundant argument/variable. Personally, I really like the streamlined look of |
this has been inactive for a long time now, closing |
Based onto: #24
Most of this modifications come from #22 applied to int32, int64 and float specific functions (Thanks @ARF1)
Non functional branch, still having some problems.