Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterblocks #24

Closed
wants to merge 8 commits into from
Closed

Conversation

FrancescElies
Copy link
Contributor

trying iterblocks as suggested in Blosc/bcolz#153

@FrancescElies FrancescElies force-pushed the iterblocks branch 2 times, most recently from 8b96e97 to af1eaf6 Compare March 10, 2015 12:01
@FrancescElies
Copy link
Contributor Author

This branch contains no new features but takes advantage of iterblocks to avoid code duplication while dealing chunks and leftovers.

Refactoring is based on #16

In terms of performance I experienced the same orders of magnitude compared to master (sometimes a bit slower, sometimes even faster). While refactoring I did my best but I might have written some new lines of code which might be optimized. Of course still some further testing is needed (I'll be working on that)

@CarstVaartjes & @esc between the different options we have, iterblocks and chunks iterator (atm not yet working), which one should be the way to go?
In case is decided to go for iterblocks, this will have impact on @ARF1's PR , would this work fine with cython.parallel? I hope so, but I do not know.

@ARF1
Copy link

ARF1 commented Mar 13, 2015 via email

@FrancescElies
Copy link
Contributor Author

@ARF1 I don't know if you are working on top of this branch, but would you be ok if I rebase on top of master? Since we already merged #16, templating commits are a distraction when one looks at 'Files changed'.

If you could share your parallel processing branch with iterblocks I would love to have a look at it too.
I also played around with cython parallel (your branch #22 was very helpful), I did some try outs but not on bquery, so far I did not achieve to make any real performance improvements. I also feel this is going to take a while

@FrancescElies
Copy link
Contributor Author

By the way if you already tried iterblocks did you experience any difference in terms of performance?

@FrancescElies FrancescElies force-pushed the iterblocks branch 2 times, most recently from fe93001 to 28bdd7d Compare March 13, 2015 16:13
@ARF1
Copy link

ARF1 commented Mar 13, 2015

@FrancescElies FrancescElies Please feel free to rebase.

I have a (non-functional) parallel processing iterblocks branch based on
an earlier revision of iterblocks and am working right now on
cherry-picking a functional parallel processing branch (based on earlier
revision of bquery master) to put on top of iterblocks.

Once I am done with the latter I will share them for you to look at. I
was hoping that by comparing both branches I could figure out what is
going wrong in my iterblocks branch. But I have a sneaking suspicion
that won't be the case.

Maybe you can figure out what is going on. BTW: with my functional
branch I was seeing about x1.8 speedup with 2 cores relative to master
if I remember correctly.

@CarstVaartjes
Copy link
Member

hi @FrancescElies :) how are you? just wondering -> is this still applicable with the latest version of bcolz? (and are left over arrays included there?)
because it would clean up the code quite a bit. about multi-threading: the point i never got around to explaining, is that i would make the "per-column" part multi threaded and not so much the iterblocks themselves. this has to do with the fact that quite a few algorithms rely on the order of data (and if that is broken by parallelism, stuff gets hard). But each aggregation (and groupby) column creation could run parallel without any issue

@FrancescElies
Copy link
Contributor Author

Hi Carst,
I'm doing good and you? I believe this still should be fine with newer versions of bcolz, leftovers were handled yes. But I would recommend to test it it I don't remember exactly in which state I left this branch

@CarstVaartjes
Copy link
Member

As busy as ever here :/ but otherwise good :) I think this is pre-template switch, but i'll rob the implementation in a new branch then ;)

@FrancescElies
Copy link
Contributor Author

If I remember correctly it was working, at least as far as I can see that's
what travis said a that time, but justfeel free to do what is best for
your needs, thanks for asking

On Sun, Sep 27, 2015 at 9:17 PM, Carst Vaartjes notifications@github.com
wrote:

As busy as ever here :/ but otherwise good :) I think this is pre-template
switch, but i'll rob the implementation in a new branch then ;)


Reply to this email directly or view it on GitHub
#24 (comment).

This was referenced Oct 5, 2015
@FrancescElies
Copy link
Contributor Author

closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants