Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterblocks #53

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

waylonflinn
Copy link
Contributor

Simplify the cython code with bcolz.iterblocks, as in @FrancescElies PR

@waylonflinn
Copy link
Contributor Author

Adapted from #24

@CarstVaartjes
Copy link
Member

Really nice work! the only thing what I need to check now if we can do this inside a nogil prange (for multi threading)

@waylonflinn
Copy link
Contributor Author

I've also got one more thing I want to add before this gets merged.

@waylonflinn
Copy link
Contributor Author

This is done, if you'd like to merge it.

@CarstVaartjes
Copy link
Member

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is no cython implementation possible/planned right? as we cannot multi-thread python functions...

@waylonflinn
Copy link
Contributor Author

I've been thinking a lot about this and I'm planning a more general solution to problems like this. It will involve a map reduce style processing layer.

I don't think processing multiple columns simultaneously is the way forward. I'm going to focus on processing rows in a single chunk in parallel.


Sent from Mailbox

On Sun, Oct 11, 2015 at 2:18 PM, Carst Vaartjes notifications@github.com
wrote:

I think that we cannot use bz.iterblocks because of this: http://docs.cython.org/src/userguide/external_C_code.html#nogil

@FrancescElies there is not possible cython implementation right? as we cannot multi-thread python functions...

Reply to this email directly or view it on GitHub:
#53 (comment)

@CarstVaartjes
Copy link
Member

Let's see, really open to suggestions but the problem is that we have a few algorithms that depend on the processing order; so parallelism in the blocks that would disturb the order can give issues :/

P.s. why do you think column-parallelism is a bad idea?

@FrancescElies
Copy link
Contributor

...no cython implementation possible/planned? Not sure if I understand the question

@CarstVaartjes
Copy link
Member

Sorry! to explain myself: the iterblocks is currently defined here as a normal python function: https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it multi-threaded, so that's why we cannot use it (even though it makes the code much nicer as you can see in this pull request)

@FrancescElies
Copy link
Contributor

Not that I'm aware, but you could ask in the mailing list

On Sun, Oct 11, 2015 at 10:01 PM, Carst Vaartjes notifications@github.com
wrote:

Sorry! to explain myself: the iterblocks is currently defined here as a
normal python function:
https://github.com/Blosc/bcolz/blob/master/bcolz/toplevel.py

unless it's a nogil defined cython function, we cannot call it
multi-threaded, so that's why we cannot use it (even though it makes the
code much nicer as you can see in this pull request)


Reply to this email directly or view it on GitHub
#53 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants