Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel access? #5

Closed
haroldsoh opened this issue Jul 18, 2015 · 6 comments
Closed

Parallel access? #5

haroldsoh opened this issue Jul 18, 2015 · 6 comments

Comments

@haroldsoh
Copy link

Is it possible to access a database using parallel julia processes? I'm running into problems with the code below (could be that I'm messing something up). Other than one process, the others seem to hang (perhaps waiting for db access?):

# restart processes
if nprocs() > 1
    rmprocs(workers()) # remove all worker processes
end
wpids = addprocs(2) # add processes

println("Spawned ", nprocs(), " processes, ", nworkers()," workers")
println("Proc IDs: ", procs())

# load LMDB on all processes
@everywhere using LMDB

# create a sample database 
nsamples = 100
dbname = "simpleseq.db"
if !isdir(dbname)
     mkdir(dbname)
end

# the data are just {1:1, 2:2 ... }
create() do env
    #put!(env, :Flags, LMDB.NOSYNC)
    open(env, dbname)
    for i=1:nsamples
        start(env) do txn
            open(txn) do dbi
                open(txn, dbi) do cur
                    insert!(cur, string(i), string(i))    
                end
                commit(txn)
            end
        end
    end
end

# load up the functions (see below for specification)
@everywhere using ParHelperFuncs

# the following single process call works
miniBatchSum([1,2,3], dbname)

# the following (which does it in parallel) does not work
# we generate some ids to split across the nodes
# each node will process sample_size values
# the ids are put into proc_idxs
sample_size = 10;
idxs = randperm(nsamples);
idxs = idxs[1:(nworkers()*sample_size)]
proc_idxs = Any[]
st_idx = 1;
en_idx = sample_size;
for i=1:nworkers()
    push!(proc_idxs, idxs[st_idx:en_idx]);
    st_idx = en_idx+1;
    en_idx = en_idx+sample_size;
end

# spawn and run across all worker nodes
k = 1;
remrefs = Array(Any, nworkers());
for proc in workers()
    println("Remote call to: ", proc);
    remrefs[k] = remotecall(proc, miniBatchSum, proc_idxs[k], dbname); 
    k += 1;
end

# collect the results
k = 1;
results = Array(Any, nworkers());
for k = 1:length(remrefs)
    wait(remrefs[k]); 
    results[k] = fetch(remrefs[k]);
    k += 1;
end

And in ParHelperFuncs.jl

module ParHelperFuncs

using LMDB

export getSamplesFromDb, miniBatchSum

# we pull samples from the database
function getSamplesFromDb(env, idxs::Array{Int})
    txn = start(env)
    dbi = open(txn)
    xs = Int[]
    for idx in idxs
        key = string(idx)
        val = get(txn, dbi, key, String);
        val = int(val)
        push!(xs, val)
    end
    return xs
end

# we 
function miniBatchSum(idxs, dbname::String)
    # open the database 
    println("Opening ", dbname)
    env = create()
    open(env, dbname)
    xs = getSamplesFromDb(env, idxs)
    close(env)

    # the cost is the sum of all the values we get from the db
    cost = 0.0
    for x in xs
        cost += x;
    end 

    return cost
end

end
@wildart
Copy link
Owner

wildart commented Jul 19, 2015

Julia model of parallel computing is process based, so structure you parallel program according to that model. Here is quote from LMDB python wrapper docs: "Environments may be opened by multiple processes on the same host, making it ideal for working around Python’s GIL". Python model will work in Julia case: Try to open multiple environments.

@haroldsoh
Copy link
Author

Sorry wildart, I couldn't understand your response. In my code, I am already creating multiple environments (one on each process), i.e., each miniBatchSum() function calls create() followed by and open(env, dbname). Is this what you mean or something else? Thanks!

@wildart
Copy link
Owner

wildart commented Jul 19, 2015

Yes, in each process you create environment, like in your miniBatchSum.

@wildart
Copy link
Owner

wildart commented Jul 19, 2015

I'm getting bunch of segfaults on your code. I am going to look at parallel execution closer while I get an overhaul of the wrapper code, #6.

@wildart
Copy link
Owner

wildart commented Aug 8, 2015

@haroldsoh I updated package. It works with the above example now (updated example code).

@wildart
Copy link
Owner

wildart commented Nov 17, 2017

I added parallel access example to repository: misc/ptest.jl

@wildart wildart closed this as completed Nov 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants