Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for loops #63

Open
kkranker opened this issue Apr 2, 2018 · 8 comments
Open

for loops #63

kkranker opened this issue Apr 2, 2018 · 8 comments
Labels

Comments

@kkranker
Copy link

kkranker commented Apr 2, 2018

I do not understand how to implement a for loop using the parallel command. I have something like this:

local ellist A B C D E F
foreach el of local ellist {
  cmd, option(`el')
}

... and I want to execute each instance of cmd in parallel rather than in sequence. That is, for each time through the loop, I want to fire up a new instance of Stata to run cmd. If the length of ellist is greater than the number of clusters, I'd want parallel manage the workload so that (1) the number of loops running at one time is equal to the number of clusters and (2) the next loop starts when a cluster becomes available.

Can the parallel command do this kind of thing? How? Does it make a difference if I'm working in Stata versus or Mata? (I'm working in Mata.)

Thanks,
Keith

P.S. I see a parallel_for Mata program is in development, but I don't know how to use it.
P.P.S. I considered using pll_id inside the definition of cmd, but the problem is that the number of processes run will equal the number of clusters, not the number of elements in ellist.
P.P.P.S. If you can do this with for loops, can you also do it with while and do loops?

@gvegayon
Copy link
Owner

gvegayon commented Apr 2, 2018

In the following example we create a function called myfun that copies data from the variable ellist to vname (new variable) by looping through the elements of the variable ellist.

// Setup
clear all
set trace off
set more off

parallel setclusters 4

// Test data. You can specify the elements you want to loop in parallel here:
set obs 6
quietly {
  gen ellist = "A" if _n == 1
  replace ellist = "B" if _n == 2
  replace ellist = "C" if _n == 3
  replace ellist = "D" if _n == 4
  replace ellist = "E" if _n == 5
  replace ellist = "F" if _n == 6
}

// This program copies ellist into vname
program def myloop
    args vname
    
    // Creating the variable
    gen `vname' = ""

  // Looping through the data
    forval i = 1/`=_N' {
        qui replace `vname' = ellist[`i'] if _n == `i'
    }
end

// Calling the program in serial fashion
myloop ellist2

// Calling the program using parallel, we need to pass the program in prog
parallel, prog(myloop): myloop ellist2_pll

// Do we get the same output?
list 


// Same example but using mata --------------------------------------------------

mata
void myfunction(string scalar vname) {
    
    // Creating the data
    (void) st_addvar("str10", vname);
    
    string matrix D, A;
    D = st_sdata(., "ellist");
    A = st_sdata(.,vname);
    
    numeric scalar i;
    for (i = 1; i <= rows(A); i++)
        A[i] = D[i];
    
    st_sstore(.,vname, A);
    return;
}
end

// Serial and parallel fashion
m : myfunction("ellist_mata")
parallel, mata: m: myfunction("ellist_mata_pll")


// Do we get the same?
list
## 
## . // Setup
## . clear all
## 
## . set trace off
## 
## . set more off
## 
## . 
## . parallel setclusters 4
## N Clusters: 4
## Stata dir:  /usr/local/stata12/stata
## 
## . 
## . // Test data. You can specify the elements you want to loop in parallel here:
## . set obs 6
## obs was 0, now 6
## 
## . quietly {
## 
## . 
## . // This program copies ellist into vname
## . program def myloop
##   1.         args vname
##   2.         
## .         // Creating the variable
## .         gen `vname' = ""
##   3. 
## .   // Looping through the data
## .         forval i = 1/`=_N' {
##   4.                 qui replace `vname' = ellist[`i'] if _n == `i'
##   5.         }
##   6. end
## 
## . 
## . // Calling the program in serial fashion
## . myloop ellist2
## (6 missing values generated)
## 
## . 
## . // Calling the program using parallel, we need to pass the program in prog
## . parallel, prog(myloop): myloop ellist2_pll
## -------------------------------------------------------------------------------
## > -
## Exporting the following program(s): myloop
## 
## myloop:
##   1.         args vname
##   2.         gen `vname' = ""
##   3.         forval i = 1/`=_N' {
##   4.                 qui replace `vname' = ellist[`i'] if _n == `i'
##   5.         }
## -------------------------------------------------------------------------------
## > -
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters   : 4
## pll_id     : 3nc1i8tzl1
## Running at : /home/george/Documents/parallel/playground
## Randtype   : datetime
## 
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
## 
## . 
## . // Do we get the same output?
## . list 
## 
##      +-----------------------------+
##      | ellist   ellist2   ellist~l |
##      |-----------------------------|
##   1. |      A         A          A |
##   2. |      B         B          B |
##   3. |      C         C          C |
##   4. |      D         D          D |
##   5. |      E         E          E |
##      |-----------------------------|
##   6. |      F         F          F |
##      +-----------------------------+
## 
## . 
## . 
## . // Same example but using mata ----------------------------------------------
## > ----
## . 
## . mata
## ------------------------------------------------- mata (type end to exit) -----
## : void myfunction(string scalar vname) {
## >         
## >         // Creating the data
## >         (void) st_addvar("str10", vname);
## >         
## >         string matrix D, A;
## >         D = st_sdata(., "ellist");
## >         A = st_sdata(.,vname);
## >         
## >         numeric scalar i;
## >         for (i = 1; i <= rows(A); i++)
## >                 A[i] = D[i];
## >         
## >         st_sstore(.,vname, A);
## >         return;
## > }
## 
## : end
## -------------------------------------------------------------------------------
## 
## . 
## . // Serial and parallel fashion
## . m : myfunction("ellist_mata")
## 
## . parallel, mata: m: myfunction("ellist_mata_pll")
## -------------------------------------------------------------------------------
## Parallel Computing with Stata
## Clusters   : 4
## pll_id     : 3nc1i8tzl3
## Running at : /home/george/Documents/parallel/playground
## Randtype   : datetime
## 
## Waiting for the clusters to finish...
## cluster 0001 has exited without error...
## cluster 0002 has exited without error...
## cluster 0003 has exited without error...
## cluster 0004 has exited without error...
## -------------------------------------------------------------------------------
## Enter -parallel printlog #- to checkout logfiles.
## -------------------------------------------------------------------------------
## 
## . 
## . 
## . // Do we get the same?
## . list
## 
##      +---------------------------------------------------+
##      | ellist   ellist2   el~2_pll   ellist~a   el~a_pll |
##      |---------------------------------------------------|
##   1. |      A         A          A          A          A |
##   2. |      B         B          B          B          B |
##   3. |      C         C          C          C          C |
##   4. |      D         D          D          D          D |
##   5. |      E         E          E          E          E |
##      |---------------------------------------------------|
##   6. |      F         F          F          F          F |
##      +---------------------------------------------------+

@kkranker
Copy link
Author

kkranker commented Apr 3, 2018

Thank you -- storing the element list and the output inside of variables is a neat trick that I hadn't thought of. I'll play around with this idea and see if I can make it work.

@kkranker kkranker closed this as completed Apr 3, 2018
@kkranker kkranker reopened this Feb 22, 2019
@kkranker
Copy link
Author

I had this working a while ago, but I'm finding that something broke. Perhaps there is a bug in the latest version of your code? When I run the code above, I get the error

cluster #### Exited with error -3499- while running the command/dofile (view log)...

The log files have:

 /* Checking for break */
. mata: parallel_break()
.     m: myfunction("ellist_mata_pll")
                 <istmt>:  3499  myfunction() not found
r(3499);
.   }

I appears the mata function (myfunction) is not being passed to child clusters. Any suggestions?

I'm using the latest version of parallel from SSC. I can't figure out how to install directly from GitHub.

. which parallel
C:\Users\kkranker\Documents\Stata\Ado\plus\p\parallel.ado
*! version 1.15.8.19  19agol2015
*! PARALLEL: Stata module for parallel computing
*! by George G. Vega [cre,aut], Brian Quistorff [ctb]

@gvegayon
Copy link
Owner

Perhaps you updated Stata? As you can see, the SSC version is pretty old. Instructions to install the dev version are here: https://github.com/gvegayon/parallel#development-version-latestmaster try following those and let us know.

@kkranker
Copy link
Author

kkranker commented Feb 27, 2019 via email

@gvegayon
Copy link
Owner

You should try downloading another version directly as a zip file as explained here: https://github.com/gvegayon/parallel/tree/sj-review#development-version-latestmaster
Follow those instructions and let us know how it goes.

@bquistorff
Copy link
Collaborator

I just reconfirmed that the net install from GitHub it worked for me on Stata 14 on Windows (and I've done it previously on Linux). Might be something with Stata v15, or something with your local setup. Hard to tell.

@kkranker
Copy link
Author

kkranker commented Feb 27, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants