Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting API rate when creating docklets parallely #108

Closed
simecek opened this issue Oct 8, 2015 · 6 comments
Closed

Hitting API rate when creating docklets parallely #108

simecek opened this issue Oct 8, 2015 · 6 comments
Assignees
Milestone

Comments

@simecek
Copy link

simecek commented Oct 8, 2015

I use analogsea to start DO machines for course participants (https://github.com/churchill-lab/sysgen2015). To send the same set of instructions to >30 dockets, I am using doParallel/foreach loop, for example this pull "churchill/doqtl" image to all docklets

# pulling docker images
foreach(i = 1:N, .packages="analogsea") %dopar% {

  # select droplet
  d = droplet_list[[i]]

  # pull docker images
  d %>% docklet_pull("rocker/hadleyverse")
  d %>% docklet_pull("churchill/doqtl")
  d %>% docklet_images()
}

The problem is when I tried parallelization of docklet_create:

# starting docklet
droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  docklet_create(size = getOption("do_size", "8gb"),
                        region = getOption("do_region", "nyc2"))
}

For some reason, the package sent crazy amount of API requests and hit 5000/hour API rate in a few seconds. I filled a ticket on Digital Ocean and got the graph with number of requests per 5 minutes.

cy_dixda3npdlnuti38zfzb3j9ikqgoy_ye4lkrn0mo

When I use for instead of foreach, everything is fine (but slow).

droplet_list <- list()
for (i in 1:N) {
  print(i)
  # start i-th machine
  droplet_list[[i]] <- docklet_create(size = getOption("do_size", "8gb"),
                                      region = getOption("do_region", "nyc2"))
}

I believe it is not an issue of foreach or Digital Ocean but the problem of docklet_create.

@sckott
Copy link
Collaborator

sckott commented Oct 8, 2015

Thanks for the report @simecek - I'll have a look

@sckott sckott added this to the v0.4 milestone Oct 8, 2015
@sckott sckott self-assigned this Oct 8, 2015
@sckott
Copy link
Collaborator

sckott commented Oct 8, 2015

hi again, okay, i made a few small changes, so reinstall from github devtools::install_github("sckott/analogsea")

The wait parameter is the key here. it is by default TRUE - which means we ping the DO API every 1 second to check if the droplet is up or not yet. Once it's up we exit the function call and return the droplet object.

You can set this to FALSE and not do any of those API pings - of course the object returned will be missing the IP address though, but you can manually do your own pinging if you want until its back up, or wait till up, then call droplet(d$id) to renew metadata for the object

I added an option do.wait_time that you can set. It's default is 1 second. So if you still want the wait to occur (pinging every X seconds until the droplet is up), you can do that with whatever time interval you like.

That makes sense that for would take a lot longer than foreach since you had wait=TRUE, so each droplet spin up had to finish before the next could start.

let me know if the changes help.

@simecek
Copy link
Author

simecek commented Oct 26, 2015

Hi sckott,

I reinstalled analogsea from Github and set do.wait_time to 30. I got API error later but hit it anyway. I am suspicious that Sys.sleep in action_wait somehow does not work (=runs faster) when processed in parallel (as below)

library(parallel)
library(doParallel)
library("analogsea")

N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)
options(do.wait_time=30)

droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  docklet_create(size = getOption("do_size", "512mb"),
                 region = getOption("do_region", "nyc2"))
}

However, when I set wait to FALSE everything works fine and as you suggested I used droplet function to get IP later.

Thank you very much for you help. From my perspective the issue was resolved.

@simecek simecek closed this as completed Oct 26, 2015
@sckott
Copy link
Collaborator

sckott commented Oct 26, 2015

@simecek Glad it's resolved.

I am suspicious that Sys.sleep in action_wait somehow does not work (=runs faster) when processed in parallel (as below)

Do you know if when you tried that your rate limit was at its max? I'll test this out and see if the wait time is ignored.

@simecek
Copy link
Author

simecek commented Oct 27, 2015

I re-run the code and found the bug. do.wait_time needs to be set inside the foreach loop. With the modified version below, everything works fine and I do not get API error. Thank you once more.

library(parallel)
library(doParallel)
library("analogsea")

N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)

droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  options(do.wait_time=30)
  docklet_create(size = getOption("do_size", "512mb"),
                 region = getOption("do_region", "nyc2"))
}

@sckott
Copy link
Collaborator

sckott commented Oct 27, 2015

Great, glad it worked. I'll make a note in the docs about this so other users don't have to run into the same problem.

sckott added a commit that referenced this issue Oct 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants