Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dask Example #21

Open
bkmgit opened this issue May 23, 2021 · 3 comments
Open

Update Dask Example #21

bkmgit opened this issue May 23, 2021 · 3 comments

Comments

@bkmgit
Copy link
Contributor

bkmgit commented May 23, 2021

  • It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.
  • Since most HPC clusters will have schedulers, it may be worth using dask-jobqueue, a possible example script is below
import numpy as np
import argparse
import sys
import math
import dask.array as da
from dask_jobqueue import SLURMCluster
from dask.distributed import Client

cluster = SLURMCluster(cores=4,
                       processes=1,
                       memory="4GB",
                       walltime="00:10:00")


np.random.seed(2021)
da.random.seed(2021)

def inside_circle(total_count, chunk_size=-1):
  x = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  y = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  radii = da.sqrt (x*x + y*y)
  filtered = da.where(radii <= 1.0)
  indices = np.array(filtered[0])
  count = len(radii[indices])
  return count

def estimate_pi(total_count,chunk_size):
  count = inside_circle(total_count, chunk_size)
  return (4.0 * count / total_count )


def main():
  parser = argparse.ArgumentParser(
          description='Estimate Pi using a Monte Carlo method.')
  parser.add_argument('n_samples', metavar='N', type=int, nargs=1,
          default=10000,
          help='number of times to draw a random number')
  parser.add_argument('chunk_size', metavar='N', type=int, nargs=1,
          default=1000,
          help='chunk size')
  args = parser.parse_args()

  n_samples = args.n_samples[0]
  chunk_size = args.chunk_size[0]
  client = Client(cluster)
  my_pi = estimate_pi(n_samples,chunk_size)

  print("[dask version] pi is %f from %i samples with %i" % (my_pi, n_samples,chunk_size))
  sys.exit(0)

if __name__=='__main__':
  main()
  • It may also be worth considering Ray
@psteinb
Copy link
Member

psteinb commented May 27, 2021

It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.

  • for sure - if this lesson can be delivered, depends on the site you use. Typically custom ports higher than 1024 are open for traffic on the landing pad (login nodes).

  • I'm not sure if things changed. When I wrote the dask part, dask.jobqueue was an additional package to install. I decided not to use it in order to reduce dependencies. Maybe that needs to be revisited.

  • ray is an interesting library. The main consideration for showcasing dask is to illustrate a paradigm shift that came into the parallel computing community around the 00 years of this century, i.e. the client-server architecture which potentially auto-parallelizes code. And I must admit, this point was often hard to bring across. In general, this lesson bears the potential to fall victim of feature envy, i.e. constantly update the content with new libraries coming up and fashionable. I think, the focus should remain on concepts.

@ocaisa
Copy link
Member

ocaisa commented Jul 30, 2021

I have used dask-jobqueue a lot and have organised some tutorials on it. To me, it is great way to introduce interactive supercomputing. It really is best used through JupyterHub though, where you have really nice visualisations. This can also be made to work well with remote systems. There are great lessons out there in this respect, but of course they use Jupyter notebooks not a Carpentries template, for example see https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02

@ocaisa
Copy link
Member

ocaisa commented Jul 30, 2021

There are solutions for that which could still allow us to stick (mostly) to the Carpentries template, https://jekyllnb.readthedocs.io/en/latest/ used within a GitHub Action could do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants