Update Dask Example #21

bkmgit · 2021-05-23T16:27:01Z

It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.
Since most HPC clusters will have schedulers, it may be worth using dask-jobqueue, a possible example script is below

import numpy as np
import argparse
import sys
import math
import dask.array as da
from dask_jobqueue import SLURMCluster
from dask.distributed import Client

cluster = SLURMCluster(cores=4,
                       processes=1,
                       memory="4GB",
                       walltime="00:10:00")


np.random.seed(2021)
da.random.seed(2021)

def inside_circle(total_count, chunk_size=-1):
  x = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  y = da.random.uniform(size=(total_count),
                        chunks=(chunk_size))
  radii = da.sqrt (x*x + y*y)
  filtered = da.where(radii <= 1.0)
  indices = np.array(filtered[0])
  count = len(radii[indices])
  return count

def estimate_pi(total_count,chunk_size):
  count = inside_circle(total_count, chunk_size)
  return (4.0 * count / total_count )


def main():
  parser = argparse.ArgumentParser(
          description='Estimate Pi using a Monte Carlo method.')
  parser.add_argument('n_samples', metavar='N', type=int, nargs=1,
          default=10000,
          help='number of times to draw a random number')
  parser.add_argument('chunk_size', metavar='N', type=int, nargs=1,
          default=1000,
          help='chunk size')
  args = parser.parse_args()

  n_samples = args.n_samples[0]
  chunk_size = args.chunk_size[0]
  client = Client(cluster)
  my_pi = estimate_pi(n_samples,chunk_size)

  print("[dask version] pi is %f from %i samples with %i" % (my_pi, n_samples,chunk_size))
  sys.exit(0)

if __name__=='__main__':
  main()

It may also be worth considering Ray

psteinb · 2021-05-27T07:26:17Z

It may not be realistic to assume that most clusters will allow setting up of a webserver for viewing the scheduler.

for sure - if this lesson can be delivered, depends on the site you use. Typically custom ports higher than 1024 are open for traffic on the landing pad (login nodes).
I'm not sure if things changed. When I wrote the dask part, dask.jobqueue was an additional package to install. I decided not to use it in order to reduce dependencies. Maybe that needs to be revisited.
ray is an interesting library. The main consideration for showcasing dask is to illustrate a paradigm shift that came into the parallel computing community around the 00 years of this century, i.e. the client-server architecture which potentially auto-parallelizes code. And I must admit, this point was often hard to bring across. In general, this lesson bears the potential to fall victim of feature envy, i.e. constantly update the content with new libraries coming up and fashionable. I think, the focus should remain on concepts.

ocaisa · 2021-07-30T10:05:12Z

I have used dask-jobqueue a lot and have organised some tutorials on it. To me, it is great way to introduce interactive supercomputing. It really is best used through JupyterHub though, where you have really nice visualisations. This can also be made to work well with remote systems. There are great lessons out there in this respect, but of course they use Jupyter notebooks not a Carpentries template, for example see https://github.com/ExaESM-WP4/workshop-Dask-Jobqueue-cecam-2021-02

ocaisa · 2021-07-30T10:49:07Z

There are solutions for that which could still allow us to stick (mostly) to the Carpentries template, https://jekyllnb.readthedocs.io/en/latest/ used within a GitHub Action could do this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Dask Example #21

Update Dask Example #21

bkmgit commented May 23, 2021

psteinb commented May 27, 2021 •

edited

Loading

ocaisa commented Jul 30, 2021 •

edited

Loading

ocaisa commented Jul 30, 2021

Update Dask Example #21

Update Dask Example #21

Comments

bkmgit commented May 23, 2021

psteinb commented May 27, 2021 • edited Loading

ocaisa commented Jul 30, 2021 • edited Loading

ocaisa commented Jul 30, 2021

psteinb commented May 27, 2021 •

edited

Loading

ocaisa commented Jul 30, 2021 •

edited

Loading