Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the Dask Compute #131

Closed
rosepearson opened this issue Dec 19, 2022 · 8 comments · Fixed by #183
Closed

Remove the Dask Compute #131

rosepearson opened this issue Dec 19, 2022 · 8 comments · Fixed by #183
Labels
enhancement New feature or request performance
Milestone

Comments

@rosepearson
Copy link
Owner

Remove the explicit dask compute call.
image

  1. Deal with this mask:
    image
@rosepearson rosepearson changed the title Remote the Dask Compute Remove the Dask Compute Dec 19, 2022
@jennan
Copy link
Collaborator

jennan commented Dec 19, 2022

some lines that could cause issues

@rosepearson rosepearson added enhancement New feature or request performance labels Jan 8, 2023
@rosepearson
Copy link
Owner Author

rosepearson commented May 11, 2023

Not quite the same subject - but when picking this up again could look at using dask with pandas to remove the horrendous bottle neck that exists when calculating the 'open waterway elevations'. See https://github.com/rosepearson/GeoFabrics/blob/main/src/geofabrics/processor.py#L2386 - move to issue #130

@rosepearson
Copy link
Owner Author

Instructions for setup:

module purge
module load NeSI
module load Miniconda3
export CYLC_VERSION=8.0.3
export PROJECT=niwa03440

set +u
module load Anaconda3
source $(conda info --base)/etc/profile.d/conda.sh
conda activate /nesi/project/niwa03440/conda/envs/htop
set -u

Running the code:

geofabrics_from_file --instruction /nesi/project/niwa03440/geofabrics/instructions/performance/issue131/small.json

@jennan
Copy link
Collaborator

jennan commented Jun 26, 2023

I have started some changes in compute_to_disk's branch, the part saving from computation work, but the code crashes before the end (likely due to commenting out saving extends 😅)

@rosepearson
Copy link
Owner Author

rosepearson commented Jun 26, 2023

Thanks for the catch-up. It'l looking promising. Looking forward to hearing how the large example goes.

At this stage:

  1. I am expecting to only consider a more advanced DASK scheduler in a later issue.
  2. I am expecting to address various TODOs in the code including moving/removing or updating the 'get extents' code

See issue 153 for notes around planned changes to how or if the extents are calculated.

@rosepearson rosepearson linked a pull request Jul 3, 2023 that will close this issue
10 tasks
@jennan
Copy link
Collaborator

jennan commented Jul 7, 2023

I have run overnight the large example, using a Māui ancil node. It took 6 hours and max 40GB.

Few additional notes:

  • 20 workers & memory limit of 5GB (20 cores & 100 GB requested via Slurm)
  • chunk_size=100:
    • this lowered the memory pressure on workers but created more tasks (than chunk_size=200)
    • with chunk_size=200, workers kept being restarted due to too high memory usage (and I wanted to avoid putting too much memory per worker), and restarting a worker implies recomputing data
  • switch to .zarr output format:
    • with netcdf, workers have to wait for a lock to write on disk
    • with chunk_size=100 + netcdf, the workers wait too long, leading to poor utilisation (low % of cpu usage) per worker
    • but none of this happened with zarr

Ideally I should add the timing of converting from zarr to netcdf.

Efficiency of the job

 $ nn_seff 30023700
Cluster: maui_ancil
Job ID: 30023700
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   59.2%  05:54:55 of 10:00:00 time limit
CPU Efficiency: 186.7%  4-14:26:59 of 2-11:09:10 core-walltime
Mem Efficiency:  37.0%  36.97 GB of 100.00 GB

And here is the slurm job I used for the records

#!/usr/bin/env bash
#SBATCH --time=00-10:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=100GB
#SBATCH --output logs/%j-%x.out
#SBATCH --error logs/%j-%x.out

# exit on errors, undefined variables and errors in pipes
set -euo pipefail

# load required environment modules
module purge
module load NeSI
module load Miniconda3/22.11.1-1

# activate conda environment
set +u
source $(conda info --base)/etc/profile.d/conda.sh 
conda deactivate  # workaround for https://github.com/conda/conda/issues/9392
export PYTHONNOUSERSITE=1
conda activate ./venv_new
set -u

# disable spilling on disk and increase threshold to pause/kill dask worker
export DASK_DISTRIBUTED__WORKER__MEMORY__TARGET=False
export DASK_DISTRIBUTED__WORKER__MEMORY__SPILL=False
export DASK_DISTRIBUTED__WORKER__MEMORY__PAUSE=0.90
export DASK_DISTRIBUTED__WORKER__MEMORY__TERMINATE=0.95

# change output folder in the .json file
JSON_FILE="issue131/results/large_${SLURM_JOB_ID}.json"
sed "s/\"subfolder\": \"large\"/\"subfolder\": \"large_${SLURM_JOB_ID}\"/" \
    issue131/large.json > "${JSON_FILE}"

# run the large example
time geofabrics_from_file --instruction "${JSON_FILE}"

@jennan
Copy link
Collaborator

jennan commented Jul 9, 2023

Kia ora @rosepearson ,

I have run more jobs to compare different scenarios (netcdf vs. zarr, smaller or larger chunks, without or without clipping). Here are my (unsorted 😅) notes and my conclusions/recommendations.

job 30020508

  • 20 cores, 100 GB
  • chunk_size=200, number_of_cores=20, memory_limit=5GiB
  • crashed after 20 min, when workers restarted due to a spike in memory usage

job 30021473

  • 20 cores, 100 GB
  • chunk_size=100, number_of_cores=20, memory_limit=5GB
  • tried export DASK_DISTRIBUTED__COMM__RETRY__COUNT=3
  • still timeout messages
  • manually stopped has cpu usage was quite low

job 30022496

  • 20 cores, 200 GB
  • chunk_size=400, number_of_cores=20, memory_limit=10GB
  • manually stopped as workers restarted due to a spike in memory usage

job 30022806

  • 20 cores, 100 GB
  • chunk_size=200, number_of_cores=20, memory_limit=5GB
  • save as .zarr
  • good cpu utilisation (> 90%)
  • manually stopped as workers restarted due to a spike in memory usage

Note: store-map tasks are rescheduled when worker is restarted... but they probably could be cancelled once data is written on disk.

job 30023700

  • 20 cores, 100 GB
  • chunk_size=100, number_of_cores=20, memory_limit=5GB
  • without rio.clip
  • save as .zarr
  • good cpu utilisation (> 90%)
  • lower worker memory usage (than chunk_size=200), seems < 2.5GiB / worker
 $ nn_seff 30023700
Cluster: maui_ancil
Job ID: 30023700
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   59.2%  05:54:55 of 10:00:00 time limit
CPU Efficiency: 186.7%  4-14:26:59 of 2-11:09:10 core-walltime
Mem Efficiency:  37.0%  36.97 GB of 100.00 GB
  • output size
$ du -sh issue131/results/large_30023700/raw_dem_8m.zarr/
3.2G    issue131/results/large_30023700/raw_dem_8m.zarr/

job 30041464

  • 20 cores, 100 GB
  • chunk_size=100, number_of_cores=20, memory_limit=5GB
  • with rio.clip
  • save as .zarr
  • failed due to chunking pattern not supported by zarr
ValueError: Zarr requires uniform chunk sizes except for final chunk. Variable named 'data_source' has incompatible dask chunks: ((38, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100), (100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100)). Consider rechunking using `chunk()`.

job 30059619

  • 20 cores, 400 GB
  • chunk_size=200, number_of_cores=20, memory_limit=20GB
  • with rio.clip
  • save as .nc
  • bad cpu utilisation (~30%)
  • bad ratio compute vs. transfer (almost 3-to-1)
  • workers seem to use less than 3GB in general (but go to ~5GB sometimes)
  • error messages at the end but it seems that it ran to completion
$ nn_seff 30059619
Cluster: maui_ancil
Job ID: 30059619
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   40.1%  04:00:47 of 10:00:00 time limit
CPU Efficiency: 139.4%  2-07:55:47 of 1-16:07:50 core-walltime
Mem Efficiency:  15.1%  60.39 GB of 400.00 GB
  • output size
$ du -sh issue131/results/large_30059619/raw_dem_8m.zarr 
4.2G    issue131/results/large_30059619/raw_dem_8m.zarr

job 30064157

  • 20 cores, 100 GB
  • chunk_size=200, number_of_cores=20, memory_limit=10GB
  • without rio.clip
  • save as .zarr + blosc @ clevel=4
  • good cpu utilisation (~95%)
  • workers seem to use less than 4GB in general (but go to ~5GB sometimes)
$ nn_seff 30064157
Cluster: maui_ancil
Job ID: 30064157
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   34.0%  03:24:03 of 10:00:00 time limit
CPU Efficiency: 193.2%  2-17:42:40 of 1-10:00:30 core-walltime
Mem Efficiency:  56.9%  56.89 GB of 100.00 GB
  • lighter output
$ du -sh issue131/results/large_30064157/raw_dem_8m.zarr/
828M    issue131/results/large_30064157/raw_dem_8m.zarr/

job 30069881

  • 20 cores, 100 GB
  • chunk_size=200, number_of_cores=20, memory_limit=10GB
  • without rio.clip
  • save as .nc
  • good cpu utilisation (~95%)
  • workers seem to use less than 4GB in general (but go sometimes a little bit above)
$ nn_seff 30069881
Cluster: maui_ancil
Job ID: 30069881
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   33.2%  03:19:25 of 10:00:00 time limit
CPU Efficiency: 192.0%  2-15:48:21 of 1-09:14:10 core-walltime
Mem Efficiency:  54.3%  54.30 GB of 100.00 GB
  • output size (wooops, I forgot to change extension to .nc)
$ du -sh issue131/results/large_30069881/raw_dem_8m.zarr 
4.2G    issue131/results/large_30069881/raw_dem_8m.zarr

job 30073973

  • 20 cores, 200 GB
  • chunk_size=200, number_of_cores=20, memory_limit=15GB
  • without rio.clip
  • save as .nc + zlib compression @ level=4 for "z"
  • good cpu utilisation (~95%)
  • workers seem to use less than 7GB in general
$ nn_seff 30073973
Cluster: maui_ancil
Job ID: 30073973
State: COMPLETED
Cores: 10
Tasks: 1
Nodes: 1
Job Wall-time:   27.5%  02:44:48 of 10:00:00 time limit
CPU Efficiency: 190.6%  2-04:21:23 of 1-03:28:00 core-walltime
Mem Efficiency:  40.9%  81.90 GB of 200.00 GB
  • output size, surprisingly larger than before
$ du -sh issue131/results/large_30073973/raw_dem_8m.nc 
9.0G    issue131/results/large_30073973/raw_dem_8m.nc

Conclusions

zarr vs. netcdf

  • zarr managed to get lighter outputs (due to compressing chunks)
  • surprisingly netcdf didn't compress well the outputs (probably me not using the compression properly)
  • in terms of timing, no real difference, so no need to replace netcdf with zarr
  • zarr didn't support clipping, i.e. it doesn't like when chunks don't have the same size (except the final chunks, that's ok, but here the first row/column chunks had different sizes).

clip vs no clip

  • clipping lowers the cpu utilisation of dask workers
  • clipping has a negative impact on time, but not so bad (4h with clipping and 3h19m without clipping, see job 30059619 vs. 30069881)

chunksize

  • larger chunksize lead to faster computation
    • 2h44 min with chunk size of 300 (and no clip)
    • 3h19min with chunk size of 200 (and no clip)
  • but this increase the memory used per worker
    • about 6-7GB per worker for chunk size of 300 (and no clip)
    • about 4GB per worker for chunk size of 200 (and no clip)

memory settings

  • workers memory can have temporary spikes, but otherwise their memory usage is quite stable
  • workers restarted because they use too much memory will trigger re-computation of all the tasks they were hosting (even if the data has been saved... not great), which means we should avoid at all cost workers to restart
  • to achieve this, set a high enough memory_limit per worker (e.g. I used 15GB with chunk size 300)
  • and also configure the limits to restart a worker at a higher percentage (see https://distributed.dask.org/en/latest/worker-memory.html)
# disable spilling on disk and increase threshold to pause/kill dask worker
export DASK_DISTRIBUTED__WORKER__MEMORY__TARGET=False
export DASK_DISTRIBUTED__WORKER__MEMORY__SPILL=False
export DASK_DISTRIBUTED__WORKER__MEMORY__PAUSE=0.90
export DASK_DISTRIBUTED__WORKER__MEMORY__TERMINATE=0.95
  • but workers will operate at a lower regime in general, so the hosting slurm job doesn't need to request memory_limit x number of workers, but can request much less (for 20 workers and chunk size = 300, we could request 140GB as most the time workers are below 7GB).

I hope you'll find these recommendations useful :).

@rosepearson
Copy link
Owner Author

Great! Note that the above was all without raw_extents calculations. Move away from doing this - consider either pulling in the geometries specified in the TileIndex files or working out before pulling in the coast DEM and buffering. Notes about raw_extents issues to address as part of this in #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants