-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the Dask Compute #131
Comments
some lines that could cause issues
|
Not quite the same subject - but when picking this up again could look at using dask with pandas to remove the horrendous bottle neck that exists when calculating the 'open waterway elevations'. See https://github.com/rosepearson/GeoFabrics/blob/main/src/geofabrics/processor.py#L2386 - move to issue #130 |
Instructions for setup:
Running the code:
|
I have started some changes in compute_to_disk's branch, the part saving from computation work, but the code crashes before the end (likely due to commenting out saving extends 😅) |
Thanks for the catch-up. It'l looking promising. Looking forward to hearing how the large example goes. At this stage:
See issue 153 for notes around planned changes to how or if the extents are calculated. |
I have run overnight the large example, using a Māui ancil node. It took 6 hours and max 40GB. Few additional notes:
Ideally I should add the timing of converting from zarr to netcdf. Efficiency of the job
And here is the slurm job I used for the records
|
Kia ora @rosepearson , I have run more jobs to compare different scenarios (netcdf vs. zarr, smaller or larger chunks, without or without clipping). Here are my (unsorted 😅) notes and my conclusions/recommendations. job 30020508
job 30021473
job 30022496
job 30022806
Note: job 30023700
job 30041464
job 30059619
job 30064157
job 30069881
job 30073973
Conclusionszarr vs. netcdf
clip vs no clip
chunksize
memory settings
I hope you'll find these recommendations useful :). |
Great! Note that the above was all without raw_extents calculations. Move away from doing this - consider either pulling in the geometries specified in the TileIndex files or working out before pulling in the coast DEM and buffering. Notes about raw_extents issues to address as part of this in #153 |
Remove the explicit dask compute call.
The text was updated successfully, but these errors were encountered: