Skip to content
This repository has been archived by the owner on Jun 13, 2023. It is now read-only.

ngen-parallel mode is not working. #24

Closed
arpita0911patel opened this issue Apr 7, 2023 · 3 comments · Fixed by #38
Closed

ngen-parallel mode is not working. #24

arpita0911patel opened this issue Apr 7, 2023 · 3 comments · Fixed by #38
Assignees
Labels
bug Something isn't working

Comments

@arpita0911patel
Copy link
Contributor

Current behavior

While trying to use "AWI_16_680661_001" data for the model, and selecting parallel run, it's throwing below error:
"Missing required argument for partition file path."

Expected behavior

Should create the output files.

Steps to replicate behavior (include URLs)

apatel54@UA-W2RP43G:~/Desktop/ngen_data|⇒ docker run --rm -it -v "$(pwd)"/AWI_16_680661_001:/ngen/AWI_001 --platform=linux/amd64 awiciroh/ciroh-ngen-image:latest
Working directory is
/ngen
Found these Catchment files:
/ngen/ngen/data/catchment_data.geojson
/ngen/ngen/data/catchment_data_test1.geojson
/ngen/ngen/extern/cfe/cfe/params/data/hydrofabrics/releases/beta/01a/catchments.geojson
/ngen/ngen/extern/topmodel/topmodel/params/data/hydrofabrics/releases/beta/01a/catchments.geojson
/ngen/AWI_001/config/catchments.geojson
Found these Nexus files:
/ngen/ngen/data/nexus_data.geojson
/ngen/AWI_001/config/nexus.geojson
Found these Realization files:
/ngen/ngen/data/example_bmi_multi_realization_config_w_routing.json
/ngen/ngen/data/example_bmi_multi_realization_config_w_noah_pet_cfe.json
/ngen/ngen/data/example_bmi_multi_realization_config__macos.json
/ngen/ngen/data/test_bmi_multi_realization_config_w_netcdf.json
/ngen/ngen/data/lstm/example_lstm_realization_config.json
/ngen/ngen/data/test_realization_config.json
/ngen/ngen/data/example_realization_config.json
/ngen/ngen/data/test_bmi_multi_realization_config.json
/ngen/ngen/data/example_bmi_multi_realization_config.json
/ngen/ngen/data/example_realization_config_w_bmi_c__lin_mac.json
/ngen/ngen/data/example_bmi_multi_realization_config_w_netcdf.json
/ngen/ngen/data/test_bmi_multi_realization_config_w_noah_pet_cfe.json
/ngen/ngen/extern/sloth/test/data/sloth_cfe_realization.json
/ngen/ngen/extern/sloth/test/data/sloth_realization.json
/ngen/ngen/extern/SoilFreezeThaw/SoilFreezeThaw/configs/realization_config_multi_linux.json
/ngen/ngen/extern/SoilFreezeThaw/SoilFreezeThaw/configs/realization_config_multi_macos.json
/ngen/ngen/extern/SoilFreezeThaw/SoilFreezeThaw/configs/realization_config_standalone_linux.json
/ngen/ngen/extern/SoilFreezeThaw/SoilFreezeThaw/configs/realization_config_standalone_macos.json
/ngen/ngen/extern/SoilMoistureProfiles/SoilMoistureProfiles/config/realization_config_smp_macos.json
/ngen/ngen/extern/SoilMoistureProfiles/SoilMoistureProfiles/config/realization_config_smp_linux.json
/ngen/AWI_001/config/awi_simplified_realization.json

  1. ngen-parallel
  2. ngen-serial
  3. bash
    #? 1
    Enter the hydrofabric catchment file path: /ngen/AWI_001/config/catchments.geojson
    /ngen/AWI_001/config/catchments.geojson selected
    Enter the hydrofabric nexus file path: /ngen/AWI_001/config/nexus.geojson
    /ngen/AWI_001/config/nexus.geojson selected
    Enter the Realization file path: /ngen/AWI_001/config/awi_simplified_realization.json
    /ngen/AWI_001/config/awi_simplified_realization.json selected

Your NGEN run command is ngen-parallel /ngen/AWI_001/config/catchments.geojson "" /ngen/AWI_001/config/nexus.geojson "" /ngen/AWI_001/config/awi_simplified_realization.json
Copy and paste it into the terminal to run your model.
The tested model is /dmod/bin/ngen-serial /ngen/data/catchment_data.geojson /ngen/data/nexus_data.geojson /ngen/ngen/data/example_realization_config.json
If your model didn't run, or encountered an error, try checking the Forcings paths in the Realizations file you selected.

Your model run is beginning!

NGen Framework 0.1.0
Missing required argument for partition file path.

Screenshots

image

@arpita0911patel arpita0911patel added the bug Something isn't working label Apr 7, 2023
@ZacharyWills
Copy link
Contributor

Hey Arpita!

In the last PR:
https://github.com/AlabamaWaterInstitute/CloudInfra/pull/23/files

I fixed the compilation of the partition generator (which is its own binary).

Image

This "cuts" NGEN to allow for parallelism in a manner that's hydrologically consistent, and the ngen-parallel doesnt work without that partition file that it generates. That partition file is added as an additional argument at the end of the command.

So in this case without generating the file the error you got is the model framework recognizing that it doesnt want to parallelize without minding the hydrology.

@ZacharyWills
Copy link
Contributor

the partition generator is copied to the /dmod/bin directory

@arpita0911patel
Copy link
Contributor Author

Tried running the latest image that has the fix:
apatel54@UA-W2RP43G:~/Desktop/ngen_data|⇒ docker run --rm -it -v "$(pwd)"/AWI_16_680661_001:/ngen/AWI_001 --platform=linux/amd64 awiciroh/ciroh-ngen-image:latest

bash-4.4# cd /dmod/bin/

bash-4.4# ./partitionGenerator /ngen/AWI_001/config/catchments.geojson /ngen/AWI_001/config/nexus.geojson AWI_001_partition_file 5 '' ''
Partitioning 210 catchments into 5 partitions.
Validating catchments...

Number of catchments is: 210
Catchment validation completed
Found 9 remotes in partition 0
Found 13 remotes in partition 1
Found 2 remotes in partition 2
Found 12 remotes in partition 3
Found 10 remotes in partition 4
Found 46 total remotes (average of approximately 9 remotes per partition)

bash-4.4# ./ngen-parallel /ngen/AWI_001/config/catchments.geojson "" /ngen/AWI_001/config/nexus.geojson "" /ngen/AWI_001/config/awi_simplified_realization.json AWI_001_partition_file
NGen Framework 0.1.0
Building Nexus collection
file read success
file_path: AWI_001_partition_file

root_tree: 1
Building Catchment collection
terminate called after throwing an instance of 'std::runtime_error'
what(): Can't init CFE; unreadable shared library file './extern/cfe/cmake_build/libcfebmi.so.1.0.0'
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

Seeing this error.

Arpita

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants