Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while executing distributed version of ngen #456

Closed
pvbangalore opened this issue Oct 14, 2022 · 10 comments · Fixed by #475
Closed

Error while executing distributed version of ngen #456

pvbangalore opened this issue Oct 14, 2022 · 10 comments · Fixed by #475
Assignees
Labels
bug Something isn't working partitioner related to partition generation

Comments

@pvbangalore
Copy link

Short description explaining the high-level reason for the new issue.

Current behavior

Program terminates with an exception in 'boost::wrapexceptboost::property_tree::json_parser::json_parser_error' due to unexpected value in partition_config.json (output.log attached has complete output)

Expected behavior

Program should execute without any runtime errors.

Steps to replicate behavior (include URLs)

Attached file (build_notes.log) shows all the steps to build and execute ngen with MPI

Screenshots

build_notes.txt
output.log

@hellkite500
Copy link
Member

Can you add the partition file you generated? It looks like the issue is in parsing that file.

@pvbangalore
Copy link
Author

Here are the contents of the file partition_config.json:
{ "partitions":[ {"id":0, "cat-ids":["cat-67"], "nex-ids":["nex-68"], "remote-connections":[]}, {"id":1, "cat-ids":["cat-52"], "nex-ids":["nex-34"], "remote-connections":[]}, {"id":2, "cat-ids":["cat-27"], "nex-ids":["nex-26"], "remote-connections":[]}, ] }

@hellkite500
Copy link
Member

So I'm not sure the root cause of the poorly formatted partition file, but the issue causing the parsing error is here:

"remote-connections":[]}, ] }

There is a , at the end of the list, indicating another value should follow, but there isn't one. So this isn't valid JSON.

You can try removing that , and see if that works. It looks like this was generated by the compiled partition generator, so there may be a bug there to look into how the partitions are being serialized into JSON.

@hellkite500
Copy link
Member

hellkite500 commented Oct 26, 2022

Ok, looking a little more at the input data used to build the partition file, there are only 3 catchments and 3 nexuses in the input data, but the partition generator tried making 4 partitions. So I think the serialization loop didn't terminate the strings correctly, as it adds a , after each partition up to num_partitions which is the number of requested partitions, not the not number of computed partitions.

You should be able to generate a valid partitioning of the example data by using less than 4 partitions, but we should also fix this issue in the partitionGenerator and probably add a warning/message when trying to partition too few catchments.

@stcui007 can you maybe look into catching this condition where the number of possible partitions is less than the number of requested partitions and make sure this serialization loop doesn't write invalid JSON?

@hellkite500 hellkite500 added bug Something isn't working partitioner related to partition generation labels Oct 26, 2022
@mattw-nws
Copy link
Contributor

Noting that this partition file also falls afoul of the empty remote-connections issue we identified, though that is something to be fixed in ngen rather than in partitionGenerator ... @hellkite500 : Fix for this issue can be pursued separately, yes?

@hellkite500
Copy link
Member

Yes, if using current HEAD that bug will show up. I should have a fix for that up shortly, though.

@stcui007
Copy link
Contributor

stcui007 commented Nov 14, 2022 via email

@mattw-nws
Copy link
Contributor

Noting that this partition file also falls afoul of the empty remote-connections issue we identified...

See also #475

@mattw-nws
Copy link
Contributor

@pvbangalore Note that the fix in #474 basically prevents the creation of a partition file with more partitions than there are catchments in the catchment dataset, which it appears you did (3 catchments, 4 partitions). So you would continue to get this error with the same partition file, but that partitioning is not supported, basically. With only 3 catchments in the dataset, try creating only 2 partitions (and running only 2 processes), or try a larger dataset to run with > 3 partitions. If those configurations work, we can close this issue.

@mattw-nws mattw-nws linked a pull request Dec 22, 2022 that will close this issue
12 tasks
@mattw-nws
Copy link
Contributor

Believe this is fully fixed in #475... I just ran a config with zero remote nexuses in a partition and it succeeded. Will go ahead and close this but it can be reopened if the issue persists on the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working partitioner related to partition generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants