-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add csv-demand parser #995
Conversation
CI currently fails during building a test configuration, when |
The tutorial workflow in CI fails due to an empty power plants matching located in the requested region. Not sure in which way it is connected with the changes introduced by the PR, and can't reproduce it locally yet. |
@davide-f could you please have a look? CI is failing due an some troubles in finding power plants for the region, which I can't reproduce locally... Any ideas be very welcome on what may go wrong! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @ekatef :D
Co-authored-by: Davide Fioriti <67809479+davide-f@users.noreply.github.com>
scripts/build_demand_profiles.py
Outdated
load_path = os.path.join(load_dir, continent + ext) | ||
if os.path.exists(load_path): | ||
load_paths.append(load_path) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, move it inside the if, otherwise the .csv never gets executed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks :)
a159418
to
d9276c6
Compare
The changes in this PR should be aligned with #1017. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @ekatef :D
The functionality is absolutely there!
Need to check why the CI is not working though
As minor comments:
- plase revise the release_notes
- it may be good to add something in the documentation to clarify on how to feed in the custom data and leverage on this amazing contribution. I understand this is time demanding. not sure if you have the time to craft something very very basic to be later improved or we leave it completely for a next PR.
What do you think?
This reverts commit 5b488b2.
Thanks a lot for reviewing @davide-f! 😄 I can confirm that the functionality has been tested successfully :) 🎉 The reason for CI failing has been that Snakemake is parsing all the inputs before starting to do anything. So, it didn't help much to place To fix this, I had to put back composing of Done for a release note :D Have also drafted a documentation on using custom load profiles. Could you please check if this addition looks clear enough to be used? PS Can't help but say that it's amazing to have a documentation build as a part of CI 💚 Thanks a lot to you and @GbotemiB for adding this feature! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, the CI is running :D so closer to merge :)
Snakefile
Outdated
if os.path.exists(os.path.join("data", config["load_options"]["ssp"])): | ||
load_data_paths = get_load_paths_gegis("data", config, check_existence=True) | ||
else: | ||
# demand profiles are not available yet -> no opportunity to use custom load files | ||
load_data_paths = get_load_paths_gegis("data", config, check_existence=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm I think the check_existence can lead to misleading effects due to when that is evaluated.
I think we should not have this option and keep the previous behavior maybe?
The reason is that the file may not exist when the function is triggered.
For example, with a fresh run, the file does not exists at the beginning because it is moved here from retrieve_databundle.
Maybe the check on whether the file is missing should be moved when the files are actually read, but if the file is missing, the workflow breaks so it may not be needed.
Except the check on whether all selected countries are found, I expected that check to be there already but may be good to crosscheck if you wanted to address that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for double-checking @davide-f 🙂 This file-existence issue is quite a tricky part, in fact.
The point is snakemake parses all the input paths before building DAG and starting to execute anything. Which means that load_data_paths
is evaluated before any of the rule would run, including retrieve_databundle
. So, it's possible that load_data_paths
is evaluated when demand files are not loaded yet. It has been perfectly alright before, when we were not interested if the demand files exist. But the demand parser must check which exactly inputs present, and breaks if there is no any inputs yet.
If we would restore the original get_load_paths_gegis
removing the check-existence condition, that will lead to the error in a fresh run. That has been a reason of CI being unhappy previously, which I have looked to fix. Not sure the current implementation is the most elegant one, but it works and hopefully doesn't introduce breaking changes into the workflow.
Agree that it can also work if we would check if the data folder exist directly in the load_data_paths
(an example of a similar solution). Have implemented and tested the improvement and happy to have your opinion on that 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh right, good point! I catched the issue but not completely the side effects, great explanation.
I'm not a great fan of that long if case; I got another idea; adding a review comment and let's see what's your idea.
Already this is good :)
scripts/build_demand_profiles.py
Outdated
for continent in region_load: | ||
for ext in [".nc", ".csv"]: | ||
load_path = os.path.join(str(load_dir), str(continent) + str(ext)) | ||
if os.path.exists(load_path): | ||
load_paths.append(load_path) | ||
break | ||
|
||
avail_regions = [ | ||
os.path.split(os.path.abspath(pth))[1].split(".nc")[0] for pth in load_paths | ||
] | ||
|
||
logger.info(f" An assumed load folder {load_dir}, load path is {load_paths}.") | ||
|
||
if len(load_paths) == 0: | ||
logger.warning( | ||
f"No demand data file for {set(region_load).difference(avail_regions)}. An assumed load folder {load_dir}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we drop the exists of scenario path and we do:
for continent in region_load: | |
for ext in [".nc", ".csv"]: | |
load_path = os.path.join(str(load_dir), str(continent) + str(ext)) | |
if os.path.exists(load_path): | |
load_paths.append(load_path) | |
break | |
avail_regions = [ | |
os.path.split(os.path.abspath(pth))[1].split(".nc")[0] for pth in load_paths | |
] | |
logger.info(f" An assumed load folder {load_dir}, load path is {load_paths}.") | |
if len(load_paths) == 0: | |
logger.warning( | |
f"No demand data file for {set(region_load).difference(avail_regions)}. An assumed load folder {load_dir}." | |
) | |
regions_found = [] | |
file_names = [] | |
for continent in region_load: | |
sel_ext = ".nc" | |
for ext in [".nc", ".csv"]: | |
load_path = os.path.join(str(load_dir), str(continent) + str(ext)) | |
if os.path.exists(load_path): | |
sel_ext = ext | |
regions_found.append(continent) | |
break | |
file_name = str(continent) + str(sel_ext) | |
load_path = os.path.join(str(load_dir), file_name) | |
load_paths.append(load_path) | |
file_names.append(file_name) | |
logger.info( | |
f"Demand data folder: {load_dir}, load path is {load_paths}.\n" + | |
f"Expected files: " + "; ".join(file_names) | |
) |
If the file does not exist, this solution should backup to the .nc case.
Not sure about how much info to disclose;
I'm unsure it is good to advice about whether demand data files are existing or not: for the standard user that may be confusing because of the ordering about when snakemake makes this evaluation.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the solution! 😄 Implemented. Thanks a lot for that.
Agree that the no-demand warning feels redundant. Moreover, it's also not needed: in case no demand data provided and retrive_databundle: false
, Snakemake just throws an error even before running anything. So, we can just remove this checking part.
Co-authored-by: Davide Fioriti <67809479+davide-f@users.noreply.github.com>
for more information, see https://pre-commit.ci
@davide-f thanks :D Have implemented your suggestions, and the changes look now quite consistent with the original version which feels really great 🙃 Thanks a lot for reviewing! Both local and CI testing are successful now. Could you please check if everything looks well in the revised version? |
#987
The aim of the PR is to provide an option to use csv files as demand inputs.
Checklist
envs/environment.yaml
anddoc/requirements.txt
.config.default.yaml
andconfig.tutorial.yaml
.test/
(note tests are changing the config.tutorial.yaml)doc/configtables/*.csv
and line references are adjusted indoc/configuration.rst
anddoc/tutorial.rst
.doc/release_notes.rst
is amended in the format of previous release notes, including reference to the requested PR.