Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add csv-demand parser #995

Merged
merged 52 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
0f5f09a
Account for the file extension
ekatef Mar 20, 2024
849c092
Add import
ekatef Mar 21, 2024
cbe0dce
Fix format of the message
ekatef Mar 22, 2024
a3b2f8c
Add csv loading
ekatef Mar 22, 2024
6470034
Refactor path generation
ekatef Mar 22, 2024
0ff2282
Merge branch 'main' into add_demand_parser
ekatef Mar 22, 2024
7eb36b4
Fix the check of demand availability
ekatef Mar 23, 2024
1d9a250
Merge branch 'add_demand_parser' of https://github.com/ekatef/pypsa-e…
ekatef Mar 23, 2024
5504b79
Comment-out the load path check
ekatef Mar 23, 2024
e12853e
Merge branch 'pypsa-meets-earth:main' into add_demand_parser
ekatef Apr 11, 2024
c37bf1b
Implement Davide's suggestion
ekatef Apr 11, 2024
397a6d9
Add a TODO comment
ekatef Apr 11, 2024
061c985
Put loading csv-s into a function
ekatef Apr 11, 2024
9464d55
Use na-safe csv loading
ekatef Apr 11, 2024
8c27cd5
Fix the function output
ekatef Apr 11, 2024
59fe870
Add an option to combine nc and csv inputs
ekatef Apr 11, 2024
a83693c
Merge branch 'main' into add_demand_parser
ekatef Apr 29, 2024
94378e5
Fix breaking the loop
ekatef Apr 29, 2024
bc66e5d
Merge branch 'main' into add_demand_parser
ekatef May 7, 2024
87fb099
Put the path check back
ekatef May 7, 2024
37c62ce
Add a string conversion
ekatef May 8, 2024
8650b7c
Add a test for paths checked
ekatef May 8, 2024
827deb9
Replace an error with a warning
ekatef May 8, 2024
8b031fe
Add a diagnostic info
ekatef May 8, 2024
0aabc66
Define load_paths to snakemake params
ekatef May 8, 2024
065a9cd
Fix structure
ekatef May 13, 2024
8c2c2aa
Fix the info message
ekatef May 13, 2024
cfb5432
Update the error message
ekatef May 13, 2024
5f383dd
Revert "Define load_paths to snakemake params"
ekatef May 25, 2024
4c5974d
Add an existence check
ekatef May 25, 2024
e63f6b3
Return default nc-path definition
ekatef May 25, 2024
7622818
Apply changes to the path definition
ekatef May 25, 2024
5b488b2
Remove a redundant definition
ekatef May 28, 2024
083a1ac
Add type conversion
ekatef May 28, 2024
b71eeea
Add a (temporary) warning
ekatef May 28, 2024
eaf8552
Merge branch 'main' into add_demand_parser
ekatef May 28, 2024
31eec33
Add a diagnostic logger info
ekatef May 28, 2024
9482e28
Revert "Remove a redundant definition"
ekatef May 29, 2024
18df6d3
Account for different states of the demand inputs
ekatef May 29, 2024
049aaa1
Modify handling an empty demand case
ekatef May 29, 2024
8e34213
Add release note
ekatef May 29, 2024
511140d
Remove temporary comments
ekatef May 29, 2024
25451d7
Add documentation
ekatef May 29, 2024
c1c80b4
Merge branch 'main' into add_demand_parser
ekatef Jun 4, 2024
0584f17
Move existence check into the demand parser
ekatef Jun 4, 2024
00b8d60
Clean-up implementation
ekatef Jun 4, 2024
5c8ce7f
Remove an outdate commit
ekatef Jun 4, 2024
4fadf74
Implement Davide's suggestion
ekatef Jun 6, 2024
d21bbe6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2024
c6aa7f1
Fix formatting
ekatef Jun 6, 2024
14f4bca
Remove an outdated check
ekatef Jun 6, 2024
ef4dc69
Remove a not-used variable
ekatef Jun 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,12 @@ run = config.get("run", {})
RDIR = run["name"] + "/" if run.get("name") else ""
CDIR = RDIR if not run.get("shared_cutouts") else ""

load_data_paths = get_load_paths_gegis("data", config)
davide-f marked this conversation as resolved.
Show resolved Hide resolved
if os.path.exists(os.path.join("data", config["load_options"]["ssp"])):
load_data_paths = get_load_paths_gegis("data", config, check_existence=True)
else:
# demand profiles are not available yet -> no opportunity to use custom load files
load_data_paths = get_load_paths_gegis("data", config, check_existence=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm I think the check_existence can lead to misleading effects due to when that is evaluated.
I think we should not have this option and keep the previous behavior maybe?

The reason is that the file may not exist when the function is triggered.
For example, with a fresh run, the file does not exists at the beginning because it is moved here from retrieve_databundle.

Maybe the check on whether the file is missing should be moved when the files are actually read, but if the file is missing, the workflow breaks so it may not be needed.
Except the check on whether all selected countries are found, I expected that check to be there already but may be good to crosscheck if you wanted to address that here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for double-checking @davide-f 🙂 This file-existence issue is quite a tricky part, in fact.

The point is snakemake parses all the input paths before building DAG and starting to execute anything. Which means that load_data_paths is evaluated before any of the rule would run, including retrieve_databundle. So, it's possible that load_data_paths is evaluated when demand files are not loaded yet. It has been perfectly alright before, when we were not interested if the demand files exist. But the demand parser must check which exactly inputs present, and breaks if there is no any inputs yet.

If we would restore the original get_load_paths_gegis removing the check-existence condition, that will lead to the error in a fresh run. That has been a reason of CI being unhappy previously, which I have looked to fix. Not sure the current implementation is the most elegant one, but it works and hopefully doesn't introduce breaking changes into the workflow.

Agree that it can also work if we would check if the data folder exist directly in the load_data_paths (an example of a similar solution). Have implemented and tested the improvement and happy to have your opinion on that 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh right, good point! I catched the issue but not completely the side effects, great explanation.
I'm not a great fan of that long if case; I got another idea; adding a review comment and let's see what's your idea.
Already this is good :)


if config["enable"].get("retrieve_cost_data", True):
COSTS = "resources/" + RDIR + "costs.csv"
else:
Expand Down
14 changes: 14 additions & 0 deletions doc/customization_basic1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,20 @@ Year-related parameters are also being used when specifying `load_options`:
The `weather_year` value corresponds to the weather data which was used to generate the electricity demand profiles for a selected area while `prediction_year` corresponds to the point of a `Shared Socioeconomic Pathways (SSP) <https://en.wikipedia.org/wiki/Shared_Socioeconomic_Pathways>`__ trajectory. PyPSA-Earth uses SSP2-2.6 scenario within the Shared Socioeconomic Pathways framework, which is characterized by medium challenges to mitigation and adaptation efforts resulting in a global warming of approximately 2.6°C by the end of the 21st century.
The available values for `weather_year` and `prediction_year` can be checked by looking into `pypsa-earth/data/ssp2-2.6` folder. Currently, there are pre-calculated demand data for 2011, 2013, 2018 weather years and for 2030, 2040, 2050, and 2100 scenario prediction years.

Use custom demand data
----------------------

It is possible to implement custom demand profiles. It can be done by creating a dedicated custom demand sub-folder in a scenario folder `pypsa-earth/data/ssp2-2.6` and placing there a custom demand file. The name of a custom demand sub-folder should correspond to `weather_year` argument which stands in this case for general identification of a demand input. The name of a demand input file should be a continent name to which belongs a country of initerest. Both csv and nc formats can be used for demand files.

For example, to `pypsa-earth/data/ssp2-2.6/2013_custom/`

.. note::

For example, to provide custom inputs for Nigeria, you can put the time-series into `Africa.csv` file and place the file into `pypsa-earth/data/ssp2-2.6/2013_custom/` folder. To make it fetched, you'll need to specify `weather_year: 2013_custom` under `load_options`.

A format of the custom csv demand file should correspond to the csv files supplied with the model: there are `region_code`, `time`, `region_name` and `Electricity demand` columns, while a semicolon is used as a separator.


Configure `atlite` section
--------------------------

Expand Down
2 changes: 2 additions & 0 deletions doc/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ E.g. if a new rule becomes available describe how to use it `snakemake -j1 run_t

* Add an option to merge isolated networks into respective backbone networks by countries. `PR #903 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/903>`__

* Add an option to use csv format for custom demand imports. `PR #995 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/995>`__

**Minor Changes and bug-fixing**

* Minor bug-fixing to run the cluster wildcard min `PR #1019 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/1019>`__
Expand Down
80 changes: 67 additions & 13 deletions scripts/build_demand_profiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
it returns a csv file called "demand_profiles.csv", that allocates the load to the buses of the network according to GDP and population.
"""
import os
import os.path
from itertools import product

import geopandas as gpd
Expand All @@ -49,7 +50,7 @@
import pypsa
import scipy.sparse as sparse
import xarray as xr
from _helpers import configure_logging, create_logger, read_osm_config
from _helpers import configure_logging, create_logger, read_csv_nafix, read_osm_config
from shapely.prepared import prep
from shapely.validation import make_valid

Expand Down Expand Up @@ -89,7 +90,7 @@ def get_gegis_regions(countries):
return regions


def get_load_paths_gegis(ssp_parentfolder, config):
def get_load_paths_gegis(ssp_parentfolder, config, check_existence=False):
"""
Create load paths for GEGIS outputs.

Expand All @@ -107,15 +108,39 @@ def get_load_paths_gegis(ssp_parentfolder, config):
ssp = config.get("load_options")["ssp"]

load_paths = []
for continent in region_load:
load_path = os.path.join(
ssp_parentfolder,
str(ssp),
str(prediction_year),
"era5_" + str(weather_year),
str(continent) + ".nc",
)
load_paths.append(load_path)
load_dir = os.path.join(
ssp_parentfolder,
str(ssp),
str(prediction_year),
"era5_" + str(weather_year),
)

if check_existence:
for continent in region_load:
for ext in [".nc", ".csv"]:
load_path = os.path.join(str(load_dir), str(continent) + str(ext))
if os.path.exists(load_path):
load_paths.append(load_path)
break

avail_regions = [
os.path.split(os.path.abspath(pth))[1].split(".nc")[0] for pth in load_paths
]

# TODO Remove after debugging
logger.info(f" An assumed load folder {load_dir}, load path is {load_paths}.")

if len(load_paths) == 0:
logger.warning(
f"No demand data file for {set(region_load).difference(avail_regions)}. An assumed load folder {load_dir}."
)
else:
for continent in region_load:
load_path = os.path.join(
str(load_dir),
str(continent) + ".nc",
)
load_paths.append(load_path)

return load_paths

Expand All @@ -135,6 +160,23 @@ def shapes_to_shapes(orig, dest):
return transfer


def load_demand_csv(path):
df = read_csv_nafix(path, sep=";")
df.time = pd.to_datetime(df.time, format="%Y-%m-%d %H:%M:%S")
load_regions = {c: n for c, n in zip(df.region_code, df.region_name)}

gegis_load = df.set_index(["region_code", "time"]).to_xarray()
gegis_load = gegis_load.assign_coords(
{
"region_name": (
"region_code",
[name for (code, name) in load_regions.items()],
)
}
)
return gegis_load


def build_demand_profiles(
n,
load_paths,
Expand Down Expand Up @@ -174,9 +216,21 @@ def build_demand_profiles(
substation_lv_i = n.buses.index[n.buses["substation_lv"]]
regions = gpd.read_file(regions).set_index("name").reindex(substation_lv_i)
load_paths = load_paths
# Merge load .nc files: https://stackoverflow.com/questions/47226429/join-merge-multiple-netcdf-files-using-xarray
gegis_load = xr.open_mfdataset(load_paths, combine="nested")

gegis_load_list = []

for path in load_paths:
if str(path).endswith(".csv"):
gegis_load_xr = load_demand_csv(path)
else:
# Merge load .nc files: https://stackoverflow.com/questions/47226429/join-merge-multiple-netcdf-files-using-xarray
gegis_load_xr = xr.open_mfdataset(path, combine="nested")
gegis_load_list.append(gegis_load_xr)

logger.info(f"Merging demand data from paths {load_paths} into the load data frame")
gegis_load = xr.merge(gegis_load_list)
gegis_load = gegis_load.to_dataframe().reset_index().set_index("time")
davide-f marked this conversation as resolved.
Show resolved Hide resolved

# filter load for analysed countries
gegis_load = gegis_load.loc[gegis_load.region_code.isin(countries)]

Expand Down
Loading