Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to run at a low latency of less than 3 days (0-3) #112

Merged
merged 28 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9feb42b
Changed default behaviour of RAT:
SanchitMinocha Aug 24, 2024
595b04c
Updated comments
SanchitMinocha Aug 24, 2024
9447b6e
All IMERG data has been revised to V07B
SanchitMinocha Aug 24, 2024
f0d3767
Bug fix in read_write_chunk & updated CombinedNC
SanchitMinocha Aug 24, 2024
a08f89d
Dwnlod of new low latency data using GFS & GEFS
SanchitMinocha Aug 24, 2024
bb2b4a5
Divide forecasting into 2 to avoid circular import
SanchitMinocha Aug 24, 2024
e4dcb59
Vic_init_state_finder to run operationally
SanchitMinocha Aug 24, 2024
f67e024
Vic save date is no longer end date
SanchitMinocha Aug 24, 2024
e393b5f
divide forecasting into 2 to avoid circular import
SanchitMinocha Aug 24, 2024
c864ebf
Added low latency feature in RAT
SanchitMinocha Aug 24, 2024
79cb63b
updated storage scenario
pritamd47 Aug 25, 2024
c741d36
updated storage based scenario in forecasting
pritamd47 Aug 25, 2024
3394b9a
Removed duplicate of forecasting.py
SanchitMinocha Aug 25, 2024
f113700
Renamed imerg_latency as low_latency_limit
SanchitMinocha Aug 25, 2024
21de4dd
Added additional util functions for usage
SanchitMinocha Aug 25, 2024
d85a724
Added function to estimate quantile of res. area
SanchitMinocha Aug 30, 2024
c1b3077
Updated basin reservoir shpfile create fn
SanchitMinocha Aug 30, 2024
7c2ccd7
conversion of all inflow files to final outputs
SanchitMinocha Aug 30, 2024
7dea06d
forecast outfl. scenario by user & actual storage
SanchitMinocha Aug 30, 2024
d8a4e91
Easy forecasting for multiple dates in the past.
SanchitMinocha Aug 30, 2024
fb16a94
Updated forecasting docs
SanchitMinocha Aug 30, 2024
411eb72
Bug fix: rout init & added basedate parameter
SanchitMinocha Aug 30, 2024
7c11d59
added low_latency as parameter
SanchitMinocha Aug 30, 2024
fd9a9f6
Bug fix: loading of netcdf file to overwrite
SanchitMinocha Aug 30, 2024
17d67b6
added generic fn to create metsim inputs
SanchitMinocha Aug 30, 2024
08b5eba
Added resolution for review comments
SanchitMinocha Aug 30, 2024
8faea6c
Clearification of forecast_vic_init_state in the docs
SanchitMinocha Sep 9, 2024
240269a
Updated docs to describe changes in new version
SanchitMinocha Sep 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Configuration/rat_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,7 @@ This section of the configuration file describes the parameters defined by `rout
If `clean_previous_outputs` is `True`, the previous outputs are cleaned before executing any step in `steps`.

!!! tip_note "Tip"
You should use `clean_previous_outputs` if you want to have fresh outputs of RAT for a river basin. Otherwise, by default RAT will keep appending the new outputs to the same files and will concatenate data by calendar dates.
You should use `clean_previous_outputs` if you want to have fresh outputs of RAT for a river basin. Otherwise, by default RAT will keep appending the new outputs to the same files and will concatenate data by calendar dates. In case the new outputs and previous outputs have some coinciding dates, RAT will replace the previous outputs with the new outputs for these dates.

### Confidential
* <h6 class="parameter_heading">*`secrets:`* :</h6>
Expand Down
16 changes: 15 additions & 1 deletion docs/Development/PatchNotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Patch Notes

### v3.0.14
In this release, we have:

1. Enhanced Low-Latency Functionality: RAT can now run in operational mode with significantly reduced latency as low as 0.
2. Updated Forecasting Plugin: The forecasting plugin has been upgraded to allow forecast generation for multiple past dates.
3. Updated IMERG Precipitation Web Links: The web links for downloading historical IMERG data (prior to 2024) have been updated to match those for current data. This change reflects the revision of the IMERG product version for historical data to V07B, which is now the same as the version for recent IMERG data. These updates were implemented on June 1, 2024, on the IMERG web servers.
4. Updated RAT documentation: to reflect the changes in the forecasting plugin and the possibility of using low latency in operational mode.

!!!note
1. Previously, a latency of 3 or more days was recommended due to delays in retrieving meteorological data from servers. However, RAT can now operate with latencies of less than 3 days, including real-time data (0-day latency). This is a major improvement over earlier versions, enabling users to generate data for the current day and produce forecasts up to 15 days ahead from the current day.
2. Previously, forecasts could only be generated for the final date of the RAT run, which worked well for operational use. Now, for case studies and research purposes, users can generate forecasts for several historical dates, offering greater flexibility and utility.

### v3.0.13
In this release, we have:

Expand Down Expand Up @@ -29,7 +41,9 @@ In this release, we have:

### v3.0.8

In this release, we have added rat.toolbox module to contain helpful and utility functions of user. Right now it has one function related to config and that is to update an existing config. Future work will be to add more functions to this module like to create config, or to plot outputs etc.
In this release, we have:

1. Added rat.toolbox module to contain helpful and utility functions of user. Right now it has one function related to config and that is to update an existing config. Future work will be to add more functions to this module like to create config, or to plot outputs etc.

### v3.0.7

Expand Down
5 changes: 5 additions & 0 deletions docs/Development/RecentAdjustments.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Recent Adjustments

### September8, 2024
From June 1 in 2024,all the historical IMERG data has been revised to Version 07B along with the current IMERG data being generated daily. In the last patch we updated the link to download IMERG for recent and current data while the link for the past historic data was not updated to V07B. This was raising the 'Connection Reset' error if one tries to run RAT from say 2020 to 2024. The problem has been resolved in the [developer version](../../Development/DeveloperVersion/) of RAT. It has been released in the version of RAT [v3.0.14](../../Development/PatchNotes/#v3014). To [update](https://conda.io/projects/conda/en/latest/commands/update.html) RAT, please use `conda update rat` in your RAT environment.

Due to the non-availability of IMERG data for April and May 2024 on the IMERG web server, users may encounter issues when running RAT for time periods that include these months. This issue is on the IMERG data provider's side, and we are hopeful it will be resolved soon.

### June 30, 2024
There is a change in the required permissions for Google Service Accounts to access Earth Engine API.

Expand Down
11 changes: 8 additions & 3 deletions docs/Plugins/Forecasting.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,17 @@ To run the forecast plugin, set the value of the `forecast` option in the PLUGIN
PLUGINS:
forecast: True
forecast_lead_time: 15
forecast_start_time: end_date # can either be “end_date” or a date in YYYY-MM-DD format
forecast_rule_curve_dir: /path/to/rule_curve
forecast_gen_start_date: end_date # can either be “end_date” or a date in YYYY-MM-DD format
forecast_gen_end_date: # a date in YYYY-MM-DD format (Optional)
forecast_rule_curve_dir: /path/to/rule_curve (Optional)
forecast_reservoir_shpfile_column_dict: {column_id: GRAND_ID, column_capacity: CAP_MCM}
forecast_vic_init_state: # path of VIC state file or date in YYYY-MM-DD format (Optional)
forecast_rout_init_state: # path of Routing state file or date in YYYY-MM-DD format (Optional)
forecast_storage_scenario: [ST, GO, CO]
forecast_storage_change_percent_of_smax: [20, 10, 5]
```

The `forecast_start_time` option controls when the forecast will begin. If the value is set to `end_date`, the forecast will begin on the end date of RAT’s normal mode of running, i.e., in nowcast mode, which are controlled by `start_date` and `end_date` options in the BASIN section. Alternatively, a date in the YYYY-MM-DD format can also be provided to start the forecast from that date. The forecast window or the number of days ahead for which the forecast is generated is controlled by the `forecast_lead_time` option, with a maximum of 15 days ahead. The `forecast_rule_curve_dir` option should point to the directory containing the rule curve files for the reservoir. The `forecast_reservoir_shpfile_column_dict` option specifies the names of the columns that correspond to the ID and capacity of the reservoir, named `column_id` and `column_capacity`. These columns should be present in the [`reservoir_vector_file` in the GEE section](../../Configuration/rat_config/#gee) in the configuration file for running the forecasting plugin.
The `forecast_gen_start_date` option controls when the generation of forecast will begin. If the value is set to `end_date`, the forecast will begin on the end date of RAT’s normal mode of running, i.e., in nowcast mode, which are controlled by `start_date` and `end_date` options in the BASIN section. Alternatively, a date in the YYYY-MM-DD format can also be provided to start the generation of forecast from that date. The `forecast_gen_end_date` option defines the end date for which forecasts will be generated. The forecast window for each date when the forecast is being generated or the number of days ahead for which the forecast is generated is controlled by the `forecast_lead_time` option, with a maximum of 15 days ahead. The `forecast_rule_curve_dir` option should point to the directory containing the rule curve files for the reservoir. The `forecast_reservoir_shpfile_column_dict` option specifies the names of the columns that correspond to the ID and capacity of the reservoir, named `column_id` and `column_capacity`. These columns should be present in the [`reservoir_vector_file` in the GEE section](../../Configuration/rat_config/#gee) in the configuration file for running the forecasting plugin. The `forecast_vic_init_state` option can be used to define the date in 'YYYY-MM-DD' format of vic init file to be used. It can also be the path of the vic init state file that the user want to use. In case the path of vic init state file is provided instead of date, it is assumed that the state file is for the `forecast_gen_start_date`. Also, in this case, `forecast_rout_init_state` becomes a required parameter and has to be the actual path of the rout init state file.

!!!note
The files in the `forecast_rule_curve_dir` should be named according to their IDs, corresponding to the values in `column_id`. For instance, if the id of a reservoir is 7001, then the file name of the reservoir's rule curve should be `7001.txt`. The rule curve files should be in `.txt` format with the following columns - `Month,S/Smax`, corresponding to the month and the storage level as a fraction of the maximum storage level. Rule curve files for reservoirs in the [GRAND](https://www.globaldamwatch.org/grand) database can be downloaded [here](https://www.dropbox.com/scl/fi/jtquzasjdv2tz1vtupgq5/rc.zip?rlkey=4svbutjd3aup255pnrlgnxbkl&dl=0).
Expand Down
31 changes: 21 additions & 10 deletions src/rat/core/run_metsim.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,28 @@ def convert_to_vic_forcings(self, forcings_dir):
log.debug(f"Will create {len(years)} forcing files")
for year, ds, p in zip(years, dataset, paths):
if os.path.isfile(p):
existing = xr.open_dataset(p)#.load()
existing.close()

log.debug(f"Writing file for year {year}: {p} -- Updating existing")
# xr.merge([existing, ds], compat='override', join='outer').to_netcdf(p)
# xr.concat([existing, ds], dim='time').to_netcdf(p)
last_existing_time = existing.time[-1]
log.debug("Existing data: %s", last_existing_time)
# Open the existing NetCDF file
with xr.open_dataset(p) as existing:
last_existing_time = existing.time[-1]
log.debug(f"Writing file for year {year}: {p} -- Updating existing")
log.debug("Existing data: %s", last_existing_time)

# Load the data into memory before closing the dataset
existing_data = existing.sel(
time=slice(None, last_existing_time)
).load() # Load the data into memory

# Select the new data that needs to be appended
ds = ds.sel(time=slice(last_existing_time + np.timedelta64(6,'h'), ds.time[-1]))
#ds = ds.isel(time=slice(1, None))
xr.merge([existing, ds]).to_netcdf(p)

# Merge the existing data with the new data and save it back to the file
xr.merge([existing_data, ds]).to_netcdf(p)

# # Explicitly delete variables to free up memory
# del existing_data, ds

# # Manually trigger garbage collection to ensure memory is freed
# gc.collect()
else:
log.debug(f"Writing file for year {year}: {p} -- Updating new")
ds.to_netcdf(p)
2 changes: 1 addition & 1 deletion src/rat/core/run_postprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def calc_E(res_data, start_date, end_date, forcings_path, vic_res_path, sarea, s
# Concat the two dataframes into a new dataframe holding all the data (memory intensive):
complement = pd.concat([existing_data, new_data], ignore_index=True)
# Remove all duplicates:
complement.drop_duplicates(subset=['time'],inplace=True, keep='first')
complement.drop_duplicates(subset=['time'],inplace=True, keep='last')
complement.to_csv(savepath, index=False)
else:
data[['time', 'penman_E']].rename({'penman_E': 'OUT_EVAP'}, axis=1).to_csv(savepath, index=False)
Expand Down
2 changes: 1 addition & 1 deletion src/rat/core/run_routing.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ def generate_inflow(src_dir, dst_dir):
# Concat the two dataframes into a new dataframe holding all the data (memory intensive):
complement = pd.concat([existing_data, new_data], ignore_index=True)
# Remove all duplicates:
complement.drop_duplicates(subset=['date'], inplace=True, keep='first')
complement.drop_duplicates(subset=['date'], inplace=True, keep='last')
complement.sort_values(by='date', inplace=True)
complement.to_csv(outpath, index=False)
else:
Expand Down
4 changes: 2 additions & 2 deletions src/rat/core/run_vic.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def generate_routing_input_state(self, ndays, rout_input_state_file, save_path,
first_existing_time = new_vic_output.time[0]
new_vic_output.close()

#Preprocessing function for merging netcdf files
#Preprocessing function for merging netcdf files by removing coinciding dates from the first and using those values from latest
def _remove_coinciding_days(ds, cutoff_time, ndays):
file_name = ds.encoding["source"]
file_stem = Path(file_name).stem
Expand All @@ -55,7 +55,7 @@ def _remove_coinciding_days(ds, cutoff_time, ndays):
return ds
remove_coinciding_days_func = partial(_remove_coinciding_days, cutoff_time=first_existing_time, ndays=ndays)

# Merging previous and new vic outputs
# Merging previous and new vic outputs by taking 365 days data in state file (previous vic outputs) before the first date in new vic output. So coinciding data will be removed from state file.
try:
save_vic_output = xr.open_mfdataset([rout_input_state_file,self.vic_result],{'time':365}, preprocess=remove_coinciding_days_func)
save_vic_output.to_netcdf(save_path)
Expand Down
Loading