Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix numbering of extremes for havengetallen #101

Open
2 tasks
veenstrajelmer opened this issue Jun 25, 2024 · 1 comment
Open
2 tasks

Fix numbering of extremes for havengetallen #101

veenstrajelmer opened this issue Jun 25, 2024 · 1 comment

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Jun 25, 2024

fix "Exception: tidal wave numbering: HW numbers not always increasing", at least for HANSWT, BROUWHVSGT08, PETTZD and DORDT. This might not be relevant anymore if we remove moonculminations dependency, since it probably comes from matching culminations to extremes. havengetallen are also called in kw.calc_gemiddeldgetij() in case of scaling.

import os
import hatyan
import kenmerkendewaarden as kw

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')

data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station='HANSWT', extremes=True)
data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)

df_havengetallen = kw.calc_havengetallen(df_ext=data_pd_HWLW_all_12.loc["2011":"2020"], return_df_ext=False)

#same error with
extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12.loc["2011":"2020"])

Check for other stations
Code to check issues on a larger scale:

import os
import hatyan
import kenmerkendewaarden as kw

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
# dir_base = r'p:\11210325-005-kenmerkende-waarden\work\_backup_20240823'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')

station_list = ['A12','AWGPFM','BAALHK','BATH','BERGSDSWT','BROUWHVSGT02','BROUWHVSGT08','GATVBSLE','BRESKVHVN','CADZD',
                'D15','DELFZL','DENHDR','EEMSHVN','EURPFM','F16','F3PFM','HARVT10','HANSWT','HARLGN','HOEKVHLD','HOLWD','HUIBGT',
                'IJMDBTHVN','IJMDSMPL','J6','K13APFM','K14PFM','KATSBTN','KORNWDZBTN','KRAMMSZWT','L9PFM','LAUWOG','LICHTELGRE',
                'MARLGT','NES','NIEUWSTZL','NORTHCMRT','DENOVBTN','OOSTSDE04','OOSTSDE11','OOSTSDE14','OUDSD','OVLVHWT','Q1',
                'ROOMPBNN','ROOMPBTN','SCHAARVDND','SCHEVNGN','SCHIERMNOG','SINTANLHVSGR','STAVNSE','STELLDBTN','TERNZN','TERSLNZE','TEXNZE',
                'VLAKTVDRN','VLIELHVN','VLISSGN','WALSODN','WESTKPLE','WESTTSLG','WIERMGDN','YERSKE']
# station_list = ['DORDT', 'MAASMSMPL', 'PETTZD', 'ROTTDM']
tidalwavenumbers = []
for station in station_list:
    print(f"processing {station}")
    
    data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=station, extremes=True, drop_duplicates=True)
    if data_pd_HWLW_all is None:
        print('no data file')
        continue
    data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)
    
    df_ext = data_pd_HWLW_all_12.loc["2000":"2020"]
    if len(df_ext) == 0:
        print('no data in selected period')
        continue
    
    try:
        df_havengetallen = kw.calc_havengetallen(df_ext=df_ext, return_df_ext=False)
    except Exception as e:
        print(e)
        tidalwavenumbers.append(station)
        
    #same error with calc_HWLWnumbering() (after converting the timezone first)
    # extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12.loc["2011":"2020"])
print(tidalwavenumbers)

For the period from 2000-2020 this prints: ['BROUWHVSGT08', 'HANSWT', 'IJMDBTHVN', 'DENOVBTN']. Before also for ['DORDT', 'PETTZD'], but this data is not included in the download anymore

Todo:

Also happens with valid-only data
This also happens for IJMDBTHVN, for which the data is clean, but the highwater is asymetric. It is only an issue after shifting the timezone, but it comes from hatyan.calc_HWLWnumbering() already. So even if for kenmerkendewaarden we move to another way of matching culminations and extremes, we will probably want to solve this in hatyan. A follow-up issue is created there: Deltares/hatyan#329

Alternative
It could be considered to avoid numbering of extremes with one of the alternatives described in #174

This was referenced Jun 25, 2024
@veenstrajelmer
Copy link
Collaborator Author

veenstrajelmer commented Jun 25, 2024

HANSWT contains a almost duplicate timestep on 2020-04-01:

                           values qualitycode         status  HWLWcode
time                                                                  
2020-03-30 00:05:00+01:00   -2.35          00  Gecontroleerd         2
2020-03-30 06:11:00+01:00    2.20          00  Gecontroleerd         1
2020-03-30 12:15:00+01:00   -2.28          00  Gecontroleerd         2
2020-03-30 18:36:00+01:00    2.25          00  Gecontroleerd         1
2020-03-31 00:45:00+01:00   -2.03          00  Gecontroleerd         2
2020-03-31 06:45:00+01:00    2.05          00  Gecontroleerd         1
2020-03-31 12:46:00+01:00   -2.33          00  Gecontroleerd         2
2020-03-31 19:15:00+01:00    1.86          00  Gecontroleerd         1
2020-04-01 01:20:00+01:00   -2.11          00  Gecontroleerd         2
2020-04-01 07:32:00+01:00    1.96          00  Gecontroleerd         1
2020-04-01 07:35:00+01:00    1.96          00  Gecontroleerd         1
2020-04-01 13:45:00+01:00   -1.90          00  Gecontroleerd         2
2020-04-01 20:02:00+01:00    1.94          00  Gecontroleerd         1

BROUWHVSGT08 has a almost duplicate timestep on 2015-01-01:

                           values qualitycode           status  HWLWcode
time                                                                    
2015-01-01 04:11:00+01:00   -0.86          00  Ongecontroleerd         2
2015-01-01 05:33:00+01:00   -0.84          00    Gecontroleerd         2
2015-01-01 10:48:00+01:00    1.08          00    Gecontroleerd         1
2015-01-01 10:51:00+01:00    1.05          00    Gecontroleerd         1
2015-01-01 17:18:00+01:00   -1.07          00    Gecontroleerd         2
2015-01-01 17:30:00+01:00   -1.03          00    Gecontroleerd         2
2015-01-01 23:15:00+01:00    1.12          00    Gecontroleerd         1

After removing this, the algorithm works successfully. These issues were already reported in Rijkswaterstaat/wm-ws-dl#43

More stations:

import os
import hatyan
hatyan.close("all")
import kenmerkendewaarden as kw # pip install git+https://github.com/Deltares-research/kenmerkendewaarden

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')
# dir_meas = r"c:\Users\veenstra\Downloads\measurements_wl_18700101_20240101"

# station_list = ['A12','AWGPFM','BAALHK','BATH','BERGSDSWT','BROUWHVSGT02','BROUWHVSGT08','GATVBSLE','BRESKVHVN','CADZD',
#                 'D15','DELFZL','DENHDR','EEMSHVN','EURPFM','F16','F3PFM','HARVT10','HANSWT','HARLGN','HOEKVHLD','HOLWD','HUIBGT',
#                 'IJMDBTHVN','IJMDSMPL','J6','K13APFM','K14PFM','KATSBTN','KORNWDZBTN','KRAMMSZWT','L9PFM','LAUWOG','LICHTELGRE',
#                 'MARLGT','NES','NIEUWSTZL','NORTHCMRT','DENOVBTN','OOSTSDE04','OOSTSDE11','OOSTSDE14','OUDSD','OVLVHWT','Q1',
#                 'ROOMPBNN','ROOMPBTN','SCHAARVDND','SCHEVNGN','SCHIERMNOG','SINTANLHVSGR','STAVNSE','STELLDBTN','TERNZN','TERSLNZE','TEXNZE',
#                 'VLAKTVDRN','VLIELHVN','VLISSGN','WALSODN','WESTKPLE','WESTTSLG','WIERMGDN','YERSKE']

# almost-duplicate timesteps should still be defined for all of these stations
station_list = [
                # 'CADZD',
                'DELFZL',
                # 'DENHDR',
                # 'HOLWD',
                # 'K13APFM',
                # 'KORNWDZBTN',
                # 'KRAMMSZWT',
                # 'NIEUWSTZL', # duplicate timesteps
                # 'DENOVBTN',
                # 'ROOMPBNN', # duplicate timesteps
                # 'STAVNSE',
                # 'STELLDBTN', # duplicate timesteps
                # 'TERNZN',
                # 'VLAKTVDRN',
                # 'VLIELHVN',
                # 'VLISSGN', # indexError
                # 'WESTKPLE',
                ]

# TODO: also for DORDT and PETZD, but data was not downloaded

list_fails = []
for station in station_list:
    print(f"processing {station}")
    data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=station, extremes=True)
    if data_pd_HWLW_all is None:
        print("no measurement found for this station")
        continue
    
    if station=="HANSWT":
        drop_list = ["2020-04-01 07:35:00+01:00"]
    elif station=="BROUWHVSGT08":
        drop_list = ["2015-01-01 04:11:00+01:00",
                     "2015-01-01 10:51:00+01:00",
                     "2015-01-01 17:30:00+01:00"]
    elif station == "BERGSDSWT":
        drop_list = ["1996-07-05 01:09:00+01:00",
                     "1992-01-01 00:50:00+01:00"]
    elif station=="CADZD":
        drop_list = ["1993-01-01 00:30:00+01:00"]
        # TODO: more issues
    else:
        drop_list = []
    
    data_pd_HWLW_all = data_pd_HWLW_all.drop(drop_list)
    
    # data_pd_HWLW_all = data_pd_HWLW_all.loc["1960":"1965"]
    
    # convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)
    data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all)
    
    # hatyan.plot_timeseries(ts=data_pd_HWLW_all_12, ts_ext=data_pd_HWLW_all_12)
    
    # df_havengetallen = kw.calc_havengetallen(df_ext=data_pd_HWLW_all_12.loc["2011":"2020"], return_df_ext=False)
    extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12)

These data issues are reported in Rijkswaterstaat/wm-ws-dl#43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant