Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up duplicates for EURPFM extremes #23

Open
veenstrajelmer opened this issue Apr 3, 2024 · 0 comments
Open

Clean up duplicates for EURPFM extremes #23

veenstrajelmer opened this issue Apr 3, 2024 · 0 comments
Labels

Comments

@veenstrajelmer
Copy link

When retrieving extreme data for EURPFM there are duplicates present in the returned dataset:

import datetime as dt
import ddlpy

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['EURPFM'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
bool_groepering = locations['Groepering.Code'].isin(['GETETM2'])
selected = locations.loc[bool_grootheid & bool_hoedanigheid & bool_groepering & bool_stations]

# numtiple parameters avaialble per location
records = selected.iloc[0]

# if we pass one row to the measurements function you can get all the measurements
measurements = ddlpy.measurements(records, dt.datetime(2012,12,31,7,35,0), dt.datetime(2013,1,1,18,26,0), clean_df=False)

print("measurements")
print(measurements)
print()
print("measurements without duplicates")
print(measurements.drop_duplicates())

This prints the following. The raw data has all values duplicated twice (including status, meetapparaar etc, since all is present in this dataframe):

measurements
                          WaarnemingMetadata.StatuswaardeLijst  ...             Y
time                                                            ...              
2012-12-31 09:35:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 09:35:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 15:44:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 15:44:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 20:50:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 20:50:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 04:04:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 04:04:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 09:34:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 09:34:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 16:26:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 16:26:00+01:00                        Gecontroleerd  ...  5.760829e+06
[12 rows x 54 columns]

measurements without duplicates
                          WaarnemingMetadata.StatuswaardeLijst  ...             Y
time                                                            ...              
2012-12-31 09:35:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 15:44:00+01:00                        Gecontroleerd  ...  5.760829e+06
2012-12-31 20:50:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 04:04:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 09:34:00+01:00                        Gecontroleerd  ...  5.760829e+06
2013-01-01 16:26:00+01:00                        Gecontroleerd  ...  5.760829e+06
[6 rows x 54 columns]

This is also the case for other timestamps.

@veenstrajelmer veenstrajelmer changed the title Clean up duplicates for EURPFM Clean up duplicates for EURPFM extremes Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants