NAType values avoid using pandas features #227

meteoDaniel · 2020-11-18T16:51:02Z

If you collect data from opendata store like this:

climate_data = DWDObservationData(
    station_ids=[station_id],
    parameters=[DWDObservationParameterSet.PRECIPITATION],
    resolution=DWDObservationResolution.MINUTE_10,
    periods=[DWDObservationPeriod.HISTORICAL],
    humanize_column_names=True
    ).collect_safe()

And you try to work with the values:

climate_data.VALUE.astype(float) 
>>> TypeError: float() argument must be a string or a number, not 'NAType'

or

climate_data.pivot_table(values='VALUE', columns='ELEMENT', index='DATE')
>>>DataError: No numeric types to aggregate

Yield to a dataframe that contains NAType. Why do you do that ? Never seen something like that before. It makes it really unattractive to use the DataFrame. If you are introducing such a parameter I am asking myself why you provide data as a DataFrame? Because several benefits are gone if you are not able to access the data due to suspicious nan type in a value column.

Btw.: You have made several breaking changes in the last weeks but only set minor version plus 1. Versions are normally structured With Major.Minor.Patch . Major changes including breaking changes. Means the main functionality is deprecated due to some changes. .And keep in mind that providing breaking changes every one or two months will annoy users.

The text was updated successfully, but these errors were encountered:

kmuehlbauer · 2020-11-18T17:40:49Z

related upstream issue

kmuehlbauer · 2020-11-18T17:51:51Z

Btw.: You have made several breaking changes in the last weeks but only set minor version plus 1. Versions are normally structured With Major.Minor.Patch . Major changes including breaking changes. Means the main functionality is deprecated due to some changes. .And keep in mind that providing breaking changes every one or two months will annoy users.

As long as Major is 0 API isn't supposed to be stable (https://semver.org/#spec-item-4). Just my 2c.

amotl · 2020-11-18T18:31:10Z

Dear Daniel,

thanks for reporting this.

NAType issue

@kmuehlbauer: Thanks for quickly referencing the upstream issue pandas-dev/pandas#37626, you are a treasure of gold.
@meteoDaniel: I hear you well this is annoying.

So, shall we also apply to .astype("category") workaround somewhere like suggested by @mlondschien?

In [4]: pd.Series(["1", "2", pd.NA], dtype="string").astype("category").astype("float")  # magic workaround
Out[4]: 
0    1.0
1    2.0
2    NaN
dtype: float64

Versioning and stability

As long as Major is 0 API isn't supposed to be stable.

That's true, we are still in beta mode here. For that reason, even Pandas switched to version 1.0.0 only recently, after being on 0.x.x for a very long time.

Saying that, we do not intend to break things on purpose, but going forward on different levels is still required to fulfill our agenda somehow. So, please bear with us.

And keep in mind that providing breaking changes every one or two months will annoy users.

The most important thing is to provide releases. In this manner, people will be able to nail their dependencies to e.g. wetterdienst==0.10.1.

With kind regards,
Andreas.

P.S.: Still, we might want to adjust our examples accordingly. Ping, @gutzbenj ;].

meteoDaniel · 2020-11-18T18:44:38Z

Btw.: You have made several breaking changes in the last weeks but only set minor version plus 1. Versions are normally structured With Major.Minor.Patch . Major changes including breaking changes. Means the main functionality is deprecated due to some changes. .And keep in mind that providing breaking changes every one or two months will annoy users.

As long as Major is 0 API isn't supposed to be stable (https://semver.org/#spec-item-4). Just my 2c.

Thanks for introducing this :)

meteoDaniel · 2020-11-18T18:48:05Z

@amotl I checked out the example and there .dropna is applied and afterwards all is fine. So NAType is detected as NaN and is dropped. np.nan in pandas is treated as float and that makes it really handsome to work with. I am pretty sure that there was a reason for using NAType so I would like to understand why?!

kmuehlbauer · 2020-11-18T18:49:58Z

P.S.: Still, we might want to adjust our examples accordingly.

@amotl So, does that mean, you do not test the examples in your GitHub Actions?

gutzbenj · 2020-11-18T18:55:27Z

@kmuehlbauer the notebook ("climate_observations") is executed fully once per test run. But you are right, the regular ".py" examples should be executed once as well to secure that the API is still usable.

kmuehlbauer · 2020-11-18T19:03:18Z

@gutzbenj Yes, that would be worth to setup.

gutzbenj · 2020-11-18T19:09:49Z

@amotl I checked out the example and there .dropna is applied and afterwards all is fine. So NAType is detected as NaN and is dropped. np.nan in pandas is treated as float and that makes it really handsome to work with. I am pretty sure that there was a reason for using NAType so I would like to understand why?!

We have certain situations where the actual value is a integer (or even string). Currently pandas does

allow type conversion to float with arrays that contain NaNs
not allow type conversion to integer with arrays that contain NaNs

For this purpose we have used pandas internal DTypes and that would be my guess for the error you are facing here. However your example is missing a station id. I'd like to understand the error you are facing but for that I need the FULL example including the station id.

meteoDaniel · 2020-11-19T07:23:06Z

@gutzbenj I am sorry, I thought I added one: the 44 e.g. . What I am thinking about is to parse NaType to floating nan before return the dataframe?

Okay pandas says they are using nump.nan as a default for nan's but providing own NaN types? Sometimes I am asking myself why we all use pandas :D There is so many bad stuff in it.

gutzbenj · 2020-11-28T11:51:30Z

I again had a look at this problem within #260 but had figured atm that there's at least atm no solution but to be honest I also don't see a problem, that we could change within our scope. Firstly, changing the columns dtype is not recommended, as we have already set it to a meaningful dtype within the type coercion. If you still want to do that, pd.to_numeric() should be a better option. Secondly the problems with .pivot_table() are seen elsewhere and do not seem to be related to the NAType.

simonsays1980 · 2023-04-06T17:03:45Z

This is still an issue on pandas==2.0.0 and pandas=1.3.5:

from wetterdienst.provider.dwd.observation import (
    DwdObservationRequest, 
    DwdObservationDataset, 
    DwdObservationPeriod, 
    DwdObservationResolution
)
from wetterdienst import Settings


settings = Settings(ts_shape="long", ts_humanize=True, ts_si_units=False)

parameters = [
    DwdObservationDataset.TEMPERATURE_AIR, # Dry_Bulb
    DwdObservationDataset.DEW_POINT,
    DwdObservationDataset.PRESSURE,
    DwdObservationDataset.PRECIPITATION,
    DwdObservationDataset.SOLAR,
    DwdObservationDataset.WIND,
    DwdObservationDataset.SUN,
    DwdObservationDataset.CLOUDINESS,
    DwdObservationDataset.WEATHER_PHENOMENA
]
request = DwdObservationRequest(
    parameter=parameters,
    resolution=DwdObservationResolution.MINUTE_10,
    start_date="2019-01-01",
    end_date="2020-01-01",
    settings=settings,
)

stations = request.filter_by_station_id(station_id=(2968))

weather_df = stations.values.all()

I am pulling now the data via simple Python requests.

gutzbenj · 2023-04-07T20:31:40Z

@simonsays1980 what exactly is the issue you are facing there? I'm currently working on migrating wetterdienst to polars #904 which would bring relief in this missing values situation!

This was referenced Nov 22, 2020

Add tests for examples #240

Merged

Follow up on improvements to wradlib v1.9.0 #251

Closed

gutzbenj closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAType values avoid using pandas features #227

NAType values avoid using pandas features #227

meteoDaniel commented Nov 18, 2020 •

edited by amotl

Loading

kmuehlbauer commented Nov 18, 2020

kmuehlbauer commented Nov 18, 2020 •

edited

Loading

amotl commented Nov 18, 2020

meteoDaniel commented Nov 18, 2020

meteoDaniel commented Nov 18, 2020

kmuehlbauer commented Nov 18, 2020

gutzbenj commented Nov 18, 2020 •

edited

Loading

kmuehlbauer commented Nov 18, 2020

gutzbenj commented Nov 18, 2020 •

edited

Loading

meteoDaniel commented Nov 19, 2020

gutzbenj commented Nov 28, 2020

simonsays1980 commented Apr 6, 2023

gutzbenj commented Apr 7, 2023

NAType values avoid using pandas features #227

NAType values avoid using pandas features #227

Comments

meteoDaniel commented Nov 18, 2020 • edited by amotl Loading

kmuehlbauer commented Nov 18, 2020

kmuehlbauer commented Nov 18, 2020 • edited Loading

amotl commented Nov 18, 2020

NAType issue

Versioning and stability

meteoDaniel commented Nov 18, 2020

meteoDaniel commented Nov 18, 2020

kmuehlbauer commented Nov 18, 2020

gutzbenj commented Nov 18, 2020 • edited Loading

kmuehlbauer commented Nov 18, 2020

gutzbenj commented Nov 18, 2020 • edited Loading

meteoDaniel commented Nov 19, 2020

gutzbenj commented Nov 28, 2020

simonsays1980 commented Apr 6, 2023

gutzbenj commented Apr 7, 2023

meteoDaniel commented Nov 18, 2020 •

edited by amotl

Loading

kmuehlbauer commented Nov 18, 2020 •

edited

Loading

gutzbenj commented Nov 18, 2020 •

edited

Loading

gutzbenj commented Nov 18, 2020 •

edited

Loading