-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAType values avoid using pandas features #227
Comments
related upstream issue |
As long as Major is |
Dear Daniel, thanks for reporting this. NAType issue@kmuehlbauer: Thanks for quickly referencing the upstream issue pandas-dev/pandas#37626, you are a treasure of gold. So, shall we also apply to In [4]: pd.Series(["1", "2", pd.NA], dtype="string").astype("category").astype("float") # magic workaround
Out[4]:
0 1.0
1 2.0
2 NaN
dtype: float64 Versioning and stability
That's true, we are still in beta mode here. For that reason, even Pandas switched to version 1.0.0 only recently, after being on 0.x.x for a very long time. Saying that, we do not intend to break things on purpose, but going forward on different levels is still required to fulfill our agenda somehow. So, please bear with us.
The most important thing is to provide releases. In this manner, people will be able to nail their dependencies to e.g. With kind regards, P.S.: Still, we might want to adjust our examples accordingly. Ping, @gutzbenj ;]. |
Thanks for introducing this :) |
@amotl I checked out the example and there |
@kmuehlbauer the notebook ("climate_observations") is executed fully once per test run. But you are right, the regular ".py" examples should be executed once as well to secure that the API is still usable. |
@gutzbenj Yes, that would be worth to setup. |
We have certain situations where the actual value is a integer (or even string). Currently pandas does
For this purpose we have used pandas internal DTypes and that would be my guess for the error you are facing here. However your example is missing a station id. I'd like to understand the error you are facing but for that I need the FULL example including the station id. |
@gutzbenj I am sorry, I thought I added one: the 44 e.g. . What I am thinking about is to parse NaType to floating nan before return the dataframe? Okay pandas says they are using nump.nan as a default for nan's but providing own NaN types? Sometimes I am asking myself why we all use pandas :D There is so many bad stuff in it. |
I again had a look at this problem within #260 but had figured atm that there's at least atm no solution but to be honest I also don't see a problem, that we could change within our scope. Firstly, changing the columns dtype is not recommended, as we have already set it to a meaningful dtype within the type coercion. If you still want to do that, pd.to_numeric() should be a better option. Secondly the problems with .pivot_table() are seen elsewhere and do not seem to be related to the NAType. |
This is still an issue on pandas==2.0.0 and pandas=1.3.5: from wetterdienst.provider.dwd.observation import (
DwdObservationRequest,
DwdObservationDataset,
DwdObservationPeriod,
DwdObservationResolution
)
from wetterdienst import Settings
settings = Settings(ts_shape="long", ts_humanize=True, ts_si_units=False)
parameters = [
DwdObservationDataset.TEMPERATURE_AIR, # Dry_Bulb
DwdObservationDataset.DEW_POINT,
DwdObservationDataset.PRESSURE,
DwdObservationDataset.PRECIPITATION,
DwdObservationDataset.SOLAR,
DwdObservationDataset.WIND,
DwdObservationDataset.SUN,
DwdObservationDataset.CLOUDINESS,
DwdObservationDataset.WEATHER_PHENOMENA
]
request = DwdObservationRequest(
parameter=parameters,
resolution=DwdObservationResolution.MINUTE_10,
start_date="2019-01-01",
end_date="2020-01-01",
settings=settings,
)
stations = request.filter_by_station_id(station_id=(2968))
weather_df = stations.values.all() I am pulling now the data via simple Python requests. |
@simonsays1980 what exactly is the issue you are facing there? I'm currently working on migrating wetterdienst to polars #904 which would bring relief in this missing values situation! |
If you collect data from opendata store like this:
And you try to work with the values:
or
Yield to a dataframe that contains
NAType
. Why do you do that ? Never seen something like that before. It makes it really unattractive to use the DataFrame. If you are introducing such a parameter I am asking myself why you provide data as a DataFrame? Because several benefits are gone if you are not able to access the data due to suspicious nan type in a value column.Btw.: You have made several breaking changes in the last weeks but only set minor version plus 1. Versions are normally structured With Major.Minor.Patch . Major changes including breaking changes. Means the main functionality is deprecated due to some changes. .And keep in mind that providing breaking changes every one or two months will annoy users.
The text was updated successfully, but these errors were encountered: