-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Dataframe input for functions #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Calama-Consulting is right to express caution. Maybe we should restrict this sort of function signature to those that build on the core pvlib functionality and leave the core functions alone.
In this case the "explicit" variables would probably become kwargs, along with the new Another option is to suggest that users unpack DataFrames in the parameter call. Here's a quick example that uses both of the above: solpos = pd.DataFrame({'elevation':60,'zenith':30,'azimuth':180,'airmass':2.5}, index=['2015-06-21T12:00'])
def func_with_kwargs(elevation=None, zenith=None, **kwargs):
print(elevation)
print(zenith)
print(kwargs)
func_with_kwargs(**solpos)
2015-06-21T12:00 60
Name: elevation, dtype: int64
2015-06-21T12:00 30
Name: zenith, dtype: int64
{'airmass': 2015-06-21T12:00 2.5
Name: airmass, dtype: float64, 'azimuth': 2015-06-21T12:00 180
Name: azimuth, dtype: int64} I still get tripped up by the kwargs syntax, so this isn't something for beginners. |
My intention was to leave the functions as they are and to develop something like a different api, that uses these functions under the hood. So we agree here, I think. The main advantage of such an api would be that "magical" objects could be implemented (if we use consistent dataframe keys) for standard tasks. # this function parses a time series provided e.g. by GeoModel (latitude and longitude,
# time format and timezone, irradiance, ambient temperatur) and "translates" all
# GeoModel variable names to "our" names
# could also be any other data provider: TMY's, meteonorm, ...
tus = Location.from_geomodel('tuscon.tsv')
fr = Location.from_geomodel('freiburg.tsv')
# read and parse the system configuration
system = PVSystem.from_json('system.json')
# set a location for the system
system.set_location(tus)
# "magical" method, that looks at the data available and uses
# a standard modelling chain (tilted irradiance, PV module behaviour,
# Inverter, wiring, transformer, ...) to calculate energy yields.
# The results of all modelling steps are stored in a dataframe.
# By default a standard modelling chain is used, however you can
# pass your own.
tus_results = system.calculate_final_yield()
print(tus_results.columns)
[GHI, DHI, GPOA, E_module, ..., E_inverter, E_final]
# print the mean annual energy yield in kWh per kWp
print(tus_results['E_final'].resample('A').mean())
1500
system.set_location(fr)
fr_results = system.calculate_final_yield()
print(fr_results['E_final'].resample('A').mean())
1200 I am not sure if this is in the scope of pvlib, maybe this could be implemented in a different library or application. |
What about adding a new module named "applications.py" containing meta functions. If this is not intended by the pvlib, it could be a good idea to start a 'pvlib_app' repository with these meta functions. I would prefer adding a module. |
I very much think that these high level functions should be a part of pvlib. They'll build upon the core library and illustrate how to use each part of it. IPython notebooks that walk through each step of the function could be interesting and educational. These high level functions would be useful for people like me that are not experts in every step of PV system modeling and would be happy to have somebody else choose some reasonable defaults for them. Where to put them? Some of my thoughts:
There was some related discussion at the old Sandia-Labs repository. But back to the issue title, I think that DataFrame inputs make a lot of sense for some of these high level functions. |
I think we've basically implemented this via the PVSystem/Location/ModelChain classes. |
From a conversation started in #37 by @bmu :
My veiw on this would be to stay away from datframe passing as inputs, especially as the only form of input. One of the advantages of pvlib is that there is the ability to use different inputs (irradiance sources, plane transposition models, etc.) and compare their outputs from the functions. This means that a user might have multiple versions of dni,ghi, pmp, etc. which they are wanting to use in the functions. Though it is possible to repackage a new dataframe for each time a variable is swapped out, this leads to extra unnecessary steps on the user side, and makes it harder to explicitly track what is being passed through a function. When I was originally making tools for myself, I did make them with dataframe inputs, and found that it was leading to too many hard to trace errors, and ended up switching it to explicit inputs.
It might be interesting to have df input as an optional input along with the explicit variables (which might have been what you were suggesting), but I wouldn't want to move completely to df input.
The text was updated successfully, but these errors were encountered: