Skip to content

Dataframe input for functions #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
robwandrews opened this issue Mar 23, 2015 · 5 comments
Closed

Dataframe input for functions #39

robwandrews opened this issue Mar 23, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@robwandrews
Copy link
Contributor

From a conversation started in #37 by @bmu :

As an idea, they could possibly also be used by users in the future, if we would switch to dataframe inputs for some functions or a different kind of api. This would enable us to include som e "magic" functions, e.g the signature of the ominous globalinplane function could look like this:

def globalinplane(df, surface_tilt, surface_azimuth, diffuse_model='perez', 
              decomposition_model=None):
"""Determine GPOA from either GHI and DHI or from GHI only

Parameters
---------------
df : pandas.DataFrame
      A DataFrame containing all necessary columns acoording to naming conventions
      (maybe surface_tilt and surface_azimuth could also be contained in the df, 
       e.g. for tracking systems.) 
decomposition_model : None or str
      The model to use if only GHI is given in the DataFrame
....
Returns
-------
The input DataFrame plus columns `direct tilted`, 'diffuse tilted`, `GPOA` ...
""""

This function could look which columns are in the DataFrame and compute all necessary columns (e.g. if time is given as local time, compute utc or true solar time).

From my experience this is usefull for beginners because there are quite a lot of simulation steps required to calculate GPOA from GHI (maybe convert times, decomposition in direct and diffuse, diffuse model, ground reflection, direct tilted, ...?).
And there are other options, e.g. claculate expected energy yield from only a system description, Location and a DataFrame containing GHI and ambient temperature.

This is just an idea, not sure about the difficulties. There may be some complexity when implementing something like this (more on the definitions, not necessarily from a programmers point of view) and maybe this is difficult to explain to users.

My veiw on this would be to stay away from datframe passing as inputs, especially as the only form of input. One of the advantages of pvlib is that there is the ability to use different inputs (irradiance sources, plane transposition models, etc.) and compare their outputs from the functions. This means that a user might have multiple versions of dni,ghi, pmp, etc. which they are wanting to use in the functions. Though it is possible to repackage a new dataframe for each time a variable is swapped out, this leads to extra unnecessary steps on the user side, and makes it harder to explicitly track what is being passed through a function. When I was originally making tools for myself, I did make them with dataframe inputs, and found that it was leading to too many hard to trace errors, and ended up switching it to explicit inputs.

It might be interesting to have df input as an optional input along with the explicit variables (which might have been what you were suggesting), but I wouldn't want to move completely to df input.

@wholmgren
Copy link
Member

@Calama-Consulting is right to express caution. Maybe we should restrict this sort of function signature to those that build on the core pvlib functionality and leave the core functions alone.

It might be interesting to have df input as an optional input along with the explicit variables (which might have been what you were suggesting), but I wouldn't want to move completely to df input.

In this case the "explicit" variables would probably become kwargs, along with the new df input. Most of these kwargs would probably be set to None and then we'd need to make sure that sensible error messages inform the user when something is missing from the sum of kwargs and df inputs.

Another option is to suggest that users unpack DataFrames in the parameter call.

Here's a quick example that uses both of the above:

solpos = pd.DataFrame({'elevation':60,'zenith':30,'azimuth':180,'airmass':2.5},  index=['2015-06-21T12:00'])

def func_with_kwargs(elevation=None, zenith=None, **kwargs):
    print(elevation)
    print(zenith)
    print(kwargs)

func_with_kwargs(**solpos)

2015-06-21T12:00    60
Name: elevation, dtype: int64
2015-06-21T12:00    30
Name: zenith, dtype: int64
{'airmass': 2015-06-21T12:00    2.5
Name: airmass, dtype: float64, 'azimuth': 2015-06-21T12:00    180
Name: azimuth, dtype: int64}

I still get tripped up by the kwargs syntax, so this isn't something for beginners.

@bmu
Copy link
Contributor

bmu commented Mar 25, 2015

My intention was to leave the functions as they are and to develop something like a different api, that uses these functions under the hood. So we agree here, I think.
And I also think that one advantage of pvlib ist, that you can use different data sources and models to calculate whatever you want.

The main advantage of such an api would be that "magical" objects could be implemented (if we use consistent dataframe keys) for standard tasks.
A full syntax example could be something like this:

# this function parses a time series provided e.g. by GeoModel (latitude and longitude,
# time format and timezone, irradiance, ambient temperatur) and "translates" all 
# GeoModel variable names to "our" names
# could also be any other data provider: TMY's, meteonorm, ...
tus = Location.from_geomodel('tuscon.tsv')  
fr = Location.from_geomodel('freiburg.tsv')

# read and parse the system configuration
system = PVSystem.from_json('system.json')

# set a location for the system
system.set_location(tus)

# "magical" method, that looks at the data available and uses 
# a standard modelling chain (tilted irradiance, PV module behaviour,
# Inverter, wiring, transformer, ...) to calculate energy yields. 
# The results of all modelling steps are stored in a dataframe.
# By default a standard modelling chain is used, however you can
# pass your own.
tus_results = system.calculate_final_yield()
print(tus_results.columns)
[GHI, DHI, GPOA, E_module, ..., E_inverter, E_final]

# print the mean annual energy yield in kWh per kWp
print(tus_results['E_final'].resample('A').mean())
1500

system.set_location(fr)
fr_results = system.calculate_final_yield()
print(fr_results['E_final'].resample('A').mean())
1200

I am not sure if this is in the scope of pvlib, maybe this could be implemented in a different library or application.

@uvchik
Copy link
Contributor

uvchik commented Mar 25, 2015

What about adding a new module named "applications.py" containing meta functions. If this is not intended by the pvlib, it could be a good idea to start a 'pvlib_app' repository with these meta functions.

I would prefer adding a module.

@wholmgren wholmgren added the api label Mar 25, 2015
@wholmgren
Copy link
Member

I very much think that these high level functions should be a part of pvlib. They'll build upon the core library and illustrate how to use each part of it. IPython notebooks that walk through each step of the function could be interesting and educational. These high level functions would be useful for people like me that are not experts in every step of PV system modeling and would be happy to have somebody else choose some reasonable defaults for them.

Where to put them? Some of my thoughts:

  • Functions that apply to only (mostly?) one aspect of modeling should go in an existing module.
  • Functions that cut across multiple modules should go in something like api.py.
  • Regardless of where it lives, a function could be imported to the pvlib level namespace, so that one could use pvlib.magic_function(df).
  • @bmu's nice "factory" methods should stay attached to their classes.

There was some related discussion at the old Sandia-Labs repository.

But back to the issue title, I think that DataFrame inputs make a lot of sense for some of these high level functions.

@bmu bmu added this to the 0.2 milestone Apr 2, 2015
@bmu bmu self-assigned this Apr 2, 2015
@wholmgren wholmgren mentioned this issue Apr 23, 2015
@bmu bmu modified the milestones: Someday, 0.2 Jun 21, 2015
@wholmgren
Copy link
Member

I think we've basically implemented this via the PVSystem/Location/ModelChain classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants