Skip to content

modules are too long and have too many symbols #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikofski opened this issue Aug 11, 2016 · 7 comments
Closed

modules are too long and have too many symbols #235

mikofski opened this issue Aug 11, 2016 · 7 comments
Labels
Milestone

Comments

@mikofski
Copy link
Member

Several modules have become quite long which IMO makes them difficult to use and maintain. Obviously modules can be any size and structure, but I think that long files are difficult to navigate, and modules with too many symbols are error prone due to shadowing and namespace collision.

module number of lines number of symbols
pvlib.pvsystem 2211 23
pvlib.clearsky 489 13
pvlib.atmosphere 472 6
pvlib.forecast 1113 7
pvlib.irradiance 1949 23
pvlib.location 287 1
pvlib.modelchain 722 3
pvlib.solarposition 840 12
pvlib.spa 1300 50
pvlib.tmy 497 8
pvlib.tools 259 14
pvlib.tracking 523 3

In particular pvsystem, forecast, irradiance and spa are very long and contain a lot of symbols.

I don't know what would be ideal, but IMO I think a maximum of 500 lines and 10 symbols is a good baseline.

The SPA algorithm presents a unique challenge, as this code was ported, so maybe for reference sake, it should stay as it is. However, I think that the very large arrays of coefficients in irradiance and spa take a lot of space. These could easily be moved to HDF5 files using h5py which would have some memory and speed enhancements, but I'll create a related issue for that.

References for large modules and many symbols.

  1. Programmers-SE: Are there any negative side effects of splitting up large modules? [closed]
  2. Wikipedia: God Object
  3. Wikipedia: Anti-Pattern
@wholmgren
Copy link
Member

I agree that it could be better, but I also don't know what would be ideal. Some things I've thought about in the past:

  • Rename irradiance.py to transposition.py. Move things like aoi, aoi_projection to tools.py, move extraradiation to solarposition.
  • A "classes" module that implements Location, PVSystem, SingleAxisTracker. Probably would leave ModelChain where it is.
  • Make the inline documentation less verbose while expanding the narrative-style documentation. Doing this well would take a lot of time.

Further point on documentation: a huge number of those lines are thorough documentation. There are many functions with documentation that's twice as long as the code. Some modules might be 50% documentation.

We created the style guide so that the api-level symbols would at least be uniform, but there are a lot of symbols in the PV world. I'm all for consolidation if you see an opportunity.

@uvchik
Copy link
Contributor

uvchik commented Aug 18, 2016

@mikofski as far as I understand it, the length of a module is just a matter of taste in python. An indicator for a bad structured project is more the length of classes, methods or functions. Restructuring the whole project includes the danger of creating new bugs. I would rather have a look at the biggest "god" classes/functions and split them step by step. What classes/functions do you have in mind?

@mikofski
Copy link
Member Author

Continuing comments from #261 the top level modules can be converted to subpackages which would preserve the current API, but allow more flexibility:

current state:

+-+- pvlib/
  |
  +- irradiance.py  # a big module, how to add even more to this?

becomes:

+-+- pvlib/
  |
  +-+- irradiance/
    |
    +- __init__.py  # import API's from new modules to keep backward compatibility
    |
    +- core.py  # has all of the core methods
    |
    +- decomposition.py  # has all of the decomposition methods
    |
    +- transposition.py  # has all of the transposition methods
    |
    +- ...  # other modules with methods used in irradiance

Then pvlib/irradiance/__init__.py contains the following:

from pvlib.irradiance.core import extraradiation, aoi_projection, aoi, ...
from pvlib.irradance.decomposition import disc, dirint, liujordan, erbs
from pvlib.irradance.transposition import isotropic, haydavies, perez, klucher, reindl, king

# any other symbols used by all irradiance methods

Then break up irradiance into separate modules.

core.py

# code for extraradiation, aoi, etc. that was in irradiance.py

Now when someone wants to add something new to irradiance, they import it into the __init__.py and then either add it to a current module or create a new one.

@wholmgren
Copy link
Member

First, while I don't want to break the API for fun, if it needs to change for the long term health of the library then we should change it. We've broken user code in every 0.x release and I don't intend to stop until we decide on a 1.0. So, don't feel too constrained by the past.

Next, when making the new API docs, it felt pretty natural to break up the irradiance and pvsystem sections into subsections. That's probably a clue that the modules themselves should be chopped up too, though you could certainly take issue with some of the function classifications. I didn't make subsections for atmosphere or clear sky, and I don't think I'd support subpackages for them at this point either. I can imagine a solarposition subpackage that contains spa.py, a new ephemeris.py, and the remaining functionality in core.py. modelchain is long and complicated, but it does something complicated that I don't know how to do otherwise. Some of forecast might end up in iotools.

Concerning implementation, pandas and some other pydata packages use api.py modules instead of __init__.py modules to control their namespaces. I don't know the reasoning behind this other than the "__init__.py should be empty" recommendation, but we should probably look into it.

Finally, we could wait and see if the subpackage approach for io tools actually leads to more people contributing more easily maintained code.

@mikofski
Copy link
Member Author

I like the api.py and empty __init__.py idea, very clever. I think the reasoning may be that in general people seem to dislike "dunder" modules and classes probably because they are hard to pronounce, look weird and in general violate the Zen of Python, namely the first 3 lines:

The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

The nice thing about using api.py is that it is very explicit. I usually put this in __init__.py in a commented section called # API. Unfortunately, a Python package has to have __init__.py because that's how Python recognizes packages, but it (__init__.py) can be empty.

I wrote a post that might be related called recommended Python project layout that has links and some of my observations.

@mikofski
Copy link
Member Author

Anyone in favor of closing this? I think of this more as a long term strategy discussion than a particular bug that needs fixing or a feature to implement. Maybe continue the discussion in google groups or the wiki?

@wholmgren
Copy link
Member

@mikofski thanks for circling back on this. I agree we should close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants