Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow calculation and memory leaking with 0.5.1 #401

Closed
jkfm opened this issue Dec 14, 2017 · 31 comments
Closed

Slow calculation and memory leaking with 0.5.1 #401

jkfm opened this issue Dec 14, 2017 · 31 comments
Milestone

Comments

@jkfm
Copy link
Contributor

jkfm commented Dec 14, 2017

Hi all,

I am using pvlib to calculate many pv arrays and do this by looping over the individual system specs. Since updating on 0.5.1, I noticed much longer calculation times and a steady increase in memory usage.

From a quick glance at the changes from 0.5.0 I couldn't determine an obvious cause for this behavior. Did I miss something?

My parameters for the calculation are
ModelChain(system, location, dc_model='singlediode', ac_model='snlinverter', spectral_model='no_loss', temp_model='sapm', aoi_model='ashrae', solar_position_method='nrel_numba', transposition_model='haydavies', losses_model='pvwatts', clearsky_model='simplified_solis', orientation_strategy=None)

@wholmgren
Copy link
Member

It could be related to fixing ModelChain to actually use solar_position_method, especially since you're specifying nrel_numba. See #377 and #379. I suggest setting it to nrel_numpy and retesting the performance of your code. Also make sure that you're consistently using nrel_numba for every single solar position calculation in your whole script. This is because the spa module needs to be reimported when the method changes, and this takes some time, especially when the module needs to be JIT compiled again. Note that methods such as Location.get_clearsky will use the default solar position method (nrel_numpy) unless explicitly told otherwise.

Is pvlib the only thing that's changed in your environment?

@jkfm
Copy link
Contributor Author

jkfm commented Dec 14, 2017

Thanks for your suggestions! I will look into them.

Updating pvlib was not the only change that occurred, but after reverting to 0.5.0 the behavior was normal again.

@markcampanelli
Copy link
Contributor

Looks like you are using the single diode model, so it also could be my changes to support ideal devices.

@wholmgren
Copy link
Member

@jkfm did you get a chance to look into what might have caused the pvlib performance problem?

@jkfm
Copy link
Contributor Author

jkfm commented Jan 31, 2018

Not yet :-( But I will report back.

@cwhanse
Copy link
Member

cwhanse commented Jun 11, 2018

@jkfm any reports back from your investigation? We'd like to fix the issue you experienced, if we can narrow down the cause.

@jkfm
Copy link
Contributor Author

jkfm commented Jun 19, 2018

Hey all, sorry for the delay. I am now looking into the problem.

I wrote a script which replicates my actual use case but is much simpler: https://gist.github.com/jkfm/6c5668da8f28403bf37268cda4afeb05

The code for following memory usage is from here.

With 30 iterations (my RAM can't handle more ;-) ), the code has the following stats:
0.5.0:
duration 4s,
memory usage:
image

0.5.1:
duration: 1min 47s
memory usage:
image

I realise, that the memory usage is not really representative as such, but I guess the trend is clear.

@jkfm
Copy link
Contributor Author

jkfm commented Jun 19, 2018

As the script with 0.5.1 also throws different warnings I suspected the reason for the lag to be in v_from_i and i_from_v in pvsystem.py

I manually changed only those two functions back to their state from 0.5.0 et voilà:

Duration: 4.8s
Memory usage:
image

So I guess the error has to do with changes in v_from_i and i_from_v

@wholmgren
I also checked your suggestion regarding solar_position_method. There is no significant difference between nrel_numba and nrel_numpy.

Some more information regarding the warnings:
With the old code, I get these warnings once during the first iteration:
/pvlib/pvsystem.py:1958: RuntimeWarning: invalid value encountered in true_divide I = -V/(Rs + Rsh) - (nNsVth/Rs)*lambertwterm + Rsh*(IL + I0)/(Rs + Rsh)
/pvlib/pvsystem.py:1869: RuntimeWarning: overflow encountered in exp argW = I0 * Rsh / nNsVth * np.exp(Rsh * (-I + IL + I0) / nNsVth)

With the new code, I get this warning for each iteration:
/pvlib/pvsystem.py:1919: RuntimeWarning: overflow encountered in exp (Gsh[idx_p]*a[idx_p]))

@cwhanse
Copy link
Member

cwhanse commented Jun 19, 2018

Confirmed. I get different numbers for mem_usage using your script, and not as dramatic as your results for v0.5.1, but the trend is the same: since v0.5.1 we have a memory leak.

@wholmgren
Copy link
Member

Thanks @jkfm for this very detailed report!

@wholmgren wholmgren added this to the 0.6.1 milestone Sep 12, 2018
@wholmgren
Copy link
Member

good candidate for #530

@cwhanse
Copy link
Member

cwhanse commented Nov 6, 2018

Looked into this a bit more with tracemalloc, the memory leak appears to depend on using numba with spa, because the memory usage is stable with solar_position_method = nrel_numpy. That is with pvlib v0.6.0.

@wholmgren
Copy link
Member

wholmgren commented Nov 6, 2018 via email

@cwhanse
Copy link
Member

cwhanse commented Nov 6, 2018

Looking at the v0.5.1 release notes, maybe the issue was obscured in earlier releases and exposed by #482

@wholmgren
Copy link
Member

@cwhanse can you reproduce using only solarposition.spa_python(how='numba')?

@cwhanse
Copy link
Member

cwhanse commented Nov 6, 2018

Doesn't leak when using solarposition.spa_python(how='numba') directly.
Doesn't leak when using Location.get_solarposition(how='numba').
Leaks when using ModelChain.location.get_solarposition(how='numba').

@cwhanse
Copy link
Member

cwhanse commented Nov 6, 2018

@wholmgren
Copy link
Member

wholmgren commented Nov 6, 2018

That's a helpful report + gist.

@jkfm previously said

I also checked your suggestion regarding solar_position_method. There is no significant difference between nrel_numba and nrel_numpy.

@cwhanse what does your modelchain-based script report when changing the method?

@wholmgren
Copy link
Member

Looking for other differences between the 3 scripts... I'll speculate that result['output'] += mc.ac could cause python to hold on to the entire ModelChain object. In my experience gc.collect rarely helps.

@cwhanse
Copy link
Member

cwhanse commented Nov 7, 2018

@wholmgren when the model chain script has method=numpy there is no memory leak.

@cwhanse
Copy link
Member

cwhanse commented Nov 7, 2018

Still leaks with result['output'] += mc.ac commented out.

@wholmgren
Copy link
Member

I'm almost out of ideas.

Leaks when using ModelChain.location.get_solarposition(how='numba').

Does it matter if it's called directly or as part of the run_model method? In other words, what happens if you simplify the mc_loop to

def mc_loop(times):
        system = get_system()
        location = get_location()
        mc = get_mc(system, location)
    
        mc.solar_position = mc.location.get_solarposition(times)

@cwhanse
Copy link
Member

cwhanse commented Nov 8, 2018

I think I know why this memory leak happens, and I think there's an issue for us but I haven't quite got the issue nailed down. In @jkfm script, mc.solar_position is computed twice each loop, once explicitly using the Location.get_solarposition method, and again inside ModelChain.run_model at the prepare_inputs step:

for i in range(rg):
    system = get_system()
    location = get_location()
    mc = get_mc(system, location)
    mc.solar_position = mc.location.get_solarposition(times)
    weather = get_weather(mc, times)
    mc.run_model(times=times, weather=weather)

The memory leak only happens when both calculations are in the loop: comment either one out, and no leak.

The leak appears to be caused by cycling between nrel_numpy and nrel_numba values for the solar_position_method kwarg.

When mc is created, solar_position_method is set to nrel_numba. However, that attribute is NOT used by the attached Location object's method

mc.solar_position = mc.location.get_solarposition(times)

As a consequence, the first time solar position is calculated, nrel_numpy (the default) is used. The next time is via this line in ModelChain.prepare_inputs:

       self.solar_position = self.location.get_solarposition(
            self.times, method=self.solar_position_method)

which passes the value self.solar_position_method='nrel_numba', assigned when the ModelChain instance is created.

The issue for us, is that ModelChain.location brings a method get_solarposition that does not inspect ModelChain.solar_position_method and hence if called, defaults to nrel_numpy. How to fix this, or whether to fix it, is where I'm unclear.

@wholmgren
Copy link
Member

wholmgren commented Nov 8, 2018

Ah, that is very interesting. pvlib tells python to reload the spa module when the method is changed. This is required due to how the numba jit function works.

An easy thing would be to emit a user warning each time the spa module is reloaded.

Also I have no idea why this leads to a memory leak, but I'm not shocked that it does.

@cwhanse
Copy link
Member

cwhanse commented Nov 8, 2018

Cause confirmed. Adding the method argument

mc.solar_position = mc.location.get_solarposition(times, method=mc.solar_position_method)

stops the leak.

@cwhanse
Copy link
Member

cwhanse commented Nov 8, 2018

Maybe a ModelChain.get_solarposition method would prevent this mishap, by simply wrapping ModelChain.location.get_solarposition.

I'm guessing @jkfm has the solar position computed explicitly because it's not obvious that prepare_inputs will do this for you. Maybe also we could inspect ModelChain and only compute solar position if it's not already there:

if not self.solarposition:
 ....

@wholmgren
Copy link
Member

It also takes a few seconds for python to reload the spa module when jitting it. Users should be discouraged from doing this. I don't think this is a ModelChain issue.

I once ran into a similar problem in some code in which I ran solarposition.get_solarposition and then later ModelChain.run_model. In my case I only noticed an unexplained slowness, not a memory leak, but it's possible there was a memory leak problem too. My point is only that there are multiple ways for users to accidentally switch between numba and non-numba calls, and we should try to prevent that from happening at a lower level.

@cwhanse
Copy link
Member

cwhanse commented Nov 9, 2018

Yes, the performance hit for numpy/numba swapping is noticeable. Unless you object I'll prepare a PR with:

  • a ModelChain.get_solarposition method, that wraps the Location method. That way we can propagate the solar_position_method attribute. It can be bypassed, but at least we've made it convenient not to do so.

  • add a warning when spa is reloaded.

I thought better of an if/then around that statement in prepare_inputs, there's no guarantee that solarposition is consistent with times if solarposition is calculated elsewhere.

@cwhanse
Copy link
Member

cwhanse commented Nov 9, 2018

I once ran into a similar problem in some code in which I ran solarposition.get_solarposition and then later ModelChain.run_model. In my case I only noticed an unexplained slowness, not a memory leak

That's basically what @jkfm was doing.

@wholmgren
Copy link
Member

@cwhanse thanks for volunteering to make a PR.

add a warning when spa is reloaded.

+1. Warning should probably go here and here.

a ModelChain.get_solarposition method, that wraps the Location method. That way we can propagate the solar_position_method attribute. It can be bypassed, but at least we've made it convenient not to do so.

I moderately object to this because the reload problem would already be covered by the new warning and ModelChain is already too complicated. I would be in favor of it if we could list a handful of compelling use cases (other than this numba issue). That might be a separate issue.

I thought better of an if/then around that statement in prepare_inputs, there's no guarantee that solarposition is consistent with times if solarposition is calculated elsewhere.

I agree that this could easily lead to trouble and should be avoided.

@cwhanse
Copy link
Member

cwhanse commented Nov 9, 2018

I agree with ModelChain being busy, but I don't see it as complicated. I'm in favor of taking the solar position calculation out of prepare_inputs as its own method, if an issue like this arises again.

If there was an easy way to access the solar_position_method attribute from mc.location instance (where mc is an instance of ModelChain), we could smooth out this wrinkle. But I don't see how to do it without changing ModelChain.location = location to use inheritance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants