Skip to content

Detect clearsky modifications #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

benbenboben
Copy link

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):

The fifth condition (c5) of the clearsky.detect_clearsky function wasn't calculated as defined in the original reference (equations 12 - 14). This pull request also addresses part of #507 where there was an implicit assumption of minutely data. All derivatives are now calculated by explicitly dividing by the sample_interval. This does not address non-uniform time intervals.

  • Closes issue Correct condition 5 in clearsky.detect_clearsky #506 (addresses part of Handle different time intervals in clearsky.detect_clearsky #507)
  • I am familiar with the contributing guidelines.
  • Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
  • Code quality and style is sufficient. Passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
  • Pull request is nearly complete and ready for detailed review.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel strongly (perhaps to the point of insisting) that this function provide a warning if passed data with timesteps different enough from 1 minute (say outside of 30s to 2 min) and the default thresholds values are used. The validation of this algorithm only used ~1 min timesteps, and the default threshold values derive from that validation.

@@ -687,7 +687,7 @@ def detect_clearsky(measured, clearsky, times, window_length,
raise NotImplementedError('algorithm does not yet support unequal ' \
'times. consider resampling your data.')

samples_per_window = int(window_length / sample_interval)
samples_per_window = int(window_length / sample_interval) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change please. samples_per_window is counting intervals, not endpoints. Because it's an intermediate it could be renamed to be more clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that not adding 1 gives the incorrect number of samples per window. For example, if my data is 30-minute frequency (sample_interval=30) and I want 60 minute windows (window_length=60), the current implementation would only give 2 points per window (when the Hankel matrix is constructed in the following lines). In this case, 2 points per window would only span a 30 minute window, not the intended 60.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example makes my point. The algorithm operates on intervals not on points in time. The value at a timestamp is considered as the value for the following (?) interval - I'd have to look carefully at the Hankel matrix and the diff to see if we adopted a left- or right- endpoint convention.

@@ -697,25 +697,27 @@ def detect_clearsky(measured, clearsky, times, window_length,
# calculate measurement statistics
meas_mean = np.mean(measured[H], axis=0)
meas_max = np.max(measured[H], axis=0)
meas_slope = np.diff(measured[H], n=1, axis=0)
meas_ghi_diff = np.diff(measured[H], n=1, axis=0)
meas_slope = np.diff(measured[H], n=1, axis=0) / sample_interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-uniform time steps could be handled here by

  • converting the time to UNIX timestamps
  • diffing the time to get time intervals
  • element-wise division in the calculation of meas_slope
  • using the diffed time in the calculation of meas_line_length , 'clear_slope`, etc.

This could be the subject of a subsequent pull request. In hindsight, I could have separated #507 into two issues (different from 1 minute, and non-uniform time steps.)

@wholmgren
Copy link
Member

completed in #596

@wholmgren wholmgren closed this Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants