Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added sphinx documentation for data shifts routine. #131

Merged
merged 11 commits into from
Aug 2, 2022
1 change: 1 addition & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ python:
path: .
extra_requirements:
- doc
- optional

52 changes: 52 additions & 0 deletions docs/examples/data-shifts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""
Data Shift Detection & Filtering
================================

Identifying data shifts/capacity changes in time series data
"""

# %%
# This example covers identifying data shifts/capacity changes in a time series
# and extracting the longest time series segment free of these shifts, using
# :py:func:`pvanalytics.quality.data_shifts.detect_data_shifts` and
# :py:func:`pvanalytics.quality.data_shifts.get_longest_shift_segment_dates`.

import pvanalytics
import pandas as pd
import matplotlib.pyplot as plt
from pvanalytics.quality import data_shifts as dt
kperrynrel marked this conversation as resolved.
Show resolved Hide resolved
import pathlib

# %%
# As an example, we load in a simulated PVLib AC power time series with a
# single changepoint, occurring on October 28, 2015.

pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
data_shift_file = pvanalytics_dir / 'data' / 'pvlib_data_shift.csv'
df = pd.read_csv(data_shift_file)
df.index = pd.to_datetime(df['timestamp'])
df['value'].plot()
print("Changepoint at: " + str(df[df['label'] == 1].index[0]))

# %%
# Now we run the data shift algorithm (with default parameters)
# on the data stream, using
# :py:func:`pvanalytics.quality.data_shifts.detect_data_shifts`. We re-plot
# the time series, with a vertical line where the detected changepoint is.

shift_mask = dt.detect_data_shifts(df['value'])
shift_list = list(df[shift_mask].index)
df['value'].plot()
for cpd in shift_list:
plt.axvline(cpd, color="green")
plt.show()
kandersolar marked this conversation as resolved.
Show resolved Hide resolved

# %%
# We filter the time series by the detected changepoints, taking the longest
# continuous segment free of data shifts, using
# :py:func:`pvanalytics.quality.data_shifts.get_longest_shift_segment_dates`.
# The filtered time series is then plotted.
kperrynrel marked this conversation as resolved.
Show resolved Hide resolved

start_date, end_date = dt.get_longest_shift_segment_dates(df['value'])
df['value'][start_date:end_date].plot()
plt.show()
kandersolar marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions docs/whatsnew/0.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Documentation

* Started an example gallery and added an example for
:py:func:`pvanalytics.features.clearsky.reno` (:issue:`125`, :pull:`127`)
* Added an example for
:py:func:`pvanalytics.quality.data_shifts` routine (:pull:`131`)
kperrynrel marked this conversation as resolved.
Show resolved Hide resolved

Contributors
~~~~~~~~~~~~
Expand Down