-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Price-taker framework for DISPATCHES #1201
Conversation
@lbianchi-lbl I have two questions: 1.) Are you aware of why the Check Spelling test is failing in some PRs (like this one) but not others? 2.) We'll probably need to add some dependencies for this PR (namely scikit-learn). While this PR is still a WIP and probably won't be merged for a while, we were curious if you had any thoughts on the matter or if there are any potential complications with doing so. |
Thanks for bringing this to my attention. This seems to be due to a change in version of the spellchecking tool and I've created #1203 to track it.
Starting with #1133, optional dependencies are being managed within |
@lbianchi-lbl it seems the spell checker is identifying FOM as a typo, but it is a relevant acronym. How do we sidestep such issues, other than getting rid of the acronym? |
Is it fine to just add a few more entries to typos.toml as was done in #1204? |
@adam-a-a @MarcusHolly that's correct. You should be able to add an entry for FOM in |
@adam-a-a this doesn't look like it will make the final DISPATCHES release. Is that ok? |
…-pse into adam-a-a-price-taker-model
@radhakrishnatg does this have already a planned completion date? Otherwise, I'll leave it off the August release board while it's still in draft. |
while j <= len(raw_data): | ||
daily_data[day] = raw_data[i:j].reset_index(drop=True) | ||
i = j | ||
j = j + 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use horizon_length
instead of hardcoding 24. If the data vector has excess elements either truncate the data or throw an exception asking the user to check the length of the data vector.
return daily_data | ||
|
||
@staticmethod | ||
def get_optimal_n_clusters(daily_data, kmin=None, kmax=None, sample_weight=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine get_optimal_n_clusters
and get_elbow_plot
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an argument for seed
and save it as a hidden argument. Ensure that all methods which use Kmeans clustering code use the same seed value. Define seed
and horizon_length
as properties. You can access them getters and setters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update kmin and kmax to 4 and 30. Add a warning if the optimal number of clusters is close to kmax. Re-run the code with a higher kmax value in that case.
@ksbeattie - Actually, @radhakrishnatg coordinated with us and asked us (including @MarcusHolly and Tyler Jaffe, who should've passed the quiz and needs to be added to the repo) to deploy this. I would think we would want this in the final release, but I don't know what that release date is and have only recently gotten involved with DISPATCHES. I suppose this would be a better question for @radhakrishnatg. |
This will get merged into the Spring 2024 IDAES release. |
@ksbeattie @adam-a-a It's okay if this PR does not make it to the final DISPATCHES release. @lbianchi-lbl We do not have a planned completion date yet. For now, let's not include this PR in the August board. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1201 +/- ##
==========================================
- Coverage 76.83% 76.67% -0.16%
==========================================
Files 390 392 +2
Lines 61852 62089 +237
Branches 11386 11429 +43
==========================================
+ Hits 47523 47609 +86
- Misses 11867 12013 +146
- Partials 2462 2467 +5
☔ View full report in Codecov by Sentry. |
raw_data=raw_data[i], day_list=day_list | ||
) | ||
|
||
return daily_data, scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a better way to call scenarios
in cluster_lmp_data
without having to return it here, but if this is fine as is, we can keep it. It just makes the code a bit clunkier imo.
|
||
return daily_data, scenarios | ||
|
||
def get_optimal_n_clusters(self, daily_data, kmin=None, kmax=None, plot=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adam mentioned that we may want to reconsider having a separate function for plotting the elbow plot rather than just an argument in get_optimal_n_clusters
. I don't have a strong opinion either way.
|
||
Returns: | ||
lmp_data = {1: {1: 2, 2: 3, 3: 5}, 2: {1: 2, 2: 3, 3: 5}} | ||
weights = {1: 45, 2: 56} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the output for weights is in the format {0: {0: 45 , 1: 56, ...). I assume we want to get rid of the 0 in front so that it matches the format Radha initially put in the doc description, but I'm not sure how to go about doing that. I think it's just the header for the column similar to how the output for lmp_data is in the format: {0: {1: 2, 2: 3, 3: 5}, 1: {1: 2, 2: 3, 3: 5}}, where 0 is the header of the first column representing the first cluster.
On that last note, is it okay if lmp_data
and weights
start at 0 rather than 1?
|
||
assert f"kmax was not set - using a default value of 30." in caplog.text | ||
|
||
# TODO: The below test is not working because our data doesn't ever seem to arrive at an n_clusters close to kmax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get this test working for the reason stated in the TO-DO note. I've played around with it a bit and couldn't get it working. It might be the case that we just need to generate a new dataset that is capable of arriving at an n_clusters close to kmax such that we can test this logger warning. Or maybe we should just leave this warning untested for now...
[pytest] | ||
addopts = --pyargs idaes | ||
--durations=100 | ||
addopts = --durations=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to push this change? I think the test file will fail without it since the Excel could not be imported.
@adam-a-a is this still something that will be done (given that DISPATCHES is done)? |
Fixes
Summary/Motivation:
Changes proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: