-
Notifications
You must be signed in to change notification settings - Fork 116
Add Affiliate Window pulls to finance reporting task #735
Conversation
ebae712
to
52783b3
Compare
Codecov Report
@@ Coverage Diff @@
## master #735 +/- ##
==========================================
- Coverage 75.08% 74.95% -0.13%
==========================================
Files 204 206 +2
Lines 22985 23110 +125
==========================================
+ Hits 17258 17322 +64
- Misses 5727 5788 +61
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #735 +/- ##
==========================================
- Coverage 74.76% 74.62% -0.14%
==========================================
Files 205 207 +2
Lines 23206 23330 +124
==========================================
+ Hits 17349 17410 +61
- Misses 5857 5920 +63
Continue to review full report at Codecov.
|
4cb6ff4
to
45ad489
Compare
45ad489
to
384eb84
Compare
|
||
class LoadFeesToWarehouse(WarehouseMixin, VerticaCopyTask): | ||
""" | ||
Loads Affiliate Window table from Hive into the Vertica data warehouse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the class name suggests a general purpose, but the docstring treats Affiliate Window as the main purpose. Is the class correctly named?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the comment is just bad. I envisioned this being one place to load all of those random fees we're currently getting from pasted spreadsheets, as we become able to operationalise them. I'll fix the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nits -- looks great! 👍
@@ -6,6 +6,7 @@ | |||
from edx.analytics.tasks.warehouse.financial.ed_services_financial_report import ( | |||
LoadInternalReportingEdServicesReportToWarehouse | |||
) | |||
from edx.analytics.tasks.warehouse.financial.fees import LoadFeesToWarehouse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is unused.
# If this is the first run, start from the beginning | ||
print("Couldn't find last completed date, defaulting to start date: {}".format(self.interval_start)) | ||
|
||
self.run_interval = date_interval.Custom(self.interval_start, self.interval_end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that this is a closed interval, since interval_end is defined as today's date minus one day. Maybe just note that in the description of interval_end, because most of our intervals defined elsewhere are open on the end. That is, we define the interval as ending with today's date, so that it processes events (mostly events) up to midnight as of today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this a bit as I was trying to figure out the Affiliate Window approval process, I think I can adjust all of this to just be open on the end so I'll make that change wherever necessary in this PR.
""" | ||
run_date = luigi.DateParameter( | ||
default=datetime.date.today(), | ||
description='Default is today.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should default be yesterday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other open end changes should make this ok.
|
||
Pulls are made for only a single day. | ||
""" | ||
run_date = luigi.DateParameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's not actually the run_date, that is, the date the pull was run. It's the date being pulled. Would "pull_date" or "query_date" be better?
print("Fetching report from Affiliate Window for {}.".format(self.run_date)) | ||
transactions = self.fetch_report() | ||
|
||
# if there are no transactions in response something is wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious -- is it possible that there are days with nothing to pull? Actually, if there were no transactions, would this actually be a json structure that is empty, but for which "if not transactions" would somehow be false? Or would response.json() really be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were no days with empty transactions going back to 2019-01-01 so I'm assuming that doesn't happen. The if not
should catch any case where there is valid JSON that is an empty string, or empty list. If it's not valid JSON it will error getting it, and if it's something other than a list of dicts it should fail downstream.
If there really are no transactions the API returns an empty list, so that should mean we passed in an invalid date, or our advertiser id changed or some other weird situation like that.
|
||
# Forcing this to overwrite now, as this script will duplicate data | ||
# without it. This does not trickle down to the Affiliate Window calls. | ||
# TODO: Make this optional once VerticaCopyTask handles multiple input files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I understand that overwrite=True here means that the Vertica table is being overwritten each time this runs. I don't understand what it means for VerticaCopyTask to handle multiple input files. (And maybe I don't have to. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's entirely possible that I have a misapprehension about how it works, I've made a note to talk to you about this when I get back and either clear it up or make a ticket to make things work the way I think they should.
Add Affiliate Window imports to data warehouse (#735)
Analytics Pipeline Pull Request
Make sure that the following steps are done before merging: