Skip to content
This repository has been archived by the owner on May 1, 2024. It is now read-only.

Add Affiliate Window pulls to finance reporting task #735

Merged
merged 4 commits into from
Jun 19, 2019

Conversation

bmedx
Copy link
Contributor

@bmedx bmedx commented Jun 14, 2019

Analytics Pipeline Pull Request

Make sure that the following steps are done before merging:

  • If you have a migration please contact data engineering team before merging.
  • Before merging run full acceptance tests suite and provide URL for the acceptance tests run.
  • A member of data engineering team has approved the pull request.

@bmedx bmedx force-pushed the bmedx/add_affiliate_window branch from ebae712 to 52783b3 Compare June 14, 2019 17:59
@codecov
Copy link

codecov bot commented Jun 14, 2019

Codecov Report

Merging #735 into master will decrease coverage by 0.12%.
The diff coverage is 51.2%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #735      +/-   ##
==========================================
- Coverage   75.08%   74.95%   -0.13%     
==========================================
  Files         204      206       +2     
  Lines       22985    23110     +125     
==========================================
+ Hits        17258    17322      +64     
- Misses       5727     5788      +61
Impacted Files Coverage Δ
...ytics/tasks/warehouse/financial/finance_reports.py 91.66% <100%> (+0.75%) ⬆️
...tics/tasks/warehouse/financial/affiliate_window.py 45.54% <45.54%> (ø)
edx/analytics/tasks/warehouse/financial/fees.py 73.91% <73.91%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c535eef...4fed889. Read the comment docs.

@codecov-io
Copy link

codecov-io commented Jun 17, 2019

Codecov Report

Merging #735 into master will decrease coverage by 0.13%.
The diff coverage is 49.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #735      +/-   ##
==========================================
- Coverage   74.76%   74.62%   -0.14%     
==========================================
  Files         205      207       +2     
  Lines       23206    23330     +124     
==========================================
+ Hits        17349    17410      +61     
- Misses       5857     5920      +63
Impacted Files Coverage Δ
...tics/tasks/warehouse/financial/affiliate_window.py 44% <44%> (ø)
edx/analytics/tasks/warehouse/financial/fees.py 70.83% <70.83%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 769cb2e...2ba96e5. Read the comment docs.

@bmedx bmedx force-pushed the bmedx/add_affiliate_window branch 3 times, most recently from 4cb6ff4 to 45ad489 Compare June 18, 2019 18:24
@bmedx bmedx force-pushed the bmedx/add_affiliate_window branch from 45ad489 to 384eb84 Compare June 18, 2019 18:34

class LoadFeesToWarehouse(WarehouseMixin, VerticaCopyTask):
"""
Loads Affiliate Window table from Hive into the Vertica data warehouse.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the class name suggests a general purpose, but the docstring treats Affiliate Window as the main purpose. Is the class correctly named?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the comment is just bad. I envisioned this being one place to load all of those random fees we're currently getting from pasted spreadsheets, as we become able to operationalise them. I'll fix the comment.

Copy link
Contributor

@brianhw brianhw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits -- looks great! 👍

@@ -6,6 +6,7 @@
from edx.analytics.tasks.warehouse.financial.ed_services_financial_report import (
LoadInternalReportingEdServicesReportToWarehouse
)
from edx.analytics.tasks.warehouse.financial.fees import LoadFeesToWarehouse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is unused.

# If this is the first run, start from the beginning
print("Couldn't find last completed date, defaulting to start date: {}".format(self.interval_start))

self.run_interval = date_interval.Custom(self.interval_start, self.interval_end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that this is a closed interval, since interval_end is defined as today's date minus one day. Maybe just note that in the description of interval_end, because most of our intervals defined elsewhere are open on the end. That is, we define the interval as ending with today's date, so that it processes events (mostly events) up to midnight as of today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this a bit as I was trying to figure out the Affiliate Window approval process, I think I can adjust all of this to just be open on the end so I'll make that change wherever necessary in this PR.

"""
run_date = luigi.DateParameter(
default=datetime.date.today(),
description='Default is today.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should default be yesterday?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other open end changes should make this ok.


Pulls are made for only a single day.
"""
run_date = luigi.DateParameter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not actually the run_date, that is, the date the pull was run. It's the date being pulled. Would "pull_date" or "query_date" be better?

print("Fetching report from Affiliate Window for {}.".format(self.run_date))
transactions = self.fetch_report()

# if there are no transactions in response something is wrong.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious -- is it possible that there are days with nothing to pull? Actually, if there were no transactions, would this actually be a json structure that is empty, but for which "if not transactions" would somehow be false? Or would response.json() really be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no days with empty transactions going back to 2019-01-01 so I'm assuming that doesn't happen. The if not should catch any case where there is valid JSON that is an empty string, or empty list. If it's not valid JSON it will error getting it, and if it's something other than a list of dicts it should fail downstream.

If there really are no transactions the API returns an empty list, so that should mean we passed in an invalid date, or our advertiser id changed or some other weird situation like that.


# Forcing this to overwrite now, as this script will duplicate data
# without it. This does not trickle down to the Affiliate Window calls.
# TODO: Make this optional once VerticaCopyTask handles multiple input files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I understand that overwrite=True here means that the Vertica table is being overwritten each time this runs. I don't understand what it means for VerticaCopyTask to handle multiple input files. (And maybe I don't have to. : )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's entirely possible that I have a misapprehension about how it works, I've made a note to talk to you about this when I get back and either clear it up or make a ticket to make things work the way I think they should.

@bmedx bmedx merged commit 91d5aee into master Jun 19, 2019
@bmedx bmedx deleted the bmedx/add_affiliate_window branch June 19, 2019 00:54
doctoryes pushed a commit that referenced this pull request Jun 19, 2019
Add Affiliate Window imports to data warehouse (#735)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants