-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rvdss interface and new fn layout so current/historical data can be easily fetched #1551
Merged
Merged
Changes from 6 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
bae693b
add basic sql tables -- needs update with real col names
nmdefries a44ad10
rename files
nmdefries 47e9836
add main fn with CLI; remove date range params in package frontend fn…
nmdefries 3492573
start filling out historical fn stubs
nmdefries f68e335
rest of new fn layout. adds CLI
nmdefries e9759cc
Merge branch 'add_rvdss_indicator' into ndefries/rvdss-framework
nmdefries f7b40da
dashboard results can be stored directly in list in fetch_historical_…
nmdefries c1a70a2
Add in archived dashboards, and calculate start year from data
cchuong f9b9842
address todos and fix historical fetching
cchuong 2251198
Change misspelled CB to BC
cchuong e505076
Update imports
cchuong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
""" | ||
=============== | ||
=== Purpose === | ||
=============== | ||
|
||
Stores data provided by rvdss Corp., which contains flu lab test results. | ||
See: rvdss.py | ||
|
||
|
||
======================= | ||
=== Data Dictionary === | ||
======================= | ||
|
||
`rvdss` is the table where rvdss data is stored. | ||
+----------+-------------+------+-----+---------+----------------+ | ||
| Field | Type | Null | Key | Default | Extra | | ||
+----------+-------------+------+-----+---------+----------------+ | ||
| id | int(11) | NO | PRI | NULL | auto_increment | | ||
| location | varchar(8) | NO | MUL | NULL | | | ||
| epiweek | int(11) | NO | MUL | NULL | | | ||
| value | float | NO | | NULL | | | ||
+----------+-------------+------+-----+---------+----------------+ | ||
id: unique identifier for each record | ||
location: hhs1-10 | ||
epiweek: the epiweek during which the queries were executed | ||
value: number of total test records per facility, within each epiweek | ||
|
||
================= | ||
=== Changelog === | ||
================= | ||
2017-12-14: | ||
* add "need update" check | ||
|
||
2017-12-02: | ||
* original version | ||
""" | ||
|
||
# standard library | ||
import argparse | ||
|
||
# third party | ||
import mysql.connector | ||
|
||
# first party | ||
from delphi.epidata.acquisition.rvdss import rvdss | ||
import delphi.operations.secrets as secrets | ||
from delphi.utils.epidate import EpiDate | ||
import delphi.utils.epiweek as flu | ||
from delphi.utils.geo.locations import Locations | ||
|
||
LOCATIONS = Locations.hhs_list | ||
DATAPATH = "/home/automation/rvdss_data" | ||
|
||
|
||
def update(locations, first=None, last=None, force_update=False, load_email=True): | ||
# download and prepare data first | ||
qd = rvdss.rvdssData(DATAPATH, load_email) | ||
if not qd.need_update and not force_update: | ||
print("Data not updated, nothing needs change.") | ||
return | ||
|
||
qd_data = qd.load_csv() | ||
qd_measurements = qd.prepare_measurements(qd_data, start_weekday=4) | ||
qd_ts = rvdss.measurement_to_ts(qd_measurements, 7, startweek=first, endweek=last) | ||
# connect to the database | ||
u, p = secrets.db.epi | ||
cnx = mysql.connector.connect(user=u, password=p, database="epidata") | ||
cur = cnx.cursor() | ||
|
||
def get_num_rows(): | ||
cur.execute("SELECT count(1) `num` FROM `rvdss`") | ||
for (num,) in cur: | ||
pass | ||
return num | ||
|
||
# check from 4 weeks preceeding the last week with data through this week | ||
cur.execute("SELECT max(`epiweek`) `ew0`, yearweek(now(), 6) `ew1` FROM `rvdss`") | ||
for (ew0, ew1) in cur: | ||
ew0 = 200401 if ew0 is None else flu.add_epiweeks(ew0, -4) | ||
ew0 = ew0 if first is None else first | ||
ew1 = ew1 if last is None else last | ||
print(f"Checking epiweeks between {int(ew0)} and {int(ew1)}...") | ||
|
||
# keep track of how many rows were added | ||
rows_before = get_num_rows() | ||
|
||
# check rvdss for new and/or revised data | ||
sql = """ | ||
INSERT INTO | ||
`rvdss` (`location`, `epiweek`, `value`) | ||
VALUES | ||
(%s, %s, %s) | ||
ON DUPLICATE KEY UPDATE | ||
`value` = %s | ||
""" | ||
|
||
total_rows = 0 | ||
|
||
for location in locations: | ||
if location not in qd_ts: | ||
continue | ||
ews = sorted(qd_ts[location].keys()) | ||
num_missing = 0 | ||
for ew in ews: | ||
v = qd_ts[location][ew] | ||
sql_data = (location, ew, v, v) | ||
cur.execute(sql, sql_data) | ||
total_rows += 1 | ||
if v == 0: | ||
num_missing += 1 | ||
if num_missing > 0: | ||
print(f" [{location}] missing {int(num_missing)}/{len(ews)} value(s)") | ||
|
||
# keep track of how many rows were added | ||
rows_after = get_num_rows() | ||
print(f"Inserted {int(rows_after - rows_before)}/{int(total_rows)} row(s)") | ||
|
||
# cleanup | ||
cur.close() | ||
cnx.commit() | ||
cnx.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
""" | ||
Defines command line interface for the rvdss indicator. Current data (covering the most recent epiweek) and historical data (covering all data before the most recent epiweek) can be generated together or separately. | ||
|
||
Defines top-level functions to fetch data and save to disk or DB. | ||
""" | ||
|
||
import pandas as pd | ||
import os | ||
|
||
from delphi.epidata.acquisition.rvdss.utils import get_weekly_data, get_revised_data, get_dashboard_update_date | ||
from delphi.epidata.acquisition.rvdss.constants import DASHBOARD_BASE_URL, RESP_DETECTIONS_OUTPUT_FILE, POSITIVE_TESTS_OUTPUT_FILE, COUNTS_OUTPUT_FILE | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue: check imports |
||
|
||
|
||
def update_current_data(): | ||
## TODO: what is the base path for these files? | ||
base_path = "." | ||
|
||
data_dict = fetch_dashboard_data(DASHBOARD_BASE_URL, 2024) | ||
|
||
table_types = { | ||
"respiratory_detection": RESP_DETECTIONS_OUTPUT_FILE, | ||
"positive": POSITIVE_TESTS_OUTPUT_FILE, | ||
# "count": COUNTS_OUTPUT_FILE, # Dashboards don't contain this data. | ||
} | ||
for tt in table_types.keys(): | ||
data = data_dict[table_types] | ||
|
||
# Write the tables to separate csvs | ||
path = base_path + "/" + table_types[tt] | ||
|
||
# Since this function generates new data weekly, we need to combine it with the existing data, if it exists. | ||
if not os.path.exists(path): | ||
data.to_csv(path,index=True) | ||
else: | ||
old_data = pd.read_csv(path).set_index(['epiweek', 'time_value', 'issue', 'geo_type', 'geo_value']) | ||
|
||
# If index already exists in the data on disk, don't add the new data -- we may have already run the weekly data fetch. | ||
## TODO: The check on index maybe should be stricter? Although we do deduplication upstream, so this probably won't find true duplicates | ||
if not data.index.isin(old_data.index).any(): | ||
old_data= pd.concat([old_data,data],axis=0) | ||
old_data.to_csv(path,index=True) | ||
|
||
# ## TODO | ||
# update_database(data) | ||
|
||
|
||
def update_historical_data(): | ||
## TODO: what is the base path for these files? | ||
base_path = "." | ||
|
||
report_dict_list = fetch_report_data() | ||
dashboard_dict_list = fetch_historical_dashboard_data() | ||
|
||
table_types = { | ||
"respiratory_detection": RESP_DETECTIONS_OUTPUT_FILE, | ||
"positive": POSITIVE_TESTS_OUTPUT_FILE, | ||
"count": COUNTS_OUTPUT_FILE, | ||
} | ||
for tt in table_types.keys(): | ||
# Merge tables together from dashboards and reports for each table type. | ||
dashboard_data = [elem.get(tt, None) for elem in dashboard_dict_list] | ||
report_data = [elem.get(tt, None) for elem in report_dict_list] | ||
data = [report_data, dashboard_data].concat() | ||
|
||
# Write the tables to separate csvs | ||
data.to_csv(base_path +"/" + table_types[tt], index=True) | ||
|
||
# ## TODO | ||
# update_database(data) | ||
|
||
|
||
def main(): | ||
# args and usage | ||
parser = argparse.ArgumentParser() | ||
# fmt: off | ||
parser.add_argument( | ||
"--current", | ||
"-c", | ||
action="store_true", | ||
help="fetch current data, that is, data for the latest epiweek" | ||
) | ||
parser.add_argument( | ||
"--historical", | ||
"-h", | ||
action="store_true", | ||
help="fetch historical data, that is, data for all available time periods other than the latest epiweek" | ||
) | ||
# fmt: on | ||
args = parser.parse_args() | ||
|
||
current_flag, historical_flag = ( | ||
args.current, | ||
args.historical, | ||
) | ||
if not current_flag and not historical_flag: | ||
raise Exception("no data was requested") | ||
|
||
# Decide what to update | ||
if current_flag: | ||
update_current_data() | ||
if historical_flag: | ||
update_historical_data() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Need to make sure key is one of the three standard keys ("county", "positive", etc)