incorporating other changes prior to new column importation #1450

melange396 · 2024-05-28T19:52:21Z

based on #1449 with techniques similar to those used in #1442 ...

This adds some previously missing signal "Description" fields, removes {}-style variable substitution and replaces it with appropriate text in other "Description" fields, and adds previously missing "Available Geography" fields.

After this is merged, i will run the google-doc syncronization GH action once more, and then verify that it it is only adding new fields.

sonarqubecloud · 2024-05-28T19:52:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

melange396 · 2024-05-28T19:57:39Z

this code, similar to that in #1442, verifies and extracts the changes to existing fields:

import pandas as pd

base_url = 'https://github.com/cmu-delphi/delphi-epidata/raw/{}/src/server/endpoints/covidcast_utils/db_signals.csv'

current = pd.read_csv(base_url.format('dev'), na_filter=False)
proposed = pd.read_csv(base_url.format('bot/update-docs'), na_filter=False)

# this code assumes columns have not been removed or renamed,
# and that no new rows have been added or had their ordering changed

new_cols = set(proposed.columns) - set(current.columns)
print("new cols:", sorted(new_cols))

non_matching = (proposed[current.columns] != current)
diffs_per_col = non_matching.apply(sum)
print(diffs_per_col)

mismatched_time = pd.concat([current[['Source Subdivision', 'Signal']], non_matching[['Description']]], axis=1)
print("Descriptions changed:")
print(mismatched_time[mismatched_time['Description']])

# produce alpha-sorted "Available Geography" from each csv for accurate comparison purposes:
current_ag_norm = current['Available Geography'].apply(lambda g: ','.join(sorted(g.split(','))))
proposed_ag_norm = proposed['Available Geography'].apply(lambda g: ','.join(sorted(g.split(','))))
diff_geos = (current_ag_norm != proposed_ag_norm)
print("Number of rows with different geographies:", sum(diff_geos))
mismatched_geos = pd.concat([current[['Source Subdivision', 'Signal']], diff_geos], axis=1)
print(mismatched_geos[mismatched_geos['Available Geography']])
# This shows that these 44 rows with different geographies are accounted for by these rows that have empty geos:
print(
      (current.iloc[380:394]['Available Geography']=='').all(),
      (current.iloc[419:449]['Available Geography']=='').all(),
      (394-380)+(449-419)==44
      ) 




intermediate = current.copy()
intermediate.iloc[380:394]['Available Geography'] = proposed.iloc[380:394]['Available Geography']
intermediate.iloc[419:449]['Available Geography'] = proposed.iloc[419:449]['Available Geography']
intermediate['Description'] = proposed['Description']
intermediate.to_csv('intermediate.csv', index=False)
import os
for _ in range(2):
    os.system("sed -i 's/,False,/,FALSE,/g' intermediate.csv")
    os.system("sed -i 's/,True,/,TRUE,/g' intermediate.csv")

carlynvandyke

Looks good to me!

incorporating other changes prior to new column importation

9858612

melange396 added documentation chore data quality user experience things that affect users of our software or services labels May 28, 2024

melange396 requested review from nmdefries and carlynvandyke May 28, 2024 19:52

melange396 requested a review from tinatownes May 28, 2024 19:58

carlynvandyke approved these changes May 28, 2024

View reviewed changes

melange396 merged commit 1559870 into dev May 28, 2024
7 checks passed

melange396 deleted the csv_update_prep branch May 28, 2024 22:16

This was referenced May 28, 2024

Update Google Docs Meta Data #1449

Closed

Release Delphi Epidata 4.1.22 #1454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorporating other changes prior to new column importation #1450

incorporating other changes prior to new column importation #1450

melange396 commented May 28, 2024

sonarqubecloud bot commented May 28, 2024

melange396 commented May 28, 2024

carlynvandyke left a comment

incorporating other changes prior to new column importation #1450

incorporating other changes prior to new column importation #1450

Conversation

melange396 commented May 28, 2024

sonarqubecloud bot commented May 28, 2024

Quality Gate passed

melange396 commented May 28, 2024

carlynvandyke left a comment

Choose a reason for hiding this comment