Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doctor Visits show an unexpected weekly trend #2044

Open
nolangormley opened this issue Aug 30, 2024 · 1 comment
Open

Doctor Visits show an unexpected weekly trend #2044

nolangormley opened this issue Aug 30, 2024 · 1 comment
Assignees
Labels
data quality Missing data, weird data, broken data

Comments

@nolangormley
Copy link
Contributor

Actual Behavior:

When looking at the data from the Doctor Visits signal, it shows a weekly trend, similar to signals that have weekly reporting (where there is a spike at the beginning of the week).

docvisit

Expected behavior

Roni and I were looking through this yesterday and didn't seem to understand why this was. Since this is a daily reported signal, we expected it to be much more smooth.

Context

Here's some code to replicate the plot above

import wget

docvisit = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")
docvisitadj = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_adj_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")

df = pd.read_csv("covidcast-doctor-visits-smoothed_cli-2024-05-29-to-2024-08-29.csv")
dfadj = pd.read_csv("covidcast-doctor-visits-smoothed_adj_cli-2024-05-29-to-2024-08-29.csv")

df.time_value = pd.to_datetime(df.time_value, utc=True)
dfadj.time_value = pd.to_datetime(dfadj.time_value, utc=True)
dfadj = dfadj[['time_value', 'value']].rename(columns={'time_value':'time_value', 'value':'valueadj'})

foo = df[['time_value', 'value']].merge(dfadj, on='time_value', how='left')
foo.plot(x='time_value', y=['value', 'valueadj'])
@nolangormley nolangormley added the data quality Missing data, weird data, broken data label Aug 30, 2024
@nolangormley nolangormley self-assigned this Aug 30, 2024
@RoniRos
Copy link
Member

RoniRos commented Sep 1, 2024

Back in July Peter and I noticed this pattern in the "hospital admissions" signals in Texas. Dmitry investigated it and concluded (1) it is already present in the raw signal we receive; and (2) in that signal it is only present in data from Texas but not from other states. The current signal (Doctors Visits) is from the same source. :-(

@dshemetov Did I remember your conclusions correctly? And did we ever file an issue about it? If so, we should link/consolidate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data
Projects
None yet
Development

No branches or pull requests

2 participants