Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New diagnostic: distribution of # of cohort 'events' per person #667

Open
pbr6cornell opened this issue Nov 10, 2021 · 2 comments
Open

New diagnostic: distribution of # of cohort 'events' per person #667

pbr6cornell opened this issue Nov 10, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@pbr6cornell
Copy link

Rupa developed a simple but very valuable new diagnostic: looking at the histogram of number of cohort starts per person.

For cohorts that allow recurrence (e.g. a person can enter and exit multiple times, such as with an acute disease), a source of misclassification error can be if a 'new event' is actually follow-up care from a prior event. Expanding an clean window within a cohort can reduce this error, but then may introduce a different error, which is failing to identify new events by misclassifying as part of prior event. So, this truly is a different type of 'sensitivity/specificity' tradeoff, associated with the recurrent events amongst people with 1 or more events.

The count of persons by # of events gives the user perspective of how often we see persons with many recurrences vs. few, and with external context about the event, this can be used to determine if the number of recurrences is really viable.

Simple query to compute the desired numbers (which can be displayed as a simple table):

select num_events, count(subject_id) as num_persons
from
(
select subject_id, count(cohort_start_date) as num_events
from cohort
group by subject_id
) t1
group by num_events
order by num_events desc

@pbr6cornell pbr6cornell added the enhancement New feature or request label Nov 10, 2021
@gowthamrao
Copy link
Member

This is useful. ok - we will add it

@gowthamrao
Copy link
Member

I wonder if this is already something featureextraction can do or should do? because i think it fits characterization -- It is "the distribution of records by person/cohort" i.e. we can aggregate this to cohort level or keep it at person level

@anthonysena

reason is - i think the best for this information is the covariate_value table similar to age-strata. Alternative is a bigger technical lift - because we will need to either create a new set of cohort characteristic, and then create a new table to just store this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants