Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add params to snp_allele_frequencies_advanced #694

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

leehart
Copy link
Collaborator

@leehart leehart commented Dec 10, 2024

Re: issue #391

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@leehart leehart changed the title Add taxon_by param to snp_allele_frequencies_advanced(). Add example … Add params to snp_allele_frequencies_advanced Dec 10, 2024
@leehart
Copy link
Collaborator Author

leehart commented Dec 13, 2024

Hi @alimanfoo . Do we know how we want to allow any value in the period_by column? Looking at the current code, it's not clear to me what we want to do here.

util.py currently has:

def prep_samples_for_cohort_grouping(
    *, df_samples, area_by, period_by, taxon_by="taxon"
):
    # Take a copy, as we will modify the dataframe.
    df_samples = df_samples.copy()

    # Fix "intermediate" or "unassigned" taxon values - we only want to build
    # cohorts with clean taxon calls, so we set other values to None.
    loc_intermediate_taxon = (
        df_samples[taxon_by].str.startswith("intermediate").fillna(False)
    )
    df_samples.loc[loc_intermediate_taxon, taxon_by] = None
    loc_unassigned_taxon = (
        df_samples[taxon_by].str.startswith("unassigned").fillna(False)
    )
    df_samples.loc[loc_unassigned_taxon, taxon_by] = None

    # Add period column.
    if period_by == "year":
        make_period = _make_sample_period_year
    elif period_by == "quarter":
        make_period = _make_sample_period_quarter
    elif period_by == "month":
        make_period = _make_sample_period_month
    else:  # pragma: no cover
        raise ValueError(
            f"Value for period_by parameter must be one of 'year', 'quarter', 'month'; found {period_by!r}."
        )
    sample_period = df_samples.apply(make_period, axis="columns")
    df_samples["period"] = sample_period

    # Add area column for consistent output.
    df_samples["area"] = df_samples[area_by]

    return df_samples

with

def _make_sample_period_month(row):
    year = row.year
    month = row.month
    if year > 0 and month > 0:
        return pd.Period(freq="M", year=year, month=month)
    else:
        return pd.NaT


def _make_sample_period_quarter(row):
    year = row.year
    month = row.month
    if year > 0 and month > 0:
        return pd.Period(freq="Q", year=year, month=month)
    else:
        return pd.NaT


def _make_sample_period_year(row):
    year = row.year
    if year > 0:
        return pd.Period(freq="Y", year=year)
    else:
        return pd.NaT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant