Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codon_prob.py #24

Closed
wants to merge 10 commits into from
Closed

Add codon_prob.py #24

wants to merge 10 commits into from

Conversation

willdumm
Copy link
Contributor

Addresses #19

@matsen
Copy link
Contributor

matsen commented Jul 9, 2024

I'm cleaning out old code and came across this, which seems to have been eclipsed but might come in handy at some point:

def codon_change_count(pcp_df):
    # Define a helper function to count nt differences in codons
    def count_nt_differences(codon1, codon2):
        return sum(1 for x, y in zip(codon1, codon2) if x != y)

    # Prepare a list to store results
    results = []

    # Iterate over rows in the DataFrame
    for index, row in pcp_df.iterrows():
        parent_seq = row['parent']
        child_seq = row['child']

        # Ensure sequences are divisible by 3 (length of a codon)
        if len(parent_seq) % 3 != 0 or len(child_seq) % 3 != 0:
            continue  # Skip sequences that are not properly formatted

        # Break sequences into codons
        parent_codons = [parent_seq[i:i+3] for i in range(0, len(parent_seq), 3)]
        child_codons = [child_seq[i:i+3] for i in range(0, len(child_seq), 3)]

        codon_count = len(parent_codons)
        one_nt_change = 0
        two_nt_change = 0
        three_nt_change = 0

        # Count differences for each codon pair
        for pc, cc in zip(parent_codons, child_codons):
            differences = count_nt_differences(pc, cc)
            if differences == 1:
                one_nt_change += 1
            elif differences == 2:
                two_nt_change += 1
            elif differences == 3:
                three_nt_change += 1

        # Append results for this row
        results.append({
            'codon_count': codon_count,
            '1_nt_change': one_nt_change,
            '2_nt_change': two_nt_change,
            '3_nt_change': three_nt_change
        })

    # Convert results list to DataFrame
    return pd.DataFrame(results)

codon_change_df = codon_change_count(pcp_df)

@matsen
Copy link
Contributor

matsen commented Sep 12, 2024

Superseded by #50

@matsen matsen closed this Sep 12, 2024
@matsen matsen deleted the 19-add-codon-prob branch September 12, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants