Skip to content

make sure that the ebi-ena null values are lower case #3246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antgonza opened this issue Feb 6, 2023 · 0 comments
Closed

make sure that the ebi-ena null values are lower case #3246

antgonza opened this issue Feb 6, 2023 · 0 comments

Comments

@antgonza
Copy link
Member

antgonza commented Feb 6, 2023

We currently don't force ebi-ena null values to be lower case but we should.

The code is something like this:

update_values = {
    'not collected': 'not collected',
    'not provided': 'not provided',
    'restricted access': 'restricted access',
    'not applicable': 'not applicable',
    'unspecified': 'not applicable',
    'not_collected': 'not collected',
    'not_provided': 'not provided',
    'restricted_access': 'restricted access',
    'not_applicable': 'not applicable',
    'missing: not collected': 'not collected',
    'missing: not provided': 'not provided',
    'missing: restricted access': 'restricted access',
    'missing: not applicable': 'not applicable',
}

df = [sample_or_prep_object].to_dataframe().fillna("").applymap(str.lower)
ddf = df[df.isin(update_values.keys()).any(axis=1)]
if ddf.shape[0] != 0:
    cols = [c for c in ddf.columns if set(update_values) & set(ddf[c].values)]
    to_replace = ddf[cols].copy()
    to_replace.replace(update_values, inplace=True)
    [sample_or_prep_object].update(to_replace)    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant