Skip to content

Conversation

@wasade
Copy link
Contributor

@wasade wasade commented Aug 27, 2021

This PR adds in the ability to query a study for detail on a specific sample or set of samples. The returned details include EBI accession information, and what preparations the sample was observed on. The output is represented in an flat fashion suitable to be fed directly into a DataFrame or DataTable.

cc @dhakim87 @antgonza

@antgonza antgonza changed the base branch from master to dev August 27, 2021 17:09
@antgonza
Copy link
Member

Just FYI, moved the base branch to dev and I'm going to restart the builds.

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wasade, some minor comments.


# cache sample detail for lookup
study_samples = set(study.sample_template.keys())
sample_accessions = study.sample_template.ebi_sample_accessions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ebi_sample_accessions will return all available samples with Nones where there is no accession; in other words, len(sample_accessions) == len(study_samples)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that impacts the subsequent use

Comment on lines 33 to 47
study_samples = set(study.sample_template.keys())
sample_accessions = study.sample_template.ebi_sample_accessions

# cache preparation information that we'll need

# map of {sample_id: [indices, of, light, prep, info, ...]}
sample_prep_mapping = defaultdict(list)
pt_light = []
for idx, pt in enumerate(study.prep_templates()):
pt_light.append((pt.id, pt.ebi_experiment_accessions,
pt.status, pt.data_type()))

for ptsample in pt.keys():
sample_prep_mapping[ptsample].append(idx)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this block is that it will always load all samples, even when len(samples) == 1.

Another way to do this could be to first select which preps have the samples you are looking for and then build the details, something like this:

samples_set = set(samples) # not sure if this is required as its own var. 
prep_templates = [pt for pt in study.prep_templates() if set(pt) & samples_set]
...
for idx, pt in enumerate(prep_templates):
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the cost of this the same as it's still necessary to iterate over all preps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes time wise but my concern is the memory (should have said this in my previous message) to store all prep info data in pt_light, in specific due to pt.ebi_experiment_accessions, the other values are pretty small; you can imagine that this can grow a lot for studies like the AGP. However, this is something internal and if you think this is not that large or important we can improve in a future iteration, if it actually becomes a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good, last commit should reduce what's cached

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wasade, thank you; looks great! Let's wait for tests ...

@coveralls
Copy link

coveralls commented Aug 27, 2021

Coverage Status

Coverage increased (+0.02%) to 91.171% when pulling 67b4fd6 on wasade:rest-sample-status into ab438cc on qiita-spots:dev.

@antgonza antgonza merged commit 7d4ad95 into qiita-spots:dev Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants