-
Notifications
You must be signed in to change notification settings - Fork 79
RESTful sample status #3139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RESTful sample status #3139
Conversation
|
Just FYI, moved the base branch to dev and I'm going to restart the builds. |
antgonza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wasade, some minor comments.
|
|
||
| # cache sample detail for lookup | ||
| study_samples = set(study.sample_template.keys()) | ||
| sample_accessions = study.sample_template.ebi_sample_accessions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ebi_sample_accessions will return all available samples with Nones where there is no accession; in other words, len(sample_accessions) == len(study_samples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think that impacts the subsequent use
| study_samples = set(study.sample_template.keys()) | ||
| sample_accessions = study.sample_template.ebi_sample_accessions | ||
|
|
||
| # cache preparation information that we'll need | ||
|
|
||
| # map of {sample_id: [indices, of, light, prep, info, ...]} | ||
| sample_prep_mapping = defaultdict(list) | ||
| pt_light = [] | ||
| for idx, pt in enumerate(study.prep_templates()): | ||
| pt_light.append((pt.id, pt.ebi_experiment_accessions, | ||
| pt.status, pt.data_type())) | ||
|
|
||
| for ptsample in pt.keys(): | ||
| sample_prep_mapping[ptsample].append(idx) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with this block is that it will always load all samples, even when len(samples) == 1.
Another way to do this could be to first select which preps have the samples you are looking for and then build the details, something like this:
samples_set = set(samples) # not sure if this is required as its own var.
prep_templates = [pt for pt in study.prep_templates() if set(pt) & samples_set]
...
for idx, pt in enumerate(prep_templates):
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the cost of this the same as it's still necessary to iterate over all preps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes time wise but my concern is the memory (should have said this in my previous message) to store all prep info data in pt_light, in specific due to pt.ebi_experiment_accessions, the other values are pretty small; you can imagine that this can grow a lot for studies like the AGP. However, this is something internal and if you think this is not that large or important we can improve in a future iteration, if it actually becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay sounds good, last commit should reduce what's cached
antgonza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade, thank you; looks great! Let's wait for tests ...
This PR adds in the ability to query a study for detail on a specific sample or set of samples. The returned details include EBI accession information, and what preparations the sample was observed on. The output is represented in an flat fashion suitable to be fed directly into a
DataFrameorDataTable.cc @dhakim87 @antgonza