Skip to content

Redbiom initial code #2282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 14, 2017
Merged

Conversation

antgonza
Copy link
Member

#2278 needs to be merged first.

Extra from adding redbiom:

  • I rm qiita_pet/templates/analysis_waiting.html cause it's not used anywhere
  • show_loading was duplicated so just left one and made it work on all other pages

@codecov-io
Copy link

codecov-io commented Sep 13, 2017

Codecov Report

Merging #2282 into dev will increase coverage by 0.03%.
The diff coverage is 97.2%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #2282      +/-   ##
==========================================
+ Coverage   93.78%   93.81%   +0.03%     
==========================================
  Files         161      163       +2     
  Lines       18249    18427     +178     
==========================================
+ Hits        17115    17288     +173     
- Misses       1134     1139       +5
Impacted Files Coverage Δ
qiita_pet/webserver.py 100% <100%> (ø) ⬆️
qiita_db/test/test_util.py 99.62% <100%> (+0.01%) ⬆️
qiita_db/util.py 96.83% <100%> (+0.15%) ⬆️
qiita_pet/handlers/qiita_redbiom.py 94.59% <94.59%> (ø)
qiita_pet/test/test_qiita_redbiom.py 98.41% <98.41%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42dd9a9...079f4eb. Read the comment docs.

@antgonza antgonza mentioned this pull request Sep 13, 2017
9 tasks

if message == '':
config = redbiom.get_config()
get = redbiom._requests.make_get(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a private module, it should not be necessary to call this method

'Not a valid search: "%s", your query is too small '
'(too few letters), try a longer query' % query)

if message == '':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you describe in english what the goal of this block is? it looks like the intention is to establish a dict mapping a study, to the artifacts contained in that study that overlap with requested samples? if yes, wouldn't it be easier to just query the qiita API for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm also a bit confused as to why the artifact IDs are necessary? I'm concerned with the subsequent code as it depends on the structure of redbiom IDs which we do not make guarantees about. If this is necessary (which I think can be handled by cross referencing with the Qiita API), can you please open an issue so that we can construct an API endpoint to retrieve the tags from a set of redbiom IDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! got it now, this was my first interaction with redbiom and you can tell. updating and simplifying, should also address the next comments.

for idx in redbiom.util.ids_from(query, True, 'feature',
ctx):
aid, sid = idx.split('_', 1)
sid = sid.split('.', 1)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample IDs in qiita are not prefixed with the qiita study id?

ctx):
aid, sid = idx.split('_', 1)
sid = sid.split('.', 1)[0]
study_artifacts[sid].append(aid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understanding this more. An API method should be exposed in redbiom which yields a the set of observed tags for a given iterable of redbiom IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, but for the time being, this works.

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wasade, thanks!

'Not a valid search: "%s", your query is too small '
'(too few letters), try a longer query' % query)

if message == '':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! got it now, this was my first interaction with redbiom and you can tell. updating and simplifying, should also address the next comments.

ctx):
aid, sid = idx.split('_', 1)
sid = sid.split('.', 1)[0]
study_artifacts[sid].append(aid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, but for the time being, this works.

Copy link
Contributor

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

qiita_db/util.py Outdated
del info["pi_email"]
del info["pi_name"]

infolist.append({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you do this:

info['status'] = qdb.study.Study(info['study_id']).status
infolist.append(info)

you get the same result right? It looks like you're creating a dictionary by using the same key values that info has. If the problem is that info has more keys than you need, I think the solution is to delete those keys (as you are doing a couple of lines above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

message = (
'Not a valid search: "%s", are you sure this is a '
'valid metadata %s?' % (
query, 'value' if search_on == 'metadata' else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if on this line is redundant (you are sure that search_on == 'metadata' -> see line 43

message = (
'Not a valid search: "%s", your query is too small '
'(too few letters), try a longer query' % query)
if message == '':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use if not message like in line 40, for consistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can use try/except/else:

In [1]: def f(s):
   ...:     try:
   ...:         if s == 'a':
   ...:             raise ValueError('Error!')
   ...:     except ValueError:
   ...:         print "Error happened"
   ...:     else:
   ...:         print "No error"
   ...:         

In [2]: f('a')
Error happened

In [3]: f('b')
No error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

self.render('redbiom.html')

@execute_as_transaction
def _redbiom_search(self, query, search_on, callback):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some trouble following this function, basically because this function is performing 3 different operations, depending on the value of search_on. I would create smaller functions that perform the single operation, so the code is easier to read. Something along this lines:

def _redbiom_metadata_search(self, query, contexts):
    ...
    return message, study_artifacts

def _redbiom_feature_search(self, query, contexts):
    ...
    return message, study_artifacts

def _redbiom_taxon_search(self, query, contexts):
   ...
   return message, study_artifacts

Then, this function can be simplified as shown below. I know the function is incomplete here but I think this exemplifies the structure that I'm proposing:

def _redbiom_search(self, query, search_on, callback):
    search_f = {'metadata': self._redbiom_metadata_search,
                        'feature': _redbiom_feature_search,
                        'taxon': _redbiom_taxon_search}
    try:
        df = readbiom.summarize.contexts()
    except ConnectionError:
        message = '...'
    else:
        contexts = df.ContextName.values
        message, study_artifacts = search_f(query, contexts)
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

message = ('Incorrect search by: you can use metadata, '
'features or taxon and you passed: %s' % search_on)

if message == '':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use if not message

@coroutine
@execute_as_transaction
def post(self, search):
search = self.get_argument('search', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like search and search_on are not really optional parameters.
I think the correct thing to do here is:

search = self.get_argument('search')
search_on = self.get_argument('search_on')

and let tornado handle if the arguments are not passed. It will return a 400 error message which is the correct code given that the request is incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@@ -115,6 +140,36 @@ function show_hide_process_list() {
}
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation doesn't fit function signature. Can you update?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

$(button).addClass("btn-info");
bootstrapAlert('We are adding ' + aids.length + ' artifact(s) to the analysis. This ' +
'might take some time based on the number of samples on each artifact.', "warning", 10000);
$.get('/artifact/samples/', {ids:aids})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples are added using a get? That is confusing... get should be use for read-only operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not actually adding the samples, it's requesting the sample names from the artifacts to add. The addition is actually done in this line: qiita_websocket.send('sel', data['data']);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

{% if current_user is not None %}
return '<div class="container" style="max-width: 5em;">'+
'<div class="row justify-content-md-center">' +
'<div class="col-md-1 text-center details-control">&nbsp;</div>' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these two blocks are the same except for this line? You can then do:

              return '<div class="container" style="max-width: 5em;">'+
                       '<div class="row justify-content-md-center">' +
            {% if current_user is not None %}
                         '<div class="col-md-1 text-center details-control">&nbsp;</div>' +
            {% end %}
                         '<div class="col-md-1 text-center">' + data.length + '</div>' +
                       '</div>' +
                     '</div>';

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


{%block content%}
<small>
Redbiom only searches on public data and the data is updated nightly. Note that you will only be able to expand and add artifacts to analyses if you are signed into Qiita.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate? @wasade will it be possible to update the redis db every night?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not touched load mechanisms since June, and there is work to do there, but it should be feasible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a discussion for next week's meeting.

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josenavas

qiita_db/util.py Outdated
del info["pi_email"]
del info["pi_name"]

infolist.append({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

self.render('redbiom.html')

@execute_as_transaction
def _redbiom_search(self, query, search_on, callback):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

message = (
'Not a valid search: "%s", your query is too small '
'(too few letters), try a longer query' % query)
if message == '':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@coroutine
@execute_as_transaction
def post(self, search):
search = self.get_argument('search', None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@@ -115,6 +140,36 @@ function show_hide_process_list() {
}
}

/*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

$(button).addClass("btn-info");
bootstrapAlert('We are adding ' + aids.length + ' artifact(s) to the analysis. This ' +
'might take some time based on the number of samples on each artifact.', "warning", 10000);
$.get('/artifact/samples/', {ids:aids})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not actually adding the samples, it's requesting the sample names from the artifacts to add. The addition is actually done in this line: qiita_websocket.send('sel', data['data']);

{% if current_user is not None %}
return '<div class="container" style="max-width: 5em;">'+
'<div class="row justify-content-md-center">' +
'<div class="col-md-1 text-center details-control">&nbsp;</div>' +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


{%block content%}
<small>
Redbiom only searches on public data and the data is updated nightly. Note that you will only be able to expand and add artifacts to analyses if you are signed into Qiita.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a discussion for next week's meeting.

$(button).addClass("btn-info");
bootstrapAlert('We are adding ' + aids.length + ' artifact(s) to the analysis. This ' +
'might take some time based on the number of samples on each artifact.', "warning", 10000);
$.get('/artifact/samples/', {ids:aids})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@josenavas josenavas merged commit 9e1c1b5 into qiita-spots:dev Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants