Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task queue metrics helper methods #108

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Task queue metrics helper methods #108

wants to merge 6 commits into from

Conversation

alanhamlett
Copy link
Contributor

@alanhamlett alanhamlett commented Apr 24, 2018

Adds new helper classmethods:

  • Task.task_count_from_queue
  • Task.queue_metrics

Also fixed these pep8 linter rules:

E116 unexpected indentation (comment)
E127 continuation line over-indented for visual indent
E222 multiple spaces after operator
E226 missing whitespace around arithmetic operator
E231 missing whitespace after ','
E261 at least two spaces before inline comment
E301 expected 1 blank line, found 0
E302 expected 2 blank lines, found 1
E303 too many blank lines (2)
E306 expected 1 blank line before a nested definition, found 0
E502 the backslash is redundant between brackets
F403 'from ._internal import *' used; unable to detect undefined names
F841 local variable 'e' is assigned to but never used
W503 line break before binary operator

Related to #107.

@alanhamlett alanhamlett changed the title Task count from queue helper method Task queue metrics helper methods Apr 24, 2018
@alanhamlett
Copy link
Contributor Author

@thomasst ready for code review.

Copy link
Member

@thomasst thomasst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See comment.

README.rst Outdated
keyword argument, which accepts an integer indicating how many executions
should be loaded.
To get a count of the number of tasks for a given queue and state, use
``Task.count_tasks_from_queue``. To get a list of all tasks for a given queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually called task_count_from_queue in the code.

Also, another approach would be to let you pass 0 into tasks_from_queue (currently it's undefined). That way we wouldn't need to add a new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tasks_from_queue uses mget and then loops over all the returned tasks. It's much faster to use zcount to only get the count of tasks from redis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the readme with 843e016.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that passing limit=0 should make it just return the count and an empty list.

Copy link
Contributor Author

@alanhamlett alanhamlett Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it. I prefer methods to not change behavior. Would you be ok with leaving them separate methods?

@alanhamlett
Copy link
Contributor Author

Should we leave the methods separate?

@alanhamlett
Copy link
Contributor Author

Ready for re-review 😄

@alanhamlett
Copy link
Contributor Author

Anything else needed for this PR?

@alanhamlett
Copy link
Contributor Author

Bringing this PR to attention again. It should be ready to merge pending the comment above.

@thomasst
Copy link
Member

Thanks! It's on my list to review.

@alanhamlett
Copy link
Contributor Author

Let me know if I can do anything to help with your review! 😄

@alanhamlett
Copy link
Contributor Author

Happy Friday! The best day for merging PRs 😉

prefix = tiger.config['REDIS_PREFIX'] + ':'

for state in metrics.keys():
queues = tiger.connection.smembers(prefix + state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use tiger._key(state).

queues = tiger.connection.smembers(prefix + state)
for queue in queues:
metrics[state][queue] = {
'total': self.task_count_from_queue(tiger, queue, state),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use a pipeline to avoid round trips.

"""

key = tiger._key(state, queue)
count = tiger.connection.zcount(key, '-inf', '+inf')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zcard() does the same.

def task_count_from_queue(self, tiger, queue, state):
"""
Returns the number of tasks in a given queue and task state.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation details aside, I'm still unsure on whether this could be solved by just using tasks_from_queue(limit=0)[0]. I'm aware that currently passing a 0 limit to tasks_from_queue returns all the tasks but this behavior is not documented, and a more Pythonic way would be to say limit=None if we actually wanted all the tasks. We should at least define how tasks_from_queue should behave before we decide to add a new function here. @jkemp101 curious if you have any thoughts on this?

@jkemp101
Copy link
Member

Are these stats different compared to what get_queue_stats returns? I didn't compare the output of both functions. Our Prometheus exporter uses those stats with the following code for our metrics. It just summarizes some subqueues.

        stats = defaultdict(lambda: defaultdict(int))
        for queue, depths in self._tiger.get_queue_stats().items():
            if queue not in self._isolated:
                queue = queue.split('.', 1)[0]
            for state in ('active', 'queued', 'scheduled', 'error'):
                stats[queue][state] += depths.get(state, 0)
        return stats

@alanhamlett
Copy link
Contributor Author

@jkemp101 I wasn't aware of the get_queue_stats method! I could change this PR to just document and test that existing method instead of adding a new one, since they're the same?

@alanhamlett
Copy link
Contributor Author

One thing I wish get_queue_stats could do is only return stats for top-level queues instead of including subqueue stats individually. Any ideas how to do this efficiently in a redis pipeline so it could be an option to get_queue_stats, or does it need to be done in Python space?

@jkemp101
Copy link
Member

I can't think of an easy way to do this in Redis without some Lua scripting. And I kind of feel that computing stats should take second priority over the regular task processing so it makes sense to me to just offload this to Python code instead of putting any extra burden on Redis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants