Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket Optimization #51

Merged
merged 7 commits into from
Oct 5, 2020
Merged

Websocket Optimization #51

merged 7 commits into from
Oct 5, 2020

Conversation

EJH2
Copy link
Collaborator

@EJH2 EJH2 commented Oct 2, 2020

This pull request includes several changes:

  • Importing WebSocketProgressRecorder without proper configuration now results in a RuntimeError
    • Rather than flooding the console with the same error and providing no indication on the frontend, this aims to alert developers immediately
  • The default ProgressRecorder's set_progress and stop_task functions now return the task metadata
    • This will allow future progress bar implementations to easily extend the functionality of the original, without having to waste a Progress(task_id).get_info() to get the same info
  • WebSocketProgressRecorder now takes advantage of the aforementioned extension
    • This saves having to create an AsyncResult instance just to fill a dictionary with basic info

Any reviews would be greatly appreciated.

@EJH2 EJH2 requested review from czue and OmarWKH October 2, 2020 01:48
…mewhere internally to not break the entire system
Copy link
Owner

@czue czue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change to return the meta object!

Had some questions mostly around the behavior when things aren't set up. Will give you a chance to respond before approving.

'Tried to use websocket progress bar, but dependencies were not installed / configured. '
'Use pip install celery-progress[websockets] and set up channels to enable this feature. '
'See: https://channels.readthedocs.io/en/latest/ for more details.'
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that anything that ever imports from this file will raise an exception here if things aren't configured, yeah?

I'm not sure that's a problem, but it feels a bit less risky if the error gets raised when you actually try to use the class, not just import it. Is there a reason you didn't do it that way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will error if the default channel layer has not been configured. I originally thought it would be best to catch the error before it happens, and only evaluate whether the channel layer exists once vs doing it every time a task is updated, but I understand if the other change would make more sense. Although, I now notice that that error will do nothing anyways, as it's not raised at all. How I managed to remove raise during testing and not put it back before committing I will never know. I will push a new commit moving it back into WebSocketProgressRecorder.

Copy link
Collaborator Author

@EJH2 EJH2 Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you rather it stay as a logging error, or as an actual error?

As a logger error in __init__, the message would fire once for task, and the except AttributeError would silently drop all messages. As a real error in __init__, attempting to use the progress bar would just kill the task and put a traceback in the celery console.

Another interesting interaction that I've noticed is that currently, if a user has channels properly set up with the exception of having a channel layer named "default" configured, attempting to connect to the websocket via JS will throw AttributeError: 'NoneType' object has no attribute 'group_add'. This is due to the fact that, by default, the consumer will look for the "default" channel layer, and fail if it can't be found. Should we handle this as well?

)
except AttributeError: # No channel layer to send to, so ignore it
pass
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this path reachable anymore?

Copy link
Collaborator Author

@EJH2 EJH2 Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default channel layer cannot be found, and due to the lack of properly raising the error as explained above, the AttributeError exception will still be triggered by the post-run handler.

One thing I've been thinking of is that currently, the nature of get_channel_layer() just looks for a channel layer named "default", and if none can be found, returns None instead. Should we offer the ability to configure the name of this layer, so that other layers are not potentially impacted? I would investigate that as another PR if so, as to not clog this one with features.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never actually used the websocket implementation - so it's not a hugely informed opinion - but at a high level configuring it by name sounds useful.

celery_progress/websockets/tasks.py Show resolved Hide resolved
@OmarWKH
Copy link
Collaborator

OmarWKH commented Oct 2, 2020

I also like this change but I have two concerns:

  1. Previously the message sent to the client was generated by Progress.get_info. Now it's spread across Progress (for HTTP) and WebSocketProgressRecorder + task_postrun_handler (for WS). This makes it harder to understand and maintain. Maybe Progress.get_info could have more parameters? Or instead subclass AsyncResult and pass Progress.get_info a real or fabricated AsyncResult? Or..?
  2. Less important, but getting AsyncResult each time is a more defensive approach in case update_progress does not work as intended. Although I'm not sure being so defensive is warranted.

@EJH2
Copy link
Collaborator Author

EJH2 commented Oct 2, 2020

Omar, your concerns are certainly warranted, but the reason for this change is this:

  1. From what I can see in the source for AsyncResult, in order to fetch even just the state of the task, it has to hit the result backend at least once, if not more. This is perfectly fine for the HTTP progress bar, as the intervals between requests make this negligent (unless you set your request interval to an unreasonably small number, in which case you're just DDoSing yourself). However, one of the marketed benefits of the WS bar is that you get updates as soon as they happen. The need for this change stemmed from my own experience, where I improved my task so much that I would trigger a race condition for trying to fetch the data and send it before I needed to fetch it again. All of the data provided by getting the Progress info is supplied by both meta and the post-run handler kwargs.
  2. Not sure what you're referring to with update_progress, but see above.

@OmarWKH
Copy link
Collaborator

OmarWKH commented Oct 2, 2020

That makes sense.

  1. [..] All of the data provided by getting the Progress info is supplied by both meta and the post-run handler kwargs.

My proposal is to separate data collection from message generation.

Data collection: In the case of HTTP the data would be collected by AsyncResult. And in WS, by meta and kwargs.

Message generation: Both HTTP and WS would pass the data to a common function that generates the message to send to the client (like Progress.get_info does now).

  1. Not sure what you're referring to with update_progress, but see above.

Sorry I meant task.update_state.

@EJH2
Copy link
Collaborator Author

EJH2 commented Oct 2, 2020

Message generation: Both HTTP and WS would pass the data to a common function that generates the message to send to the client (like Progress.get_info does now).

I would be unopposed to a potential implementation if it makes developing and maintaining this easier for us.

Less important, but getting AsyncResult each time is a more defensive approach in case [task.update_state] does not work as intended. Although I'm not sure being so defensive is warranted.

From what I can tell at this point, if task.update_state is not working as intended, then there's a bigger problem afoot. task.update_state's sole purpose is to store the updated progress data so that Progress(task_id).get_info() can fetch it later, where AsyncResult is responsible for doing the fetching. One cannot reliably work without the other, so I don't see how we could possibly redesign that, unless there's a better way to store and later fetch the metadata.

@OmarWKH
Copy link
Collaborator

OmarWKH commented Oct 2, 2020

I would be unopposed to a potential implementation if it makes developing and maintaining this easier for us.

I'll try to come up with something tomorrow.

@EJH2
Copy link
Collaborator Author

EJH2 commented Oct 3, 2020

I've gone ahead and changed the behavior of the error down to a logger warning inside of WebSocketProgressRecorder, and waiting for @czue's comments on the other topics.

As an aside, would it be possible to add hacktoberfest to the repo topic?

Copy link
Owner

@czue czue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good from my side. feel free to go ahead and merge if you're happy with it.

)
except AttributeError: # No channel layer to send to, so ignore it
pass
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never actually used the websocket implementation - so it's not a hugely informed opinion - but at a high level configuring it by name sounds useful.

@czue
Copy link
Owner

czue commented Oct 5, 2020

As an aside, would it be possible to add hacktoberfest to the repo topic?

Sure. How do I do this, exactly? Or are you able to?

@EJH2
Copy link
Collaborator Author

EJH2 commented Oct 5, 2020

Sure. How do I do this, exactly? Or are you able to?

This can be done by going to the repo's home page and clicking on the gear next to the About section here:
image
and adding the hacktoberfest label to the topics box here:
image
I believe only the owner can edit this, as I don't seem to have the ability myself. These screenshots were taken from my own repo.

@czue
Copy link
Owner

czue commented Oct 5, 2020

thanks - done!

@EJH2
Copy link
Collaborator Author

EJH2 commented Oct 5, 2020

@OmarWKH If you're ok with this, I'll go ahead and merge it in which would allow you to continue development on your PR.

Copy link
Collaborator

@OmarWKH OmarWKH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good

@EJH2 EJH2 merged commit ecb5008 into master Oct 5, 2020
@EJH2 EJH2 deleted the feat-websocket-optimization branch October 6, 2020 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants