-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit events from the Contents Service #954
Conversation
Here's an example snippet of an extension "listening" to events emitted by the contents manager in this PR: class MyExtension(ExtensionApp):
...
def initialize_settings(self):
event_logger = self.serverapp.event_logger
async def my_listener(logger: EventLogger, schema_id: str, data: dict) -> None:
print(f"THIS WAS SEEN!: {data}")
event_logger.add_listener(
schema_id="https://events.jupyter.org/jupyter_server/contents_service/v1", listener=my_listener)
... Let me know what you think! |
Codecov Report
@@ Coverage Diff @@
## main #954 +/- ##
==========================================
+ Coverage 72.07% 72.20% +0.13%
==========================================
Files 64 64
Lines 8031 8063 +32
Branches 1340 1342 +2
==========================================
+ Hits 5788 5822 +34
+ Misses 1837 1836 -1
+ Partials 406 405 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@Zsailer Wow! You beat me to it, I was just playing around with my own custom implementation. This looks great. |
@@ -395,6 +395,7 @@ def get(self, path, content=True, type=None, format=None): | |||
if type == "directory": | |||
raise web.HTTPError(400, "%s is not a directory" % path, reason="bad type") | |||
model = self._file_model(path, content=content, format=format) | |||
self.emit(data={"action": "get", "path": path}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague performance concern about emitting an event on every get and save. These are by far the most common ContentsManager actions called by clients. In my opinion, we should only emit events that have demonstrated a developer need. For my File ID manager, I do not need these events. If there are no known uses for these events, I suggest we remove these events entirely rather than attempt to anticipate future uses at the cost of performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we should only emit events that have demonstrated a developer need
I think this is too stringent of a condition.
Events certainly have use-cases outside of just developers. Operators/admins look to use this event system to audit user activity—in fact, this was the original demand driving the jupyter events/telemetry efforts mentioned in this Jupyter Telemetry enhancement proposal (jupyter/enhancement-proposals#41).
If this work is causing a noticeable performance degradation, we should address it rather than limit the usage of the event system for operators.
Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit
method immediately returns: https://github.com/jupyter/jupyter_events/blob/50746633a2adc7e41e2e0e1b0631db10ffb165db/jupyter_events/logger.py#L337-L342
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.
Also, keep in mind, in scenarios where there are no handlers or listeners, the .emit method immediately returns:
I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.
Of course, this is out of scope for the PR. I'll make an issue of this in Jupyter Events for further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.
I think the main point is that Jupyter Server should reach a future state where it provides high volumes of fine-grained, structured event data from most areas across Jupyter Server, allowing operators to get detailed information about what's happening in a running server. In many deployment scenarios, it's required that operators keep a detailed log of everything happening in the application.
If performance is the concern, the answer isn't to avoid adding events; it's to improve performance. I think we can do it! 😎
I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.
Yeah, that's right. I think we should update jupyter_events to check if the specific event is being watched by any listeners.
This is a little trickier (not impossible) to do with handlers, since we don't keep a mapping of handlers to their specific events (consequence of using Python's logging libraries for the main EventLogger). All handlers listen to all events, unless someone adds a logging.Filter
object to look for specific events. The challenge is that we can't inspect the filter objects easily. I have some ideas on how we can solve this issue in jupyter_events, but this shouldn't block the PR here.
@Zsailer Completed my review. 👀 |
Thank you for the review @dlqqq! I appreciate it! It's probably worth doing some benchmarking tests to see how much event emission affects the server's performance. I'll try to find time to build these tests later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. Great work! Not sure if you wanted to tackle performance before/after merging this. LMK your thoughts.
Thanks @dlqqq! I think all of the performance issues can/should be solved in the |
Uses the event logger in #862 to emit events from the Contents Manager.
Borrows the contents schema from #364