Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Giant refactor to move all state into a Kernel struct #1145

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

JamesWrigley
Copy link
Collaborator

Pretty much what the title says 😛 This is not ready yet but I'm opening it for transparency. There are two reasons for the refactor:

  1. Easier testing, so that we can more confidently make changes like Make message receive and handling async #1140 and Jupyter messaging updates/correctness #1138.
  2. Allow for creating a temporary kernel that can be used in a precompilation workload for Add precompile statements to reduce startup latency #1074.

Overview:

  • Almost all global state has been moved into the new Kernel struct, and all relevant internal methods now require a kernel be passed to them. That's the bulk of the diff. I've tried to minimize other changes as much as possible, so there should 🤞 be no change in behaviour from master.

  • There are certain fields like ans and n which are documented as globals and must stay global, so setproperty!(::Kernel,...) is implemented such that it will copy changes to those fields to their corresponding global variable. Similarly there are certain public methods like IJulia.history() that must still work without a kernel argument, so a new _default_kernel global is defined for normal usage and that's used by default for all public methods.

    An edge-case are the dicts In and Out which are by default assigned to the corresponding fields of _default_kernel and will not contain changes from temporary test kernels. I don't think that's a particularly important discrepancy so I'm inclined to ignore it.

  • I implemented a basic test in waitloop.jl that spawns the IJulia waitloop and then makes a request to it using jupyter_client. This isn't working very well right now because blocking calls to the client causes hangs, I suspect there's a deadlock somewhere.

This will need a lot more work before it can be merged, but feel free to review the design already (cc @stevengj, @fredrikekre).

@MasonProtter, you should be able to do... something... with this for a precompilation workload. I would suggest creating a Kernel, connecting to some of the sockets with ZMQ.jl, and then sending some mock requests following the wire protocol: https://jupyter-client.readthedocs.io/en/latest/messaging.html#the-wire-protocol

@JamesWrigley JamesWrigley self-assigned this Feb 5, 2025
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 84.19118% with 43 lines in your changes missing coverage. Please review.

Project coverage is 61.14%. Comparing base (4390ef4) to head (d491800).

Files with missing lines Patch % Lines
src/handlers.jl 66.66% 8 Missing ⚠️
src/init.jl 82.92% 7 Missing ⚠️
src/IJulia.jl 92.59% 6 Missing ⚠️
src/comm_manager.jl 25.00% 6 Missing ⚠️
src/inline.jl 57.14% 6 Missing ⚠️
src/stdio.jl 82.35% 6 Missing ⚠️
src/eventloop.jl 86.36% 3 Missing ⚠️
src/execute_request.jl 96.29% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1145       +/-   ##
===========================================
+ Coverage   10.21%   61.14%   +50.92%     
===========================================
  Files          14       14               
  Lines         822      893       +71     
===========================================
+ Hits           84      546      +462     
+ Misses        738      347      -391     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JamesWrigley
Copy link
Collaborator Author

Figured out the deadlock, we can now start writing full tests with jupyter_client :)

@JamesWrigley JamesWrigley force-pushed the kernel-struct branch 4 times, most recently from 64ec133 to b19c5aa Compare February 10, 2025 00:11
using ZMQ, JSON, SoftGlobalScope
import Base.invokelatest
import Base: invokelatest, RefValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably using rather than import since we only use these, not extend them? Also invokelatest is exported automatically these days.

(This code dates back to the days before using Foo: bar)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think I'd prefer to do those sorts of changes in a later PR. This one is already quite large 😅

src/IJulia.jl Outdated
setfield!(value, name, x)
end

function Base.close(kernel::Kernel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be called by a finalizer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that wouldn't be easy because it wait's on the stdio tasks so it might yield. I think it's ok to rely on the do-constructor closing it in the tests.

# global counterparts.
if name ∈ (:ans, :n, :In, :Out, :inited, :verbose)
setproperty!(IJulia, name, x)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't ans, In, Out global in Main (not IJulia)?

Maybe this kind of global should be copied to/from Main before/after each execute request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas n and verbose at least can probably be moved in to the Kernel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right 🤔 I need to figure out a good way to test that, for now I've added a @test_broken test in d9acb6e.

whereas n and verbose at least can probably be moved in to the Kernel?

n is documented to be module-global (https://julialang.github.io/IJulia.jl/dev/library/public/#IJulia.n) so I don't think we can move that fully into Kernel, but verbose should be possible. I think only @vprintln/@verror_show will need tweaking to look at _default_kernel.verbose instead of the global.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the tests for ans/In/Out being propagated to kernel.current_module in 2fd607f, also moved verbose fully into Kernel.

src/stdio.jl Outdated
@@ -274,8 +260,8 @@ function oslibuv_flush()
end

import Base.flush
function flush(io::IJuliaStdio)
function flush(io::IJuliaStdio, kernel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user-visible function, e.g. if the user calls flush(stdout).

It would be better to store a reference to the kernel in the IJuliaStdio object.

Copy link
Collaborator Author

@JamesWrigley JamesWrigley Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, fixed in d9acb6e and added a test for it.

@JamesWrigley
Copy link
Collaborator Author

JamesWrigley commented Feb 10, 2025

Progress update:

  • Moved the waitloop task into Kernel to make testing easier, and added a wait(::Kernel) method to wait for it to finish.
  • Added/expanded tests for autocompletion, MIME display, and clear_output() using jupyter_kernel_test. I think that covers everything we can use jupyter_kernel_test for.
  • Moved the calls to watch_stdio() and pushdisplay() into init(), now close(::Kernel) will ensure they're cleaned up properly.
  • Added a bunch of tests for some of the public API: clear_history(), load(), the hook system, and history().
  • Tweaked history() to print all the entries on new lines in 20dc591. From the docstring, I think that's what it should be doing?

To-do:

  • Address remaining review comments.
  • Add a test for kernel.jl to ensure that the kernel works outside of the test suite.
  • Figure out why the load() test is failing on Windows.
  • Go through the coverage reports to find more low-hanging fruit to test.

This is the only change in 5.4, in previous versions it was sent on the shell
socket.
The Jupyter messaging docs state that all messages on the shell (request)
channel shall be ended with a `status: idle` message, and that: "This idle
status message indicates that IOPub messages associated with a given request
have all been received."

Previously we did not attempt to ensure this, which meant that in some cases the
`execute_result` message would be published after the `execute_reply` message
had been sent. It's still not exactly guaranteed by the `yield()`, but it's
something and it makes the `jupyter_kernel_test` tests pass.
@JamesWrigley JamesWrigley marked this pull request as ready for review February 13, 2025 23:16
@JamesWrigley
Copy link
Collaborator Author

I believe this is ready to be reviewed now. One thorny issue is segfaults on windows that appear to be coming from ZMQ, using @threadcall for the heartbeat seemed to fix it: d491800
But I'm not convinced I understand the problem fully, will try to debug it a bit more before merging.

For reviewers, I've tried to keep the commits atomic so I would recommend reviewing each commit one-by-one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants