-
Notifications
You must be signed in to change notification settings - Fork 31
Timeout during request in several conditions #34
Comments
I guess the 2) issue maybe to timeout because we need some time to upload a big file. However I set sys env variable |
There are a number of reasons you can experience timeouts during kernel startup - especially when remote kernels are in play via Enterprise Gateway. To better diagnose the nature of these issues, please provide the complete log file (with DEBUG enabled) from the Enterprise Gateway instance. The reason we'd like the entire file is to capture the complete request - from start thru error. Here are a few items worth noting...
At any rate, if you can post your EG log (with DEBUG) we can take a closer look - although setting |
The full log
|
By the way I set |
Thank you for the log file - it was helpful. First, regarding your last comment, I have some questions.
As for the log file, things are looking completely normal. The kernel was started and had entered the ACCEPTED state on a given node...
All in all, that sequence was 20 seconds (and within the default 30 seconds since I don't see an indication that Then, at the 1 minute 5 second mark, Enterprise Gateway is interrupted...
which triggers the kill sequence on the kernel. What you see is all the fallback code to terminate a kernel quickly - like when shutting down the EG instance. More Questions. 😄 Thank you. |
@kevin-bates To answer your concerns,
|
Thank you for the responses. So it definitely sounds like this is purely related to the large file uploads (item 1). Let's not worry about the resources at the moment (items 2 and 3) especially if you see this on the first notebook (when large jars are uploaded). Please do not interrupt the and especially do not interrupt EG. The interrupt of EG is what is terminating the kernel. Yes, it is running, but probably not entirely started. The default launch timeout is 30 seconds, not 20 seconds, so there may be something else here, but its not evident in the logs. Please set envs
It is critical that you check the YARN logs for the appropriate application ID (which you can also obtain from the log files once the host as been assigned. From above: Also, please provide the release of EG you're running. This will be posted in the startup message in the EG log - like:
Thanks. |
It just dawned on me what is happening. The default request timeout - enforced by tornado http requests - is 20 seconds! As a result, your Here's more log analysis that reminded me this must be happening. Your
However, nothing happens after that - because the request timeout was met. What you should see, is a sequence like this, following the
Notice the subsequent However, on your system, the 20 seconds for the http request was expired, so the client stopped moving forward with websocket initialization. Since you're running in jupyter hub, setting env variables for your notebook sessions is different. See the code that I linked to in this comment. Notice down a bit further at line Given the kernel startup is taking 20 seconds, I suspect values around 40 seconds would be more than sufficient, but let's go with 120 or so right now. |
@kevin-bates Seems your linked comment is config in Kubernetes? If it is just in jupyterhub started by terminal, would setting system environment work? If not what does (Currently we are trying to use kernel connecting to yarn cluster) |
Yes. For Jupyter hub the env for the notebook that is spawned needs to be setup via config files since there's no way to access the shell prior to the start of the notebook server. This is irrespective of kubernetes. You'll need to filter out the k8s stuff. os.getenv() is used in programs to "get" env variables. The issue you need to resolve is the "put" or "set" of the variable. Normally "put" is fine in the shell via 'export'. Fyi, I'm away from my computer today. |
@kevin-bates Thanks for your help, I am just back from vacation. Yes as you said, I just set the env variables via 'export', I am not sure if it works, for example, May it be the similar issue with spark config? If yes, how to set this env variable? |
Now that you're talking about spark config, you should probably focus on the Based on your latest response, I can't determine if you're still experiencing timeouts or have moved forward since you're now talking about other variables. I think you need to ensure you can get basic envs set via the jupyter hub configurations, then look into the kernelspec (kernel.json) files for system-specific tuning you need to do. I'm not sure how much more I can help you without specific details or an online meeting. cc: @lresende |
@kevin-bates firstly I still have timeout problem, what I mentioned above is that, I did not find a solution to set the env variables correctly. So according to what you said above, the first thing I can do is set env variables via python code in |
@kevin-bates sorry for late response, here is the latest updates. Before that, I would explain my above notes. "I just set the env variables via 'export', I am not sure if it works", this means I had set the env variables via terminal, they could be seen via "echo" in terminal but I am not sure if they still exist (or be valid) in jupyterhub session, and according to the result, this setting does not work. (The kernel error would still be about 20 seconds after the new notebook starts, which means we fail to change it) Then, I tried, use
in both as you say, the unit is second. But still the old problem, about 20 seconds to raise the error. (I also added In brief, my point is: how to set this parameter?. You have provided the way to set it through k8s, but in common sense, there should be a more fundamental way(setting it without going through any containers), isn't it? Otherwise enterprise gateway would not work without k8s, which is weird. |
@kevin-bates I think we have found a tmp solution, modify the site-package nb2kg code to
And we do not encounter timeout error now. However this could not be a perfect solution, so we are still waiting for your update |
First, thank you for your patience on this. Setting env variables is quite common, just that since the Notebook instance in this case is not spawned from the shell, it's slightly more complicated. Modifying the variable's default value in the code isn't sufficient (or necessary), although that experiment confirms the hypothesis that the runtime environment of Notebook is the issue. I'm a little surprised Could you please provide the following:
I can't determine what exactly you've set and, unfortunately, I don't have a Hub installation to try this with. Perhaps someone can spot what needs to be done. Looking around, I see this issue that may provide useful information: jupyterhub/jupyterhub#330 Looks like you could set the KG TIMEOUT values in the env of JupyterHub prior to its launch, then use: c.spawner.env_keep.append('KG_REQUEST_TIMEOUT')
c.spawner.env_keep.append('KG_CONNECT_TIMEOUT') Here's the doc for Another thing you can do is just post a simple question on the JupyterHub forum ... How do I set an environment variable for my spawned Notebook instance to use? |
Thanks for still following this issue @kevin-bates , actually the 2 things you want:
But after all, you are talking about |
I believe we're getting very close. Yes, The other approach is to set the environment variables using I'm curious how you're getting the I'm adding two Hub experts - perhaps they have information I have not provided or can confirm what I've stated. cc: @minrk @consideRatio - tl;dr this is regarding the setting of environment variables into the spawned Notebook instance from a vanilla (non-k8s) Hub. Thank you. |
Thanks, I think I almost understand. In terms of But!!! I did this after I started EG, which means, without setting |
An Port So, if you set Once you get things going that way, you should transition to using configuration files. |
@kevin-bates Your concern make sense, here is the order of config:
And, I do not think it make sense that Currently, the only connection I can find between notebook/jupyterhub and kernel(KG or EG) is that the args After all, anyway, this vanilla jupyter notebook and jupyterhub solution finally works, congratulations..! |
@Litchilitchy - I have some good news for you - if you're still interested. 😃 (I apologize for the inconvenience this caused and appreciate your patience!) I believe the reason I decided to install jupyterhub and get this working (I'm sorry I didn't do this sooner!). Here's what I did...
from jupyterhub.spawner import LocalProcessSpawner
class NB2KGProcessSpawner(LocalProcessSpawner):
def user_env(self, env):
env['KERNEL_USERNAME'] = self.user.name
return env
c.JupyterHub.spawner_class = NB2KGProcessSpawner
c.Spawner.args = ['--NotebookApp.session_manager_class=nb2kg.managers.SessionManager', '--NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager', '--NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager']
c.Spawner.environment = {'KG_URL': 'http://my-gateway-server:8888', 'KG_CONNECT_TIMEOUT': '60', 'KG_REQUEST_TIMEOUT': '60'} This uses a custom spawner class inorder to set the
Once the configuration is set, start jupyterhub. You should then be able to hit your EG server for kernels. |
I've also gone ahead and increased the KG timeout defaults to 60 seconds: #35 |
@kevin-bates Thanks for your patience! Great, finally all of these make sense, which is that, Jupyterhub owns its own environment and I think this issue could be closed. Besides, may I just have another few questions?
|
Please feel free to ask questions any time - either here, at the Enterprise Gateway repo, gitter room, or the discourse forum!
For the Python and R launchers, the launcher creates the SparkContext on behalf of the kernel. Since they're the same process, the context is available during execution of each cell. For the Scala kernels, the embedded Apache Toree kernel already has code to create the SparkContext. The other job of the launchers is to listen for interrupt/shutdown operations. I hope that helps. As suggested, I'm going to close this issue. |
I am using nb2kg via
pip install "git+https://github.com/jupyter-incubator/nb2kg.git#egg=nb2kg"
in single user jupyterhub (so I guess it is the same behavior with jupyter notebook)I found there are several conditions I get Timeout error, I guess they are due to nb2kg
Open multiple notebooks the same time, like during the first one is created via "New" button, but kernel not ready, and open the second one at this time. In this case, at most 1 kernel will be ready and the others would get this error.
use
--jars
in Spark Config to upload a large file. (like in my case 350mb). This time I create them one by one (not create until the previous one is ready or fail). I created 5 notebooks, only the second one works, and the others get error.The error seems to be the same.
So any idea?
By the way I fork the newest nb2kg and find that
client.fetch
actually does not callfuture.result()
. So is nb2kg in pip obsolete?The text was updated successfully, but these errors were encountered: