Skip to content

Conversation

@astroshim
Copy link
Contributor

What is this PR for?

Release resource after cron schedule job.

What type of PR is it?

Improvement

Todos

  • - add check-box for release resource to the zeppelin-web.
  • - add release resource(interpreter restart) function to notebook.

Is there a relevant Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-524.

How should this be tested?

please refer to the screenshots.

Screenshots (if appropriate)

release_resource

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 'resource' is not a term used in Zeppelin, i think

"Restart interpreter (release resource) after cron execution"

is more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "auto-restart interpreter on cron execution" that @bzz suggested?

@Leemoonsoo
Copy link
Member

Thanks @astroshim. This is really useful feature.
Do you mind adding an unittest?

@astroshim
Copy link
Contributor Author

Thank @Leemoonsoo your feedback.
I'm going to implement the unittest.

@bzz
Copy link
Member

bzz commented Dec 21, 2015

Wow, I did not realize that by "release resources" you mean full interpreter restart!
This will kill i.e context, associated with the interpreter and all the cached data in cluster.

May be we could call it something like "auto-restart interpreter on cron execution", then? Both in PR title and in Zeppelin UI.

Or explore some ways to actually release resources without full interpreter restart (which seems much harder).

@astroshim
Copy link
Contributor Author

@bzz Thanks your feedback.
Yes, "release resource" means restarting all binded interpreters on the note.
I was thinking restarting the interpreters is the obvious way to release resources.
If you have any better ways to release resource without the interpreters restart, please let me know.
Thanks.

@bzz
Copy link
Member

bzz commented Dec 22, 2015

@astroshim I see, you are right, it's hard to say what else can do the job.

I think if we just update the PR and the text in Zeppelin UI to something like "Auto-restart interpreters on cron execution" + add some tests, if possible - that would be great.

What do you think?

@astroshim
Copy link
Contributor Author

Dear @bzz
I totally agree with you. I'll update the text and testcase.
Thanks!

@astroshim astroshim changed the title Release resource after cron job. Auto-restart interpreters on cron execution. Dec 23, 2015
@felixcheung
Copy link
Member

Would it be possible the interpreter restart happens in the middle of another running notebook if more one are scheduled to around the same time?
Interpreter are shared right?

@HeartSaVioR
Copy link
Contributor

Actually I was trying to restart the spark interpreter first for each job run via REST API, but gave up because it throws Scheduler already terminated when restarting interpreter and job run are occurred nearly same time.

EDIT: note 2B83NUA15 was scheduled by cron for every 1 min

ERROR [2015-12-22 18:05:00,001] ({DefaultQuartzScheduler_Worker-7} QuartzScheduler.java[schedulerError]:2425) - Job (note.2B83NUA15 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.RuntimeException: Scheduler already terminated]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.RuntimeException: Scheduler already terminated
        at org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:140)
        at org.apache.zeppelin.notebook.Note.runAll(Note.java:326)
        at org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:391)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

After this, I couldn't handle interpreter via UI. Restart doesn't work so I should kill interpreter via kill -9.

Furthermore, if my understanding is right, interpreter restart aborts current jobs which are running or pending.

So it would be better to restart interpreter manually.
If we really want to restart interpreter automatically, we should care about below scenarios.

@Leemoonsoo
Copy link
Member

I think interpreter restart should abort pending / running jobs.
If it raise Scheduler already terminated need to be fixed. @HeartSaVioR do you mind creating an issue for it? I think can take care of it.

@HeartSaVioR
Copy link
Contributor

@Leemoonsoo
It occurred while I was experimenting several things together so I'm not 100% sure that we can't handle interpreter via UI.
I'll try to reproduce and report it when it is reproducible.

Btw, if we think it's valid to abort pending / running jobs, it may be better to add aborted jobs to pending queue automatically so that new interpreter process can run these, since we may couldn't recognize some jobs are aborted because of automatic interpreter restart.

@astroshim
Copy link
Contributor Author

Hello. @felixcheung
You're right.. The interpreter process is shared now but I think interpreter process will be separated as user option.
what do you think?

@felixcheung
Copy link
Member

@astroshim right, I think separated interpreter process for cron job or similar would be a good approach for this issues. I know that there are places with a dozen of users sharing Zeppelin & interpreter and restarting interpreter seem dangerous.

@HeartSaVioR
Copy link
Contributor

@Leemoonsoo
I took a look at previous logs, and its behavior was somewhat strange.
I'll file an issue regarding strange behavior with logs. (Would posting to mailing list be more appreciated?)

Btw, I can't reproduce issue so my case shouldn't block this PR. But I agree with @felixcheung and another approach @astroshim stated is better than current approach.

@astroshim
Copy link
Contributor Author

@felixcheung @HeartSaVioR
I think this issue and 'separated interpreter process' issue are different so the issues are better to be separated.
As @felixcheung says this issue might be dangerous but many other zeppelin users need this.

@Leemoonsoo
Copy link
Member

User can create separate interpreter setting (e.g 'spark-cronjob') for cron scheduled notebook to avoid interpreter restart during other notebook uses. So I think this is good to be merged and further improvement(@felixcheung's suggestion) can be handled in a separate issue.

@astroshim
Copy link
Contributor Author

@Leemoonsoo Thank you for making clear.

@Leemoonsoo
Copy link
Member

Tested and LGTM.
Merge if there're no more discussions

@asfgit asfgit closed this in 45ce8a2 Dec 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants