Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-23499 Integrate cancellation of compute job with sql kill handler. #4898

Merged
merged 9 commits into from
Dec 17, 2024

Conversation

xtern
Copy link
Contributor

@xtern xtern commented Dec 13, 2024

@@ -157,4 +283,21 @@ private static AsyncSqlCursor<InternalSqlRow> executeQueryInternal(Ignite node,

return await(fut);
}

private static class InfiniteJob implements ComputeJob<Void, Void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid copy-paste ? can it be public ItComputeSystemViewTest.InfiniteJob ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

JobDescriptor<Void, Void> job = JobDescriptor.builder(InfiniteJob.class).units(List.of()).build();
JobExecution<Void> execution = node.compute().submit(JobTarget.node(clusterNode(node)), job, null);

Awaitility.await().until(execution::stateAsync, willBe(jobStateWithStatus(EXECUTING)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to wait status here ? let`s change this test - kill job immediately.

Copy link
Contributor Author

@xtern xtern Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed this condition.

why do we need to wait status here ?

I thought that we need to wait until task will be deployed and I assumed that if this is not done, there possible flaky failures (that we will cancel a non-existent task). But locally I don't see flaky failures so I removed this check here and in the killComputeJobFromRemote test.

Awaitility.await().until(execution::stateAsync, willBe(jobStateWithStatus(CANCELED)));

assertThat(executeKillJob(node, jobId), is(false));
assertThat(executeKill(node, COMPUTE, jobId, true), is(true));
}

@Test
public void killQueryFromRemote() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we have no "real" tests for killing long-living queries ? WDYT, do we need such a tests ? Because we already have real infinite job in compute, but it not tested in sql.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something that doesn't fit on a single page?
i.e. SELECT x FROM system_range(0, 100000)?

Comment on lines 253 to 254
private static boolean executeKillJob(Ignite node, UUID jonId) {
return executeKill(node, COMPUTE, jonId, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static boolean executeKillJob(Ignite node, UUID jonId) {
return executeKill(node, COMPUTE, jonId, false);
private static boolean executeKillJob(Ignite node, UUID jobId) {
return executeKill(node, COMPUTE, jobId, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

@@ -102,7 +102,7 @@ public CompletableFuture<Boolean> handle(KillCommand cmd) {

CompletableFuture<Boolean> killFut = invokeCancel(handler, cmd.operationId());

if (killFut.isDone() || !cmd.noWait()) {
if (killFut.isCompletedExceptionally() || !cmd.noWait()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss smth here, if noWait == true - we must return immediatelly i.e. return CompletableFuture.completedFuture(true); otherwise need to return killFut, isn`t it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

if noWait == true - we must return immediatelly i.e. return CompletableFuture.completedFuture(true);

...except handling exception described in javadoc

@throws IllegalArgumentException If the operation identifier is not in the correct format.

For example ``KILL QUERY '123' NO WAIT` throws exception instead of returning true 🤔

@zstan
Copy link
Contributor

zstan commented Dec 17, 2024

Also seems we have uncompleted and probably erroneous (sql mention) java doc here: OperationKillHandler

    /**
     * Returns whether the handler can abort operations only on local node or across the entire cluster.
     *
     * <p>If a handler can only abort operations on the local node, then distributed coordination will
     * be performed by the SQL engine. <-???
     *
     * @return {@code True} if the handler can abort operations only on local node,
     *         {@code false} if the handler can abort operations across the entire cluster (in such a case . <- ???
     */
    boolean local();

@xtern
Copy link
Contributor Author

xtern commented Dec 17, 2024

Also seems we have uncompleted and probably erroneous (sql mention) java doc here: OperationKillHandler

    /**
     * Returns whether the handler can abort operations only on local node or across the entire cluster.
     *
     * <p>If a handler can only abort operations on the local node, then distributed coordination will
     * be performed by the SQL engine. <-???
     *
     * @return {@code True} if the handler can abort operations only on local node,
     *         {@code false} if the handler can abort operations across the entire cluster (in such a case . <- ???
     */
    boolean local();

This is done intentionally. If a local handler is registered, our KillCommandHandler (part of sql-engine) will call local handlers across the cluster.
Check this comment #4799 (comment)

@xtern xtern merged commit 977f6c4 into apache:main Dec 17, 2024
1 check passed
@xtern xtern deleted the ignite-23499 branch December 17, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants