Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Document how Pipelines using Stream Lookup can run into deadlock #3740

Closed
gmitter-ef opened this issue Mar 20, 2024 · 3 comments
Closed

Comments

@gmitter-ef
Copy link

Apache Hop version?

2.6, 2.7, 2.8

Java version?

OpenJDK 11

Operating system

Windows

What happened?

It turned out that the problem of pipelines running into a deadlock, which is described in
https://pentaho-public.atlassian.net/wiki/spaces/EAI/pages/386807182/Transformation+Deadlocks
, also affects Apache Hop 2.6-2.8.

I tested the “Blocking Example - Stream Lookup" and “Blocking Example - Stream Lookup - Workaround 2” in the examples zip attached under the above link, by simply importing it via the very handy import tool to Apache Hop.

I would suggest to either fix it (if possible) or add documentation for this behaviour (could not find any).

BTW: I also posted this to the users@hop.apache.org mailing list.

Issue Priority

Priority: 3

Issue Component

Component: Pipelines

@hansva
Copy link
Contributor

hansva commented Mar 25, 2024

It's how the engine works, you are right it should be documented somewhere, and it isn't.
Going to change this to a feature request for documentation.

@hansva hansva changed the title [Bug]: Pipelines using Stream Lookup can run into deadlock [Task]: Document how Pipelines using Stream Lookup can run into deadlock Mar 25, 2024
@hansva hansva removed the bug label Mar 25, 2024
@Adalennis
Copy link
Contributor

.take-issue

@github-actions github-actions bot added this to the 2.11 milestone Oct 10, 2024
@dsanderbi
Copy link
Contributor

In the link, it is described how a deadlock can occur.
image

It mentions that this happens when the main stream is split. My question specifically relates to the case of
the merge join step. In this case, a merge join is performed, and the blocking step is introduced in the upper stream,
with the explanation that the lower stream does not need it since the "sort rows" already blocks all rows.
image

When using the merge join step, you get a pop-up indicating that you need to sort on the key before joining, otherwise errors may occur.

Since sorting is required when using a merge join step, a deadlock shouldn't occur with
this step, right? Or am I mistaken?
At least, the deadlock does not occur anymore if the blocking step in the example is replaced with "sort rows".

1.1) If such a case were to occur, how does it behave with the temporary
files generated by the blocking step? Since the sorting needs to be
maintained to avoid incorrect joins, the reading must also happen in the correct order.
image
image

  1. Is there any scenario in which a deadlock could still occur?
    Specifically, if the main stream is not split, meaning that streams
    are merged using either the lookup step or merge join step, but do not
    originate from the same source (both come from different sources)?

I would highly appreciate it if these questions could be addressed in the documentation to improve the overall understanding of deadlocks. Specifically, whether the order of files created by the blocking step is preserved, and whether a deadlock can occur at all during a correctly executed merge join.

Final remark/question:
In the original documentation, it is described how to determine when a deadlock occurs
using the formula: number of rows > x * (nr of rows in rowset) + x.
In the documentation for pipeline properties https://hop.apache.org/manual/latest/pipeline/create-pipeline.html#_pipeline_properties and also in the Hop GUI, I could not find the equivalent of "Nr of rows in rowset".

hansva added a commit that referenced this issue Nov 4, 2024
added avoiding deadlocks with the stream lookup transform. #3740
@hansva hansva closed this as completed Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants