Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline failures #443

Closed
wants to merge 2 commits into from
Closed

Conversation

marcominerva
Copy link
Contributor

Motivation and Context (Why the change? What's the scenario?)

See #432.

/// </summary>
[JsonPropertyOrder(19)]
[JsonPropertyName("logs")]
public string? Logs { get; set; } = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid the "string" type which limits to a single unstructured entry, and use List<PipelineLogEntry> LogEntries if this is really needed. Thinking about the scenarios, I still don't see a use case though.

/// </summary>
[JsonPropertyOrder(6)]
[JsonPropertyName("failed")]
public bool Failed { get; set; } = false;
Copy link
Collaborator

@dluc dluc Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we ensure this is consistent?

for instance, if ContentStorage becomes unavailable and the pipeline job expires, the value will remain false and never change to true because there's nothing left running to retry.

/// </summary>
[JsonPropertyOrder(19)]
[JsonPropertyName("logs")]
public string? Logs { get; set; } = null;
Copy link
Collaborator

@dluc dluc Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto about consistency. I think a better approach would be assigning an approximate completion time and using that to establish if the pipeline failed (one scenario would be "it failed because it timed out")

}
else
catch (Exception ex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which code is throwing this exception? why stop retrying?

}
else
catch (Exception ex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, where is the exception coming from? e.g. if UpdatePipelineStatusAsync is failing in the try block it will likely fail also in the catch block

@marcominerva
Copy link
Contributor Author

The Logs property was just an idea, but we can remove it for now, the important thing is how to understand if the pipeline actually failed. I have added the catch blocks in the Distributed and InProcess Orchestrators to handle all unexpected errors like an attempt to decode an invalid PDF file (that will cause an Exception in the stepHandler.InvokeAsync method).

@dluc Your idea about assigning an approximate completion time and use it to determine the status of the pipeline is quite interesting and will allow to add the failure management without touching the Orchestrators at all, but how to determine a suitable value for the completion time? Maybe it could be a setting in configuration, that is saved in the DataPipeline object when building it?

@dluc dluc added the waiting for author Waiting for author to reply or address comments label May 19, 2024
@dluc
Copy link
Collaborator

dluc commented Oct 16, 2024

Parking this for now, since nobody else has raised similar concerns.
In this area I think users will want a stateful orchestrator that exposes each job with some nice UI, allowing to inspect, to pause/cancel, to re-run etc. and KM is not meant to develop in that direction. I would leave KM to rely on logs, and potentially develop the orchestration features separately, leveraging existing solutions.

@dluc dluc closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for author Waiting for author to reply or address comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants