Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added error or warning facilitation for unscheduled algs. #1734

Merged
merged 9 commits into from
Jun 5, 2023

Conversation

RonShvarz
Copy link
Contributor

@RonShvarz RonShvarz commented May 31, 2023

#1638


This change is Reviewable

@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev May 31, 2023 17:06 Inactive
@RonShvarz RonShvarz requested review from golanha and removed request for golanha May 31, 2023 17:06
Adding nodes resource for error message info, WIP.
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 1, 2023 10:42 Inactive
Added ability for pipeline-driver to detect warnings vs errors, upon unScheduling
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 4, 2023 10:59 Inactive
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 4, 2023 11:41 Inactive
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 4, 2023 12:44 Inactive
@RonShvarz
Copy link
Contributor Author

Error or warning distinction, format fixes, node graph object updates concerning batch use-case.

1 similar comment
@RonShvarz
Copy link
Contributor Author

Error or warning distinction, format fixes, node graph object updates concerning batch use-case.

@RonShvarz RonShvarz requested a review from golanha June 4, 2023 12:52
Copy link
Member

@golanha golanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 7 files reviewed, 4 unresolved discussions (waiting on @RonShvarz)


core/pipeline-driver/lib/datastore/graph-store.js line 128 at r5 (raw file):

    _handleBatch(n) {
        const isAnyBatchFailed = n.batch.any(b => b.status === 'FailedScheduling');

anyFailedScheduling

Code quote:

isAnyBatchFailed 

core/pipeline-driver/lib/state/state-manager.js line 121 at r5 (raw file):

                if (resources && resources[0] && resources[0].unScheduledAlgorithms) {
                    const algorithms = { ...resources[0].unScheduledAlgorithms, ...resources[0].ignoredUnscheduledAlgorithms };
                    const nodesFromEtcd = resources[0].nodes;

clusterNodes

Code quote:

nodesFromEtcd

core/pipeline-driver/lib/tasks/task-runner.js line 123 at r5 (raw file):

            n.status = event.reason;
            n.warnings = n.warnings || [];
            const { resourceMessage, isError } = this._nodeResourceMessageBuilder(event, nodesFromEtcd);

buildNodeRe....

Code quote:

_nodeResourceMessageBuilder

core/pipeline-driver/lib/tasks/task-runner.js line 156 at r5 (raw file):

            else {
                resourceMessage += `${unScheduledAlg.complexResourceDescriptor.numUnmatchedNodesBySelector} nodes don't match node selector: '${selectors}',\n`;
                ({ resourceMessage, isError } = this._specificNodesResourceMessageBuilder(unScheduledAlg, resourceMessage, isError, nodesFromEtcd));

buildSpecif....

Code quote:

specificNodesResourceMessageBuilde

Copy link
Member

@golanha golanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @RonShvarz)


core/pipeline-driver/lib/tasks/task-runner.js line 191 at r5 (raw file):

                    resourceMessage += `${k} ,\n`;
                    const currentValue = resourceIdentifier.get(k);
                    resourceIdentifier.set(k, currentValue + 1);

BreachCountPerResource

Code quote:

resourceIdentifier

CR changes and fixes,
Robust over-capacity warning,
Unit tests overhaul
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 5, 2023 08:01 Inactive
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 5, 2023 08:26 Inactive
…hen it is not the first in the running order
@RonShvarz
Copy link
Contributor Author

/deploy

@hkube-ci hkube-ci temporarily deployed to dev June 5, 2023 11:21 Inactive
Copy link
Contributor Author

@RonShvarz RonShvarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @golanha)


core/pipeline-driver/lib/datastore/graph-store.js line 128 at r5 (raw file):

Previously, golanha (Golan Hallel) wrote…

anyFailedScheduling

Done.


core/pipeline-driver/lib/state/state-manager.js line 121 at r5 (raw file):

Previously, golanha (Golan Hallel) wrote…

clusterNodes

Done.


core/pipeline-driver/lib/tasks/task-runner.js line 123 at r5 (raw file):

Previously, golanha (Golan Hallel) wrote…

buildNodeRe....

Done.


core/pipeline-driver/lib/tasks/task-runner.js line 156 at r5 (raw file):

Previously, golanha (Golan Hallel) wrote…

buildSpecif....

Done.


core/pipeline-driver/lib/tasks/task-runner.js line 191 at r5 (raw file):

Previously, golanha (Golan Hallel) wrote…

BreachCountPerResource

Done.

@RonShvarz RonShvarz requested a review from golanha June 5, 2023 11:29
Copy link
Member

@golanha golanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 6 files at r3, 3 of 4 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 1 of 1 files at r9, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @RonShvarz)

@RonShvarz RonShvarz merged commit 7637785 into master Jun 5, 2023
@RonShvarz RonShvarz deleted the error-or-warning-resourcemessage branch June 5, 2023 12:21
hkube-ci pushed a commit that referenced this pull request Jun 5, 2023
* Added error or warning facilitation for unscheduled algs.

* Fixed unit tests,
Adding nodes resource for error message info, WIP.

* Edited capital letters in task-executer,

Added ability for pipeline-driver to detect warnings vs errors, upon unScheduling

* Added warnings and errors in case a node has batches in it.

* clearing the node object if there are no failures to schedule

* Fix wrong function usage @_handleBatch,
CR changes and fixes,
Robust over-capacity warning,
Unit tests overhaul

* Round up values of  missing resources to 2 decimal points.

* handle use-case of node stuck in prescheduling with failing batches when it is not the first in the running order

* fixed formatting for resources .... bump version [skip ci]
hkube-ci pushed a commit that referenced this pull request Jun 5, 2023
* Added error or warning facilitation for unscheduled algs.

* Fixed unit tests,
Adding nodes resource for error message info, WIP.

* Edited capital letters in task-executer,

Added ability for pipeline-driver to detect warnings vs errors, upon unScheduling

* Added warnings and errors in case a node has batches in it.

* clearing the node object if there are no failures to schedule

* Fix wrong function usage @_handleBatch,
CR changes and fixes,
Robust over-capacity warning,
Unit tests overhaul

* Round up values of  missing resources to 2 decimal points.

* handle use-case of node stuck in prescheduling with failing batches when it is not the first in the running order

* fixed formatting for resources .... bump version [skip ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants