Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] job saved objects initialization #82639

Merged

Conversation

jgowdyelastic
Copy link
Member

Adds a check during the plugin's start to see if any ml-job saved objects exist. If they don't, and ML jobs do exist, it will run an initialization process to create them.

Also moves the mlLog service to a better location.

Remake of #82450

// const { body } = await client.asInternalUser.count({
// index: '.ml-config',
// });
// return body.count > 0;
Copy link
Member Author

@jgowdyelastic jgowdyelastic Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the original PR:

@droberts195 @legrego
This function needs to check that there is at least one ML job in existence.
I believe the fastest way to do this would be to perform a count on .ml-config. However, the kibana system user does not have read access to that index.
Instead i'm having to call both AD and DFA get endpoints to perform the same check (below).
Do you think it would be safe to give the kibana system user the ability to make this check?

@legrego wrote:
I'm not terribly familiar with the contents of .ml-config. It appears to be a listing of jobs? Is there anything else in that index?

A more direct question might be: Is there any information in that index that's not accessible via the ML APIs granted to users with the manage_ml cluster privilege? I'd be ok granting read access to the .ml-config index if this is true, since kibana_system already has that cluster privilege.

On the other hand, if there is information in this index that the kibana_system user currently doesn't have access to via other APIs, then I'd need a better understanding of that data before answering one way or another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 what are your thoughts on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR is holding up a lot of other work, i'm going to leave question open for a future PR.
I'll leave in this commented code in case the decision is to use count.

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jgowdyelastic jgowdyelastic removed the request for review from a team November 5, 2020 12:34
@jgowdyelastic
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡

}
return results;
}

async function _loadAllJobSavedObjects() {
const { body } = await client.asInternalUser.search<SearchResponse<SavedObjectJob>>({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked this in the first PR. We need to be using the saved objects client to retrieve all jobs, instead of the raw ES client to query the index.

The way this is currently written, it won't support users who have changed their kibana index, and it may return results from indices that match this pattern that aren't actually the "kibana index".

Using the Saved Objects Client will also ensure that the Kibana index has been fully initialized, and is ready for plugins to access.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a search here because I need a list of all saved objects across all spaces, even the ones the user doesn't have access to. As far as i'm aware the saved object client can't do that?
checkStatus needs to know if a job has a saved object, otherwise we'll end up with duplicates when repairing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@legrego i've updated these searches so it now uses the saved objects client


async function _jobSavedObjectsExist(size: number = 1) {
const { body } = await client.asInternalUser.search({
index: '.kibana*',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier: we should be using the Saved Objects client for this operation

// check whether the job saved objects exist
// and create them if needed.
const { initializeJobs } = jobSavedObjectsInitializationFactory(coreStart);
initializeJobs().finally(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve visibility into the status of this operation, we could hook into Core's status service (exposed on CoreSetup). This would allow the platform, ML consumers, and end users to understand why jobs may be missing in the event of a failure here.

Don't consider this a blocker for merging, but I'd recommend at least exploring this in a followup. If nothing else, it will help us triage failures with the inevitable SDH comes in

Copy link
Member Author

@jgowdyelastic jgowdyelastic Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inevitable" excuse me?!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, that wasn't at all a knock on your work here, sorry if it came across that way. Coordinating something like this with a distributed system is bound to have some growing pains. We continue to have them on our team with respect to the authorization system we've built, despite our best efforts.

Copy link
Member Author

@jgowdyelastic jgowdyelastic Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know, sorry! i just thought that was funny.

await initSavedObjects();
mlLog.info('Job saved objects initialized for * space');
} catch (error) {
mlLog.error('Error Initializing jobs');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to log the error that we caught to help triage failures in production. Otherwise we may never know why initialization failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in d2800ab

}

async function _needsInitializing() {
if (await _jobSavedObjectsExist()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be adding a "repair" option to the management UI in a followup? I worry that a partially initialized set of jobs would confuse end users, and restarting the Kibana server wouldn't help them since we won't attempt initialization if a single SO exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, repairing the saved objects from the management UI will be coming very soon, for 7.11

);
const { initSavedObjects } = repairFactory(client, jobSavedObjectService);
await initSavedObjects();
mlLog.info('Job saved objects initialized for * space');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How easy is it to get the count of initialized jobs? That might be nice to record in the log message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in a959ee7

@jgowdyelastic
Copy link
Member Author

@elasticmachine merge upstream

Comment on lines +56 to +63
if (await _jobsExist()) {
// some ml jobs exist, we need to create those saved objects
return true;
}

// no ml jobs actually exist,
// that's why there were no saved objects
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (await _jobsExist()) {
// some ml jobs exist, we need to create those saved objects
return true;
}
// no ml jobs actually exist,
// that's why there were no saved objects
return false;
return await _jobsExist();

@@ -388,5 +191,5 @@ export function checksFactory(
});
}

return { checkStatus, repairJobs, initSavedObjects };
return { checkStatus };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we name it getCurrentStatus instead?

) {
const { checkStatus } = checksFactory(client, jobSavedObjectService);

async function repairJobs(simulate: boolean = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about having thesimulate check on top and return the default object instead of checking in teach for loop?

const tasks: Array<() => Promise<void>> = [];

const status = await checkStatus();
for (const job of status.jobs['anomaly-detector']) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as there is no break statements or async execution, what about using .filter and .forEach

@jgowdyelastic
Copy link
Member Author

@elasticmachine merge upstream

@jgowdyelastic
Copy link
Member Author

@elasticmachine merge upstream

@legrego legrego self-requested a review November 10, 2020 16:44
Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! One last comment. When spaces is disabled, the management screen returns a 400/bad request when querying for jobs. For the time being, you'll unfortunately need to strip out namespaces: ['*'] from the find request when spaces is disabled:

image

note your internal checks for initialization/repair can optionally keep namespaces: ['*'] in place if that makes your life easier, but this user-facing call will have to be smart enough to exclude it

Copy link
Member

@legrego legrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the understanding that we'll be adding comprehensive test coverage in a followup prior to release. Nice work!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
ml 1242 1243 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ml 6.7MB 6.7MB +270.0B

Distributable file count

id before after diff
default 42766 42769 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jgowdyelastic jgowdyelastic merged commit 096acb4 into elastic:master Nov 10, 2020
jgowdyelastic added a commit to jgowdyelastic/kibana that referenced this pull request Nov 10, 2020
* [ML] job saved objects initialization

* fixing job count logic

* adding missing files

* attempting to fix build crash

* fixing kibana.json

* changes based on review

* removing accidentally added export

* adding intialization promise

* use finally so errors dont stop initialization

* function rename

* removing duplicate header

* adding job initialization count to log message

* adding error to log message

* moving initialization file

* moving intialization file back again to fix git stash issues

* removing .kibana index search

* creating internal saved object client

* code clean up

* removing commented code

* adding check for spaces enabled

* adding ids to saved objects

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@jgowdyelastic jgowdyelastic deleted the job-saved-object-initialization-2 branch November 10, 2020 21:03
jgowdyelastic added a commit that referenced this pull request Nov 11, 2020
* [ML] job saved objects initialization

* fixing job count logic

* adding missing files

* attempting to fix build crash

* fixing kibana.json

* changes based on review

* removing accidentally added export

* adding intialization promise

* use finally so errors dont stop initialization

* function rename

* removing duplicate header

* adding job initialization count to log message

* adding error to log message

* moving initialization file

* moving intialization file back again to fix git stash issues

* removing .kibana index search

* creating internal saved object client

* code clean up

* removing commented code

* adding check for spaces enabled

* adding ids to saved objects

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants