-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Endpoint] Recursive resolver children #61914
[Endpoint] Recursive resolver children #61914
Conversation
…sive-resolver-children
…sive-resolver-children
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First half of the review
children: Node[]; | ||
events: ResolverEvent[]; | ||
alerts: ResolverEvent[]; | ||
lifecycle: ResolverEvent[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's this different than events
? is this specifically process events with an entity_id
that matches the id
field on Node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, lifecycle
contains all the process events for a particular entityID including start
, still_running
, and end
events. events
are the related events for the node, i.e. file
, registry
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎗️
@@ -46,6 +46,29 @@ export class EndpointAppConstants { | |||
static ALERT_LIST_DEFAULT_SORT = '@timestamp'; | |||
} | |||
|
|||
export interface NodeStats { | |||
totalEvents: number; | |||
totalAlerts: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is totalEvents
equal to the count of non-alert events + alert events
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this info? Isn't it expensive to get a count (esp. an exact count?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
totalEvents
equal to the count ofnon-alert events + alert events
?
I think the info would useful when display totals in the UI.
Do we need this info? Isn't it expensive to get a count (esp. an exact count?)
I think this would be useful for displaying counts,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Does totalAlerts
mean total related alerts? If so, can we add that to the interface name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner if we showed the user some total, we would need to explain what the total is. What is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we're probably going to remove the stats stuff for now, but I think what this is trying to communicate is totalAlerts
is the total number of alerts associated with a specific process node/entity_id aka how many alerts has this specific process generated.
totalEvents
is how many related events exists for this process.
@@ -72,6 +73,10 @@ export const Resolver = styled( | |||
<div className="loading-container"> | |||
<EuiLoadingSpinner size="xl" /> | |||
</div> | |||
) : hasError ? ( | |||
<div className="loading-container"> | |||
<div>error occured fetching data</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i18n reminder
|
||
export function registerResolverRoutes(router: IRouter, endpointAppContext: EndpointAppContext) { | ||
const log = endpointAppContext.logFactory.get('resolver'); | ||
|
||
router.get( | ||
{ | ||
path: '/api/endpoint/resolver/{id}/related', | ||
validate: validateRelatedEvents, | ||
path: '/api/endpoint/resolver/{id}/events', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the reason for having an id
here? Or a separate events
resource below that?
Is the id
the id
of an alert? Or is a resolver
an entity somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the id is for the alert to retrieve the resolver tree for. The source proceess of the alert event is the origin of the resolver view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems non-obvious from the path
const uniquePid = action.payload.selectedEvent?.endgame?.unique_pid; | ||
const legacyEndpointID = action.payload.selectedEvent?.agent?.id; | ||
[{ lifecycle, children }] = await Promise.all([ | ||
context.services.http.get(`/api/endpoint/resolver/${uniquePid}`, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the idea with having part of the ID in params and part in the query string? Why not just have everything in the query string. Could the schema demand one of either entity-id
or both of legacy-unique-pid
and legacy-endpoint-id
// process identifiers that updates the tree structure as we push values into the cache. | ||
// | ||
// When we query a single level of results for child process events we have a flattened, sorted result | ||
// list that we need to add into a constructed tree. We also need to signal in an API response whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be added into a 'constructed tree'? Doesn't the API only need to return a flat list of (partial) process events?
// | ||
// When we query a single level of results for child process events we have a flattened, sorted result | ||
// list that we need to add into a constructed tree. We also need to signal in an API response whether | ||
// or not there are more child processes events that we have not yet retrieved, and, if so, for what parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to signal in an API response whether or not there are more child process events? Even if we needed that, how would we calculate it?
There could be more events if:
- If there are still events on the endpoint itself that haven't been sent
- There are events in an intermediate queueing system e.g. Logstash
- There are events that have been received by ES but not yet propagated
- The endpoint hasn't been turned off, rebooted, or had the sensor removed, and at least 1 process in the graph hasn't been terminated.
Unless this API can definitely say that there are no more events, the UI will need to let the user check for events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might still be useful to have the API tell you that either there definitively are more events or that there were no more last time it looked. Maybe to the user the difference the difference would be a "get more events" button vs a "check for new events" button or similar.
// list that we need to add into a constructed tree. We also need to signal in an API response whether | ||
// or not there are more child processes events that we have not yet retrieved, and, if so, for what parent | ||
// process. So, at the end of our tree construction we have a relational layout of the events with no | ||
// pagination information for the given parent nodes. In order to actually construct both the tree and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What pagination information is needed, and why? The entity-id (or in the case of legacy events, the unique-pid and endpoint-id) should be all that's needed to call this API a subsequent time, right? So why is anything else needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do pagination using search_after
we need the timestamp
and eventID
(unique identifier for the last event retrieved).
// pagination information for the given parent nodes. In order to actually construct both the tree and | ||
// insert the pagination information we basically do the following: | ||
// | ||
// 1. Using a terms aggregation query, we return an approximate roll-up of the number of child process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please justify why this is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is so we can so totals in the UI.
this.root = root; | ||
} | ||
|
||
public render() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document all public methods. Also, please provide explicit return types when possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot to unpack here
eventsPromise: Promise<Tree>, | ||
alertsPromise: Promise<Tree> | ||
) { | ||
const [children, ancestors, events, alerts] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider moving this logic to the caller. This function should have a single responsibility. Until all of these promises return successfully, this function can't merge anything. If any promise rejects, this function can't run at all. Therefore I argue that resolving the promises (and handling any errors) should be done at a higher level.
} | ||
} | ||
|
||
private async doAlerts(tree: Tree, limit: number, after?: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented out code
|
||
const fetcher = new Fetcher(client, id, endpointID); | ||
const tree = await Tree.merge( | ||
fetcher.children(children, generations, afterChild), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant all queries for a single ID be done concurrently via the msearch
API? https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's an optimization we could probably make. It will be even more useful now that we have the process ancestry array from the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ We'll plan on exploring using the process ancestry array to make improvements to the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to support events that don't have the ancestry array as well. For example, events that siem is aware of
This reverts commit 7fdf029.
…ical alignment of the process node icon
pagination: NodePagination; | ||
stats?: NodeStats; | ||
ancestors?: ResolverNode[]; | ||
parent?: ResolverNode | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this field, I can clean that up later if you'd like.
if (!this.root.ancestors) { | ||
this.root.ancestors = []; | ||
} | ||
this.root.ancestors.push(newParent); | ||
} | ||
this.cache[ancestorID].lifecycle.push(event); | ||
const currentAncestor = this.cache.get(ancestorID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move this up into the previous if
statement and remove the if (currentAncestor !== undefined) {
check right?
7d00557
to
a607144
Compare
For any exported stuff, can you add comments to it? For exported functions, can you add explicit return types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider removing any dead code, e.g. the stats part of the API.
export function parentEntityId(event: ResolverEvent) { | ||
if (isLegacyEvent(event)) { | ||
const ppid = event.endgame.unique_ppid; | ||
return String(ppid); // if unique_ppid is undefined return undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we shouldn't cast undefined to a string. Could you add an explicit return type here?
export function eventName(event: EndpointEvent | LegacyEndpointEvent): string { | ||
if (isLegacyEvent(event)) { | ||
return event.endgame.process_name ? event.endgame.process_name : ''; | ||
} else { | ||
return event.process.name; | ||
} | ||
} | ||
|
||
export function eventId(event: ResolverEvent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind moving these methods that take "ResolverEvent" to a file called "resolver_event"
@@ -46,6 +46,29 @@ export class EndpointAppConstants { | |||
static ALERT_LIST_DEFAULT_SORT = '@timestamp'; | |||
} | |||
|
|||
export interface NodeStats { | |||
totalEvents: number; | |||
totalAlerts: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner if we showed the user some total, we would need to explain what the total is. What is this?
} | ||
|
||
export interface NodePagination { | ||
nextChild?: string | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner @kqualters-elastic if there is special meaning to these interface fields, can we document it here?
Also, can you clarify what "you've reached the last page" means? Data could still be coming in right? So unless we know that there will never be any more data, we will probably need to assume that there is more.
Let me know if I'm wrong or unclear. I'm writing everything on a tablet and it's tedious 😅
} | ||
|
||
export interface NodePagination { | ||
nextChild?: string | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO ask about this
@@ -31,12 +32,21 @@ export function registerResolverRoutes(router: IRouter, endpointAppContext: Endp | |||
handleChildren(log) | |||
); | |||
|
|||
router.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you delete it then?
@@ -39,8 +39,9 @@ interface ChildrenPathParams { | |||
export const validateChildren = { | |||
params: schema.object({ id: schema.string() }), | |||
query: schema.object({ | |||
after: schema.maybe(schema.string()), | |||
limit: schema.number({ defaultValue: 10, min: 1, max: 100 }), | |||
children: schema.number({ defaultValue: 10, min: 10, max: 100 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the UI isn't specifying these params (children and generations) dynamically, we should consider hard coding it in the server. I don't see a reason why this should be dynamic.
// Ideally we'd look for one of process_start or process_running, not both | ||
// so to solve this we'll probably want to either search for all of them and only return one if that's | ||
// possible in elastic search or in memory pull out a single event to return | ||
// https://github.com/elastic/endpoint-app-team/issues/168 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please remove the link here
alerts: { | ||
filter: { term: { 'event.kind': 'alert' } }, | ||
aggs: { | ||
ids: { terms: { field: 'process.entity_id' } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could just remove the stats part of this API. We aren't using it, right?
|
||
const fetcher = new Fetcher(client, id, endpointID); | ||
const tree = await Tree.merge( | ||
fetcher.children(children, generations, afterChild), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to support events that don't have the ancestry array as well. For example, events that siem is aware of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress!
💛 Build succeeded, but was flaky
Test FailuresKibana Pipeline / kibana-xpack-agent / X-Pack API Integration Tests.x-pack/test/api_integration/apis/fleet/enrollment_api_keys/crud·ts.apis Fleet Endpoints fleet_enrollment_api_keys_crud GET /fleet/enrollment-api-keys should list existing api keysStandard Out
Stack Trace
History
To update your PR or re-run it, just comment with: |
* master: (60 commits) [SIEM] Create template timeline (elastic#63136) load react component lazily in so management section (elastic#64285) Cleanup .eslingignore and add target (elastic#64617) [Ingest] Support yaml variables in datasource (elastic#64459) typescript-ify portions of src/optimize (elastic#64688) [ngSanitize] add explicit dependencies to all uses of `ngSanitize` angular module (elastic#64546) Consolidate downloading plugin bundles to bootstrap script (elastic#64685) [Maps] disable edit layer button when flyout is open for add layer or map settings (elastic#64230) chore(NA): add async import into infra plugin to reduce apm bundle size (elastic#63292) [Maps] fix edit filter (elastic#64586) [SIEM][Detections] Adds large list support using REST endpoints Replace a number of any-ed styled(eui*) with accurate types (elastic#64555) [Endpoint] Recursive resolver children (elastic#61914) [ML] Fix new job wizard with multiple indices (elastic#64567) Use short URLs for legacy plugin deprecation warning (elastic#64540) [Uptime] Update uptime ml job id to limit to 64 char (elastic#64394) [Ingest] Fix GET /enrollment-api-keys/null error (elastic#64595) Consolidate cross-cutting concerns between region & coordinate maps in new maps_legacy plugin (elastic#64123) ES UI new platform cleanup (elastic#64332) [Event Log] use @timestamp field for queries (elastic#64391) ...
* alerting/np-migration: (64 commits) [ML] Changes Machine learning overview UI text (elastic#64625) [Uptime] Migrate client to New Platform (elastic#55086) Slim vis type timeseries (elastic#64631) [Telemetry] Fix inconsistent search behaviour in Advanced Settings (elastic#64510) removed unneeded dep and file [SIEM] Create template timeline (elastic#63136) load react component lazily in so management section (elastic#64285) Cleanup .eslingignore and add target (elastic#64617) [Ingest] Support yaml variables in datasource (elastic#64459) typescript-ify portions of src/optimize (elastic#64688) [ngSanitize] add explicit dependencies to all uses of `ngSanitize` angular module (elastic#64546) Consolidate downloading plugin bundles to bootstrap script (elastic#64685) [Maps] disable edit layer button when flyout is open for add layer or map settings (elastic#64230) chore(NA): add async import into infra plugin to reduce apm bundle size (elastic#63292) [Maps] fix edit filter (elastic#64586) [SIEM][Detections] Adds large list support using REST endpoints Replace a number of any-ed styled(eui*) with accurate types (elastic#64555) [Endpoint] Recursive resolver children (elastic#61914) [ML] Fix new job wizard with multiple indices (elastic#64567) Use short URLs for legacy plugin deprecation warning (elastic#64540) ...
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
1 similar comment
Looks like this PR has a backport PR but it still hasn't been merged. Please merge it ASAP to keep the branches relatively in sync. |
Summary
Changes resolver middleware along with defining a new resolver api that returns the entire tree in one http response. Adds error handling to initial request. Events are still flattened by the frontend, unifying the data structures between the UI and server to be done possibly in another pr.
Checklist
Delete any items that are not applicable to this PR.
For maintainers