Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet ignores pipelines when loading APM package #126695

Closed
simitt opened this issue Mar 2, 2022 · 6 comments · Fixed by #126915
Closed

Fleet ignores pipelines when loading APM package #126695

simitt opened this issue Mar 2, 2022 · 6 comments · Fixed by #126915
Assignees
Labels
bug Fixes for quality problems that affect the customer experience QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team v8.1.0

Comments

@simitt
Copy link
Contributor

simitt commented Mar 2, 2022

Kibana version: 8.1.0-BC5

Description of the problem including expected versus actual behavior:
When installing the APM package via Fleet, the index template's settings are empty and no apm pipelines are set up. The index templates should pick up the configured pipelines from the APM package and reference them in the settings such as:

{
  "index": {
    "default_pipeline": "traces-apm-8.1.0"
  }
}

@joshdover took an initial look and confirmed that this is a setup problem in Fleets, most certainly due to a bug when installing the package from the bundled packages instead of fetching them from EPR.

Steps to reproduce:

  1. Create a deployment with Integrations Server enabled on ESS for 8.1.0 (latest BC5)
  2. Navigate to Stack Management and check the settings on the apm index templates, for example on the traces-apm-default template. Observe that there are no settings

Screenshot 2022-03-02 at 17 02 51

  1. Navigate to the Ingest pipelines and observe that no apm pipelines are loaded.

Screenshot 2022-03-02 at 17 02 40

Why is it critical
Without the pipelines APM data are ingested in a different format than expected, with critical data missing. The APM UI cannot display the data and stays empty.

@simitt simitt added bug Fixes for quality problems that affect the customer experience blocker Team:Fleet Team label for Observability Data Collection Fleet team v8.1.0 labels Mar 2, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover
Copy link
Contributor

It seems that we're not install ingest pipelines for bundled packages. When I install the 8.1.0 package without the zip in the x-pack/plugins/fleet/server/bundled_packages directory the ingest pipelines are there, but when I put the zip back they're gone. You can verify this in development.

@joshdover
Copy link
Contributor

joshdover commented Mar 2, 2022

It seems we overlooked the differences and gaps in the upload package functionality. I've compared the PackageInfo object that we use to run all of the install code based on between the one we fetch from the registry and the one from an uploaded package and there are many functionality differences. See the differences in the files in packageInfo.zip with a tool like http://jsondiff.com/

Here's the diff of the changes I made to output these files. You'll want to put them outside the kibana repo to avoid restarting the server in dev mode:

# copy diff below and then run this command (type it) - macOS
$ pbpaste | git apply -  

You may need to edit the output directories depending on your system:

diff --git a/x-pack/plugins/fleet/server/services/epm/packages/install.ts b/x-pack/plugins/fleet/server/services/epm/packages/install.ts
index 107b906a969..88a0fb6ee65 100644
--- a/x-pack/plugins/fleet/server/services/epm/packages/install.ts
+++ b/x-pack/plugins/fleet/server/services/epm/packages/install.ts
@@ -5,6 +5,8 @@
  * 2.0.
  */
 
+import fs from 'fs/promises';
+
 import { i18n } from '@kbn/i18n';
 import semverLt from 'semver/functions/lt';
 import type Boom from '@hapi/boom';
@@ -301,6 +303,11 @@ async function installPackageFromRegistry({
     // get package info
     const { paths, packageInfo } = await Registry.getRegistryPackage(pkgName, pkgVersion);
 
+    await fs.writeFile(
+      `~/src/package-info-registry-${packageInfo.name}-${packageInfo.version}.json`,
+      JSON.stringify(packageInfo, undefined, 2)
+    );
+
     if (!licenseService.hasAtLeast(packageInfo.license || 'basic')) {
       const err = new Error(`Requires ${packageInfo.license} license`);
       sendEvent({
@@ -391,6 +398,11 @@ async function installPackageByUpload({
   try {
     const { packageInfo } = await parseAndVerifyArchiveEntries(archiveBuffer, contentType);
 
+    await fs.writeFile(
+      `~/src/package-info-upload-${packageInfo.name}-${packageInfo.version}.json`,
+      JSON.stringify(packageInfo, undefined, 2)
+    );
+
     const installedPkg = await getInstallationObject({
       savedObjectsClient,
       pkgName: packageInfo.name,

@kpollich
Copy link
Member

kpollich commented Mar 2, 2022

I've spent some time investigating the differences @joshdover mentioned above between the package info objects Fleet fetches from the registry, and those we generate from package archives.

  • Fleet implementations validation logic in a parseAndVerify function that runs on uploaded package archives. This function is intended to run the same checks and data massaging that happens in the package-registry repo. This is called out by a comment.
  • Given the above, it seems that the initial implementation of what was at first a one-to-one copy of some of the logic running in package registry has fallen out of date. There are various changes to fields and additions that we're not accounting for.
    • e.g. the package registry now includes an assets array of file names, some nested fields have been replaced w/ dot notation field names like index_template.mappings and index_template.settings that Fleet relies on to install various assets
  • An ideal fix would be to include a "built" version of the package in JSON inside of the archive files hosted on EPR. If we could pull in a package_info.json file after unzipping a package archive that runs through the same validation logic, etc behind the package info endpoint, Fleet would no longer have to concern itself with reimplementing this logic
  • An interim fix would be to reimplement any of this logic we're currently missing and try to get any field name changes we're relying on to match. At least enough such that APM can install as expected. I think this would be making sure we set ingest_pipeline.name": "default" and "ingest_pipeline": "default", in a few key places within the data stream objects

e.g. Here's the same data stream record in the uploaded version of APM and the Registry version:

Uploaded version:

 {
    "dataset": "apm.app",
    "elasticsearch": {
        "index_template": {
            "mappings": {
                "dynamic": false,
                "dynamic_templates": [
                    {
                        "numeric_labels": {
                            "mapping": {
                                "scaling_factor": 1000000,
                                "type": "scaled_float"
                            },
                            "path_match": "numeric_labels.*"
                        }
                    }
                ]
            }
        }
    },
    "ilm_policy": "logs-apm.app_logs-default_policy",
    "package": "apm",
    "path": "app_logs",
    "streams": [ // This is an unused field that could be removed, it appears
    ],
    "title": "APM application logs",
    "type": "logs"
},

Registry version:

{
    "dataset": "apm.app",
    "elasticsearch": {
        "index_template.mappings": { // 👈
            "dynamic": false,
            "dynamic_templates": [
                {
                    "numeric_labels": {
                        "mapping": {
                            "scaling_factor": 1000000,
                            "type": "scaled_float"
                        },
                        "path_match": "numeric_labels.*"
                    }
                }
            ]
        },
        "ingest_pipeline.name": "default" // 👈
    },
    "ilm_policy": "logs-apm.app_logs-default_policy",
    "ingest_pipeline": "default", // 👈
    "package": "apm",
    "path": "app_logs",
    "release": "ga",
    "title": "APM application logs",
    "type": "logs"
},

So, we'd need to account for at least the following edits in all data streams in our validation:

  • Rename index_template value within elasticearch property to index_template.mappings
  • Ensure ingest_pipeline property is set to default (or another appropriate value? Not sure on this one.)
  • Add release property (might not be necessary)

There are similar differences in inputs where various boolean values are explicitly set to false in the registry package info object while they're simply omitted in the archive version Fleet is generating.

I think a minimum implementation to unblock APM would be aligning the property names and pipeline values in the streams objects, but I'm not 100% sure. The vars changes and other minor field editions (file paths of screenshots are now included for example) are non-blocking and we could likely skip them.

The root cause solution here as it was even called out in November, 2020 (actually probably before that, because this file was renamed 🙃) is to rely on some verification that comes from the package spec or that's external to Fleet instead. If the verification/build process happened completely within the package registry repo and we included a JSON payload of the resulting object (exactly what we serve at endpoints like https://epr.elastic.co/package/apm/8.1.0/), that would completely remove Fleet's need to do any kind of duplicative validation/build-up of the package info object like we do currently. This is brittle because we attempted to duplicate the package registry's behavior here for our upload endpoint, and as soon as stopped maintaining the upload logic, we fell out of sync with any changes made to these package payloads.

@joshdover
Copy link
Contributor

joshdover commented Mar 3, 2022

Thanks, @kpollich for the very detailed investigation and writeup.

  • An ideal fix would be to include a "built" version of the package in JSON inside of the archive files hosted on EPR. If we could pull in a package_info.json file after unzipping a package archive that runs through the same validation logic, etc behind the package info endpoint, Fleet would no longer have to concern itself with reimplementing this logic

This is an interesting idea and would probably be a viable short-term solution for bundled packages, however I don't see how we could solve #70582 with this solution for packages that are not in a registry at all. In #115032 it's also being requested that we move in the opposite direction, to stop relying on the registry API to do much processing of package contents for us. I think the main issue we need to figure out is whether or not we should solve our data stream processing code in Kibana now or go with a short-term solution like bundling the package info as you described.

IMO attempting to fix our data stream processing code would be time better spent. For the solution you suggested, we'd need to make adjustments to the upload package flow to support pulling the package info from this file/object, and not just in the install case, but also in the general case where the package is loaded from the epm-packages saved objects. To me, this sounds like a decent amount of effort to add what would essentially be considered tech debt. That said, if we go with trying to fix our processing code, I think we should timebox this and consider your alternative if it seems like too high a lift.

I think a minimum implementation to unblock APM would be aligning the property names and pipeline values in the streams objects, but I'm not 100% sure. The vars changes and other minor field editions (file paths of screenshots are now included for example) are non-blocking and we could likely skip them.

I agree, knowing what is missing is going to be difficult. The only way I can think of to make sure that we don't miss anything, at least for the known bundled packages, is to have a test that compares the package info response from the registry with the one that we generate from Kibana. We could use this test to verify that our logic matches whenever we bump the package versions in fleet_packages.json.

We could also do a one-off test of switching over to our own logic for every package and compare it to the registry to identify any remaining gaps that are not visible when testing just the bundled packages. This would help us be able to close the gap to support #70582.

@ghost
Copy link

ghost commented Mar 17, 2022

Hi @joshdover,
We have validated this issue on the released 8.1.0 build and observed that the issue is fixed. Please find below the testing details:

  • The index templates picks up the configured pipelines from the APM package and displays the data in the settings tab.

Screen-Recording:

Home.-.Elastic.-.Google.Chrome.2022-03-17.14-03-29.mp4

Build Details:
VERSION: 8.1.0
COMMIT: 4aaeda2
BUILD: 50485

Thanks!

@amolnater-qasource amolnater-qasource added the QA:Validated Issue has been validated by QA label Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team v8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants