Skip to content

Conversation

@rubvs
Copy link
Contributor

@rubvs rubvs commented Aug 26, 2025

Closes #17603 & fixes #18313

This PR contains some confusion. Mostly, originating from how ES handles GeoIp pipelines. The GeoIP processor operates in two different phases:

  1. When the pipelines are created (think compile time)
  2. When data are ingest into the pipelines (think runtime)

If the GeoIp database is unavailable on creation, all following ingested data will be tagged, see Code

If the database was available on creation - but is currently unavailable on data ingestion (runtime) - the incoming data will only be tagged if ignore_missing = false, see Code. We have ignore_missing: true in our geoip pipeline for APM Server, see Code.

Therefore, the system tests are testing for 3 things:

  1. Compile Time: The database is unavailable on start (compile time). We have to restart the container without downloading - in which case, the tags field will be present the response.
  2. Runtime: Enrichment succeeded due to the GeoIp database being available (downloaded) when the container starts.
  3. Runtime: Enrichment fails due to missing ip address or the database was removed after initial creation (runtime). In this case, there should be no tags, but the client.geo fields in the response should be missing.

@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Aug 26, 2025

This pull request does not have a backport label. Could you fix it @rubvs? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-9./d is the label to automatically backport to the 9./d branch. /d is the digit.
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

return false
}
return status.Status == "HEALTHY"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused code. cleanup.

@rubvs rubvs force-pushed the fix-geoip-systemtest branch from 54e65eb to 7045b25 Compare August 27, 2025 18:29
@rubvs rubvs force-pushed the fix-geoip-systemtest branch from fa87eb7 to 7509e8f Compare August 27, 2025 19:03
@rubvs rubvs marked this pull request as ready for review August 27, 2025 19:23
@rubvs rubvs requested a review from a team as a code owner August 27, 2025 19:23
@rubvs rubvs requested a review from simitt August 27, 2025 20:05
@rubvs rubvs changed the title download geoip db instead of mounting geoip system tests Aug 27, 2025
Copy link
Member

@kruskall kruskall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and digging into the flaky geoip download! 🙇

I'm afraid this has diverged significantly from what the initial ask/issue was about and increased in scope to a point which I'm unsure we should care in apm-server.

@simitt
Copy link
Contributor

simitt commented Aug 28, 2025

Here's how I understand this to work: The apm-data plugin defines download_database_on_pipeline_creation: false. This means that the geoip database is not downloaded when the apm assets are installed, but only on first actual need, so when the first document with the client.ip field set, arrives.
The question is why the behaviour is flaky when the database is not pre-downloaded. In previous conversations @rubvs indicated that this is not a timing issue, but I am not sure there is yet an alternative explanation for it.

@masseyke
Copy link
Member

Is the problem just that it takes a little bit of time for elasticsearch to download and install the database? It does not block requests while this is happening, so some will be tagged as if no geoip database exists.

@rubvs
Copy link
Contributor Author

rubvs commented Aug 29, 2025

Thanks @masseyke, the async parts cleared up some confusion. I have updated the system tests accordingly. @simitt the updated test will trigger a lazy download on the first request, which will then poll for the absence of tags for up to 3 minutes, while the database is being downloaded. On success, the resulting hits will be asserted with the approvals.

@rubvs
Copy link
Contributor Author

rubvs commented Aug 29, 2025

Mmh, don't know why the fips unit tests are failing though.

@rubvs rubvs requested a review from kruskall August 29, 2025 03:50
@rubvs rubvs force-pushed the fix-geoip-systemtest branch from 3edc38d to 5c093c2 Compare August 29, 2025 20:01
@rubvs rubvs requested a review from simitt August 29, 2025 20:13
@rubvs
Copy link
Contributor Author

rubvs commented Aug 29, 2025

  • Removed downloading the db on startup.
  • When required, download the geoip db lazily.
  • The approvals already check for tags. I tested this manually to make sure.

Moreover, from an abstraction level this PR reintroduces locality: The test that requires the geoip db, is responsible for downloading the db.

The GeoIpLazyDownload sends an event containing client.ip to trigger the download, then waits until finished. The datastreams have to cleared, since they will contain noisy tagged items.

isaacaflores2
isaacaflores2 previously approved these changes Aug 29, 2025
Copy link
Contributor

@isaacaflores2 isaacaflores2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this issue and sharing the geoIP database details you learned about ElasticSearch.

It looks like test requested in #17603 were already present. Like you mentioned the approvals.json files do not contain a tags field which will assert the absence of tags given that ElasticSearch has been properly setup with the geoip databases.

@rubvs rubvs force-pushed the fix-geoip-systemtest branch from 99a9c62 to 814ca4a Compare September 5, 2025 02:37
@rubvs rubvs force-pushed the fix-geoip-systemtest branch from 814ca4a to 910da49 Compare September 5, 2025 02:38
@rubvs rubvs requested a review from simitt September 5, 2025 02:38
simitt
simitt previously approved these changes Sep 5, 2025
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@simitt
Copy link
Contributor

simitt commented Sep 5, 2025

@kruskall please re-review as you had requested changes, so this can be brought over the finish line

@rubvs rubvs enabled auto-merge September 5, 2025 22:44
1pkg
1pkg previously approved these changes Sep 5, 2025
@rubvs rubvs dismissed stale reviews from 1pkg and simitt via 72e46f0 September 8, 2025 16:54
@rubvs rubvs added this pull request to the merge queue Sep 9, 2025
Merged via the queue into elastic:main with commit 45ff456 Sep 9, 2025
19 checks passed
@rubvs rubvs deleted the fix-geoip-systemtest branch September 9, 2025 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

failing geoip system test when run in isolation Add test for the absence of tags in apm documents

6 participants