-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] GeoIPv2 #68920
Labels
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
Meta
release highlight
Team:Data Management
Meta label for data/management team
v7.14.0
v8.0.0-alpha1
Comments
probakowski
added
Meta
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
v8.0.0
labels
Feb 11, 2021
elasticmachine
added
the
Team:Data Management
Meta label for data/management team
label
Feb 11, 2021
Pinging @elastic/es-core-features (Team:Core/Features) |
probakowski
added a commit
that referenced
this issue
Feb 23, 2021
This change adds component that will download new GeoIP databases from infra service New databases are downloaded in chunks and stored in .geoip_databases index Downloads are verified against MD5 checksum provided by the server Current state of all stored databases is stored in cluster state in persistent task state Relates to #68920
probakowski
added a commit
to probakowski/elasticsearch
that referenced
this issue
Feb 23, 2021
This change adds component that will download new GeoIP databases from infra service New databases are downloaded in chunks and stored in .geoip_databases index Downloads are verified against MD5 checksum provided by the server Current state of all stored databases is stored in cluster state in persistent task state Relates to elastic#68920 # Conflicts: # modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java
probakowski
added a commit
that referenced
this issue
Feb 24, 2021
This change adds component that will download new GeoIP databases from infra service New databases are downloaded in chunks and stored in .geoip_databases index Downloads are verified against MD5 checksum provided by the server Current state of all stored databases is stored in cluster state in persistent task state Relates to #68920
probakowski
added a commit
that referenced
this issue
Feb 24, 2021
This change adds query parameter confirming that we accept ToS of GeoIP database service provided by Infra. It also changes integration test to use lower timeout when using local fixture. Relates to #68920
probakowski
added a commit
to probakowski/elasticsearch
that referenced
this issue
Feb 24, 2021
This change adds query parameter confirming that we accept ToS of GeoIP database service provided by Infra. It also changes integration test to use lower timeout when using local fixture. Relates to elastic#68920
probakowski
added a commit
that referenced
this issue
Feb 24, 2021
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Feb 24, 2021
…ownloader This component is responsible for making the databases maintained by GeoIpDownloader available for ingest processors. Also provided a lookup mechanism for geoip processors with fallback to {@link LocalDatabases}. All databases are downloaded into a geoip tmp directory, which is created at node startup. The following high level steps are executed after each cluster state update: 1) Check which databases are available in {@link GeoIpTaskState}, which is part of the geoip downloader persistent task. 2) For each database check whether the databases have changed by comparing the local and remote md5 hash or are locally missing. 3) For each database identified in step 2 start downloading the database chunks. Each chunks is appended to a tmp file (inside geoip tmp dir) and after all chunks have been downloaded, the database is uncompressed and renamed to the final filename.After this the database is loaded and if there is an old instance of this database then that is closed. 4) Cleanup locally loaded databases that are no longer mentioned in {@link GeoIpTaskState}. Relates to elastic#68920
martijnvg
added a commit
that referenced
this issue
Mar 4, 2021
…ownloader (#69540) This component is responsible for making the databases maintained by GeoIpDownloader available for ingest processors. Also provided a lookup mechanism for geoip processors with fallback to {@link LocalDatabases}. All databases are downloaded into a geoip tmp directory, which is created at node startup. The following high level steps are executed after each cluster state update: 1) Check which databases are available in {@link GeoIpTaskState}, which is part of the geoip downloader persistent task. 2) For each database check whether the databases have changed by comparing the local and remote md5 hash or are locally missing. 3) For each database identified in step 2 start downloading the database chunks. Each chunks is appended to a tmp file (inside geoip tmp dir) and after all chunks have been downloaded, the database is uncompressed and renamed to the final filename. After this the database is loaded and if there is an old instance of this database then that is closed. 4) Cleanup locally loaded databases that are no longer mentioned in {@link GeoIpTaskState}. Relates to #68920
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Mar 4, 2021
…GeoIpDownloader Backport of elastic#69540 to 7.x branch. This component is responsible for making the databases maintained by GeoIpDownloader available for ingest processors. Also provided a lookup mechanism for geoip processors with fallback to {@link LocalDatabases}. All databases are downloaded into a geoip tmp directory, which is created at node startup. The following high level steps are executed after each cluster state update: 1) Check which databases are available in {@link GeoIpTaskState}, which is part of the geoip downloader persistent task. 2) For each database check whether the databases have changed by comparing the local and remote md5 hash or are locally missing. 3) For each database identified in step 2 start downloading the database chunks. Each chunks is appended to a tmp file (inside geoip tmp dir) and after all chunks have been downloaded, the database is uncompressed and renamed to the final filename. After this the database is loaded and if there is an old instance of this database then that is closed. 4) Cleanup locally loaded databases that are no longer mentioned in {@link GeoIpTaskState}. Relates to elastic#68920
martijnvg
added a commit
that referenced
this issue
Mar 10, 2021
…ownloader (#69971) Backport of #69540 to 7.x branch. This component is responsible for making the databases maintained by GeoIpDownloader available for ingest processors. Also provided a lookup mechanism for geoip processors with fallback to {@link LocalDatabases}. All databases are downloaded into a geoip tmp directory, which is created at node startup. The following high level steps are executed after each cluster state update: 1) Check which databases are available in {@link GeoIpTaskState}, which is part of the geoip downloader persistent task. 2) For each database check whether the databases have changed by comparing the local and remote md5 hash or are locally missing. 3) For each database identified in step 2 start downloading the database chunks. Each chunks is appended to a tmp file (inside geoip tmp dir) and after all chunks have been downloaded, the database is uncompressed and renamed to the final filename. After this the database is loaded and if there is an old instance of this database then that is closed. 4) Cleanup locally loaded databases that are no longer mentioned in {@link GeoIpTaskState}. Relates to #68920 Other cherry-picked commits: * Fix ReloadingDatabasesWhilePerformingGeoLookupsIT (#70163) Wait for ingest threads to stop using the DatabaseReaderLazyLoader, so the during the next run the db update thread doesn't try to remove the db again (because the file hasn't yet been deleted). Also delete tmp dirs this test create at the end of the test, so that when repeating this test many times, this test doesn't accumulate many directories with database files. Closes #69980 * Fix clean up of old entries in DatabaseRegistry.initialize (#70135) This change switches clean up in DatabaseRegistry.initialize from using Files.walk and stream operations to Files.walkFileTree which can be made more robust in case of errors * Fix DatabaseRegistryTests (#70180) This test predefined expected md5 hashes in constants, that were expected with java15. However java16 creates different md5 hashes and so the expected md5 hashes don't match with the actual md5 hashes, which caused tests in this test suite to fail (running with java16 only). The tests now generates the expected md5 hash during the test instead of using predefined constants. Closes #69986 * Fix GeoIpDownloaderIT#testUseGeoIpProcessorWithDownloadedDBs(...) test (#70215) The test failure looks legit, because there is a possibility that the same databases was downloaded twice. See added comment in DatabaseRegistry class. Relates to #69972 * Muted GeoIpDownloaderIT#testUseGeoIpProcessorWithDownloadedDBs(...) test, see #69972 Co-authored-by: Przemko Robakowski <przemko.robakowski@elastic.co>
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Mar 16, 2021
This change adjust where the geoip tmp directory is created to avoid issues when running multiple nodes on the same machine. In the java tmp dir, a 'geoip-databases' directory is created and directly under this directory a directory with the node id as name is created. This allows safely running multiple nodes on the same machine (this happens mainly during tests). Closes elastic#69972 Relates to elastic#68920
martijnvg
added a commit
that referenced
this issue
Mar 17, 2021
This change adjust where the geoip tmp directory is created to avoid issues when running multiple nodes on the same machine. In the java tmp dir, a 'geoip-databases' directory is created and directly under this directory a directory with the node id as name is created. This allows safely running multiple nodes on the same machine (this happens mainly during tests). Closes #69972 Relates to #68920
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Mar 17, 2021
Backport elastic#70462 to 7.x branch. This change adjust where the geoip tmp directory is created to avoid issues when running multiple nodes on the same machine. In the java tmp dir, a 'geoip-databases' directory is created and directly under this directory a directory with the node id as name is created. This allows safely running multiple nodes on the same machine (this happens mainly during tests). Closes elastic#69972 Relates to elastic#68920
martijnvg
added a commit
that referenced
this issue
Mar 17, 2021
Backport #70462 to 7.x branch. This change adjust where the geoip tmp directory is created to avoid issues when running multiple nodes on the same machine. In the java tmp dir, a 'geoip-databases' directory is created and directly under this directory a directory with the node id as name is created. This allows safely running multiple nodes on the same machine (this happens mainly during tests). Closes #69972 Relates to #68920
probakowski
added a commit
that referenced
this issue
Mar 23, 2021
This change adds _geoip/stats endpoint that can be used to collect basic data about geoip downloader (successful, failed and skipped downloads, current db count and total time spent downloading). It also fixes missing/wrong origins for clients that will break if used with security. Relates to #68920
probakowski
added a commit
to probakowski/elasticsearch
that referenced
this issue
Mar 23, 2021
This change adds _geoip/stats endpoint that can be used to collect basic data about geoip downloader (successful, failed and skipped downloads, current db count and total time spent downloading). It also fixes missing/wrong origins for clients that will break if used with security. Relates to elastic#68920
probakowski
added a commit
that referenced
this issue
Apr 15, 2021
This PR adds documentation for GeoIPv2 auto-update feature. It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting. Relates to #68920 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
probakowski
added a commit
that referenced
this issue
Apr 15, 2021
* Enable GeoIP downloader by default (#71505) This change enables GeoIP downloader by default. It removes feature flag but adds flag that is used by tests to disable it again (as we don't want to hammer GeoIP database service with every test cluster we spin up). Relates to #68920 * fix compilation * spotless * packaging tests * disableGeoIpDownloader * fix packaging
probakowski
added a commit
to probakowski/elasticsearch
that referenced
this issue
Apr 15, 2021
This PR adds documentation for GeoIPv2 auto-update feature. It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting. Relates to elastic#68920 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
probakowski
added a commit
that referenced
this issue
Apr 15, 2021
* Update GeoIP processor documentation (#71211) This PR adds documentation for GeoIPv2 auto-update feature. It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting. Relates to #68920 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
62 tasks
@probakowski should this be relabeled to |
it should, I've updated it |
As all work for 7.x is done and there's just 1 task left, I'll close this issue as done |
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Oct 7, 2021
adjusted tests. Kept the `geolite2-databases` dependency for tests only. Relates to elastic#68920
martijnvg
added a commit
that referenced
this issue
Oct 13, 2021
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs). * Kept the geolite2-databases dependency for most of the unit tests only. * Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it. * If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline. * Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded. Relates to #68920
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
Meta
release highlight
Team:Data Management
Meta label for data/management team
v7.14.0
v8.0.0-alpha1
This is meta issue to track progress of GeoIPv2 work. There are currently these identified items we need to complete for GA:
.tgz
files from infra endpoint @probakowski (Add support for .tgz files in GeoIpDownloader #70725)The text was updated successfully, but these errors were encountered: