Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Geo IP #2120

Closed
neilmunday opened this issue May 24, 2020 · 61 comments · Fixed by #3493
Closed

Add support for Geo IP #2120

neilmunday opened this issue May 24, 2020 · 61 comments · Fixed by #3493

Comments

@neilmunday
Copy link

neilmunday commented May 24, 2020

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
It would be great if promtail/loki had a GEO IP feature like LogStash. E.g. regex identifies IP addresses in log message and performs GEO IP look-up to add additional fields to store location. This could then be used by the Grafana World Map plugin - though this plugin may also need updating.

Describe alternatives you've considered

ELK. It already has this feature -> LogStash Geo IP filter + Kibana world map.

Additional context
Add any other context or screenshots about the feature request here.

@adityacs
Copy link
Contributor

@cyriltovena @slim-bean This is a good feature to support. However, this requires us to package the geolite2 or any other similar database file along side Promtail. Also, we should figure out if the license allows us to do this.

WDYT?

@WarraxUA
Copy link

nginx with the module ngx_http_geoip_module can writes geodata tag to the access log
just need a new "Worldmap Panel Plugin" for Grafana with support as a datasource - Loki

nginx log WITH GEODATA TAG -> Promtail -> Loki -> Grafana

P.S. I’m just surprised that the Grafana lab didn’t realize such a simple thing even at the time of the announcement of Loki

@wardbekker
Copy link
Member

Hi @WarraxUA and folks. I've been using an preview branch of the upcoming metrics and field extraction feature. This allowed me to build the below dashboard, with metrics on high cardinality fields. For the Worldmap I've added the GEOIP module to Nginx, and added the country name to the log output. With the following expression I was able to sum by countryname as input for the worldpanel. (syntax pending to change, and it's a bit double escaped sum by (country_code) (count_over_time({filename=\"/var/log/nginx/access.log\"} | regexp \"HTTP\\\\/1\\\\.1\\\" (?P<statuscode>\\\\d{3}) (?P<bytessent>\\\\d+) (?P<refferer>\\\".*?\\\") \\\"(?P<useragent>.*)\\\" \\\"(?P<country_code>.*)\\\"\"[$__interval]))

web_analytics_dashboard_4

@neilmunday
Copy link
Author

Looking good!

@Wnthr
Copy link

Wnthr commented Jul 1, 2020

Hi @WarraxUA and folks. I've been using an preview branch of the upcoming metrics and field extraction feature. This allowed me to build the below dashboard, with metrics on high cardinality fields. For the Worldmap I've added the GEOIP module to Nginx, and added the country name to the log output. With the following expression I was able to sum by countryname as input for the worldpanel. (syntax pending to change, and it's a bit double escaped sum by (country_code) (count_over_time({filename=\"/var/log/nginx/access.log\"} | regexp \"HTTP\\\\/1\\\\.1\\\" (?P<statuscode>\\\\d{3}) (?P<bytessent>\\\\d+) (?P<refferer>\\\".*?\\\") \\\"(?P<useragent>.*)\\\" \\\"(?P<country_code>.*)\\\"\"[$__interval]))

web_analytics_dashboard_4

That looks better than good to me, is the preview branch you speak of available publically, alternatively, is there a time frame in which it will be available? I am currently in the prototyping stage of my project, so running unreleased isn't a concern.

@cyriltovena
Copy link
Contributor

Here is the repo https://github.com/cyriltovena/demo/blob/master/logql/docker-compose.yaml#L8

There’s a small readme but also I gave a talk at GrafanaCon about this https://grafana.com/go/grafanaconline/loki-future/ see at the end, when you hear my weird and funny french accent you found it 😂

For ETA this is hard we’re still trying to make sure the syntax is easy to use and learn as we will live with this forever.

So soon TM.

@stale
Copy link

stale bot commented Aug 1, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Aug 1, 2020
@dfoxg
Copy link

dfoxg commented Aug 1, 2020

Are here any updates? It's a important feature

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Aug 1, 2020
@afletch
Copy link

afletch commented Aug 12, 2020

Being able to enrich data either upon collection in promtail (via a plugin?) or when that data lands in Loki, is really very important.

@adeleglise
Copy link

I would love seeing this too.

@stale
Copy link

stale bot commented Oct 4, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Oct 4, 2020
@neilmunday
Copy link
Author

Any news?

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Oct 6, 2020
@jonkristian
Copy link

Here's just a thought. Wouldn't it be more preferable to enrich data after the logs are collected, then one would not have to add extra overhead on the web server. To be honest though I really don't know how much overhead the geoip would add, but if you have many sites it could impact.

@stale
Copy link

stale bot commented Nov 14, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Nov 14, 2020
@neilmunday
Copy link
Author

A comment to keep this issue open.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Nov 14, 2020
@WarraxUA
Copy link

example dashboard from wardbekker1 with Geo_IP
https://grafana.com/grafana/dashboards/12559

@afletch
Copy link

afletch commented Nov 16, 2020

The dashboards shared in this thread are very nice, but they don't address the issue highlighted by the OP, which is; there is currently no way to enrich data either within Promtail or at the point of ingestion into Loki. GeoIP is a good example of this, but it would apply to any enrichment of collected log data using external lookups.
So, if you have a GeoIP field in the source log data, extracting it (and displaying it) is easy enough. If you don't have GeoIP in the source, then adding this label data is not possible in the flow at present.

(this is possible using fluend as a client, but then you're stepping outside the stack)

Hope this helps clarify what is being requested here, as things seem to have got muddied over time.

@stale
Copy link

stale bot commented Dec 20, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Dec 20, 2020
@neilmunday
Copy link
Author

A comment to keep this issue open.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Dec 20, 2020
@stale
Copy link

stale bot commented Jan 20, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@M-JobPixel
Copy link

I'm interested in how you did this @pmorange.
Was this enrichment en route to the block store or enrichment of the data before it was displayed in grafana?

@mora-phi
Copy link

mora-phi commented Nov 3, 2021

I'm interested in how you did this @pmorange. Was this enrichment en route to the block store or enrichment of the data before it was displayed in grafana?

Hi,
What I wanted to achieve was to display in a map in Grafana the location of IPs I had blocked in my geo IPTables rules (thos IPs outside my country).
All of this was new to me, of course :-)

Here are the steps :

  • First I log my iptables geo blocking events with ulogd in a file (I am now thinking I could have done that directly with rsyslog but I don't want to break everything now that it's working).

I have lines like these in the iptables log file :

Nov  3 15:41:22 alpine GeoIP(OUTSIDE OF ZONE):  IN=eth0 OUT=br-d323db5ba3a3 MAC=00:0c:29:d4:68:ca:74:da:88:20:30:0a:08:00 SRC=80.82.65.247 DST=172.21.0.2 LEN=44 TOS=00 PREC=0x00 TTL=247 ID=54166 PROTO=TCP SPT=58694 DPT=443 SEQ=1345397855 ACK=0 WINDOW=1024 SYN URGP=0 MARK=0
  • What Grafana needs if a geo code to display, so I needed to convert IP to a country code (didn't need to be more specific like city details, so I kept this simpler but it would not be difficult to add this level of detail).
    So I used rsyslog along with lognormalizer to parse the iptables lines, add the geo details and output all of this in a new log file.
    I set up an account to download the country MaxMind DB and a crontab to update the DB each week.
    In an rsyslog configuration file I have this ruleset :
ruleset ( name="geoip_ruleset"){
    action(type="mmnormalize" rulebase="/etc/lognormalizer/iptables_rule.rb")
    if ( $parsesuccess == "OK" ) then {
        action(type="mmdblookup" mmdbfile="/var/lib/libmaxminddb/GeoLite2-Country.mmdb"
               fields=[":code:!country!iso_code"]
               key="!SRC")

        action(type = "omfile" file = "/var/log/ulogd_iptables_geoloc.log" template="iptablesgeoip")
    }
    else if $parsesuccess == "FAIL" then {
        action(type="omfile" File="/tmp/parse-failure")
    }

along with this template :

template(name="iptablesgeoip" type="string" string="%$!date%-%$!SRC%-%$!src_geo!code%\n")

And the lognormalizer ruleset is the following one (file iptables_rule.rb) :

prefix=%date:date-rfc3164% %host:word% %tag:char-to:\x3a%:
rule=:%-:iptables%

With the log line above, I end up with this line in a new log file (ulogd_iptables_geoloc.log) :

Nov  3 15:41:22-80.82.65.247-NL

=> date + IP + 2 letters country code

  • Now that I have the information written in real-time in a log file, along with the Geo Location I need, I set up promtail to read the file and send it to Loki :
- job_name: iptables
  static_configs:
    - targets:
        - localhost
      labels:
        job: iptables
        __path__: /var/log/ulogd_iptables_geoloc.log
  pipeline_stages:
    - match:
        selector: '{job="iptables"}'
        stages:
        - regex:
            expression: '^(?P<time_local>.*)-(?P<source_ip>.*)-(?P<country_code>.*)'
        - labels:
            time_local:
            source_ip:
            country_code:
  • In Grafana I now receive this kind of data :
    image

  • All that is left is to make a new panel of type WorldMap (I am really using Panodata Map Panel but the former works well).
    I was not able to make the newest GeoMap Panel work, don't ask me why, I just didn't manage to do it yet.
    The configuration of the panel is like this :
    image

And now I see this result in real-time of people outside my country that get denied and access to my home network :-)
(The 3 biggest circles are test I did with a VPN, but you see I get visits from other countries too, and that's just data of the last 4 hours)
image

That quite a lot of steps to achieve something that could be implemented directly in Loki.
It was not so easy to make all of this work together, as documentation is often scarce, but at least now it works :-)
I have not talked about how I filter in iptables, that was another adventure entirely :-p

@M-JobPixel
Copy link

Thanks for the detailed explanation.

This is enriching before putting the data into the blockstore.

I think I will have to do something similar with logstash, but I would be interested in moving the enrichment to Loki so it's only done for the loglines we display and not all the loglines.

Of course I'm not sure that's possible. ;-)

@lfasci
Copy link

lfasci commented Nov 6, 2021

This functionality would be very useful, if implemented in Loki it's independent from the agent.
In the meantime another solution should be to use https://vector.dev/

@stale
Copy link

stale bot commented Jan 9, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 9, 2022
@onedr0p
Copy link

onedr0p commented Jan 9, 2022

I've moved from promtail to vector and have this working pretty well. Vector is highly configurable.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 9, 2022
@stale
Copy link

stale bot commented Mar 2, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 2, 2022
@kittydoor
Copy link

This issue is still relevant

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 2, 2022
@mpadinhabrandao
Copy link

ping

@yuangu
Copy link

yuangu commented Apr 8, 2022

Would you like to try this?Another Loki client
https://github.com/tsaikd/gogstash

@lfasci
Copy link

lfasci commented May 27, 2022

ping

@Nihiue
Copy link

Nihiue commented Sep 6, 2022

hi, I got a temporary solution and it works well

https://github.com/Nihiue/loki-enhance-middleware

@onedr0p
Copy link

onedr0p commented Sep 6, 2022

Damn, I'd just use Vector instead of promtail at that point @Nihiue but good stuff.

@R-Studio
Copy link

R-Studio commented Nov 15, 2022

any news?
(maybe a feature for the grafana agent?)

@benisai
Copy link

benisai commented Jan 1, 2023

Interesting. Can someone share the vector config / how to? Or the other methods

@svenvg93
Copy link

svenvg93 commented Jan 5, 2023

@adityacs new update on this?

@benisai
Copy link

benisai commented Jan 5, 2023

I would also like an update

@madmurl0c
Copy link

It would be great if there'd be some kind of adapter to get the geoip data into grafana dashboards. I collect access logs from multiple nginx instances running in docker containers (using the docker loki logging driver) and don't want to bloat every docker container by including the geoip module.

@benisai
Copy link

benisai commented Jan 25, 2023

Ping - Keeping this alive.

@jrx-sjg
Copy link

jrx-sjg commented Mar 14, 2023

Step by step guide to have GeoIP information in nginx logs available to loki:

1.- create an account on MaxMind for geolite2:
https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en

2.- once logged in, create a license key for geolite2.

3.- in your nginx ingress configuration add (for an ingress nginx deployed using helm):

controller:
  config:
    use-geoip2: "true" 
    log-format-upstream: '$remote_addr [$geoip2_city_country_code $geoip2_city_country_name $geoip2_city $geoip2_postal_code $geoip2_latitude $geoip2_longitude] $host $remote_user [$time_local] $request $status $body_bytes_sent $http_referer $http_user_agent $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name]  $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
                    
  maxmindLicenseKey: "<your_license_key>"

4.- Once in loki, you can make a LogQL query like this:

{app="ingress-nginx", cluster="do-fra1-k8s-develop-jx3"} |~ "ModSecurity" != "transaction" | pattern "<date> <time> [<level>] <_> [client <client_ip>] ModSecurity: <modsec_action> (Value: `<score>' ) [file \"/etc/nginx/owasp-modsecurity-crs/rules/<rule_set>.conf\"] <_> [id \"<rule_id>\"] <_> [msg \"<message>\"] <_> [severity \"<severity>\"] <_> <_> <_> <_> <_> [uri \"<uri>\"] [unique_id \"<unique_id>\"] <_>, <_>, <_>, request: \"<request_method> <resource> <http_version>\", host: \"<host>\"" | line_format "{{.client_ip}} ID: {{.unique_id}} [{{.level | upper | trunc 5}}] {{.message}} - {{.request_method}} {{.host}}{{.uri}} (Payload: {{.resource}}) - {{.rule_set}}:{{.rule_id}} [Severity: {{.severity}} Score: {{.score}}]"

5.- (optional) You can also make the parsing in promtail, for better performance and versatility.

Take into consideration you will need to have your nginx installation properly configured to get clients real ip, that can vary among cloud providers.

@mora-phi
Copy link

Step by step guide to have GeoIP information in nginx logs available to loki:

1.- create an account on MaxMind for geolite2: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en

2.- once logged in, create a license key for geolite2.

3.- in your nginx ingress configuration add (for an ingress nginx deployed using helm):

controller:
  config:
    use-geoip2: "true" 
    log-format-upstream: '$remote_addr [$geoip2_city_country_code $geoip2_city_country_name $geoip2_city $geoip2_postal_code $geoip2_latitude $geoip2_longitude] $host $remote_user [$time_local] $request $status $body_bytes_sent $http_referer $http_user_agent $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name]  $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
                    
  maxmindLicenseKey: "<your_license_key>"

4.- Once in loki, you can make a LogQL query like this:

{app="ingress-nginx", cluster="do-fra1-k8s-develop-jx3"} |~ "ModSecurity" != "transaction" | pattern "<date> <time> [<level>] <_> [client <client_ip>] ModSecurity: <modsec_action> (Value: `<score>' ) [file \"/etc/nginx/owasp-modsecurity-crs/rules/<rule_set>.conf\"] <_> [id \"<rule_id>\"] <_> [msg \"<message>\"] <_> [severity \"<severity>\"] <_> <_> <_> <_> <_> [uri \"<uri>\"] [unique_id \"<unique_id>\"] <_>, <_>, <_>, request: \"<request_method> <resource> <http_version>\", host: \"<host>\"" | line_format "{{.client_ip}} ID: {{.unique_id}} [{{.level | upper | trunc 5}}] {{.message}} - {{.request_method}} {{.host}}{{.uri}} (Payload: {{.resource}}) - {{.rule_set}}:{{.rule_id}} [Severity: {{.severity}} Score: {{.score}}]"

5.- (optional) You can also make the parsing in promtail, for better performance and versatility.

Take into consideration you will need to have your nginx installation properly configured to get clients real ip, that can vary among cloud providers.

Thanks a lot, because the other issue only says "Add geoip stage in promtail", but I could not find any documentation about this new feature, anywhere. I may have badly looked, that's another possible problem hehehe

@jrx-sjg
Copy link

jrx-sjg commented Mar 14, 2023

I wrote that because I struggle HARD to get that working, and there is no clear documentation anywhere.

@DoTheEvo
Copy link

DoTheEvo commented Mar 18, 2023

Got it working today. Used latest docker image for loki and promtail with this version main-0295fd4

The documentation was good enough since I already was familiar with pipeline stages of promtail.

Manually downloaded city mmdb from maxmind (~70MB) and bind mounted it in to promtail container. Not sure if theres a function to just give promtail url and it updates the database regularly on its own.. like opnsense has. Would be cool if it got it.

this is my promtail-config.yml for caddy reverse proxy, the data caddy sends are json.. so its cleaner than regex and named groups I guess

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: caddy_access_log

    static_configs:
      - targets:
          - localhost
        labels:
          job: caddy_access_log
          host: example.com
          agent: caddy-promtail
          __path__: /var/log/caddy/*.log

    pipeline_stages:
      - json:
          expressions:
            remote_ip: request.remote_ip

      - geoip:
          db: "/etc/GeoLite2-City.mmdb"
          source: remote_ip
          db_type: "city"

And the funny thing is that for two IPs I tried, one was resolved as a city in another country, but contry was mine.. so free maxmind is not exactly shining, but its good enough I guess.

Now just to spend hours tinkering on getting that cool world map thing going on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.