Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to filter dataset by fields or regex #117

Open
allinurl opened this issue May 21, 2014 · 49 comments
Open

Ability to filter dataset by fields or regex #117

allinurl opened this issue May 21, 2014 · 49 comments

Comments

@allinurl
Copy link
Owner

allinurl commented May 21, 2014

Add the ability to filter the results within the UI (Terminal & HTML) - e.g. filter by fields such as host, request, etc. then display only data matching that filter criteria, or enter a regex to match in the request and restrict display to only those matching entries.

Ideally this would spin up a new thread so multiple datasets can be analyzed at the same time. Each dataset should live on its own dashboard.

@aphorise
Copy link

Out of curiosity - is this functionality and those referenced / related intended for the TUI only?

@allinurl
Copy link
Owner Author

Good question, the original thought was to make these filters for the terminal, however, I didn't think much about having them available in the HTML output.

Not sure yet how this would work, perhaps allowing the user to set initial filters in the config file or since there are plans to have the HTML output be real-time, have some sort of subset filtering in the client side. Any thoughts?

@aphorise
Copy link

A small related note - I think most of the rapid requests that are coming in for additional functionality, aggregation and related UI - would be better grouped in a separate argument. For example:

--rui 'regex,average_files,average_hits,host_servers...'

for Rich-User-Interface. Thereafter and into the future it can be included as part of standard views if its common to most user expectations or perhaps adaptively enabled based on the log-file and the scheme therein that matches RUI options.

Regarding HTML - if you dont mind using jQuery & DataTables then for the specific purposes of sort / filter I'd recommend:
http://datatables.net/examples/api/regex.html

Its a 160 Kbyte addition in javascript but worth it for what it does.
This would also give us a footing into other light / efficient jQ based libraries for additional UI and eye-candy as required.

If however you do not wish to have such dependencies - then we have our work cut out :-D

@kyberorg
Copy link

I am also waiting for this. It would be great to have it.

@imclean557
Copy link

@kyberorg you can help if you're that keen.

@vezaynk
Copy link

vezaynk commented Apr 3, 2023

@imclean557 I'm sure there's plenty of people willing to help if there was an active branch. Your comment is most unhelpful.

@nietzscheanic
Copy link

+1

@kuon
Copy link

kuon commented Nov 19, 2023

Just to add some usage context and workaround on this.

I have a cluster of web servers, and I run goaccess like this on my central syslog-ng machine:

goaccess /var/log/hosts/*/nginx/*.log \
  --log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
  --date-format=%d/%b/%Y --time-format=%T --persist --restore \
  --db-path /var/goaccess/db -o /var/goaccess/www/index.html \
  -o /var/goaccess/www/report.json

This aggregate all requests of the cluster and produce one global report. The current issue would need to be implemented to be able to select which vhost to see in the main report.

As a workaround, I create "per-vhost" logs like this:

VHOSTS="vods.kuon.ch www.kuon.ch"


for f in /var/log/hosts/*/nginx/*.log
do
  for vhost in $VHOSTS
  do
    # Create destination directory
    host=$(basename $(dirname $(dirname $f)))
    outdir=/var/goaccess/vhosts/$vhost/$host
    mkdir -p $outdir
    out=$outdir/$(basename $f)

    # Filter logs
    # NOTE: $vhost will be matched as regex, you may need escaping
    rg "^\w+ \d+ \d+:\d+:\d+ \S+ \S+ \S+ access: $vhost " $f > $out

  done
done

# Remove empty logfiles
find /var/goaccess/vhosts -size 0 -delete

# Remove empty dirs
find /var/goaccess/vhosts -type d -empty -delete

for vhost in $VHOSTS
do
  db=/var/goaccess/db_vhosts/$vhost
  mkdir -p $db
  out=/var/goaccess/www/$vhost/
  mkdir -p $out
  goaccess /var/goaccess/vhosts/$vhost/*/*.log \
   --log-format='%^:%^:%^:%^: %v %h %^[%d:%t %^] "%r" %s %b %L "%R" "%u"' \
   --date-format=%d/%b/%Y --time-format=%T --persist --restore \
   --db-path $db -o $out/index.html \
   -o $out/report.json
done

It is a bit "quick & dirty" but it works for the time being.

@rwjack
Copy link

rwjack commented Nov 19, 2023

To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.

@kuon
Copy link

kuon commented Nov 19, 2023

To anyone still following this thread, I found it way easier to just use promtail and grafana, instead of reinventing the wheel with goaccess log parsing, storage, etc.

I beg to differ. I had a setup with grafana and loki but it was very hard to get some particular insight.

Sure, you can have one very nice panel with the stats, that you can look at, but it doesn't really tell you anything. With goaccess, when there is an issue I can just grep (rg) the logs and get the info I want.

Also, I switched to syslog-ng and it is so much better and easier than all new fancy solutions like promtail. Don't get me wrong, I get why all those solutions exists (having to route logs through the internet, better scalability...), but for our use at our size, plain log files are just easier.

I don't think those tools are exclusive. You can use goaccess to generate a .json and inject those metrics in prometheus or other to browse them in grafana.

Finally this kind of setup depends on many things, the number of servers, the criticality of the mission, the size of the team, the skills of the team... I can only advice on trying what fits your situation best.

@nodiscc
Copy link

nodiscc commented Jan 21, 2024

My use case is described in #2599 (I persist the database on-disk, so there is currently no way to remove a visitor from the Visitor Hostnames and IPs table, as exclude-ip will only prevent new visitors from being inserted in the persistent database, but the ones already inserted will be kept).

I understand that this issue is trying to be "generic" (i.e. being able to filter based on any field), and real-time (i.e. ability to set a filter from TUI, command-line, or HTML report) - but I feel this the scope is too wide to actually be actionable/possible to implement (need to write different filter mechanisms for the TUI/CLI/HTML interfaces...)

@allinurl I think it would be good to establish a list of what users actually expect to achieve with this feature.

For me, a simple --exclude-ip-from-report $IP1,$IP2,$IP3,... command-line flag during one-shot HTML report generation would be sufficient.

@Hufschmidt
Copy link

Its been a while since I started monitoring this but I think my main requirement was also to exclude certain fixed IPs from my monitoring host from appearing on the list.

@allinurl
Copy link
Owner Author

@nodiscc, good observation. As I previously explained, it's currently not practical to implement a direct "exclude-ip-from-report" functionality when retrieving data from the persisted store because, at that stage, the data has already been processed. To introduce this feature, we need to restructure how data is stored, making it a bit more complex than a straightforward filtering process. Although there are challenges, progress is being made and will be out sooner than later.

@Hufschmidt, you can achieve exclusion using -e, for example: -e 127.0.0.1. Are you aiming to exclude in real-time?

@bear0330
Copy link

Wait for a date filter for a long time

@allinurl
Copy link
Owner Author

@bear0330 hard at work on this feature! wait won't be in vain ;)

@seiz
Copy link

seiz commented Jul 31, 2024

I am also eagerly awaiting this enhancement. I wish i could filter the HTML-Report at least by date (range), to be able to show stats for a specific date (i use persistence and my reports include several days/months).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests