Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log parser not streaming from stderin #282

Open
matt9mg opened this issue Oct 26, 2020 · 7 comments
Open

Log parser not streaming from stderin #282

matt9mg opened this issue Oct 26, 2020 · 7 comments

Comments

@matt9mg
Copy link

matt9mg commented Oct 26, 2020

It would seem running the below code from the documentation doesn't seem to pass anything into stderin.

This is placed in the Vhost

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" matomoLogFormat
  CustomLog ${APACHE_LOG_DIR}/matomo_input.log matomoLogFormat

  CustomLog "||/var/www/html/import_logs.py \
	--debug --enable-http-errors --enable-http-redirects --enable-bots \
	--url=http://XXXXXXXXXX --output=${APACHE_LOG_DIR}/matomo.log --recorders=1 \
	--recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -" matomoLogFormat

output is from the log file

2020-10-26 16:30:24,118: [DEBUG] Accepted hostnames: all
2020-10-26 16:30:24,120: [DEBUG] Matomo Tracker API URL is: http://XXXXXXXXXX
2020-10-26 16:30:24,121: [DEBUG] Matomo Analytics API URL is: http://XXXXXXXXXX
2020-10-26 16:30:24,122: [DEBUG] Authentication token token_auth is: XXXXXXXXXXXXXX
2020-10-26 16:30:24,123: [DEBUG] Resolver: static
2020-10-26 16:30:24,201: [DEBUG] Launched recorder

Logs import summary
-------------------

    0 requests imported successfully
    0 requests were downloads
    0 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        0 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        0 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    0 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:



Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Matomo, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='http://XXXXXXX'

But if I run this command via a cron from a file it works as expected and I see the log output is sending this information to my matomo instance.

/var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXX --idsite=3 --log-format-name=common_complete /var/log/apache2/site.log

@matt9mg
Copy link
Author

matt9mg commented Oct 26, 2020

checking the error.log file I can see this.

AH00106: piped log program '/var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly

But checking /var/log/apache2/matomo.log i just get the same old output above.

@Findus23
Copy link
Member

Hi,

I don't know Apache at all, so I can't help here. But just for your information: This section of the README was written 8 years ago, so it is not impossible that it won't work that way any more. If you find out more it would be great if you could create a PR that fixes it or if it turns out to not work at all, removes it.

@matt9mg
Copy link
Author

matt9mg commented Oct 26, 2020

Running the above in the apache vhost using a php script manages to grab the stdin.

<?php

$stdin = fopen('php://stdin', 'rb');
ob_implicit_flush(true);
while ($line = fgets($stdin)) {
    $line = trim($line);
    file_put_contents(__DIR__ . '/tmp.txt', print_r($line, true), FILE_APPEND);
}

But adding some debug logging into the .py script shows nothing is being passed to python. My python skills aren't that strong so may need someone else to help with this as its a big requirement.

@matt9mg
Copy link
Author

matt9mg commented Oct 27, 2020

Using the latest version supplied in this repository (which is different from the matomo application you download from the website) doesn't work at all

AH00106: piped log program '/usr/bin/python3 /var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly
Traceback (most recent call last):
  File "/var/www/html/import_logs.py", line 2661, in <module>
    config = Configuration()
  File "/var/www/html/import_logs.py", line 1024, in __init__
    self._parse_args(self._create_parser(), argv)
  File "/var/www/html/import_logs.py", line 934, in _parse_args
    sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
ValueError: can't have unbuffered text I/O

But does support my theory that the buffer is empty which is passed to python.

@keykey7
Copy link

keykey7 commented Dec 1, 2020

similar issue here: the --output option seems bugged.
dropping it and redirecting stdout manually helped in my case.

@AdUser
Copy link

AdUser commented Feb 3, 2021

Using the latest version supplied in this repository (which is different from the matomo application you download from the website) doesn't work at all

AH00106: piped log program '/usr/bin/python3 /var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly
Traceback (most recent call last):
  File "/var/www/html/import_logs.py", line 2661, in <module>
    config = Configuration()
  File "/var/www/html/import_logs.py", line 1024, in __init__
    self._parse_args(self._create_parser(), argv)
  File "/var/www/html/import_logs.py", line 934, in _parse_args
    sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
ValueError: can't have unbuffered text I/O

But does support my theory that the buffer is empty which is passed to python.

It's python3 migration issue: offset isn't applicable in "text-mode" i/o. Can be fixed with small patch:

        if self.options.output:
-            sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
+            sys.stdout = sys.stderr = open(self.options.output, 'a+')

@sgiehl
Copy link
Member

sgiehl commented Feb 3, 2021

@AdUser would you mind creating a small PR for that, so someone from the team can review and merge that? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants