Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import_logs.py broken since 4.4.1 #319

Closed
jsandner opened this issue Aug 17, 2021 · 9 comments · Fixed by #324
Closed

import_logs.py broken since 4.4.1 #319

jsandner opened this issue Aug 17, 2021 · 9 comments · Fixed by #324
Labels

Comments

@jsandner
Copy link

Importing Apache Logfiles with import_logs.py doesn't work here since upgrade from matomo 4.3.1 to 4.4.1.

Error Message is

Error when connecting to Matomo: 'utf-8' codec can't decode byte 0x80 in position 10: invalid start byte
  • Matomo Server runs with Version 4.4.1
  • On Import Server I have 2 matomo directories:
    • version 4.3.1 (works)
    • version 4.4.1 (error)

Expected Behavior

Current Behavior

python3 /srv/www/vhosts/ANALYSE/matomo.mydomain/matomo/misc/log-analytics/import_logs.py --url=http://piwik.my.domain --enable-http-errors --enable-http-redirects --enable-bots --recorders=2 --idsite=16 --recorder-max-payload-size=50 --log-format-name=common_complete /srv/import/access_log-20210817

2021-08-17 09:28:16,766: [INFO] Error when connecting to Matomo: 'utf-8' codec can't decode byte 0x80 in position 10: invalid start byte
2021-08-17 09:28:16,767: [INFO] Retrying request, attempt number 2

Possible Solution

import_logs.py

response = opener.open(request, timeout = timeout)
#encoding = response.info().get_content_charset('utf-8')
result = response.read()
response.close()
#return result.decode(encoding)
return result

omitting decoding of result makes it work again.

Steps to Reproduce (for Bugs)

Context

Your Environment

  • Matomo Version: 4.4.1
  • PHP Version: 7.4.6
  • Python Version: 3.6.13
  • Server Operating System: SLES 15 SP3
  • Additionally installed plugins:
  • Browser:
  • Operating System:
@sgiehl
Copy link
Member

sgiehl commented Aug 17, 2021

Thanks for the report. I'll move this to the log-importer repo.
Guess this was introduced with #316
Does your apache log file maybe contain any strange characters?
You could maybe try to pass the --encoding parameter and check if that changes something when using something else than utf-8

@sgiehl sgiehl transferred this issue from matomo-org/matomo Aug 17, 2021
@sgiehl sgiehl added the bug label Aug 17, 2021
@jsandner
Copy link
Author

using --encoding=ascii or --encoding=iso-8859-1 makes no difference. Still doesn't work.
Possibly there are strange characters in the logfile ... hackers try strange URL-parameters and the called URL is logged including the arguments.

@sgiehl
Copy link
Member

sgiehl commented Aug 17, 2021

could you maybe try to identify the line that let's the importer fail and paste it here, so we can try to reproduce that?

@jsandner
Copy link
Author

jsandner commented Aug 17, 2021

The importer statement is

python3 /srv/www/vhosts/ANALYSE/matomo.mydomain/matomo/misc/log-analytics/import_logs.py --url=http://piwik.my.domain --enable-http-errors --enable-http-redirects --enable-bots --recorders=2 --encoding=ascii --idsite=16 --recorder-max-payload-size=50 --log-format-name=common_complete apache.log

i get the error, if apache.log contains this line

www.my.domain 10.171.92.193 - - [16/Aug/2021:00:10:03 +0200] "GET /oberbayern/test.rss HTTP/1.0" 200 4496 "-" "axios/0.17.1"

if I replace .rss by .css it works:

www.my.domain 10.171.92.193 - - [16/Aug/2021:00:10:03 +0200] "GET /oberbayern/test.css HTTP/1.0" 200 4496 "-" "axios/0.17.1"

@sgiehl
Copy link
Member

sgiehl commented Aug 17, 2021

if I replace .rss by .css it works:

Guess that's because requests to static files like css are ignored by default.

@metalocator
Copy link

Any luck on this issue? We are seeing the same problem

python3 misc/log-analytics/import_logs.py --replay-tracking-expected-tracker-file=piwik.php --replay-tracking --enable-http-errors --url=https://analytics.xxxxxxxxxx.com --debug-tracker --log-format-name=common_complete /var/log/apache2/other_vhosts_access.log.1
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/log/apache2/other_vhosts_access.log.1...
2021-09-14 23:49:50,278: [INFO] Error when connecting to Matomo: 'utf-8' codec can't decode byte 0x80 in position 10: invalid start byte
2021-09-14 23:49:50,278: [INFO] Retrying request, attempt number 2
961 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
961 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)

@justinvelluppillai
Copy link

Probably introduced by the encoding changes here #316

@metalocator
Copy link

Thanks for the reply. Is there any possible workaround?

Downgrade or revert this patch?

EreMaijala added a commit to EreMaijala/matomo-log-analytics that referenced this issue Nov 16, 2021
@EreMaijala
Copy link
Contributor

If you have queued tracking enabled, see pull request #324. That's what seems to have been causing the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants