This is not a usual PV/UV log analyse tool, but another perspective of web log. It provide fine grained(the minimum is every minute) trouble shooting and performance optimization in a specified time period, based on it's flexible and various summaries.
- Python 3.4+
- pymongo-3.7.2+
- MongoDB-server 3.4+
First, let's make clear on a couple of terms:
uri
:current URI in request(without the args/query_strings);
request_uri
:full original request URI (with arguments);
args
:arguments in the request line.(refer to nginx)
uri_abs
and args_abs
Refers to the string after the abstract processing of uri and args(In order to classify).
Eg:"/sub/0/100414/4070?channel=ios&version=1.4.5"
will convert to uri_abs
:"/sub/*/*/*",args_abs
:"channel=*&version=*" after the 'abstract processing'
- Provide a unified log analysis portal via which you can view the summary analysis of logs generated by the site on all servers. Also you can filter by
time period
andserver name
. - Support for three main classify of analyse:
request_uri
,ip
andresponse_code
. And each is based on three aspect:hits
,response_size
,response_time
. In addition, different sub-items have their own characteristics. request_uri
analysis can directly show which kinds of requests are in large quantity, which kinds of requests are time-consuming and which kinds of requests take up the traffic. In addition, it can show the distribution changes of each index with time in different time granularity (minute, ten_min, hour, day) of a certain kind of request. Also the distribution of different args_abs of a special a uri_abs.IP
analysis divides all requests into three sources (from_cdn/proxy
,from_reverse_proxy
,from_client_directly
). Each of the three sources displays the IP address of the top N of requests. And can show the time distribution of each index of a certain IP requests; The distribution of different uri_abs indexes generated by a specific IP can also be analyzed.
- Core idea: the request_uri is abstractly categorized. The variable parts are represented as "*", so that the invariant parts can represent a specific type of request. It's actually a different way of looking at logs, The "minimum analysis unit" changed from "a specific of log line" to "a certain type of request".
- Support both
Plaintext
andjson
log format. - Easy to configure: no regex, simply copy the
log_format
defined in nginx into the config file. - A more accurate description of
response time
andresponse size
is achieved through thefour-quantile
concept. In most cases, because of the extremum, the arithmetic mean doesn't represent the distribution of these values very well. - Support for
custom abstraction rules
with the flexibility to specify whether and how certain parts of a request should be abstracted. - Since the
log_analyse
script is run regularly on the web server (like distributed), the efficiency of the log_analyse is very important. In my testing case (disk: 3*7200rpm RAID5, gigabit LAN), the processing speed was between 20000-30000 lines/s.
log_analyse.py deployed on the web server and run through the crontab. Use re
module in python to parse every line, get the value of uri
, args
,time_local
, status
, body_bytes_sent
, request_time
, server_name
... Then process these values to generate a intermediate result which stored in MongoDB.
log_show.py serves as the entry point of all final analyse/view, take a further process with the intermediate result.
Handling of abnormal requests
The main problem with separating fields by 'spaces' or 'double quotation' is that you can't take all exceptions into account when dealing with abnormal records (for a variety of reasons, and in a variety of styles), so the project use re
module rather than simply split()
function. In this project, the "tolerable" abnormal records are processed by some judgment logic in the code; as for the "intolerable" abnormal records, an empty string is returned and the log line is recorded in the file.
[ljk@demo ~]$ log_show --help
Usage:
log_show <site_name> [options] request [distribution [<request>]|detail <uri>]
log_show <site_name> [options] ip [distribution <ip>|detail <ip>]
log_show <site_name> [options] error [distribution <error_code>|detail <error_code>]
Options:
-h --help Show this screen.
-f --from <start_time> Start time. Format: %y%m%d[%H[%M]], %H and %M is optional
-t --to <end_time> End time. Format is same as --from
-l --limit <num> Number of lines in output, 0 means no limit. [default: 5]
-s --server <server> Web server hostname
-g --group_by <group_by> Group by every minute, every ten minutes, every hour or every day,
valid values: "minute", "ten_min", "hour", "day". [default: hour]
distribution Show distribution(about hits,bytes,time,etc) of:
all or specific 'request', the specific 'ip', the specific 'error_code' in every period.
Period is specific by --group_by
detail Show details of:
detail 'args' analyse of the specific 'uri'(if it has args);
detail 'uri' analyse of the specific 'ip' or 'error_code'
Notice: it's best to put 'request_uri', 'uri' and 'ip' in quotation marks.
all subcommand support the -f
,-t
,-s
options, which can filter by start/end time
and single server
.
Analyze the data of a given web site that is in the mongodb
[ljk@demo ~]$ log_show api request -l 3
====================
Total_hits:999205 invalid_hits:581
====================
hits percent time_distribution(s) bytes_distribution(B) uri_abs
430210 43.06% %50<0.03 %75<0.06 %100<2.82 %50<61 %75<63 %100<155 /api/record/getR
183367 18.35% %50<0.03 %75<0.06 %100<1.73 %50<196 %75<221 %100<344 /api/getR/com/*/*/*
102299 10.24% %50<0.02 %75<0.05 %100<1.77 %50<3862 %75<3982 %100<4512 /view/*/*/*/*.js
====================
Total_bytes:1.91 GB
====================
bytes percent time_distribution(s) bytes_distribution(B) uri_abs
1.23 GB 64.61% %50<0.04 %75<0.1 %100<1.96 %50<17296 %75<31054 %100<691666 /api/NewCom/list
319.05 MB 16.32% %50<0.02 %75<0.05 %100<1.77 %50<3862 %75<3982 %100<4512 /view/*/*/*/*.js
167.12 MB 8.55% %50<0.19 %75<0.55 %100<2.93 %50<3078 %75<3213 %100<11327 /api/getR/com/*/*
====================
Total_time:117048s
====================
cum. time percent time_distribution(s) bytes_distribution(B) uri_abs
38747 33.10% %50<0.03 %75<0.06 %100<2.82 %50<61 %75<63 %100<155 /api/record/getR
22092 18.87% %50<0.03 %75<0.06 %100<1.73 %50<196 %75<221 %100<344 /api/getR/com/*/*/*
17959 15.34% %50<0.19 %75<0.55 %100<2.93 %50<3078 %75<3213 %100<11327 /api/getRInfo/com/*/*
Through the above example, the ranking of three dimensions of "hits/bytes/time" and the distribution of response time and response size within the specified time (from 00:00:00 to the current time of the day by default) can be observed. If a uri_abs that has fewer hits but return larger bytes or takes more time, then we should paying more attention on that uri_abs.
Display IP based analysis results
[ljk@demo ~]$ log_show.py api ip -l 2
====================
From_cdn/Proxy: hits hits(%) bytes bytes(%) time(%)
==================== 199870 99.94 570.51 MB 99.99 99.99
Last_cdn_ip
xxx.57.xxx.189 1914 0.96 696.18 KB 0.12 0.68
xxx.206.xxx.154 1741 0.87 1.56 MB 0.27 0.98
User_ip_via_cdn
xxx.249.xxx.56 787 0.39 154.82 KB 0.03 0.23
xxx.60.xxx.86 183 0.09 1.05 MB 0.18 0.13
====================
From_reverse_proxy: hits hits(%) bytes bytes(%) time(%)
==================== 66 0.03 68.83 KB 0.01 0.01
User_ip_via_proxy
xxx.188.xxx.21 2 0.00 1.53 KB 0.00 0.00
xxx.5.xxx.4 2 0.00 324.00 B 0.00 0.00
====================
From_client_directly: hits hits(%) bytes bytes(%) time(%)
==================== 64 0.03 8.32 KB 0.00 0.00
Remote_addr
192.168.1.202 29 0.01 58.00 B 0.00 0.00
192.168.1.200 29 0.01 58.00 B 0.00 0.00
The idea of IP analysis is to classify requests into three categories by source: "From_cdn/Proxy", "From_reverse_proxy", "From_client_directly". Then sort the IP addresses by number of requests within each category
- Aggregate statistics for "all requests" or "specified uri/request_uri" at the granularity of "minutes/ten minutes/hours/days".
- Aggregate statistics for "specified IP" by "minute/ten minute/hour/day" granularity.
# Example 1: analyze the distribution of the specified request, specify the grouping aggregation by minute, and display 5 rows by default
[ljk@demo ~]$ python log_show.py api request distribution "/view/*/*.json" -g minute
====================
uri_abs: /view/*/*.json
Total_hits: 17130 Total_bytes: 23.92 MB
====================
minute hits hits(%) bytes bytes(%) time_distribution(s) bytes_distribution(B)
201803091654 1543 9.01% 2.15 MB 8.98% %50<0.03 %75<0.05 %100<1.07 %50<1593 %75<1645 %100<1982
201803091655 1527 8.91% 2.13 MB 8.88% %50<0.04 %75<0.05 %100<1.04 %50<1592 %75<1642 %100<2143
201803091656 1464 8.55% 2.05 MB 8.57% %50<0.04 %75<0.05 %100<1.03 %50<1592 %75<1642 %100<1952
201803091657 1551 9.05% 2.15 MB 8.97% %50<0.03 %75<0.04 %100<0.89 %50<1594 %75<1639 %100<1977
201803091658 1458 8.51% 2.06 MB 8.61% %50<0.03 %75<0.04 %100<2.35 %50<1596 %75<1644 %100<2146
Through the above example, the distribution of "/view//.json" in the specified time period can be shown, including hits/bytes/time
and the proportion of each indicator in each granularity relative to the total amount. This subcommand also shows the "trend" of each indicator over time.
Note:
The minute
field is the aggregate granularity specified,1803091654 means "18-03-09 16:54"
The -g
parameter specifies the polymerization of granularity (minute/ten_min/hour/day)
distribution
subcommands can use with specific "uri/request_uri" (display how the uri/request_uri distribute over time with a specify granularity) , or not with "uri" (show all request's distribution over time with a specify granularity)
# Example 2: analyze the number of requests/bandwidth generated by the specified IP over time.
# The default aggregation granularity is hour
[ljk@demo ~]$ python log_show.py api ip -t 180314 distribution "140.206.109.174" -l 0
====================
IP: 140.206.109.174
Total_hits: 10999 Total_bytes: 4.83 MB
====================
hour hits hits(%) bytes bytes(%)
2018031306 1273 11.57% 765.40 KB 15.47%
2018031307 2133 19.39% 1004.74 KB 20.31%
2018031308 2211 20.10% 1.00 MB 20.74%
2018031309 2334 21.22% 1.05 MB 21.72%
2018031310 2421 22.01% 850.79 KB 17.20%
2018031311 627 5.70% 226.30 KB 4.57%
Note:
-l 0
means not limit the output lines(output all the results)
- Analyze a uri in detail to see the distribution of different args
- Analyze a specific IP in detail to see how the requests it generates are distributed across different
uri_abs
Applicable scenarios: for example, if uri_abs
of a certain type is found to be abnormal in some aspect (hits/bytes/time), then the detail subcommand can be used to further analyze such uri_abs
and precisely locate the exception caused by which args_abs
. Or if you observe an IP access exception, you can drill down to see if the IP is generic or just interested in some uri.
# Example 1:
[ljk@demo ~]$ python log_show.py api -f 180201 request detail "/recommend/update" -l 3
====================
uri_abs: /recommend/batchUpdate
Total_hits: 10069 Total_bytes: 7.62 MB
====================
hits hits(%) bytes bytes(%) time(%) time_distribution(s) bytes_distribution(B) args_abs
4568 45.37% 3.46 MB 45.44% 47.96% %50<0.06 %75<0.07 %100<0.47 %50<795 %75<845 %100<1484 uid=*&category_id=*&channel=*&version=*
4333 43.03% 3.25 MB 42.64% 42.30% %50<0.05 %75<0.07 %100<0.48 %50<791 %75<840 %100<1447 category_id=*&channel=*&uid=*&version=*
389 3.86% 314.15 KB 4.03% 0.88% %50<0.03 %75<0.04 %100<0.06 %50<802 %75<850 %100<1203 category_id=*&channel=*&version=*
From the above example, you can observe that the uri "/recommend/update" corresponds to different parameters for each metric. Another incidental finding is that the same parameter combination is not written in the same order in the development of writing parameters. Although it does not affect the function, it will cause some troubles in the accurate application performance monitoring.
Note:
detail
Subcommand followed by uri (no arguments, arguments are ignored if they are included)
# Example 2: Analyse how many requests are generated by an IP and the metrics for each (hits/bytes/time)
[ljk@demo ~]$ python log_show.py m -t 180314 ip detail "1.2.3.4"
====================
IP: 140.206.109.174
Total_hits: 10999 Total_bytes: 4.83 MB
====================
hits hits(%) bytes bytes(%) time(%) uri_abs
10536 95.79% 405.47 KB 8.19% 92.01% /introduction/watch
147 1.34% 1.90 MB 39.31% 1.93% /view/*/*.html
138 1.25% 407.42 KB 8.23% 2.41% /chapinfo/*/*.html
42 0.38% 644.88 KB 13.03% 1.38% /info/*.html
30 0.27% 229.98 KB 4.65% 1.14% /classify/*.json
The script is designed to be placed in the crontab
on the web server, and get the final result using log_show.py.
*/15 * * * * export LANG=zh_CN.UTF-8;python3 /home/ljk/log_analyse.py &> /tmp/log_analyse.log
Note
uri_abs
andargs_abs
is the abstraction of the uri and args, default rule as follows:
uri:the request_uri is separated into several fields by "/" and ".", and abstracted as a "*" if a field is made up entirely of numbers.
args:replace every value with a "*".- There are a few other interesting functions in the
common/common.py