Skip to content

Commit

Permalink
version 3.0.0 - IP Geolocation integration
Browse files Browse the repository at this point in the history
IP Geolocation integration, table & column renames, refinements - see changelog
  • Loading branch information
WillTheFarmer committed Jan 27, 2025
1 parent 9ee4d95 commit c3232ee
Show file tree
Hide file tree
Showing 20 changed files with 3,554 additions and 1,538 deletions.
21 changes: 17 additions & 4 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- version 2.1.4 - 01/02/2025 - add import_device TABLE to separate import_client TABLE
- version 2.1.5 - 01/03/2025 - move platformNode column from import_client to import_device
- version 2.1.6 - 01/09/2025 - repository name change - ApacheLogs2MySQL to apache-logs-to-mysql
- version 3.0.0 - 01/28/2025 - IP Geolocation integration, several table & column renames, many process refinements - see changelog
- [1.0.1] apache_logs.error_systemCodeID corrected line - INTO logsystemCode to INTO logsystemCodeID
- [1.0.1] remove debugging - SELECT statement from apache_logs.process_access_import, process_error_import & normalize_useragent.
- [1.0.1] remove whitespace and commented out old code on all stored programs
Expand All @@ -33,12 +34,12 @@
- [2.0.0] add ServerName & ServerPort on import Combined & Error logs stage tables. Option allow adding domains to logs.
- [2.0.0] add ERROR_SERVERNAME,ERROR_SERVERPORT,COMBINED_SERVERNAME & COMBINED_SERVERPORT variables to settings.env.
- [2.0.0] add SET servername & serverport COLUMN values to LOAD DATA statements.
- [2.0.0] create log_referer, log_remotehost, log_servername, log_serverport TABLES to assoicate Access and Error logs.
- [2.0.0] create log_referer, log_remotehost, log_servername, log_serverport TABLES to associate Access and Error logs.
- [2.0.0] add server_name & server_port COLUMNS to import_file TABLE. Provides second option to update Apache logs without %v.
- [2.0.0] add compound indexes ACCESS_LOG and ERROR_LOG for ServerName and Serverport.
- [2.0.0] modify process_access_import & process_error_import to populate empty server_name & server_port with ServerName & ServerPort from import_file TABLE.
- [2.0.0] add WATCH_LOG to setting Log Level in watch4logs.py. 0=no messages, 1=message when files found, 2=message when checking for files & files found
- [2.0.0] add class bcolors to place RED BACKGROUND on all ERROR - messsages
- [2.0.0] add class bcolors to place RED BACKGROUND on all ERROR - messages
- [2.0.0] add file - mysql_user_and_grants.sql - MySQL USER and GRANTS file for CREATE USER apache_upload for Python module
- [2.0.0] add Start and End DATETIME to processLogs Function. Already had duration times.
- [2.0.0] add file - call_processes.sql - description and CALL command lines for 5 Stored Procedures
Expand All @@ -47,7 +48,7 @@
- [2.0.0] This version is the application baseline
- [2.1.0] add request_log_id to access and error formats functionality. Enables easier association with access and error records.
- [2.1.0] add columns to load_error_default & load_access_csv2mysql TABLES
- [2.1.0] modify process_error_parse - replace POSITION function with LOCATE function, removed unrequired brackets, add parsing logic for %v and %L String Formats.
- [2.1.0] modify process_error_parse - replace POSITION function with LOCATE function, removed not required brackets, add parsing logic for %v and %L String Formats.
- [2.1.0] modify process_error_import - add normalization for request_log_id, replace POSITION function with LOCATE function
- [2.1.0] modify process_access_parse - add parsing for request_log_id, replace POSITION function with LOCATE function
- [2.1.0] modify process_access_import - add normalization for request_log_id, replace POSITION function with LOCATE function
Expand Down Expand Up @@ -88,4 +89,16 @@
- [2.1.6] rename files - apachelogs2MySQL.py to logs2mysql.py, apachelogs2MySQL.sql to apache_logs_schema.sql
- [2.1.6] modify `logs2mysql.py` line `if useragent_process == 1:` to `if useragent_process >= 1:`
- [2.1.6] modify all files with refers to repository name. Changed `ApacheLogs2MySQL` to `apache-logs-to-mysql`
- [2.1.6] "application name" is still referred to as `ApacheLogs2MySQL` in `README.md`, `CITATION.md`, `logs2mysql.py`, `watch4logs.py`, `apache_logs_schema.sql`, `INSTALL.md` and `settings.env`
- [2.1.6] "application name" is still referred to as `ApacheLogs2MySQL` in `README.md`, `CITATION.md`, `logs2mysql.py`, `watch4logs.py`, `apache_logs_schema.sql`, `INSTALL.md` and `settings.env`
- [3.0.0] This version is NOT backward compatible to previous versions due to many database and process changes. These are final major changes required for Web interface in development.
- [3.0.0] Integration with MaxMind GeoIP2 Python API to enhance Client IP geolocation data for Log Data Visualization in charts, reports & data analysis interfaces.
- [3.0.0] modify `logs2mysql.py` to integrate IP data retrieval process and reorganizing encapsulation of all processes within the same "Import Load Process".
- [3.0.0] add TABLES `log_client_city`, `log_client_coordinate`, `log_client_country`, `log_client_network`, `log_client_organization` and `log_client_subdivision` for IP geolocation data.
- [3.0.0] add `normalize_client` STORED PROCEDURE to normalize IP Address geolocation data into 6 tables.
- [3.0.0] rename TABLES `log_clientname` to `log_client`, `log_servername` to `log_server`
- [3.0.0] rename COLUMNS `clientnameid` to `clientid`, `servernameid` to `serverid` throughout application tables and processes.
- [3.0.0] modify `process_access_parse` and `process_error_parse` WHERE CLAUSES for server_name UPDATE commands.
- [3.0.0] add 16 stored functions for log attribute tables to return names for Slice and dice is a data analysis in drill-down Web interface.
- [3.0.0] modify and reworded all console log messages in `logs2mysql.py` to standardize messages for each process. Added COLORS to coordinate message types for better readability.
- [3.0.0] modify all database INDEX NAMES for standardization and consolidation.
- [3.0.0] tested simultaneously uploading logs from 10 VPS with multiple VirtualHosts on each Server processing thousands of files in different formats and millions of log records.
10 changes: 5 additions & 5 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ I volunteer for a nonprofit organization that wanted to import their Apache webs

First I installed the Apache log_sql_mysql modules which did create a single MySQL mostly empty table of the access log with no control or customization and many other issues. Next I experimented with several simple log file parsers but none normalized the parsed log data into a MySQL database. Finally I reviewed other available Apache logging solutions that didn't use MySQL including GoAccess, Logstash, Apache Viewer, DataDog and others as well as CrowdStrike and Solarwinds Loggly.

Mid-September 2024 after all my research I decided to write a simple solution which snowballed into a complete application. All October I worked long hours around the clock. November I spent incorporating the application into VPS websites and applications I oversee while making improvements along the way. Version 2.0.0 fixed the major issues encountered and is the application baseline. December I spent refining the major changes made in Version 2.0.0. Version 2.1.5 was last code change to fix client identification issue when OS version changes by adding `import_device` TABLE. The first week of January 2025 I spent processing millions of records from 10 VPS simultaneously to single MySQL Server.
Mid-September 2024 after all my research I decided to write a simple solution which snowballed into a complete application. All October I worked long hours around the clock. November I spent incorporating the application into VPS websites and applications I oversee while making improvements along the way. Version 2.0.0 fixed the major issues encountered and is the application baseline. December I spent refining the major changes made in Version 2.0.0. Version 2.1.5 was last code change to fix client identification issue when OS version changes by adding `import_device` TABLE.

Version 2.1.6 renames the repository, the 2 Python modules files and the MySQL schema creation script file. This version of application is production ready.
First 2 weeks of January 2025 I spent processing millions of records from 10 VPS simultaneously to single MySQL Server. Version 3.0.0 is last major change with IP Address geoLocation and a final pass through to fine tune processes and rename some tables and columns. This version of application is production ready.

The final version is less Python and more SQL and much faster processing millions of records. At this point, I have over 1050 hours of research, design, iteration & development into application. It is much more time then I intended to invest into this project but it did produce my first open-source software.

That's how volunteering, lack of a viable MySQL solution and a flexible schedule came together just right to allow me to dive deep into this project.

### “Timing, degree and conviction are the three wise men in this life.” — Robert I. Fitzhenry

The final version is less Python and more SQL and much faster processing millions of records. At this point, I have over 950 hours of research, design, iteration & development into application. It is much more time then I intended to invest into this project but it did produce my first open-source software.

Monetary contributions made will be reflected in development of Web Interface with Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration for this MySQL `apache_logs` schema. Web Interface will be released in separate repository.
Monetary contributions made will be reflected in development of [Web Interface](https://github.com/WillTheFarmer/mysql-to-apache-echarts) for this MySQL `apache_logs` schema.
14 changes: 10 additions & 4 deletions .github/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Python module links & install command lines for each platform. Single quotes aro
|[user-agents](https://pypi.org/project/user-agents/)|pip install pyyaml ua-parser user-agents|sudo apt-get install python3-user-agents|python3 -m pip install user-agents|[selwin/python-user-agents](https://github.com/selwin/python-user-agents)|
|[watchdog](https://pypi.org/project/watchdog/)|pip install watchdog|sudo apt-get install python3-watchdog|python3 -m pip install watchdog|[gorakhargosh/watchdog](https://github.com/gorakhargosh/watchdog/tree/master)|
|[python-dotenv](https://pypi.org/project/python-dotenv/)|pip install python-dotenv|sudo apt-get install python3-dotenv|python3 -m pip install python-dotenv|[theskumar/python-dotenv](https://github.com/theskumar/python-dotenv)|
|[geoip2](https://pypi.org/project/geoip2/)|pip install geoip2|sudo apt-get install python3-geoip2|python3 -m pip install python-geoip2|[maxmind/GeoIP2-python](https://github.com/maxmind/GeoIP2-python)|

### 4. Settings.env steps
First rename the settings.env file to .env
Expand All @@ -64,15 +65,15 @@ ERROR_LOG=2
ERROR_PATH=C:\Users\farmf\Documents\apacheLogs\**/*error*.*
ERROR_RECURSIVE=1
ERROR_PROCESS=2
ERROR_SERVERNAME=yourdomain.com
ERROR_SERVERPORT=443
ERROR_SERVER=errordomain.com
ERROR_SERVERPORT=911
COMBINED=1
COMBINED_LOG=2
COMBINED_PATH=C:\Users\farmf\Documents\apacheLogs\combined\**/*access*.*
COMBINED_RECURSIVE=1
COMBINED_PROCESS=2
COMBINED_SERVERNAME=yourdomain.com
COMBINED_SERVERPORT=443
COMBINED_SERVER=combodomain.com
COMBINED_SERVERPORT=311
VHOST=1
VHOST_LOG=2
VHOST_PATH=C:\Users\farmf\Documents\apacheLogs\vhost\**/*access*.*
Expand All @@ -86,6 +87,11 @@ CSV2MYSQL_PROCESS=2
USERAGENT=1
USERAGENT_LOG=2
USERAGENT_PROCESS=1
GEOIP2=1
GEOIP2_LOG=2
GEOIP2_CITY=C:\Users\farmf\Downloads\ip_databases\dbip-city-lite-2025-01.mmdb
GEOIP2_ASN=C:\Users\farmf\Downloads\ip_databases\dbip-asn-lite-2025-01.mmdb
GEOIP2_PROCESS=1
```
### 6. Run Application
If MySQL steps completed successfully, successfully installed Python modules, renamed file `settings.env` to `.env`, and updated MySQL server connection and log folder variables it is time to run application.
Expand Down
29 changes: 12 additions & 17 deletions .github/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,25 @@
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files
and normalizing data into database designed for reports & data analysis.

Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.

Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql***
LogFormat defined :point_down:

Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization
on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset.
Error Log view images :point_down:

Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName`
and `%p - canonical ServerPort` Format Strings described :point_down:
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:

4 LogFormats & 2 ErrorLogFormats can be loaded and 6 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution.
All processing stages are encapsulated within one "Import Load" that captures process metrics, notifications and errors into MySQL import tables. Every log data record is traceable back to the computer, folder, file, load process, parse process and import process it came from.

Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
Multiple Access and Error logs and formats can be loaded, parsed and imported along with User Agent parsing and IP Address geoLocation retrieval in a single execution. A single execution can also be configured to only load logs to Server.
### Console Process Messages - 4 LogFormats, 2 ErrorLogFormats & 6 MySQL Stored Procedures
![Processing Messages Console](./assets/processing_messages_console.png)
New version has [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration with 5 additional MySQL tables for IP geoLocation data. Two DB-IP Lite databases are required - `IP to City` and `IP to ASN`. Free DB-IP Lite databases can be found at [DB-IP](https://db-ip.com/db/lite.php)

The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts)
created in a separate repository. The Web interface consists of Express.js web application frameworks with Drill Down Capability & Apache ECharts frameworks for Data Visualization.
A visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) and currently under development. The Web interface consists of Express.js web application frameworks with Drill Down Capability & [Apache ECharts](https://github.com/apache/echarts) frameworks for Data Visualization.

New version with [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration will be released end of January
with 5 additional tables for IP geolocation data. Tables are shown in updated diagram :point_down:
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
## Entity Relationship Diagram of apache_logs schema tables
![Entity Relationship Diagram](./assets/entity_relationship_diagram.png)
Diagram created with open-source database diagrams editor [chartdb/chartdb](https://github.com/chartdb/chartdb)
## Application Description
## Application runs on Windows, Linux and MacOS
This is a fast, reliable processing application with detailed logging and two stages of data parsing.
First stage is performed in `LOAD DATA LOCAL INFILE` statements.
Second stage is performed in `process_access_parse` and `process_error_parse` Stored Procedures.
Expand All @@ -49,7 +43,7 @@ All folder paths, filename patterns, logging, processing, MySQL connection setti

Two Python Client modules can run in PM2 daemon process manager for 24/7 online processing on multiple web servers feeding a single Server module simultaneous.

Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link,
Application is developed with Python 3.12, MySQL and 5 Python modules. Modules are listed with Python Package Index link,
install command for each platform & GitHub Repository link.
## Four Supported Access Log Formats
Apache uses same Standard Access LogFormats (***common***, ***combined***, ***vhost_combined***) on all 3 platforms. Each LogFormat adds 2 Format Strings to the prior.
Expand Down Expand Up @@ -144,7 +138,7 @@ To use this format place `ErrorLogFormat` before `ErrorLog` in `apache2.conf` to
|%v|The canonical ServerName of the server serving the request.|
|%L|Log ID of the request. A %L format string is also available in `mod_log_config` to allow to correlate access log entries with error log lines. If [mod_unique_id](https://httpd.apache.org/docs/current/mod/mod_unique_id.html) is loaded, its unique id will be used as log ID for requests.|

## Three options to attach ServerName & ServerPort to Access & Error logs
## Three options to associate ServerName & ServerPort to Access & Error logs
Apache LogFormats - ***common***, ***combined*** and Apache ErrorLogFormat - ***default*** do not contain `%v - canonical ServerName` and `%p - canonical ServerPort`.

In order to consolidate logs from multiple domains `%v - canonical ServerName` is required and `%p - canonical ServerPort` is optional.
Expand Down Expand Up @@ -187,6 +181,7 @@ command line under '2. Python Steps' below. If that works you are all set.
|[user-agents](https://pypi.org/project/user-agents/)|pip install pyyaml ua-parser user-agents|sudo apt-get install python3-user-agents|python3 -m pip install user-agents|[selwin/python-user-agents](https://github.com/selwin/python-user-agents)|
|[watchdog](https://pypi.org/project/watchdog/)|pip install watchdog|sudo apt-get install python3-watchdog|python3 -m pip install watchdog|[gorakhargosh/watchdog](https://github.com/gorakhargosh/watchdog/tree/master)|
|[python-dotenv](https://pypi.org/project/python-dotenv/)|pip install python-dotenv|sudo apt-get install python3-dotenv|python3 -m pip install python-dotenv|[theskumar/python-dotenv](https://github.com/theskumar/python-dotenv)|
|[geoip2](https://pypi.org/project/geoip2/)|pip install geoip2|sudo apt-get install python3-geoip2|python3 -m pip install python-geoip2|[maxmind/GeoIP2-python](https://github.com/maxmind/GeoIP2-python)|

## Installation Instructions
Steps make installation quick and straightforward. Application will be ready to import Apache logs on completion.
Expand Down
Binary file modified .github/assets/call_processes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/assets/check_domain_columns.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/assets/entity_relationship_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/assets/load_settings_variables.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/assets/mysql_user_and_grants.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/assets/processing_messages_console.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/assets/settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit c3232ee

Please sign in to comment.