This analysis uses log file generated by NASA fan website that generates a large amount of Internet traffic data. In this analysis, I performed basic analytics on the server log file, provide useful metrics, and implement basic security measures.
- Cleaning the unstructured data to a dataframe.
- List the top 10 most active host/IP addresses that have accessed the site.
- Identify the 10 resources that consume the most bandwidth on the site.
- List the top 10 busiest (or most frequently visited) 60-minute periods.
- Detect patterns of three failed login attempts from the same IP address over 20 seconds so that all further attempts to the site can be blocked for 5 minutes. Log those possible security breaches.
- Pandas
- Numpy
- re