The traffic.csv
file and the traffic.db
SQLite database contain the same web traffic data ("events") from a few different pages ("links") over a period of 7 days including various categorical dimensions about the geographic origin of that traffic as well as a page's content.
Our goal is to understand this traffic better, in particular the volume and distribution of events, and to develop ideas how to increase the links' clickrates. With that in mind, please analyze the data using plain SQL statements as well as the Python Pandas library, providing answers to the following questions:
- [SQL + Pandas] How many total pageview events did the links in the provided dataset receive in the full period, how many per day?
- [SQL + Pandas] What about the other recorded events?
- [SQL + Pandas] Which countries did the pageviews come from?
- [SQL + Pandas] What was the overall click rate (clicks/pageviews)?
- [Pandas] How does the clickrate distribute across different links?
Please think about how to best convey this information to both technical and commercial stakeholders and prepare 3 different types of reports:
- A low-level, ad-hoc analysis for your product team (product manager + engineers), including code and results (e.g. in a Jupyter Notebook)
- A high-level presentation for commercial stakeholders, focussing on narrative and visualizations (e.g. with Google Slides). The team is particularly interested in what territories to focus on.
- A mockup for a dashboard (e.g. with Miro or Figma) or a live dashboard (e.g. with Tableau or Preset) highlighting relevant metrics.
We expect you to spend 2-3 hours to prepare these reports. Please work on them in the displayed order. In case you are not able to finish everything within 3 hours, this is not a problem and we will just talk through the rest. Besides, please don't hesitate to contact us in case you have any questions.