{"cells": [{"cell_type": "markdown", "id": "db7f08c1", "metadata": {}, "source": ["## Open Government Data, provided by **Canton Zurich**\n", "*Autogenerated Python starter code for data set with identifier* **339@statistisches-amt-kanton-zuerich**"]}, {"cell_type": "markdown", "id": "164c4327", "metadata": {}, "source": ["## Dataset\n", "# **Grosse Betriebe (250+ VZ\u00c4) [Anz.]**"]}, {"cell_type": "markdown", "id": "46d13c8b-4ac3-4e2f-940f-0c14ec5b7c0d", "metadata": {}, "source": ["## Description\n", "\n", "Anzahl Grosse Betriebe mit einer Besch\u00e4ftigung von 250 Vollzeit\u00e4quivalenten oder mehr. Aktuellstes Jahr provisorisch."]}, {"cell_type": "markdown", "id": "cad813bb-c986-4bb4-b52b-f78bb608086f", "metadata": {}, "source": ["## Data set links\n", "\n", "[Direct data shop link for dataset](https://www.zh.ch/de/politik-staat/statistik-daten/datenkatalog.html#/datasets/339@statistisches-amt-kanton-zuerich)"]}, {"cell_type": "markdown", "id": "9d4813e9", "metadata": {}, "source": ["## Metadata\n", "- **Issued** `2016-01-20T20:16:00`\n- **Modified** `2024-08-27T07:53:29`\n- **Startdate** `2011-12-31`\n- **Enddate** `None`\n- **Theme** `['http://publications.europa.eu/resource/authority/data-theme/ECON']`\n- **Keyword** `['betriebe', 'bezirke', 'gemeinden', 'kanton_zuerich', 'unternehmensstruktur', 'ogd']`\n- **Publisher** `['Statistisches Amt des Kantons Z\u00fcrich']`\n- **Landingpage** `https://www.zh.ch/de/politik-staat/gemeinden/gemeindeportraet.html`\n"]}, {"cell_type": "markdown", "id": "8a857d65", "metadata": {"jp-MarkdownHeadingCollapsed": true, "tags": []}, "source": ["## Imports and helper functions"]}, {"cell_type": "code", "execution_count": null, "id": "93b39602-1c1e-46d2-ae70-1716b1481e9b", "metadata": {"tags": []}, "outputs": [], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "\n", "params = {\n", "    'text.color': (0.25, 0.25, 0.25),\n", "    'figure.figsize': [18, 6],\n", "   }\n", "\n", "plt.rcParams.update(params)\n", "\n", "import pandas as pd "]}, {"cell_type": "code", "execution_count": null, "id": "aa6611d7-e1c0-40a7-b0ff-601a8ef8439b", "metadata": {"tags": []}, "outputs": [], "source": ["# helper function for reading datasets with proper separator\n", "def get_dataset(url):\n", "    if url[-3:] != \"csv\":\n", "        print(\"The data set URL has no proper 'csv' extension. Reading the dataset might not have worked as expected.\\nPlease check the dataset link and adjust pandas' read_csv() parameters accordingly.\")\n", "    data = pd.read_csv(url, sep=\",\", on_bad_lines='warn', encoding_errors='ignore', low_memory=False)\n", "    # if dataframe only has one column or less the data is not comma separated, use \";\" instead\n", "    if data.shape[1] <= 1:\n", "        data = pd.read_csv(url, sep=';', on_bad_lines='warn', encoding_errors='ignore', low_memory=False)\n", "        if data.shape[1] <= 1:\n", "            print(\"The data wasn't imported properly. Very likely the correct separator couldn't be found.\\nPlease check the dataset manually and adjust the code.\")\n", "    return data"]}, {"cell_type": "markdown", "id": "02ce518f", "metadata": {}, "source": ["## Load data\n", "\n", "- The dataset has **`1` distribution(s)** in CSV format.\n", "- All available CSV distributions are listed below and can be read into a pandas dataframe."]}, {"cell_type": "code", "execution_count": null, "id": "0", "metadata": {"tags": []}, "outputs": [], "source": "# Distribution 0\n# Ktzhdistid               : 262\n# Title                    : Grosse Betriebe (250+ VZ\u00c4) [Anz.]\n# Description              : None\n# Issued                   : 2016-01-21T16:30:35\n# Modified                 : 2024-08-27T07:53:29\n\ndf = get_dataset('https://www.web.statistik.zh.ch/ogd/data/KANTON_ZUERICH_587.csv')\n\n"}, {"cell_type": "markdown", "id": "4ce1f78f", "metadata": {}, "source": ["## Analyze data"]}, {"cell_type": "code", "execution_count": null, "id": "3e3dab86", "metadata": {}, "outputs": [], "source": ["# drop columns that have no values\n", "df.dropna(how='all', axis=1, inplace=True)"]}, {"cell_type": "code", "execution_count": null, "id": "841bd8d2", "metadata": {}, "outputs": [], "source": ["print(f'The dataset has {df.shape[0]:,.0f} rows (observations) and {df.shape[1]:,.0f} columns (variables).')\n", "print(f'There seem to be {df.duplicated().sum()} exact duplicates in the data.')"]}, {"cell_type": "code", "execution_count": null, "id": "75e73c96", "metadata": {}, "outputs": [], "source": ["df.info(memory_usage='deep', verbose=True)"]}, {"cell_type": "code", "execution_count": null, "id": "02f3df4d", "metadata": {}, "outputs": [], "source": ["df.head()"]}, {"cell_type": "code", "execution_count": null, "id": "a0d7d898", "metadata": {}, "outputs": [], "source": ["# display a small random sample transposed in order to see all variables\n", "df.sample(3).T"]}, {"cell_type": "code", "execution_count": null, "id": "786806dd", "metadata": {}, "outputs": [], "source": ["# describe non-numerical features\n", "try:\n", "    with pd.option_context('display.float_format', '{:,.2f}'.format):\n", "        display(df.describe(exclude='number'))\n", "except:\n", "    print(\"No categorical data in dataset.\")"]}, {"cell_type": "code", "execution_count": null, "id": "e744a6b6", "metadata": {}, "outputs": [], "source": ["# describe numerical features\n", "try:\n", "    with pd.option_context('display.float_format', '{:,.2f}'.format):\n", "        display(df.describe(include='number'))\n", "except:\n", "    print(\"No numercial data in dataset.\")"]}, {"cell_type": "code", "execution_count": null, "id": "7a65d95d", "metadata": {}, "outputs": [], "source": ["# check missing values with missingno\n", "# https://github.com/ResidentMario/missingno\n", "import missingno as msno\n", "msno.matrix(df, labels=True, sort='descending');"]}, {"cell_type": "code", "execution_count": null, "id": "fcc604b7", "metadata": {}, "outputs": [], "source": ["# plot a histogram for each numerical feature\n", "try:\n", "    df.hist(bins=25, rwidth=.9)\n", "    plt.tight_layout()\n", "    plt.show()\n", "except:\n", "    print(\"No numercial data to plot.\") "]}, {"cell_type": "code", "execution_count": null, "id": "e13cc6eb", "metadata": {}, "outputs": [], "source": ["# continue your code here..."]}, {"cell_type": "code", "execution_count": null, "id": "18886378", "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": null, "id": "59fce071", "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": null, "id": "f42f25aa", "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "id": "c6b87d83", "metadata": {}, "source": ["**Contact**: Statistisches Amt des Kantons Z\u00fcrich | Data Shop | datashop@statistik.zh.ch"]}], "metadata": {"kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15"}}, "nbformat": 4, "nbformat_minor": 5}