diff --git a/notebooks/jupyter-timesketch-demo.ipynb b/notebooks/jupyter-timesketch-demo.ipynb new file mode 100644 index 0000000000..ead7a64308 --- /dev/null +++ b/notebooks/jupyter-timesketch-demo.ipynb @@ -0,0 +1,1115 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "mZWOhSLNRInc" + }, + "source": [ + "# Timesketch and Jupyter\n", + "\n", + "This is a small notebook that is built to demonstrate how to interact with Timesketch from a jupyter notebook in order to do some additional exploration of the data.\n", + "\n", + "Jupyter can greatly complement investigations by providing the analyst with access to the powers of using python to manipulate the data stored in Timeskech. Additionally it provides developers with the ability to do research on the data in order to speed up developments of analyzers, aggregators and graphing. The purpose of this notebook is simply to briefly introduce the powers of jupyter notebooks to analysts and developers, with the hope of inspiring more to take advantage of this powerful platform. It is also possible to use [colab](https://colab.research.google.com) to do these explorations, and a colab copy of this notebook can be found [here](https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/colab-timesketch-demo.ipynb).\n", + "\n", + "Each code cell (denoted by the [] and grey color) can be run simply by hitting \"shift + enter\" inside it. In order to run the notebook you'll need to install the jupyter notebook on your machine, and start it (with a python3 kernel). Then you can connect to your local runtime. It is also possible to use [mybinder](https://mybinder.org/v2/gh/google/timesketch/master?urlpath=notebooks) to start up a docker instance with the jupyter notebook. Remember, you can easily add new code cells, or modify the code that is already there to experiment.\n", + "\n", + "## README\n", + "\n", + "If you want to have your own copy of the notebook to make some changes or do some other experimentation you can simply select \"File / Save as\" button.\n", + "\n", + "If you want to connect the notebook to your own Timesketch instance (that is if it is not publicly reachable) you simply run the jupyter notebook binary on a machine that can reach your instance, and configure the SERVER/USER/PASSWORD parameters below to match yours.\n", + "\n", + "Once you have your local runtime setup you should be able to reach your local Timesketch instance.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "lWnyv25sRUyk" + }, + "source": [ + "# Import Libraries\n", + "\n", + "We need to start by importing some libraries that we'll use in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "nh4pwmG-RI2w" + }, + "outputs": [], + "source": [ + "import altair as alt # For graphing.\n", + "import numpy as np # Never know when this will come in handy.\n", + "import pandas as pd # We will be using pandas quite heavily.\n", + "\n", + "from timesketch_api_client import client" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "uMYsDkCgYK5y" + }, + "source": [ + "## Connect to TS" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "9IUzNc8yRanl" + }, + "source": [ + "And now we can start creating a timesketch client. The client is the object used to connect to the TS server and provides the API to interact with it.\n", + "\n", + "This will connect to the public demo of timesketch, you may want to change these parameters to connect to your own TS instance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "colab": {}, + "colab_type": "code", + "id": "1QQFoUFWRP4N" + }, + "outputs": [], + "source": [ + "SERVER = 'https://demo.timesketch.org'\n", + "USER = 'demo'\n", + "PASSWORD = 'demo'\n", + "\n", + "ts_client = client.TimesketchApi(SERVER, USER, PASSWORD)\n", + "\n", + "# To enable altair renderers in the notebook.\n", + "alt.renderers.enable('notebook')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "aKC_0-qlfzYA" + }, + "source": [ + "(hint the above cell is just a small piece of code, you can see the code by clicking the \"three dots\" and Form/Show code )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "pX1B1gccRv9L" + }, + "source": [ + "### Let's Explore\n", + "And now we can start to explore. The first thing is to get all the sketches that are available. Most of the operations you want to do with TS are available in the sketch API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "QN2r9x3uRvRG" + }, + "outputs": [], + "source": [ + "sketches = ts_client.list_sketches()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "r6Qmi4v_SGX1" + }, + "source": [ + "Now that we've got a lis of all available sketches, let's print out the names of the sketches as well as the index into the list, so that we can more easily choose a sketch that interests us." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "Wn0zDL6SRuYY" + }, + "outputs": [], + "source": [ + "for i, sketch in enumerate(sketches):\n", + " print('[{0:d}] {1:s}'.format(i, sketch.name))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "c2ykfp9unWUK" + }, + "source": [ + "Another way is to create a dictionary where the keys are the names of the sketchces and values are the sketch objects." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "GzQjPxiem7di" + }, + "outputs": [], + "source": [ + "sketch_dict = dict((x.name, x) for x in sketches)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "J8Rvpi-XnEZ8" + }, + "outputs": [], + "source": [ + "sketch_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "NK0EqQ-1Siaa" + }, + "source": [ + "Let's now take a closer look at some of the data we've got in the \"Greendale\" investigation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "lT6Oh0GRSFg1" + }, + "outputs": [], + "source": [ + "gd_sketch = sketch_dict.get('The Greendale incident - 2019', sketches[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "KM5Gum5kWfim" + }, + "source": [ + "Now that we've connected to a sketch we can do all sorts of things.\n", + "\n", + "Try doing: `gd_sketch.`\n", + "\n", + "In colab you can use TAB completion to get a list of all attributes of the object you are working with. See a function you may want to call? Try calling it with `gd_sketch.function_name?` and hit enter.. let's look at an example:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "e7oEZ80sYzc7" + }, + "outputs": [], + "source": [ + "gd_sketch.explore?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "8h2S3dXBY6c0" + }, + "source": [ + "This way you'll get a list of all the parameters you may want or need to use. You can also use tab completion as soon as you type, `gd_sketch.e` will give you all options that start with an `e`, etc.\n", + "\n", + "You can also type `gd_sketch.explore()` and get a pop-up with a list of what parameters this function provides.\n", + "\n", + "But for now, let's look at what views are available to use here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "AYgCmg_yZOO7" + }, + "outputs": [], + "source": [ + "views = gd_sketch.list_views()\n", + "\n", + "for index, view in enumerate(views):\n", + " print('[{0:d}] {1:s}'.format(index, view.name))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "Tt5SWKzgZZe5" + }, + "source": [ + "You can then start to query the API to get back results from these views. Let's try one of them...\n", + "\n", + "Word of caution, try to limit your search so that you don't get too many results back. The API will happily let you get all the results back as you choose, but the more records you get back the longer the API call will take (10k events per API call). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "yY8jk_UzSpCE" + }, + "outputs": [], + "source": [ + "# You can change this number if you would like to test out another view.\n", + "# The way the code works is that it checks first of you set the \"view_text\", and uses that to pick a view, otherwise the number is used.\n", + "view_number = 1\n", + "view_text = '[phishy_domains] Phishy Domains'\n", + "\n", + "if view_text:\n", + " for index, view in enumerate(views):\n", + " if view.name == view_text:\n", + " view_number = index\n", + " break\n", + "\n", + "print('Fetching data from : {0:s}'.format(views[view_number].name))\n", + "print( ' Query used : {0:s}'.format(views[view_number].query_string if views[view_number].query_string else views[view_number].query_dsl))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "zLgLBXXMlDKa" + }, + "source": [ + "If you want to issue this query, then you can run the cell below, otherwise you can change the view_number to try another one." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "MmlF6oYcj8wh" + }, + "outputs": [], + "source": [ + "greendale_frame = gd_sketch.explore(view=views[view_number], as_pandas=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "jEki5_BmZpKu" + }, + "source": [ + "Did you notice the \"`as_pandas=True`\" parameter that got passed to the \"`explore`\" function? That means that the data that we'll get back is a pandas DataFrame that we can now start exploring. \n", + "\n", + "Let's start with seeing how many entries we got back." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "1_fjRL4XZ-XW" + }, + "outputs": [], + "source": [ + "greendale_frame.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "XWjrJnivaSo9" + }, + "source": [ + "This tells us that the view returned back 670 events with 12 columns. Let's explore the first few entries, just so that we can wrap our head around what we got back." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "ymR_NtseaRrO" + }, + "outputs": [], + "source": [ + "greendale_frame.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "bdPUgvNtl82r" + }, + "source": [ + "Let's look at what columns we got back... and maybe create a slice that contains fewer columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "zuu7VFCAmB9e" + }, + "outputs": [], + "source": [ + "greendale_frame.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "Dqv6gXlKmEUW" + }, + "outputs": [], + "source": [ + "greendale_slice = greendale_frame[['datetime', 'timestamp_desc', 'tag', 'message', 'label']]\n", + "\n", + "greendale_slice.head(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "D-5xReYEHxKc" + }, + "source": [ + "Since this is a result from the analyzers we have few extra fields we can pull in. \n", + "\n", + "When running `gd_sketch.explore?` did you notice the field called `return_fields`:\n", + "\n", + "```\n", + " return_fields: List of fields that should be included in the\n", + " response.\n", + " ```\n", + " \n", + " We can use that to specify what fields we would like to get back. Let's add few more fields (you can see what fields are available in the UI)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "EMvcyumPH4eW" + }, + "outputs": [], + "source": [ + "greendale_frame = gd_sketch.explore(view=views[view_number], return_fields='datetime,message,source_short,tag,timestamp_desc,url,domain,human_readable', as_pandas=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "_mvDnsFfIful" + }, + "source": [ + "Let's briefly look at these events." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "ip3vimOaIhhY" + }, + "outputs": [], + "source": [ + "greendale_slice = greendale_frame[['datetime', 'timestamp_desc', 'tag', 'human_readable', 'url', 'domain']]\n", + "\n", + "greendale_slice.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "PKioTKqoI85E" + }, + "source": [ + "OK,.... since this is a phishy domain analyzer, and all the results we got back are essentially from that analyzer, let's look at few things. First of all let's look at the tags tha are available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "MFMRBxtRJDcK" + }, + "outputs": [], + "source": [ + "greendale_frame['tag_string'] = greendale_frame.tag.str.join('|')\n", + "\n", + "greendale_frame.tag_string.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "KP_joLR7JVJ8" + }, + "source": [ + "OK... so we've got some that are part of the whitelisted-domain... let's look at those the domains that are marked as \"phishy\" yet excluding those that are whitelisted." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "h9l165w_JdtT" + }, + "outputs": [], + "source": [ + "greendale_frame[~greendale_frame.tag_string.str.contains('whitelisted-domain')].domain.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "VADR_gpAJzMz" + }, + "source": [ + "OK... now we get to see all the domains that the domain analyzer considered to be potentially \"phishy\"... is there a domain that stands out??? what about that grendale one?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "-1PBtCtjJ5Ag" + }, + "outputs": [], + "source": [ + "greendale_slice[greendale_slice.domain == 'grendale.xyz']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "4PbDcNMzJ-8S" + }, + "source": [ + "OK... this seems odd.. let's look at few things, a the `human_readable` string as well as the URL..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "greendale_slice[greendale_slice.domain == 'grendale.xyz']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "vxnMcThfKEOs" + }, + "outputs": [], + "source": [ + "grendale = greendale_slice[greendale_slice.domain == 'grendale.xyz']\n", + "\n", + "string_set = set()\n", + "for string_list in grendale.human_readable:\n", + " new_list = [x for x in string_list if 'phishy_domains' in x]\n", + " _ = list(map(string_set.add, new_list))\n", + "\n", + "for entry in string_set:\n", + " print('Human readable string is: {0:s}'.format(entry))\n", + "\n", + "print('')\n", + "print('Counts for URL connections to the grendale domain:')\n", + "grendale_count = grendale.url.value_counts()\n", + "for index in grendale_count.index:\n", + " print('[{0:d}] {1:s}'.format(grendale_count[index], index))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "1QH7LFksLuLd" + }, + "source": [ + "We can start doing a lot more now if we want to... let's look at when these things occurred..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "-EShYajvL1RE" + }, + "outputs": [], + "source": [ + "grendale_array = grendale.url.unique()\n", + "\n", + "greendale_slice[greendale_slice.url.isin(grendale_array)]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "DrvPXI8zMETu" + }, + "source": [ + "OK... we can then start to look at surrounding events.... let's look at one date in particular... \"2015-08-29 12:21:06\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "kwJrIsErMNGv" + }, + "outputs": [], + "source": [ + "query_dsl = \"\"\"\n", + "{\n", + "\t\"query\": {\n", + "\t\t\"bool\": {\n", + "\t\t\t\"filter\": {\n", + "\t\t\t\t\"bool\": {\n", + "\t\t\t\t\t\"should\": [\n", + "\t\t\t\t\t\t{\n", + "\t\t\t\t\t\t\t\"range\": {\n", + "\t\t\t\t\t\t\t\t\"datetime\": {\n", + "\t\t\t\t\t\t\t\t\t\"gte\": \"2015-08-29T12:20:06\",\n", + "\t\t\t\t\t\t\t\t\t\"lte\": \"2015-08-29T12:22:06\"\n", + "\t\t\t\t\t\t\t\t}\n", + "\t\t\t\t\t\t\t}\n", + "\t\t\t\t\t\t}\n", + "\t\t\t\t\t]\n", + "\t\t\t\t}\n", + "\t\t\t},\n", + "\t\t\t\"must\": [\n", + "\t\t\t\t{\n", + "\t\t\t\t\t\"query_string\": {\n", + "\t\t\t\t\t\t\"query\": \"*\"\n", + "\t\t\t\t\t}\n", + "\t\t\t\t}\n", + "\t\t\t]\n", + "\t\t}\n", + "\t},\n", + "\t\"size\": 10000,\n", + "\t\"sort\": {\n", + "\t\t\"datetime\": \"asc\"\n", + "\t}\n", + "}\n", + "\"\"\"\n", + "\n", + "data = gd_sketch.explore(query_dsl=query_dsl, return_fields='message,human_readable,datetime,timestamp_desc,source_short,data_type,tags,url,domain', as_pandas=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "6NSD4izoNExQ" + }, + "outputs": [], + "source": [ + "data[['datetime', 'message', 'human_readable', 'url']].head(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "gEQ9yZbTNzjJ" + }, + "source": [ + "Let's find the grendale and just look at events two seconds before/after" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "wA9lQ1JANdGg" + }, + "outputs": [], + "source": [ + "data[(data.datetime > '2015-08-29 12:21:04') & (data.datetime < '2015-08-29 12:21:08')][['datetime', 'message', 'timestamp_desc']]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "0tA4pHRLOj5g" + }, + "source": [ + "## Let's look at logins...\n", + "\n", + "Let's do a search to look at login entries..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "vbsb58imPVCQ" + }, + "outputs": [], + "source": [ + "login_data = gd_sketch.explore(\n", + " 'data_type:\"windows:evtx:record\" AND event_identifier:4624', \n", + " return_fields='datetime,timestamp_desc,human_readable,message,tag,event_identifier,computer_name,record_number,recovered,strings,username',\n", + " as_pandas=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "YydMkgThab09" + }, + "source": [ + "This will produce quite a bit of events... let's look at how many." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "-PxPYIA5Pvks" + }, + "outputs": [], + "source": [ + "login_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "ZsM5me3KQwp4" + }, + "source": [ + "Let's look at usernames...." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "kfuzWPfJQynH" + }, + "outputs": [], + "source": [ + "login_data.username.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "ccCwN92sQ5qb" + }, + "source": [ + "The login analyzer in the demo site wasn't checked in, and therefore didn't extract all those usernames. Let's do this manually for logon entries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "cg1xPWKXa9QY" + }, + "outputs": [], + "source": [ + "login_data['account_name'] = login_data.message.str.extract(r'Account Name:.+Account Name:\\\\t\\\\t([^\\\\]+)\\\\n', expand=False)\n", + "login_data['account_domain'] = login_data.message.str.extract(r'Account Domain:.+Account Domain:\\\\t\\\\t([^\\\\]+)\\\\n', expand=False)\n", + "login_data['process_name'] = login_data.message.str.extract(r'Process Name:.+Process Name:\\\\t\\\\t([^\\\\]+)\\\\n', expand=False)\n", + "login_data['date'] = pd.to_datetime(login_data.datetime)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "5ApoSLqLfXbX" + }, + "source": [ + "What accounts have logged in:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "tKF9UR3mbSC-" + }, + "outputs": [], + "source": [ + "login_data.account_name.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "9fy6EAURSpvK" + }, + "source": [ + "Let's look at all the computers in there..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "DFU56oKaSUMC" + }, + "outputs": [], + "source": [ + "login_data.computer_name.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "8IApcGWwfaMS" + }, + "source": [ + "Let's graph.... and you can then interact with the graph... try zomming in, etc.\n", + "\n", + "First we'll define a graph function that we can then call with parameters..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "HwU4K4MnaYdt" + }, + "outputs": [], + "source": [ + "def GraphLogins(data_frame, machine_name=None):\n", + " \n", + " if machine_name:\n", + " data_slice = data_frame[data_frame.computer_name == machine_name]\n", + " title = 'Accounts Logged In - {0:s}'.format(machine_name)\n", + " else:\n", + " data_slice = data_frame\n", + " title = 'Accounts Logged In'\n", + " \n", + " data_grouped = data_slice[['account_name', 'date']].groupby('account_name', as_index=False).count()\n", + " data_grouped['count'] = data_grouped.date\n", + " del data_grouped['date']\n", + "\n", + " return alt.Chart(data_grouped, width=400).mark_bar().encode(\n", + " x='account_name', y='count',\n", + " tooltip=['account_name', 'count']\n", + " ).properties(\n", + " title=title\n", + " ).interactive()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "9stCil8TgXhq" + }, + "source": [ + "Start by graphing all machines" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "T-oUET5AgYyW" + }, + "outputs": [], + "source": [ + "GraphLogins(login_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "drEG2TTSncoS" + }, + "source": [ + "Or we can look at this for a particular machine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "SP1vf_xBUr2a" + }, + "outputs": [], + "source": [ + "GraphLogins(login_data, 'Student-PC1.internal.greendale.edu')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "2axKKhp7Unfe" + }, + "source": [ + "Or we can look at this as a scatter plot...\n", + "\n", + "First we'll define a function that munches the data for us. This function will essentially graph all logins in a day with a scatter plot, using colors to denote the count value.\n", + "\n", + "**This graph will be very interactive... try selecting a time period by clicking with the mouse on the upper graph and drawing a selection.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "D9DiG03hazwY" + }, + "outputs": [], + "source": [ + "login_data['day'] = login_data['date'].dt.strftime('%Y-%m-%d')\n", + "\n", + "def GraphScatterLogin(data_frame, machine_name=''):\n", + " if machine_name:\n", + " data_slice = data_frame[data_frame.computer_name == machine_name]\n", + " title = 'Accounts Logged In - {0:s}'.format(machine_name)\n", + " else:\n", + " data_slice = data_frame\n", + " title = 'Accounts Logged In'\n", + " \n", + " login_grouped = data_slice[['day', 'computer_name', 'account_name', 'message']].groupby(['day', 'computer_name', 'account_name'], as_index=False).count()\n", + " login_grouped['count'] = login_grouped.message\n", + " del login_grouped['message']\n", + " \n", + " brush = alt.selection_interval(encodings=['x'])\n", + " click = alt.selection_multi(encodings=['color'])\n", + " color = alt.Color('count:Q')\n", + "\n", + " chart1 = alt.Chart(login_grouped).mark_point().encode(\n", + " x='day', \n", + " y='account_name',\n", + " color=alt.condition(brush, color, alt.value('lightgray')),\n", + " ).properties(\n", + " title=title,\n", + " width=600\n", + " ).add_selection(\n", + " brush\n", + " ).transform_filter(\n", + " click\n", + " )\n", + " \n", + " chart2 = alt.Chart(login_grouped).mark_bar().encode(\n", + " x='count',\n", + " y='account_name',\n", + " color=alt.condition(brush, color, alt.value('lightgray')),\n", + " tooltip=['count'],\n", + " ).transform_filter(\n", + " brush\n", + " ).properties(\n", + " width=600\n", + " ).add_selection(\n", + " click\n", + " )\n", + " \n", + " return chart1 & chart2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "Z4s-lEHxhQXH" + }, + "source": [ + "OK, let's start by graphing for all logins..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "PuaJmcJMhShS" + }, + "outputs": [], + "source": [ + "GraphScatterLogin(login_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "dpLDsdSGhT1r" + }, + "source": [ + "And now just for the Student-PC1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "2XaBqZqRVIoL" + }, + "outputs": [], + "source": [ + "GraphScatterLogin(login_data, 'Student-PC1.internal.greendale.edu')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "z0f-qAxyhYa4" + }, + "source": [ + "And now it is your time to shine, experiment with python pandas, the graphing library and other data science techniques." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "HejGxei3hfnM" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "name": "Timesketch And Colab.ipynb", + "private_outputs": true, + "provenance": [], + "version": "0.3.2" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.4rc1" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}