Skip to content

Commit

Permalink
Merge pull request #35 from LaunchCodeEducation/python-databases
Browse files Browse the repository at this point in the history
Python databases
  • Loading branch information
jwoolbright23 authored May 28, 2024
2 parents 0a03143 + 74b6e9e commit 20ddebf
Show file tree
Hide file tree
Showing 8 changed files with 304 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ import pandas as pd
# Create a pandas DataFrame by providing a list of lists
movie_list_of_lists = pd.DataFrame([["Interstellar", "Pride and Prejudice", "Inception", "Barbie"],["Marley & Me", "Two Weeks Notice", "The Guardian", "Bridesmaids"]])

# Create a pandas Series from a pre-existing list of lists
# Create a pandas DataFrame from a pre-existing list of lists
movies_dataframe_data = [["Interstellar", "Pride and Prejudice", "Inception", "Barbie"],["Marley & Me", "The Proposal", "The Guardian", "Bridesmaids"]]

dataframe_from_existing_list = pd.DataFrame(movies_dataframe_data)
Expand Down
26 changes: 26 additions & 0 deletions content/python-pandas-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
+++
pre = "<b>23. </b>"
chapter = true
title = "Databases with Python and pandas"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 23
+++

## Learning Objectives
Upon completing all the content in this chapter, you should be able to do the following:
1. Establish a connection to a sqlite3 database using python.
1. Create a cursor object to interact with the database.
1. Create a pandas DataFrame using data from a sqlite3 database.
1. Add data from a pandas DataFrame into a slite3 database.

## Key Terminology

### Databases with Python
1. sqlite3
1. Cursor object
1. parameterized queries

## Content Links

{{% children %}}
16 changes: 16 additions & 0 deletions content/python-pandas-databases/exercises/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
+++
title = "Exercises: Working with Databases in Python"
date = 2021-10-01T09:28:27-05:00
draft = false
weight = 2
+++

## Getting Started

Open up `data-analysis-projects/databases-python-pandas/studio/databases-and-py.ipynb` file inside of Jupyter Notebook and begin working through the exercises!

## Submitting Your Work

When finished make sure to push your changes up to GitHub.

Copy the link to your GitHub repository and paste it into the submission box in Canvas for **Exercises: Working with Databases in Python** and click *Submit*.
12 changes: 12 additions & 0 deletions content/python-pandas-databases/next-steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
+++
title = "Next Steps"
date = 2021-10-01T09:28:27-05:00
draft = false
weight = 4
+++

You are now ready to dive into the first major visualization tool that we will use called [Tableau](https://www.tableau.com/). If you would like to further explore content related to interacting with databases using python and pandas you can find some of our favorite resources below:

1. [GeeksforGeeks: Working with Databases using pandas](https://www.geeksforgeeks.org/working-with-database-using-pandas/)
1. [Tutorialspoint: Python - Databases and SQL](https://www.tutorialspoint.com/python_network_programming/python_databases_and_sql.htm)
1. [DigitalOcean: How To Use the sqlite3 Module in Python3](https://www.digitalocean.com/community/tutorials/how-to-use-the-sqlite3-module-in-python-3)
10 changes: 10 additions & 0 deletions content/python-pandas-databases/reading/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
+++
title = "Reading"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 1
+++

## Reading Content

{{% children %}}
76 changes: 76 additions & 0 deletions content/python-pandas-databases/reading/pandas-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
+++
title = "Databases with pandas"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 2
+++

In addition to all the great things pandas is capable of, the library also makes it possible to inject data stored elsewhere into a pandas DataFrame or Series. This lesson will walk through the process of creating a pandas DataFrame from an existing table within a SQLite datastore.

This lesson will also utilize `sqlite3` as the database used to demonstrate how to interact with a database using a separate tool or library (pandas). Since we have already covered how to manipulate data with pandas in previous lessons, we will instead focus on the following:
1. Reading data from the database
1. Storing the data inside of a pandas DataFrame
1. Creating a new table inside of the database
- Adding the DataFrame data into the new table

{{% notice blue Note "rocket" %}}
The following examples can be found within the `data-analysis-projects/databases-python-pandas/pandas-db-walkthrough.ipynb` file.
{{% /notice %}}

## Create a DataFrame

{{% notice blue Example "rocket" %}}
```python
import sqlite3
import pandas as pd

# Create SQLite connection to Movies.db file
movies_db = sqlite3.connect('Movies.db')

# Use the pandas read_sql_query function to return a pandas DataFrame
df = pd.read_sql_query('Select * from movies;', movies_db)

# Use .head() function to return first five rows (there are only 5 rows currently)
df.head()
```
{{% /notice %}}

{{% notice blue Note "rocket" %}}
The `read_sql_query` pandas function in the above example is used to read queries into a DataFrame. You can find it's documentation here: [pandas.read_sql_query API reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html)
{{% /notice %}}

## Create New Table from DataFrame

After exploring, cleaning, or manipulating data with pandas, you can add that data back into your database. In the scenario below we will add a new movie to an existing DataFrame and then store the DataFrame inside of a new table within the SQLite database.

{{% notice blue Example "rocket" %}}
We will first start by adding a row to our existing DataFrame:

```python
new_movie = pd.DataFrame([{'title':'Dune', 'genre':'Science Fiction', 'release':2021, 'rt_score': 83}])
df = pd.concat([df, new_movie], ignore_index=True)
```

It was not necessary to update our DataFrame to add a new table to the database, but it will help visually when reading data to show that it was populated into a new table correctly.

```python {linenos=table}
# Inject dataframe into database as new table, if the table exists - replace it
df.to_sql('df', movies_db, if_exists="replace")
# Execute command to create a new table called new_movie_table with the new_movie dataframe data
movies_db.execute(
"""
create table new_movie_table as
select * from new_movie
"""
)
```

The pandas `DataFrame.to_sql` function documentation in the above code block can be found here: [pandas.DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)

```python
# Read data from newly created table, passing in existing movies_db connection as parameter
new_movies_df = pd.read_sql_query('Select * from new_movie_table;', movies_db)
# Read first 6 rows
new_movies_df.head(6)
```
{{% /notice %}}
136 changes: 136 additions & 0 deletions content/python-pandas-databases/reading/python-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
+++
title = "Databases with Python"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 1
+++

Working within a database command line interface can oftentimes be cumbersome and difficult to execute multiple commands. Because of this, analysts oftentimes prefer to use a user-friendly graphical user interface (GUI) or leverage programming languages like Python and supported libraries to interact with them. We will use Python and the **sqlite3** library to complete the following:
1. Create a new SQLite database
1. Add a table
1. Perform CRUD operations on the table

While you can accomplish more than just the above using python and pandas, like performing joins, it is not always best practice. As it relates to joins, database engines are built and optimized to perform joins extremely well. It is always important to know what you will be doing with your data before acting.

{{% notice blue Note "rocket" %}}
The examples below are also available in the `data-analysis-projects/databases-python-pandas/python-db-walkthrough.ipynb` file.
{{% /notice %}}

## sqlite3 with Python

[sqlite3](https://docs.python.org/3/library/sqlite3.html) works in conjunction with python by allowing the user to establish a connection to a file located on your machine. You can then reference the connection variable to begin executing sql commands.

The basic syntax is as follows:

```python
import sqlite3

# If the 'Movies.db` database does not already exist, sqlite3 will create one!
movies_db = sqlite3.connect('Movies.db') # connect to database
```

{{% notice blue Note "rocket" %}}
If we were to print the `connection_variable` we would see the following output:

```python
<sqlite3.Connection object at 0x7334db1d3940> # the 0x7334db1d3940 portion will vary
```

This shows that a `sqlite3.Connection` object was created and can now be referenced using the `movies_db` variable.
{{% /notice %}}

## Cursor Objects

Now that we have established a connection to the database we need a way to execute commands. The **cursor object** is a database cursor which allows us to do so.

We can create a new cursor object by referencing the cursor function and storing it within a variable:

```python
# variable named "cur" that references the connection object:
cur = movies_db.cursor()
```

The basic syntax for executing a command with the cursor object is as follows:

```python
cur.execute("SQL statement")
```

## Creating a table

```python
cur.execute("CREATE TABLE table_name (column DATA TYPE, column DATA TYPE, etc..)")
```

{{% notice blue Note "rocket" %}}
You can find a list of SQLite data types here: [Data Types in SQLite](https://sqlite.org/datatype3.html).
{{% /notice %}}

### Insert Table Values

```python
cur.execute("INSERT INTO table_name ('value-one', 'value-two', etc..)")
```

### Reading Data

There are a couple strategies that you can use to read data from your database. Since the cursor object is an iterator in and of itself, you can iterate over the cursor object to fetch data.

{{% notice blue Example "rocket" %}}

```python
# For loop to iterate over cursor object
for row in cur.execute("SELECT column FROM table_name")
print(row)
```

The above for loop will return all rows within the specified column inside of the `SELECT` statement. You could also pass the `*` flag to return all values from all rows within the database.
{{% /notice %}}

You can also use the `fetchall()` function to read data from the database like so:

```python
cur.execute("SELECT * FROM table_name").fetchall()
```

### Updating Data

When running dynamic queries against a database there are some risks to be made aware of, specifically SQL injection attacks or SQLi attacks. While we have multiple strategies to avoid SQLi attacks, the one we will focus on in this class is using **parameterized queries**.

Parameterized queries allow you to inject a placeholder (`?`) into your SQL statement and pass in the desired value as a parameter.

{{% notice blue Example "rocket" %}}
```python
# Desired value
update_release_year = 1997 # Value that needs to be updated
movie_to_update = 'Good Will Hunting'
# Execute an UPDATE statement using the ? placeholder, passing in the update variables as a list literal
cur.execute("UPDATE movies SET release = ? WHERE title = ?", [update_release_year, movie_to_update])
```
{{% /notice %}}

### Deleting Data

Similar to updating data we will want to use parameterized queries as best and safe practice!

{{% notice blue Example "rocket" %}}
```python
movie_to_delete = 'Inception' # Too many sci fi movies!
# Execute a DELETE statement using the ? placeholder, passing in the variable as a list literal
cur.execute("DELETE FROM movies WHERE title = ?", [movie_to_delete])
```
{{% /notice %}}

## Check Your Understanding

{{% notice green Question "rocket" %}}
What type of database is SQLite?

<!-- Solution: disk-based database, does not require its own server. Stored isnide of a file on your machine -->
{{% /notice %}}

{{% notice green Question "rocket" %}}
What is the primary reason for creating a cursor object?

<!-- Solution: Executing commands inside of the datastore -->
{{% /notice %}}
27 changes: 27 additions & 0 deletions content/python-pandas-databases/studio/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
+++
title = "Studio: Working with Databases in Python"
date = 2021-10-01T09:28:27-05:00
draft = false
weight = 3
+++

## Getting Started

Before getting started with the coding, create a `watchlist` of 10 TV shows/movies that you, personally, are interested in watching.

Open up the `data-analysis-projects/databases-python-pandas/studio/databases-and-py.ipynb` file in Jupyter Notebook.

## In Your Notebook

We will be using the [TV Shows dataset](https://www.kaggle.com/datasets/ruchi798/tv-shows-on-netflix-prime-video-hulu-and-disney) from Kaggle. We have included the CSV for you in the repository you just cloned.

You will also be using the `watchlist` you created to figure out which streaming services contain the shows that you want to watch next so you can decide which one is worth the money to you.

As you complete the different tasks in the studio, you may choose between using pandas or SQL.
Remember that during the prep work, we learned that one is oftentimes more efficient at certain tasks than the other, so choose wisely!

## Submitting Your Work

When finished, make sure to push your changes up to GitHub!

Copy the link to your GitHub repository and paste it into the submission box in Canvas for **Studio: Working with Databases in Python** and click *Submit*.

0 comments on commit 20ddebf

Please sign in to comment.