Skip to content

Commit

Permalink
Merge pull request #92 from LaunchCodeEducation/christian-issue-58
Browse files Browse the repository at this point in the history
Issue 58 - DA class 18 textbook changes
  • Loading branch information
mlambert125 authored Sep 23, 2024
2 parents fcad4c2 + ceb1989 commit 462d018
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 6 deletions.
2 changes: 1 addition & 1 deletion content/data-manipulation/reading/aggregation/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Let's take things a step further and aggregate the data within the grouped colum
grouping_variable = your_data.groupby(["column_name"]).sum()
```

The above code will return the sum of all values within the provided column, giving you a count of each unique value inside.
The above code will group the dataset by 'column_name', and will then return the sum of all values within each column grouped by each unique value in the 'column_name' column
{{% /notice %}}

The `.groupby()` method can take multiple columns as a parameter upon creation, but it is best practice to only provide as many columns as needed for your analysis. As you increase the amount of grouped columns, you are also increasing the amount of compute power and memory needed, which can lead to performance issues.
Expand Down
4 changes: 2 additions & 2 deletions content/data-manipulation/reading/recoding-data/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ data["survived"] = data["survived"].replace(to_replace={0: False, 1: True})
Creating a function to aggregate data or create new columns is another common practice used when analyzing data. Pandas utilizes the `.apply()` method to execute a function on a pandas Series or DataFrame.

{{% notice blue Example "rocket" %}}
SUppose you wanted to know how many survivors under the age of 20 are still alive from the titanic dataset:
Suppose you wanted to know how many survivors under the age of 20 are still alive from the titanic dataset:

```python
import pandas as pd
Expand Down Expand Up @@ -74,6 +74,6 @@ print(data["under_21_survivors"].value_counts())
## Summary

When recoding your data there are some things you should think about:
1. Does the original data need to remain in-tact?
1. Does the original data need to remain intact?
1. What data tyes should be replaced with new values, and what type of data should the new value be?
1. Would a function be useful for repetitive tasks and manipulation?
8 changes: 5 additions & 3 deletions content/data-manipulation/studio/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@ weight = 3

## Getting Started

For the studio, we will be continuing to work with pumpkins in small groups.
This week's studio will be continuing our work with the USDA Pumpkin Prices dataset.

## In Your Notebooks
The starting notebook for this studio can be found at `data-analysis-projects/data-manipulation/studio/data-manipulation-studio/data-manipulation-studio.ipynb`

## In Your Notebook

As always confer with a partner if you want to talk something through!

You are more than welcome to also pull up your notebook from the last lesson to help you speed through cleaning the data and get straight to manipulating it.
You are more than welcome to pull up your notebook from the last lesson to help you speed through cleaning the data and get straight to manipulating it.

## Submitting Your Work

Expand Down

0 comments on commit 462d018

Please sign in to comment.