Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change "dataframe(s)" to "data frame(s)" #20

Merged
merged 1 commit into from
Sep 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion boolean-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -668,7 +668,7 @@
"source": [
"### Logical subsetting\n",
"\n",
"Although we've been effectively using this all along, it's useful to make it explicit: booleans can be used to logically subset a dataframe. Let's say we only want the bits of a dataframe where `x` is greater than `y`:"
"Although we've been effectively using this all along, it's useful to make it explicit: booleans can be used to logically subset a data frame. Let's say we only want the bits of a data frame where `x` is greater than `y`:"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions communicate-plots.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@
"\n",
"There are two possible sources of labels: ones that are part of the data, which we'll add with `geom_text`; and ones that we add directly and manually as annotations using `geom_label`.\n",
"\n",
"In the first case, you might have a dataframe that contains labels.\n",
"In the first case, you might have a data frame that contains labels.\n",
"In the following plot we pull out the cars with the highest engine size in each drive type and save their information as a new data frame called `label_info`. In creating it, we pick out the mean values of \"hwy\" by \"drv\" as the points to label—but we could do any aggregation we feel would work well on the chart."
]
},
Expand Down Expand Up @@ -401,7 +401,7 @@
"\n",
"1. Use `geom_text()` with infinite positions to place text at the four corners of the plot.\n",
"\n",
"2. Use `geom_label()` to add a point geom in the middle of your last plot without having to create a dataframe\n",
"2. Use `geom_label()` to add a point geom in the middle of your last plot without having to create a data frame\n",
" Customise the shape, size, or colour of the point.\n",
"\n",
"3. How do labels with `geom_text()` interact with faceting?\n",
Expand Down
2 changes: 1 addition & 1 deletion data-tidy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@
"\n",
"![](https://pandas.pydata.org/docs/_images/reshaping_unstack.png)\n",
"\n",
"Let's define a multi-index dataframe to demonstrate this:"
"Let's define a multi-index data frame to demonstrate this:"
]
},
{
Expand Down
22 changes: 11 additions & 11 deletions data-transform.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@
"\n",
"**pandas** is a really comprehensive package, and this book will barely scratch the surface of what it can do. But it's built around a few simple ideas that, once they've clicked, make life a lot easier.\n",
"\n",
"Let’s start with the absolute basics. The most basic pandas object is a dataframe. A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data, even lists) in columns. It is made up of rows and columns (with each row-column cell containing a value), plus two bits of contextual information: the index (which carries information about each row) and the column names (which carry information about each column).\n",
"Let’s start with the absolute basics. The most basic pandas object is DataFrame. A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data, even lists) in columns. It is made up of rows and columns (with each row-column cell containing a value), plus two bits of contextual information: the index (which carries information about each row) and the column names (which carry information about each column).\n",
"\n",
"![](https://pandas.pydata.org/docs/_images/01_table_dataframe.svg)\n",
"\n",
Expand Down Expand Up @@ -212,7 +212,7 @@
"\n",
"1. we will use `query` to find only the rows where the destination `\"dest\"` column has the value `\"IAH\"`. This doesn't change the index, it only removes irrelevant rows. In effect, this step removes rows we're not interested in.\n",
"2. we will use `groupby` to group rows by the year, month, and day (we pass a list of columns to the `groupby` function). This step changes the index; the new index will have three columns in that track the year, month, and day. In effect, this step changes the index.\n",
"3. we will choose which columns we wish to keep after the groupby operation by passing a list of them to a set of square brackets (the double brackets are because it's a list within a dataframe). Here we just want one column, `\"arr_delay\"`. This doesn't affect the index. In effect, this step removes columns we're not interested in.\n",
"3. we will choose which columns we wish to keep after the groupby operation by passing a list of them to a set of square brackets (the double brackets are because it's a list within a data frame). Here we just want one column, `\"arr_delay\"`. This doesn't affect the index. In effect, this step removes columns we're not interested in.\n",
"4. finally, we must specify what groupby operation we wish to apply; when aggregating the information in multiple rows down to one row, we need to say how that information should be aggregated. In this case, we'll use the mean. In effect, this step applies a statistic to the variable(s) we selected earlier, across the groups we created earlier."
]
},
Expand All @@ -231,14 +231,14 @@
"id": "b8b85551",
"metadata": {},
"source": [
"You can see here that we've created a new dataframe with a new index. To do it, we used four key operations:\n",
"You can see here that we've created a new data frame with a new index. To do it, we used four key operations:\n",
"\n",
"1. manipulating rows\n",
"2. manipulating the index\n",
"3. manipulating columns\n",
"4. applying statistics\n",
"\n",
"Most operations you could want to do to a single dataframe are covered by these, but there are different options for each of them depending on what you need.\n",
"Most operations you could want to do to a single data frame are covered by these, but there are different options for each of them depending on what you need.\n",
"\n",
"Let's now dig a bit more into these operations."
]
Expand Down Expand Up @@ -299,7 +299,7 @@
"id": "18124edd",
"metadata": {},
"source": [
"But you can also access particular rows based on their location in the dataframe using `.iloc`. Remember that Python indices begin from zero, so to retrieve the first row you would use `.iloc[0]`:\n"
"But you can also access particular rows based on their location in the data frame using `.iloc`. Remember that Python indices begin from zero, so to retrieve the first row you would use `.iloc[0]`:\n"
]
},
{
Expand Down Expand Up @@ -381,7 +381,7 @@
"id": "f5e03f63",
"metadata": {},
"source": [
"In fact, there are lots of options that work with `query`: as well as `>` (greater than), you can use `>=` (greater than or equal to), `<` (less than), `<=` (less than or equal to), `==` (equal to), and `!=` (not equal to). You can also use the commands `and` as well as `or` to combine multiple conditions. Here's an example of `and` from the `flights` dataframe:"
"In fact, there are lots of options that work with `query`: as well as `>` (greater than), you can use `>=` (greater than or equal to), `<` (less than), `<=` (less than or equal to), `==` (equal to), and `!=` (not equal to). You can also use the commands `and` as well as `or` to combine multiple conditions. Here's an example of `and` from the `flights` data frame:"
]
},
{
Expand Down Expand Up @@ -410,7 +410,7 @@
"source": [
"### Re-arranging Rows\n",
"\n",
"Again and again, you will want to re-order the rows of your dataframe according to the values in a particular column. **pandas** makes this very easy via the `.sort_values` function. It takes a data frame and a set of column names to sort by. If you provide more than one column name, each additional column will be used to break ties in the values of preceding columns. For example, the following code sorts by the departure time, which is spread over four columns."
"Again and again, you will want to re-order the rows of your data frame according to the values in a particular column. **pandas** makes this very easy via the `.sort_values` function. It takes a data frame and a set of column names to sort by. If you provide more than one column name, each additional column will be used to break ties in the values of preceding columns. For example, the following code sorts by the departure time, which is spread over four columns."
]
},
{
Expand Down Expand Up @@ -516,7 +516,7 @@
"source": [
"### Creating New Columns\n",
"\n",
"Let's now move on to creating new columns, either using new information or from existing columns. Given a dataframe, `df`, creating a new column with the same value repeated is as easy as using square brackets with a string (text enclosed by quotation marks) in."
"Let's now move on to creating new columns, either using new information or from existing columns. Given a data frame, `df`, creating a new column with the same value repeated is as easy as using square brackets with a string (text enclosed by quotation marks) in."
]
},
{
Expand Down Expand Up @@ -577,7 +577,7 @@
"id": "10792ddd",
"metadata": {},
"source": [
"Very often, you will want to create a new column that is the result of an operation on existing columns. There are a couple of ways to do this. The 'stand-alone' method works in a similar way to what we've just seen except that we refer to the dataframe on the right-hand side of the assignment statement too:"
"Very often, you will want to create a new column that is the result of an operation on existing columns. There are a couple of ways to do this. The 'stand-alone' method works in a similar way to what we've just seen except that we refer to the data frame on the right-hand side of the assignment statement too:"
]
},
{
Expand Down Expand Up @@ -902,7 +902,7 @@
"source": [
"## Review of How to Access Rows, Columns, and Values\n",
"\n",
"With all of these different ways to access values in data frames, it can get confusing. These are the different ways to get the first column of a dataframe (when that first column is called `column` and the dataframe is `df`):\n",
"With all of these different ways to access values in data frames, it can get confusing. These are the different ways to get the first column of a data frame (when that first column is called `column` and the data frame is `df`):\n",
"\n",
"- `df.column`\n",
"- `df[\"column\"]`\n",
Expand All @@ -921,7 +921,7 @@
"- `df.iloc[0, 0]`\n",
"- `df.loc[\"row\", \"column\"]`\n",
"\n",
"In the above examples, square brackets are instructions about *where* to grab bits from the data frame. They are a bit like an address system for values within a dataframe. Square brackets *also* denote lists though. So if you want to select *multiple* columns or rows, you might see syntax like this:\n",
"In the above examples, square brackets are instructions about *where* to grab bits from the data frame. They are a bit like an address system for values within a data frame. Square brackets *also* denote lists though. So if you want to select *multiple* columns or rows, you might see syntax like this:\n",
"\n",
"`df.loc[[\"row0\", \"row1\"], [\"column0\", \"column2\"]]`\n",
"\n",
Expand Down
8 changes: 4 additions & 4 deletions databases.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"## Database Basics\n",
"\n",
"At the simplest level, you can think about a database as a collection of data frames, called **tables** in database terminology.\n",
"Like a **pandas** dataframe, a database table is a collection of named columns, where every value in the column is the same type.\n",
"Like a **pandas** data frame, a database table is a collection of named columns, where every value in the column is the same type.\n",
"There are three high level differences between data frames and database tables:\n",
"\n",
"- Database tables are stored on disk (ie on file) and can be arbitrarily large.\n",
Expand Down Expand Up @@ -113,7 +113,7 @@
"id": "2992b718",
"metadata": {},
"source": [
"Note that the output here is in the form a Python object called a tuple. If we wanted to put this into a **pandas** dataframe, we can just pass it straight in:"
"Note that the output here is in the form a Python object called a tuple. If we wanted to put this into a **pandas** data frame, we can just pass it straight in:"
]
},
{
Expand Down Expand Up @@ -424,7 +424,7 @@
"id": "2bb51bf6",
"metadata": {},
"source": [
"One nice feature of this is that the column names in SQL get passed straight to the column names in our dataframe.\n",
"One nice feature of this is that the column names in SQL get passed straight to the column names in our data frame.\n",
"\n",
"Now, when you're writing Python in Visual Studio Code (at least with the Python extensions installed), you get a lot of high quality syntax and auto-completion support. Extensions to the Python language also allow you to take a great deal of care over the types of variables that you are dealing with. Wouldn't it be nice to have all of that with SQL too (even when accessing it via Python)? The next two packages we'll look at provide that. Both make working with SQL databases from Python a lot easier and more productive."
]
Expand All @@ -436,7 +436,7 @@
"source": [
"## SQL with **ibis**\n",
"\n",
"It's not exactly satisfactory to have to write out your SQL queries in text. What if we could create commands directly from **pandas** commands? You can't *quite* do that, but there's a package that gets you pretty close and it's called [**ibis**](https://ibis-project.org/). **ibis** is particularly useful when you are reading from a database and want to query it just like you would a **pandas** dataframe.\n",
"It's not exactly satisfactory to have to write out your SQL queries in text. What if we could create commands directly from **pandas** commands? You can't *quite* do that, but there's a package that gets you pretty close and it's called [**ibis**](https://ibis-project.org/). **ibis** is particularly useful when you are reading from a database and want to query it just like you would a **pandas** data frame.\n",
"\n",
"**Ibis** can connect to local databases (eg a SQLite database), server-based databases (eg Postgres), or cloud-based databased (eg Google's BigQuery). The syntax to make a connection is, for example, `ibis.bigquery.connect`.\n",
"\n",
Expand Down
12 changes: 6 additions & 6 deletions dates-and-times.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@
"id": "ce8c06d8",
"metadata": {},
"source": [
"Following the discussion of the previous chapter on timezones, you can also localise timezones directly in **pandas** dataframes:\n"
"Following the discussion of the previous chapter on timezones, you can also localise timezones directly in **pandas** data frames:\n"
]
},
{
Expand Down Expand Up @@ -699,7 +699,7 @@
"source": [
"### Creating a datetime Index and Setting the Frequency\n",
"\n",
"For the subsequent parts, we'll set the datetime column to be the index of the dataframe. *This is the standard setup you will likely want to use when dealing with time series.*"
"For the subsequent parts, we'll set the datetime column to be the index of the data frame. *This is the standard setup you will likely want to use when dealing with time series.*"
]
},
{
Expand All @@ -718,7 +718,7 @@
"id": "8d5017ed",
"metadata": {},
"source": [
"Now, if we look at the first few entries of the index of dataframe (a datetime index) using `head` as above, we'll see that the `freq=` parameter is set as `None`."
"Now, if we look at the first few entries of the index of data frame (a datetime index) using `head` as above, we'll see that the `freq=` parameter is set as `None`."
]
},
{
Expand All @@ -736,7 +736,7 @@
"id": "3bd0fe54",
"metadata": {},
"source": [
"This can be set for the whole dataframe using the `asfreq` function:"
"This can be set for the whole data frame using the `asfreq` function:"
]
},
{
Expand Down Expand Up @@ -789,7 +789,7 @@
"source": [
"## Making Quick Time Series Plots\n",
"\n",
"Having managed to put your time series into a dataframe, perhaps converting a column of type string into a colume of type datetime in the process, you often just want to see the thing! We can achieve this using the `plot` command, as long as we have a datetime index.\n"
"Having managed to put your time series into a data frame, perhaps converting a column of type string into a colume of type datetime in the process, you often just want to see the thing! We can achieve this using the `plot` command, as long as we have a datetime index.\n"
]
},
{
Expand All @@ -813,7 +813,7 @@
"\n",
"### Resampling\n",
"\n",
"Quite frequently, there is a situation in which one would like to change the frequency of a given time series. A time index-based dataframe makes this easy via the `resample` function. `resample` must be told *how* you'd like to resample the data, for example via the mean or median. Here's an example resampling the monthly data to annual and taking the mean:"
"Quite frequently, there is a situation in which one would like to change the frequency of a given time series. A time index-based data frame makes this easy via the `resample` function. `resample` must be told *how* you'd like to resample the data, for example via the mean or median. Here's an example resampling the monthly data to annual and taking the mean:"
]
},
{
Expand Down
Loading