From 67b013041b857f2f6fe5c67074a9846577c62785 Mon Sep 17 00:00:00 2001 From: johncfrauen-lc101 Date: Sun, 22 Sep 2024 18:32:07 -0600 Subject: [PATCH] updating book content for readability --- content/data-visualization/_index.md | 2 +- .../reading/chart-styles/_index.md | 16 ++++++-- .../reading/viz-best-practices/_index.md | 38 ++++++++++++++----- .../reading/vizzes-with-python/_index.md | 14 +++++-- 4 files changed, 52 insertions(+), 18 deletions(-) diff --git a/content/data-visualization/_index.md b/content/data-visualization/_index.md index f523f8f5..907a5a00 100644 --- a/content/data-visualization/_index.md +++ b/content/data-visualization/_index.md @@ -13,7 +13,7 @@ hidden = false Upon completing all the content in this chapter, you should be able to do the following: 1. Talk about what to do and what not to do with data visualizations. -1. Differentiate between chart styles and pros and cons to each style. Scatter plots, histograms, bar charts, pie charts, etc. +1. Differentiate between chart styles and provide the pros and cons for each style. Scatter plots, histograms, bar charts, pie charts, etc 1. Use Matplotlib and Seaborn to create visualizations. ## Key Terminology diff --git a/content/data-visualization/reading/chart-styles/_index.md b/content/data-visualization/reading/chart-styles/_index.md index 36aa8372..22e6386d 100644 --- a/content/data-visualization/reading/chart-styles/_index.md +++ b/content/data-visualization/reading/chart-styles/_index.md @@ -29,23 +29,31 @@ Once you figure out what category of chart you need, then you can dive deeper in {{< /rawhtml >}} -For now, we just want you to focus in on practicing making visualizations, trying out different chart styles, and developing your thought process around making an effective visualization. You do not need to memorize all the different chart styles at this time. +For now, we just want you to practice making visualizations, trying out different chart styles, and developing your thought process around making an effective visualization. You do not need to memorize all the different chart styles at this time. Let's dive deeper into a few of the chart styles that you may recognize from earlier chapters in this book. ## Bar Charts and Column Charts -Bar charts and column charts both fall into the category of comparison charts. A bar chart has the categories on the y-axis, so the bars are horizontal, whereas a column chart has the categories on the x-axis so the bars are vertical. While the difference in orientation might seem minor, it can make a huge difference in the readability of the chart. For example, if the categories have very long names or there are a lot of categories, your colleagues might find it easier to read those labels if the labels are displayed on the y-axis as opposed to the x-axis. Don't hesitate to switch between a column and a bar chart to find something that looks nice to you! +Bar charts and column charts both fall into the category of comparison charts. + +A bar chart has the categories on the y-axis, so the bars are horizontal, whereas a column chart has the categories on the x-axis so the bars are vertical. While the difference in orientation might seem minor, it can make a huge difference in the readability of the chart. + +For example, if the categories have very long names or there are a lot of categories, your colleagues might find it easier to read those labels if the labels are displayed on the y-axis as opposed to the x-axis. Don't hesitate to switch between a column and a bar chart to find something that looks nice to you! You want to be mindful of using a stacked column chart though because that falls into the category of composition charts. ## Scatterplots -Scatterplots are relationship plots. A scatterplot can help pinpoint what the relationship exactly is in a dataset. For example, we want to visualize the number of butterflies seen in an area and the number of gardeners signed up with a pollinator planting program. We might assume before we assemble our visualization that there is a relationship between the two so a scatterplot makes sense here. +Scatterplots are relationship plots. A scatterplot can help pinpoint what the relationship exactly is in a dataset. + +For example, we want to visualize the number of butterflies seen in an area and the number of gardeners signed up with a pollinator planting program. We might assume before we assemble our visualization that there is a relationship between the two so a scatterplot makes sense here. ## Histogram -A histogram is a type of distribution chart that looks a little bit like a bar chart. The key with a distribution chart is that we are trying to understand how the data is distributed. Is the data spread out or is it tightly packed? Histograms are oftentimes used in EDA to shed light on oddities in summary statistics and it is because they are distribution charts. If we wanted to plot the daily butterfly population over the course of the summer, a histogram would be a great choice! +A histogram is a type of distribution chart that looks a little bit like a bar chart. The key with a distribution chart is that we are trying to understand how the data is distributed. Is the data spread out or is it tightly packed? + +Histograms are oftentimes used in EDA to shed light on oddities in summary statistics and it is because they are distribution charts. If we wanted to plot the daily butterfly population over the course of the summer, a histogram would be a great choice! ## Pie Charts diff --git a/content/data-visualization/reading/viz-best-practices/_index.md b/content/data-visualization/reading/viz-best-practices/_index.md index fa61777a..957735f8 100644 --- a/content/data-visualization/reading/viz-best-practices/_index.md +++ b/content/data-visualization/reading/viz-best-practices/_index.md @@ -5,27 +5,37 @@ draft = false weight = 1 +++ -So far, we have shown quite a few visualizations. As much as visualizations are an integral part of EDA and cleaning data, they are also an integral part of conveying your findings to project stakeholders and other business leaders. We need to now focus in on what we should be doing when creating visualizations to present to our colleagues. While we want our presentations to stand out, our visualizations should also be effective without us having to be there to explain what is going on. +So far, we have shown quite a few visualizations. As much as visualizations are an integral part of EDA and cleaning data, they are also an integral part of conveying your findings to project stakeholders and other business leaders. + +We now need to focus in on what we should be doing when creating visualizations to present to our colleagues. While we want our presentations to stand out, our visualizations should also be effective without us having to be there to explain what is going on. Since these visualizations are part of communicating our findings to our collaborators, even before we start choosing what chart we want to use, we can follow some best practices for all types of charts. ## Labeling Your Chart -No matter what chart style you choose, you need to properly label your charts. Labels include axes labels and chart titles. Your labels should be clear and concise. +No matter what chart style you choose, you need to properly label your charts. Labels include axes labels (X-axis, Y-axis, in rare instances Z-axis) and chart titles. Your labels should be clear and concise. Let's say that we work for a plant nursery and we want to visualize a tomato plant's growth over the course of six months. A poor chart title might be "Plant Growth". While accurate, since we are talking about a tomato plant, our colleagues at the nursery would not know it is a tomato plant unless you are present to explain. Keeping in mind that we want our visualizations to stand on their own, a better title would be "Tomato Plant Growth March 2024 - August 2024". -When it comes to labeling the axes, we want to make sure that we are clear and that we include units of measurement, if necessary. Let's assume we want to put the time on the x-axis and the height of the plant at that time on the y-axis. Before we start labeling our axes, we need to look at our data points for mapping the tomato plant's growth. Was a new measurement taken hourly, daily, weekly, or monthly? How is the data formatted? If it was daily, is the data in a date format, such as `03-16-2024`, or is it listed as number of days such as `15`? If we title the x-axis "Date", then our colleagues are going to have to study the x-axis much closer to figure out the answers to these questions. If we title it "Days of Growth Since 3/1/24", then our colleagues will be able to figure out that our data was measured in number of days as opposed to a date format and that the measurements were taken daily as opposed to the alternatives. Your colleagues should not have to study an axis of a visualization very closely to determine what is on that axis. The label should tell them what they are looking at. Furthermore, you may want to make adjustments to an axis so the points are legible and not all scrunched up and if you do, that may make it even harder for your colleagues to understand what is on that axis without a proper label. +When it comes to labeling the axes, we want to make sure that we are clear and that we include units of measurement, if necessary. Let's assume we want to put the time on the x-axis and the height of the plant at that time on the y-axis. + +Before we start labeling our axes, we need to look at our data points for mapping the tomato plant's growth. Was a new measurement taken hourly, daily, weekly, or monthly? How is the data formatted? If it was daily, is the data in a date format, such as `03-16-2024`, or is it listed as number of days such as `15`? + +If we title the x-axis "Date", then our colleagues are going to have to study the x-axis much closer to figure out the answers to these questions. If we title it "Days of Growth Since 3/1/24", then our colleagues will be able to figure out that our data was measured in number of days as opposed to a date format and that the measurements were taken daily as opposed to the alternatives. + +Your colleagues should not have to study an axis of a visualization very closely to determine what is on that axis. The label should tell them what they are looking at. Furthermore, you may want to make adjustments to an axis so the points are legible and not all scrunched up and if you do, that may make it even harder for your colleagues to understand what is on that axis without a proper label. + +With the x-axis labeled, we can turn our attention to the y-axis. We need to look at the data points again. -With the x-axis labeled, we can turn our attention to the y-axis. We need to look at the data points again. What measurement system was used to measure the plant? If we plot out the points and label the y-axis, "Height of Central Stem", our colleagues will not know if the height was measured in inches, centimeters, feet, or meters? We need to make sure we add units to our labels so everyone understands what they are looking at. When adding units of measurements to an axis label, you include it in parantheses at the end of the label: "Height of Central Stem (in)". +What measurement system was used to measure the plant? If we plot out the points and label the y-axis, "Height of Central Stem", our colleagues will not know if the height was measured in inches, centimeters, feet, or meters. We need to make sure we add units to our labels so everyone understands what they are looking at. When adding units of measurements to an axis label, you include it in parantheses at the end of the label: "Height of Central Stem (in)". ## Time to Add Color! -Next, you want to pay attention to the colors you choose. Sometimes, you may find that the default color options are perfectly fine, but other times you may want to use colors from your company's branding. Colors should be contrasting. Look at a color wheel, is the color you want to use on the opposite side of the color wheel? +Next, you want to pay attention to the colors you choose. Sometimes, you may find that the default color options are perfectly fine, but other times you may want to use colors from your company's branding. Colors should be contrasting. Look at a color wheel: is the color you want to use on the opposite side of the color wheel? -With your chart properly labelled, you can turn your attention to the colors used in your chart. In general, Google Sheets and Python libraries provide beautiful default color options for your visualization, however, during your career as an analyst, you may need to customize the color choices a bit. When you are working as an analyst, you may want to choose colors from your company's branding to ensure that your visualizations look sleek with the rest of your presentation. However, if you do not have a color palette provided for you, you may need to turn your attention to the color wheel. +With your chart properly labelled, you can turn your attention to the colors used in your chart. In general, Google Sheets and Python libraries provide beautiful default color options for your visualization. However, during your career as an analyst, you may need to customize the color choices a bit. When you are working as an analyst, you may want to choose colors from your company's branding to ensure that your visualizations look sleek with the rest of your presentation. However, if you do not have a color palette provided for you, you may need to turn your attention to the color wheel. -Humans have been refining the color wheel for centuries, ever since Isaac Newton mapped out the first one. The color wheel can be a helpful reference when you find yourself having to pick out your own color scheme for visualizations. Since most visualizations will be presented on-screen, we are going to focus on the RGB color wheel, however, you may have already seen the RYB color wheel commonly used for print. +Humans have been refining the color wheel for centuries, ever since Isaac Newton mapped out the first one. The color wheel can be a helpful reference when you find yourself having to pick out your own color scheme for visualizations. Since most visualizations will be presented on-screen, we are going to focus on the RGB color wheel. {{< rawhtml >}}
@@ -34,10 +44,20 @@ Humans have been refining the color wheel for centuries, ever since Isaac Newton
{{< /rawhtml >}} -The RGB color wheel has three primary colors: red, green, and blue. The three **primary colors** are equidistant from each other on the color wheel. When picking three colors that go well together, choosing three colors that are equidistant from each other on the wheel is a pretty safe bet. You may notice that another popular color scheme in the world of tech, cyan, yellow, and magenta, are also equidistant from each other on the RGB color wheel. These three are the **secondary colors** on the color wheel and are made by mixing the three primary colors. When picking two colors that go well together, you can choose **complementary colors**, which are two colors that reside directly across from each other on the color wheel. In the case of the RGB color wheel, magenta and green are complementary colors and as Lily Pulitzer has already demonstrated, those two colors look good together. You can find any number of wonderful combinations by using the color wheel. +The RGB color wheel has three primary colors: red, green, and blue. The three **primary colors** are equidistant from each other on the color wheel. When picking three colors that go well together, choosing three colors that are equidistant from each other on the wheel is a pretty safe bet. + +You may notice that the colors of another popular color scheme in the world of tech (cyan, yellow, and magenta) are also equidistant from each other on the RGB color wheel. These three are the **secondary colors** on the color wheel and are made by mixing the three primary colors. When picking two colors that go well together, you can choose **complementary colors**, which are two colors that reside directly across from each other on the color wheel. + +In the case of the RGB color wheel, magenta and green are complementary colors and, as Lily Pulitzer has already demonstrated, those two colors look good together. You can find any number of wonderful combinations by using the color wheel. ## Keep It Simple! -Finally, you should make sure that you are not putting too much info on one chart. The more info on one chart, the harder it is for someone to read that one chart. If you are trying to map out multiple lines on one line chart, the chart might become difficult to read once you add the third line and will possibly become too busy once you add a fifth. While you may find yourself wanting to minimize the number of visualizations you make, you should not do so at the expense of your visualizations' readability. For example, imagine that you are trying to visualize all quiz scores over time for a third grade class of 30 students. If you wanted to map out each student's scores as one line on the chart, that would be 30 lines on one chart, making for a very busy chart! On the other hand, if you wanted to map out each student's scores as their own chart, you would need to make 30 charts, making a lot of work for you. However, if the teacher has already divided the class into 6 pods of 5 students, then it might make sense to make a chart for each pod with a line for each student. +Finally, you should make sure that you are not putting too much info on one chart. + +The more info on one chart, the harder it is for someone to read that one chart. If you are trying to map out multiple lines on one line chart, the chart might become difficult to read once you add the third line and will possibly become too busy once you add a fourth. While you may find yourself wanting to minimize the number of visualizations you make, you should not do so at the expense of your visualizations' readability. + +For example, imagine that you are trying to visualize all quiz scores over time for a third grade class of 30 students. If you wanted to map out each student's scores as one line on the chart, that would be 30 lines on one chart, making for a very busy chart! On the other hand, if you wanted to map out each student's scores as their own chart, you would need to make 30 charts, making a lot of work for you. + +However, if the teacher has already divided the class into 6 pods of 5 students, then it might make sense to make a chart for each pod with a line for each student. By making sure that all your charts are properly labelled, that the colors are a good combination, and that you are making sure that it is not too busy, you are well on your way to making some beautiful visualizations. The only other best practice you want to follow when making visualizations is choosing the right chart style for the chart you are putting together. Let's dive into that next. \ No newline at end of file diff --git a/content/data-visualization/reading/vizzes-with-python/_index.md b/content/data-visualization/reading/vizzes-with-python/_index.md index dccbace4..731e9c55 100644 --- a/content/data-visualization/reading/vizzes-with-python/_index.md +++ b/content/data-visualization/reading/vizzes-with-python/_index.md @@ -5,7 +5,11 @@ draft = false weight = 3 +++ -Armed with visualization best practices, we can dive into Matplotlib and Seaborn, the two Python libraries oftentimes used with pandas to create beautiful visualizations. Seaborn is built on top of Matplotlib and can make more complex visualizations with customized color palettes. Matplotlib might show up more as a tool for EDA where the visualizations are mainly tools for us as analysts as opposed to visualizations to present to business leaders. +Armed with visualization best practices, we can dive into Matplotlib and Seaborn, the two Python libraries oftentimes used with pandas to create beautiful visualizations. + +Seaborn is built on top of Matplotlib and can make more complex visualizations with customized color palettes. + +Matplotlib might show up more as a tool for EDA, where the visualizations are mainly tools for us as analysts as opposed to visualizations to present to business leaders. To import Matplotlib, we can use the following common syntax: @@ -21,7 +25,9 @@ import seaborn as sns {{% notice blue Note %}} -Throughout your career, you may notice that sometimes you need to use `plot.show()` to get your visualization to actually appear on the screen. This depends on what environment you are running your code in. If you are running your code in the terminal, you need to add `plot.show()`, however, if you are running your code in an environment like a Jupyter notebook, you do not need `plot.show()`. +Throughout your career, you may notice that sometimes you need to use `plot.show()` to get your visualization to actually appear on the screen. This depends on what environment you are running your code in. If you are running your code in the terminal, you need to add `plot.show()`. + +However, if you are running your code in an environment like a Jupyter notebook, you do not need `plot.show()`. {{% /notice %}} @@ -29,7 +35,7 @@ Throughout your career, you may notice that sometimes you need to use `plot.show ### Matplotlib -For column charts, you will need to use `bar()` method and for bar charts, you will need to use the `bar()` method. Both methods have similar parameters so once you master one, it will be simpler to master the other one. +For column charts, you will need to use `bar()` method and for bar charts, you will need to use the `barh()` method. Both methods have similar parameters so once you master one, it will be simpler to master the other one. Let's revisit the plant nursery. @@ -85,7 +91,7 @@ sns.scatterplot(data=pollinator_program, x="pollinator_participants", "butterfli ### Matplotlib -Let's use histograms to butterfly population at the nursery. When we chart a histogram, we need to decide on the bin size. **Bins** are the columns in a histogram. If someone counted the number of butterflies every week for 8 weeks, then it makes sense for us to have 8 bins. +Let's use histograms to better understand the butterfly population at the nursery. When we chart a histogram, we need to decide on the bin size. **Bins** are the columns in a histogram. If someone counted the number of butterflies every week for 8 weeks, then it makes sense for us to have 8 bins. ```python {linenos=table} butterflies=[2,6,17,28,59,112,56,24]