05-visualizing-data/ch13-graphing-the-results-of-checkbox-responses.html

<section class="red" data-type="chapter">
<header>
  <div class="icon"><img src="../images/sections/05/checkbox.png" /></div>
  <p>Chapter 13</p>
  <h1>Graphing the Results of Checkbox Responses</h1>
  <p data-type="author">By Ellen Cooper</p>
</header>

<section data-type="sect1">
<p>This chapter focuses on checkbox responses or multiple response questions, where a question can be answered with more than one answer, if applicable.</p>

<h2>Checkboxes Vs. Radio Buttons</h2>

<p>Let&rsquo;s say you&rsquo;re doing a survey and you&rsquo;re interested in what multimedia devices your respondents have used over the last six months. You would use a checkbox response question if you wanted to find out all of the multiple devices that people used over the six-month period. A radio button only allows respondents to select a single answer, so you could only use it to find out, for example, which one device a person used most often during that same six-month period. Each type of question has merit; which you should use just depends on the purpose of your question and how you are going to use the results.</p>

<h2>What a Checkbox Question Really Is</h2>

<p>So here&rsquo;s the most important thing to know about checkbox questions, and it&rsquo;s why you have to consider how you graph the results of checkbox questions differently than you do the results of other types of questions. Checkbox questions aren&rsquo;t really their own question type! They&rsquo;re actually just a shorthand way to write a <em>series</em> of yes/no questions. A respondent checks a box if an answer choice applies and leaves it blank if it doesn&rsquo;t.</p>

<p>We have the checkbox format because it makes surveys more streamlined and easier to understand. In the example below, we asked, &ldquo;Which of the following electronic devices have you used in the past 6 months? Please select all that apply.&rdquo; The premise behind the question is that it&rsquo;s likely that a respondent could use more than one electronic device over a 6-month period, such as a cell phone and a tablet.</p>

<p>If we were to pose this as a series of yes/no questions, it would read something like this:</p>

<table>
	<tbody>
		<tr>
			<th colspan="2">In the last 6 months, have you used a/an:</th>
		</tr>
		<tr>
			<td>Desktop PC?</td>
			<td>Y / N</td>
		</tr>
		<tr>
			<td>Desktop Mac?</td>
			<td>Y / N</td>
		</tr>
		<tr>
			<td>iPad?</td>
			<td>Y / N</td>
		</tr>
		<tr>
			<td>Tablet (other than an iPad)?</td>
			<td>Y / N</td>
		</tr>
		<tr>
			<td>Laptop (Mac or PC)?</td>
			<td>Y / N</td>
		</tr>
		<tr>
			<td>Cell phone?</td>
			<td>Y / N</td>
		</tr>
	</tbody>
</table>

<p>With the checkbox question, survey respondents only need to check the answers that apply to them, while in a series of yes/no questions, they would need to respond to every question, even if all their answers were &ldquo;No&rdquo;. With a checkbox question, you can simply provide a &ldquo;None&rdquo; option at the bottom of your choice list to handle this situation. When several yes/no questions are related, checkbox questions also prevent repetition of instructions, since all the questions are grouped into one.</p>

<p>These changes can help improve survey readability, flow, length, and overall response rates. However, if you want to handle the resulting data correctly, it is very important for you to remember that the underlying structure of a checkbox is actually a series of dichotmous questions.</p>

<h2>How Checkbox Answers are Received</h2>

<p>How your results or raw data are compiled will, of course, depend on the program you are using to design and distribute your survey. One of the more common formats is shown in the table below; this particular data structure reflects how a checkbox question serves as a quick way to represent a series of yes/no questions. A &ldquo;1&rdquo; is shown when a device was selected and a &ldquo;0&rdquo; if a device was not selected.</p>

<table>
	<tbody>
		<tr>
			<th>Date</th>
			<th>Q1_PC</th>
			<th>Q1_Mac</th>
			<th>Q1_Tablet</th>
			<th>Q1_iPad</th>
			<th>Q1_Laptop</th>
			<th>Q1_Cellphone</th>
			<th>Q1_None</th>
		</tr>
		<tr>
			<td>10/02/2013</td>
			<td>1</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>10/01/2013</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>1</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/27/2013</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>1</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
			<td>1</td>
			<td>0</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>0</td>
			<td>1</td>
		</tr>
		<tr>
			<td>&nbsp;</td>
			<td>6</td>
			<td>3</td>
			<td>3</td>
			<td>2</td>
			<td>5</td>
			<td>9</td>
			<td>1</td>
		</tr>
	</tbody>
</table>

<p>You might also receive results like this:</p>

<table>
	<tbody>
		<tr>
			<th>Date</th>
			<th>Response</th>
		</tr>
		<tr>
			<td>10/02/2013</td>
			<td>PC, Tablet, Cellphone</td>
		</tr>
		<tr>
			<td>10/01/2013</td>
			<td>Mac, iPad, Tablet, Cellphone</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>PC, iPad, Cellphone</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>PC, Tablet, Cellphone</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>Mac, Laptop, Cellphone</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>Mac, Tablet, Cellphone</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>PC, Laptop, Cellphone</td>
		</tr>
		<tr>
			<td>09/27/2013</td>
			<td>PC, Laptop, Cellphone</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>PC, Laptop, Cellphone</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>None</td>
		</tr>
	</tbody>
</table>

<p>Or like this:</p>

<table>
	<tbody>
		<tr>
			<th>Date</th>
			<th>Q1_PC</th>
			<th>Q1_Mac</th>
			<th>Q1_Tablet</th>
			<th>Q1_iPad</th>
			<th>Q1_Laptop</th>
			<th>Q1_Cellphone</th>
			<th>Q1_None</th>
		</tr>
		<tr>
			<td>10/02/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>Q1_Tablet</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>10/01/2013</td>
			<td>&nbsp;</td>
			<td>Q1_Mac</td>
			<td>&nbsp;</td>
			<td>Q1_iPad</td>
			<td>Q1_Laptop</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_iPad</td>
			<td>&nbsp;</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>Q1_Tablet</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>&nbsp;</td>
			<td>Q1_Mac</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Laptop</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>&nbsp;</td>
			<td>Q1_Mac</td>
			<td>Q1_Tablet</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/30/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Laptop</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/27/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Laptop</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>Q1_PC</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_Laptop</td>
			<td>Q1_Cellphone</td>
			<td>&nbsp;</td>
		</tr>
		<tr>
			<td>09/26/2013</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td>Q1_None</td>
		</tr>
	</tbody>
</table>

<p>All three of the above examples represent the same answers, but they&rsquo;re formatted in different ways. Since different survey collection tools format checkbox responses in different ways, you may need to reformat your data to match the specific format required by the visualization software you are using.</p>

<p>Let&rsquo;s take a look at a summary of possible responses to the checkbox question posed above.</p>

<table class="tableizer-table">
	<tbody>
		<tr class="tableizer-firstrow">
			<th>Table 1 Electronic Devices Used</th>
			<th>Total</th>
		</tr>
		<tr>
			<td>PC</td>
			<td>421 (84%)</td>
		</tr>
		<tr>
			<td>Mac (desktop)</td>
			<td>300 (60%)</td>
		</tr>
		<tr>
			<td>Tablet (any kind)</td>
			<td>285 (57%)</td>
		</tr>
		<tr>
			<td>iPad</td>
			<td>185 (37%)</td>
		</tr>
		<tr>
			<td>Laptop</td>
			<td>200 (40%)</td>
		</tr>
		<tr>
			<td>Cell phone (any kind)</td>
			<td>450 (90%)</td>
		</tr>
	</tbody>
</table>

<p>You may notice that the total number of responses (1,841) is greater than the number of people that did the survey (N=500)! Why? It&rsquo;s because of the whole &ldquo;a checkbox is really a bunch of yes/no questions rolled into one&rdquo; thing. The total possible number of checked boxes in a checkbox question?  It&rsquo;s the (# of &ldquo;real&rdquo; answer options) X (# of respondents).  (Here, a &ldquo;real&rdquo; answer option means one that isn&rsquo;t &ldquo;None,&rdquo; &ldquo;N/A&rdquo; or &ldquo;Prefer not to Answer,&rdquo; since selecting one of those options would prevent a person from choosing any other answers in additional to that.) For this survey, there were 6 device options (aside from &ldquo;None&rdquo;) that a person could select and there were 500 people that answered the survey. So the total number of boxes that had the potential to get checked during the survey was 3000, not just 500.</p>

<h2>Displaying Your Results</h2>

<p>Since the total number of responses is greater than the number of respondents, you need to use some caution when creating graphs based on these data. There are a few different ways to illustrate your results, depending on what your overall question of interest is.</p> 

<h3>Bar Charts</h3>

<p>One way is to construct a bar graph and base the percentages on the number of respondents that selected each answer choice, like in the graph below. Clearly, cellphones (90%) and PCs (84%) are the most commonly-cited electronic devices used in the past six months.</p>

<figure><img alt="Electronic devices used" src="../images/sections/05/electronic-devices-used-1a.png" /></figure>

<p>However, the fact that the total adds to more than 100% can be unsettling for some. An alternative way to illustrate the results is to base the percentages on the total mentions (1,841). Analyzing results based on mentions is useful when you want the percentages to total 100%.</p>

<figure><img alt="Electronic devices used" src="../images/sections/05/electronic-devices-used-1b.png" /></figure>

<p>Keep in mind that this way of displaying data is based on the number of mentions of a device, not the number of consumers who use that device. While you may be tempted to say, &ldquo;24% of consumers used a cellphone in the past six months,&rdquo; the bar chart above isn&rsquo;t actually displaying that information.</p>

<p>Rather than the number of respondents (500), the chart shows the number of responses (1,841) to our survey on electronic devices. So, it would be better to say, &ldquo;Based on all devices mentioned, cellphones were mentioned approximately one-quarter (24%) of the time, followed closely by PCs (23%).&rdquo; This percentage represents the number of cellphone mentions (450) out of the total mentions (1,841) and accurately reflects the information on display.</p>

<p>Depending on your question of interest, you can also group your data. Maybe you&rsquo;re more interested in reporting how many devices people used rather than exactly what devices were. You could make a column chart like the one below.</p>

<figure><img alt="Percent by grouping" src="../images/sections/05/devices-reported.png" /></figure>

<div data-type="warning"><h3>Warning about pie charts and checkbox questions</h3>

<p>Don&rsquo;t use pie charts if you&rsquo;re basing your percentages on the number of respondents that selected each answer choice! Pie charts are used to represent part-to-whole relationships and the total percentage of all the groups has to equal 100%. Since the possible sum of the percentages is greater than 100% when you base these calculations on the number of respondents, pie charts are an incorrect choice for displaying these results.</p>
</div>

<h3>Over Time</h3>

<p>If your checkbox responses have been collected over a period of time, say 2009&ndash;2013, you could display the responses in a line chart as shown below.</p>

<figure><img alt="Line chart for checkboxes" src="../images/sections/05/checkbox-line.png" /></figure>

<h2>Attitudinal Measurements</h2>

<p>So far, we&rsquo;ve been looking at the use of checkbox questions to gather data on basic counts (e.g. electronic devices used). Checkbox responses can also be used to assess agreement with <a class="glossterm" href="glossary01.html#statement-attitudinal" target="_blank">attitudinal statements</a>. The graph below shows attitudes of homeowners towards owning their home. In this case, since the statistic of interest is what percentage of homeowners agree with each statement, it is probably best to keep the graph as is, with the total exceeding 100%.</p>

<figure><img alt="Home ownership bar graph" src="../images/sections/05/home-ownership.png" /></figure>
</section>
</section>