forked from ricedh/ricedh.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-voyant.html
169 lines (133 loc) · 17 KB
/
02-voyant.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!--This page was produced by pandoc using a shell script. See http://wcm1.web.rice.edu/colophon.html for more information.-->
<meta name="author" content="Alyssa Anderson" />
<title>Using Voyant for Text Analysis | Digital History Methods</title>
<link rel="stylesheet" href="./bootstrap.css" type="text/css" />
<link rel="stylesheet" href="./main.css" type="text/css" />
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Crimson+Text:400,400italic,700,700italic" rel='stylesheet' type='text/css' />
<link href='http://fonts.googleapis.com/css?family=Oswald:400,700' rel='stylesheet' type='text/css'>
</head>
<body>
<div class="container">
<div class="row">
<div class="span2"> </div>
<div class="span7">
<div class="header">
<h1><a href="index.html" style="color:#CD86C1">DIGITAL HISTORY METHODS</a></h1>
<div class="nav">
<a href="index.html" id="hover"><img src="./digital-runawayad.png" class="gray" /><p class="caption poptext">Home</p></a>
<a href="01-palladio.html" id="hover"><img src="./01-palladio.png" class="gray" /><p class="caption poptext">Mapping points</p></a>
<a href="02-voyant.html" id="hover"><img src="./02-voyant.png" /><p class="caption poptext">Text mining</p></a>
<a href="03-ner.html" id="hover"><img src="./03-ner.png" class="gray" /><p class="caption poptext">Finding locations</p></a>
<a href="04-mallet.html" id="hover"><img src="./04-mallet.png" class="gray" /><p class="caption poptext">Topic modelling</p></a>
<a href="05-twitterbot.html" id="hover"><img src="./05-twitterbot.png" class="gray" /><p class="caption poptext">Tweeting history</p></a>
</div>
</div>
<article>
<!--The title is produced by the pandoc template using the title block at the top of the markdown file-->
<h1>Using Voyant for Text Analysis</h1>
<!--The author is produced by the pandoc template using the title block at the top of the markdown file-->
<div id="dateline">
By Alyssa Anderson
</div>
<!--Begin content of the main markdown file for this page, which is processed by pandoc and output as html.-->
<p>Open with an embedded image from Voyant, and a <a href="http://voyant-tools.org/?corpus=1398028124350.4052&stopList=stop.en.taporware.txt">link to the full corpus of Texas ads</a>.</p>
<iframe width="0" height="0" src="http://voyant-tools.org/?corpus=1398028124350.4052&stopList=stop.en.taporware.txt"></iframe>
<h1 id="rationale">Rationale</h1>
<p>Digital text-mining tools can help researchers understand document collections that are prohibitively large for a close-reading. Our collection of runaway slave advertisements from Texas, Arkansas, and Mississippi totals over 2,500 individual ads! Not only would it be extremely time consuming to read this entire collection, the consistently short, boilerplate format of runaway ads can make it difficult to really distinguish between them. The ads from Texas, Arkansas, and Mississippi start to all look practically indistinguishable, making it difficult for close-reading alone to recognize pattern breaks between the states, without the assistance of computational data. This is where Voyant comes in. We hoped that the “distant-reading” capabilities of Voyant would be able to pick up on larger word usage trends that are not immediately apparent when reading with the human eye.</p>
<p>Additionally, as we read through our initial collection of Texas runaway ads, it became clear that these ads tell us little about the perspectives of runaway slaves themselves. Information about the slave is always filtered through the attitudes and beliefs of the slaveholder. Working within the limitations of this category of primary source, we became increasingly interested in analyzing this language of the slaveholders. In what sort of discourse on slavery were these subscribers engaged, and were there any differences across state lines? Across time? Voyant’s efficient “reading” skills and visualization capabilities make it easy to immediately spot differences between multiple corpuses or trends across one corpus. As students who have primarily worked with close-reading in the past, digital text-mining tools such as Voyant offer another angle of analysis into a text.</p>
<h1 id="methodology">Methodology</h1>
<p><a href="http://voyant-tools.org/">Voyant</a> is a free, online text-analysis resource. Among its tools, it generates a word cloud of most frequent words, generates graphs of word frequency across the corpus, and lets you compare multiple documents. Once you have a text uploaded, you can play around within the Voyant “skin”, opening and closing different tools, or clicking on a particular word to see trends for that word specifically. It’s also possible to generate a link to the skin that can then be shared or embedded into a page, allowing others to play around with the data on their own.</p>
<p>Once we had cleaned and prepared our collection of ads for analysis, using Voyant was very straightforward. The Voyant platform is compatible with a wide range of document formats, including plain text, HTML, XML, PDF, RTF, and MS Word. Once in the Voyant skin, it is easy to play around with tools simultaneously, as selecting a word in the “Summary” tool, for example, will automatically pull it up in the “Word Trends” tool. Use the “Words in the Entire Corpus” tool to easily search and store key words into a collection of “favourites”. Select multiple words in “Words in the Entire Corpus” to analyze them side-by-side.</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2782288/92621190-cb1d-11e3-8ea8-54b94c7d1c9a.png" alt="voyant-screenshot" /><p class="caption">voyant-screenshot</p>
</div>
<p>There are certain decisions that have to be made before even uploading your documents into Voyant. Throughout the digital reading process, one of our primary questions has been how to split the corpus of ads. For example, uploading a single document of all ads from a single state is useful for looking at language data in aggregate for a single state. This screenshot shows the word “negro” selected from all the Texas ads uploaded as one document.</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2782332/48828cca-cb1e-11e3-8d7b-5bd9282937de.png" alt="texas-all-screenshot" /><p class="caption">texas-all-screenshot</p>
</div>
<p>Because we were careful to keep our ads in chronological order, the Word Trends tool also shows trends over time. However, you can also make the choice to manually split a corpus into separate documents by time increments to more easily track specific changes across the corpus. Or, for an example of how Voyant could be applied to a different kind of primary source, a single book could be split into separate chapters. This screenshot shows the Texas ads split by decade:</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2782338/5902bc8c-cb1e-11e3-8e8a-d0c3cab9d319.png" alt="texas-all-decade-split-screenshot" /><p class="caption">texas-all-decade-split-screenshot</p>
</div>
<p>At a glance, it is easy to see that the Word Trends graph looks very different depending on whether the corpus is segmented into 10 even parts by Voyant, or split into 10-year increments. There’s not one right way to split a corpus, it just depends on what kind of analysis you want to focus on. It is good to keep in mind though, that these decisions made before the Voyant skin is even opened will alter the appearance of the data.</p>
<p>Finally, certain corpus categories can be split by geographical location. This method of dividing a corpus is useful for comparing language trends across geographical lines. This screenshot shows the Texas corpus of ads split by the Houston area (<em>The Telegraph and Texas Register</em> 1836 to 1860) and the Austin area (<em>Texas Gazette</em> 1850 to 1860), and reveals that the word “negro” appears relatively more frequently in the Austin area.</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2782511/cc32c678-cb20-11e3-9729-9ff9e808fe37.png" alt="tx-split-screenshot" /><p class="caption">tx-split-screenshot</p>
</div>
<p>A few useful tips for using Voyant: To apply stopwords, click on the wrench icon in the upper right-hand corner to choose a pre-made list, or add your own. To save a url to the current Voyant skin, create an HTML link to embed, or download an image, click on the floppy icon.</p>
<p>Find <a href="http://docs.voyant-tools.org/tools/">here</a> a list of all Voyant tools, including some which are not pre-loaded into the basic Voyant skin. For a tutorial on how to compare one Voyant skin corpus to a separate Voyant skin corpus, visit <a href="http://www.briancroxall.net/2012/07/18/comparing-corpora-in-voyant-tools/">this post</a> from Brian Croxall.</p>
<h1 id="conclusions">Conclusions</h1>
<p>Our primary findings in Voyant focused heavily on the use of racial and ethnic descriptors in the ads. In most runaway ads, the subscriber tends to give some description of the runaway’s complexion or racial status. We were interested in tracking variations in these terms across states.</p>
<p>This graph from shows trends for the word “African” across the entire collection of Texas ads from 1836 to 1860. Over time, occurrences of this word goes down until eventually disappearing. As a class, we speculated about potentially finding evidence of the Atlantic slave trade continuing in the early years of Texas. These trends would suggest that to be the case. Although the United States criminalized international slave trade in 1807, as an independent nation from 1836 to 1846, Texas was not subject to these restrictions. The heaviest use of the word African occurs from the years 1836 (the first year we have data for) to 1838, and then sporadically after that. The latest mention of an African runaway in Texas occurs in 1857, for a slave 38 years old. Presumably over time, the growing international abolition movement and the eventual annexation of Texas to the United States contributed to the disappearance of notices for runaway Africans.</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2796762/75804006-cc17-11e3-8170-4af6666883ec.png" alt="tx-african-screenshot" /><p class="caption">tx-african-screenshot</p>
</div>
<iframe width="395" height="560" src="http://voyant-tools.org/tool/TypeFrequenciesChart/?corpus=1398028860042.2957&docIdType=d1397967000233.76be2c7d-45ec-314d-199d-d2220e9971e2%3Aafrican&stopList=stop.en.taporware.txt&mode=document&limit=2&freqsMode=raw"></iframe>
<p>Additionally, “African” appears most frequently in Texas compared to the other states, and slightly more frequently in Mississippi than in Arkansas. This confirms class suspicions from close-reading that Texas had higher rates of Africans than the other states. We also speculated that Mississippi and Texas, with access to ocean ports, have higher rates of African slaves than landlocked Arkansas due to easier shipping trade routes.</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2796767/87ddfbc6-cc17-11e3-9d93-c56157d33125.png" alt="all-states-african-screenshot" /><p class="caption">all-states-african-screenshot</p>
</div>
<iframe width="395" height="560" src="http://voyant-tools.org/tool/TypeFrequenciesChart/?corpus=1398028124350.4052&stopList=stop.en.taporware.txt&type=african&mode=corpus"></iframe>
<p>Our explorations of the word “African” in Voyant exemplify the ways that digital text mining tools can not only help confirm hypotheses from close-reading but also raise previously unthought of questions for further close-reading. The graph peak of the early years encouraged us to read more closely into the history of international slave trade and abolitionism, and how these relate to the Texas timeline and position as an independent nation.</p>
<p>We also used Voyant to track variations in language descriptors by state. We found that the French word Griff(e) occurred much more frequently in Texas, followed by Mississippi, and not at all in Arkansas. (Just as a side note, this sometimes occurring “e” highlights the importance of checking for variations or abnormalities in spelling when conducting digital research.) Tracking the word “Griffe” alongside “Mulatto” reveals that while Texas and Mississippi have higher use of the word Griff(e), Arkansas has higher use of the word Mulatto. Additionally, Texas and Mississippi – the states where the French word “Griff” is used – also have higher occurrences of the word “French” suggesting a more significant presence of French people or the French language in these states. In Arkansas subscribers were more likely to prefer the term Mulatto to refer to someone of part white, part black ancestry, whereas in Texas and Mississippi they used both Mulatto and Griff(e).</p>
<div class="figure">
<img src="https://cloud.githubusercontent.com/assets/6466141/2796768/8de69852-cc17-11e3-9500-086826a53d52.png" alt="all-comparison-descriptors" /><p class="caption">all-comparison-descriptors</p>
</div>
<iframe width="395" height="560" src="http://voyant-tools.org/tool/TypeFrequenciesChart/?corpus=1398028124350.4052&stopList=stop.en.taporware.txt&type=mulatto&type=french&type=griff&type=griffe&mode=corpus"></iframe>
<p>Overall, Voyant has been a very useful tool for our historical project on Texas runaway ads. The majority of Voyant’s limitations are due to the fact that the software is still in beta mode. The platform can sometimes be a bit jerky, slow to load, or sometimes gets stuck loading. Scrolling through the tools is not always a smooth, straightforward process. Unfortunately, while the favorites feature is useful, this list can’t be saved for future use when generating a skin URL. Despite these hiccups, we recommend the tool for historians hoping to take a “distant reading” of their documents.</p>
<!--End content from the main markdown file for this page.-->
</article>
<p class="revision"><a href="https://github.com/ricedh/drafts/commits/master/02-voyant.md" title="version history on GitHub">revision history for this page </a></p>
<footer>
<!--Begin contents of _footer.html which are inserted using the include-after-body option in pandoc.-->
<!--Footer-->
<div class="splashthumb">
<a href="index.html"><img src="./digital-runawayad.png" /></a>
<p class="caption">Home</p>
</div>
<div class="splashthumb">
<a href="01-palladio.html"><img src="./01-palladio.png" /></a>
<p class="caption">Mapping points</p>
</div>
<div class="splashthumb">
<a href="02-voyant.html"><img src="./02-voyant.png" /></a>
<p class="caption">Text mining</p>
</div>
<div class="splashthumb">
<a href="03-ner.html"><img src="./03-ner.png" /></a>
<p class="caption">Finding locations</p>
</div>
<div class="splashthumb">
<a href="04-mallet.html"><img src="./04-mallet.png" /></a></p>
<p class="caption">Topic modelling</p>
</div>
<div class="splashthumb">
<a href="05-twitterbot.html"><img src="./05-twitterbot.png" /></a>
<p class="caption">Tweeting history</p>
</div>
<!--Statcounter-->
<script type="text/javascript">
var sc_project=2874620;
var sc_invisible=0;
var sc_partition=29;
var sc_security="cc89e61f";
</script>
<script type="text/javascript" src="http://www.statcounter.com/counter/counter_xhtml.js"></script>
<noscript><div class="statcounter"><a class="statcounter" href="http://www.statcounter.com/"><img class="statcounter" src="http://c30.statcounter.com/2874620/0/cc89e61f/0/" alt="invisible hit counter" /></a></div></noscript>
<!--End contents of _footer.html which are inserted using the include-after-body option in pandoc.-->
</footer>
<a rel="license" style="clear:both; display:block; padding-top:2em;" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><br />This work by <a xmlns:cc="http://creativecommons.org/ns#" href="http://ricedh.github.io" property="cc:attributionName" rel="cc:attributionURL">http://ricedh.github.io</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.<br />Permissions beyond the scope of this license may be available at <a xmlns:cc="http://creativecommons.org/ns#" href="http://github.com/ricedh/drafts" rel="cc:morePermissions">http://github.com/ricedh/drafts</a>.
</div>
<div class="span3"> </div>
</div>
</div>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</body>
</html>