Topic Model of Deported Migrant Children's Narratives
This script is part of a larger joint research project studying the consequences of United States’ immigration and refugee policies on children. Researchers interviewed 203 children who had been deported to Mexico; interviews were conducted in Spanish by local graduate students. Questions covered demographics, immigration, background, experience with authorities, and others. The children also had the opportunity to provide open ended responses about their future aspirations or other topics they felt like sharing. These open ended responses are what were analyzed in this project.
Latent Dirichlet Allocation (LDA) is an unsupervised machine learning algorithm, which was used to identify clusters of words in the survey responses. These clusters were then interpreted by researchers for “topics” shared by these deported children. The scripts analyze responses and create three outputs: a text file with the five topics and their top 50 words, a frequency chart of the topic models, and a word cloud image file.
*The products and outputs from the analysis are provided, but the original survey has been excluded for privacy reasons.
Andreas Mueller. word_cloud, https://github.com/amueller/word_cloud, (2012)
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825-2830 (2011) http://jmlr.org/papers/v12/pedregosa11a.html
John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55 http://aip.scitation.org/doi/abs/10.1109/MCSE.2007.55
Lda Developers. lda: Topic modeling with latent Dirichlet Allocation, http://pythonhosted.org/lda/index.html#, (2014)
Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37 http://aip.scitation.org/doi/abs/10.1109/MCSE.2011.37
Steven Bird, Ewan Klein, Edward Loper. NLTK: Natural Language Processing with Python, Natural Language Processing with Python, O’Reily Media, (2009), http://www.nltk.org/