From ee9cf8fd448603d0cd7442068f6eb01f870e4ebf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Moritz=20M=C3=A4hr?= Date: Wed, 18 Dec 2019 13:40:32 +0100 Subject: [PATCH] zenodo dataset added and urls to ilo fixed #258 --- lessons/working-with-batches-of-pdf-files.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/lessons/working-with-batches-of-pdf-files.md b/lessons/working-with-batches-of-pdf-files.md index 15f3ecea5..f9190381e 100644 --- a/lessons/working-with-batches-of-pdf-files.md +++ b/lessons/working-with-batches-of-pdf-files.md @@ -121,7 +121,7 @@ If you are on a Mac and receive an error message that the file is from an “uni ## Data -Throughout this lesson you will work with historical documents from the [1st International Conference of Labour Statisticians](https://www.ilo.org/global/statistics-and-databases/meetings-and-events/international-conference-of-labour-statisticians/WCMS_221512/lang--en/index.htm) from 1923. The data of all past conferences is provided by the [International Labour Organization (ILO)](https://www.ilo.org/global/about-the-ilo/history/lang--en/index.htm) and is [publicly available](https://www.ilo.org/public/libdoc/ilo/ILO-SR/). +Throughout this lesson you will work with historical documents from the [1st International Conference of Labour Statisticians](https://ilostat.ilo.org/resources/methods/icls/icls-documents/) from 1923. The data of all past conferences is provided by the [International Labour Organization (ILO)](https://www.ilo.org/global/about-the-ilo/history/lang--en/index.htm) and is [publicly available](https://www.ilo.org/public/libdoc/ilo/ILO-SR/). To make it easier for you to navigate through the file system and create folders, here are some basic commands of the Bash Command Line: @@ -163,7 +163,7 @@ Always make a backup copy of your data before using the commands in this course. # Assessing your PDF(s) -In order to make this lesson as realistic as possible, you will be guided by a concrete historical case study. The study draws on the extensive collection of the [International Labour Organization (ILO)](https://www.ilo.org/global/about-the-ilo/history/lang--en/index.htm), in particular the sources of the [1st International Conference of Labour Statisticians](https://www.ilo.org/global/statistics-and-databases/meetings-and-events/international-conference-of-labour-statisticians/WCMS_221512/lang--en/index.htm). +In order to make this lesson as realistic as possible, you will be guided by a concrete historical case study. The study draws on the extensive collection of the [International Labour Organization (ILO)](https://ilostat.ilo.org/resources/methods/icls/icls-documents/), in particular the sources of the 1st International Conference of Labour Statisticians. You are interested in what topics were discussed by the labour statisticians. For this purpose you would like to analyze all available documents of this conference using Topic Modelling. This assumes that all documents are available in plain text. @@ -284,6 +284,11 @@ Now that you have performed all the steps of the PDF processing on some examples 3. Create the Topic Model. 4. Evaluate the Topic Model. +
+ Both the download and the processing of the corpus is very time and resource consuming. At doi.org/10.5281/zenodo.3582736 you can download the collection as a ZIP file and go directly to step + 3. +
+ ### Download the corpus To avoid confusion create a new folder with `mkdir` and open it with `cd`.