Test corpus for Glossa (http://github.com/noklesta/glossa_svn) consisting of the English version of the EU constitution, copied from http://urd.let.rug.nl/tiedeman/OPUS/.
- A Glossa installation, of course. See http://github.com/noklesta/glossa_svn.
- The finished corpus is included in this repository and does not need to be re-created,
but it can nevertheless be accomplished by running the
create_corpus.rb
ruby script (after settingPATH
in the registry file tocorpus_data/TEST
). In that case you will need the nokogiri ruby gem (sudo gem install nokogiri
).
- Copy the
test
directory that is found in theconf
directory to the location where you want to keep your configuration files. This location should be specified in the filecgi-bin/glossa/paths.conf
in your Glossa installation. In other words, if theconf
parameter given inpaths.conf
is/opt/glossa/conf/
, you should now end up with/opt/glossa/conf/test/cgi.conf
. - Edit the
cgi.conf
file to specify the following:- username and password to your Glossa database, as well as the database name and the host it is running on
htmlRoot
(the root URL for HTML, PHP, and JavaScript files) andcgiRoot
(the root URL for CGI files)- the path to your registry directory
- the location that will be used for exports of search results (
dat_files
). Since the exported files will be served from here, it needs to be located below the document root of your web server. config_dir
, which should be set to the same as theconf
parameter inpaths.conf
- Copy the
TEST
directory found in thecorpus_data
directory to the location where you want to store corpus data. - Copy the file
corpus_registry/test
to your CWB registry location and edit thePATH
line in this file to point to theTEST
directory. - Copy the contents of the
html
directory to thehtml
directory belowhtmlRoot
in your Glossa installation. - Copy the contents of the
js
directory to thejs
directory belowhtmlRoot
in your Glossa installation. Edittest.conf.js
and set thehtmlRoot
andcgiRoot
variables to the same values as in thecgi.conf
file.
Please cite the following article if you use any part of the corpus in your own work:
Jörg Tiedemann, Lars Nygaard, 2004, The OPUS corpus - parallel & free. In Proceedings of the
Fourth International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal