Skip to content
forked from jweese/thrax

Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation

License

Notifications You must be signed in to change notification settings

joshua-decoder/thrax

This branch is 17 commits ahead of jweese/thrax:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0d766be · Dec 1, 2016
Nov 12, 2016
Sep 21, 2015
Apr 6, 2013
Nov 29, 2016
Aug 15, 2016
Sep 21, 2015
Mar 23, 2013
Sep 21, 2015
Jan 27, 2011
Mar 28, 2013
Mar 28, 2013
Sep 21, 2015
Mar 9, 2012

Repository files navigation

Thrax uses Apache hadoop (an open-source implementation of MapReduce) to
efficiently extract a synchronous context-free grammar translation model
for use in modern machine translation systems.

Thrax currently has support for both Hiero-style grammars (with a single
non-terminal symbol) and SAMT-style grammars (where non-terminal symbols are
calculated by projecting onto the span from a target-side parse tree).

COMPILING:

First, you need to set two environment variables:
$HADOOP should point to the directory where Hadoop is installed.
$AWS_SDK should point to the directory where the Amazon Web Services SDK
is installed.

To compile, type

    ant

This will compile all classes and package them into a jar for use on a 
Hadoop cluster.

At the end of the compilation, ant should report that the build was successful.

RUNNING THRAX:
Thrax can be invoked with

    hadoop jar $THRAX/bin/thrax.jar <configuration file>

Some example configuration files have been included with this distribution:

    example/hiero.conf
    example/samt.conf

COPYRIGHT AND LICENSE:
Copyright (c) 2010-13 by the Thrax team:
    Jonny Weese <jonny@cs.jhu.edu>
    Juri Ganitkevitch <juri@cs.jhu.edu>

See LICENSE.txt (included with this distribution) for the complete terms.

About

Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 84.6%
  • JavaScript 14.6%
  • Other 0.8%