Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for parallel executions. Also fixed a bug in enhanced semantic roles. #422

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
588b830
Started Keith's parallel implementation
MihaiSurdeanu Sep 12, 2020
a57592d
Added cloneBuilder()
MihaiSurdeanu Sep 13, 2020
9ae1135
Added THreadable
MihaiSurdeanu Sep 13, 2020
aaa46b7
added clone() everywhere. does not compile
MihaiSurdeanu Sep 16, 2020
0e910b5
Added support for parallelism. Fixed a few bugs in the enhanced seman…
MihaiSurdeanu Sep 16, 2020
fbb43bd
CHANGES
MihaiSurdeanu Sep 16, 2020
61b5e54
CHANGES
MihaiSurdeanu Sep 16, 2020
881fc90
fixed comment
MihaiSurdeanu Sep 16, 2020
46a4086
Merge branch 'master' into parallel
MihaiSurdeanu Sep 16, 2020
667bec3
Empty copy() to this
kwalcock Sep 16, 2020
57cf778
clone returns this for Greedy
MihaiSurdeanu Sep 16, 2020
5191664
clone for viterbi
MihaiSurdeanu Sep 16, 2020
d28b702
Match other changes
kwalcock Sep 17, 2020
6106001
Update fatdynet version
kwalcock Sep 17, 2020
e566d27
Remove comments
kwalcock Sep 17, 2020
47f0310
Merge pull request #423 from clulab/kwalcock-parallel
kwalcock Sep 17, 2020
8473aba
Fixed bug in issue #424
MihaiSurdeanu Sep 19, 2020
1a51124
Merge branch 'parallel' of https://github.com/clulab/processors into …
MihaiSurdeanu Sep 19, 2020
649d097
Merge branch 'master' into parallel
kwalcock Oct 1, 2020
55e8c3d
Modify example program
kwalcock Oct 5, 2020
fdeefe8
Merge branch 'master' into kwalcock-parallel
kwalcock Oct 6, 2020
da1c96c
Merge branch 'master' into parallel
kwalcock Nov 6, 2020
8c8e7b6
Merge branch 'kwalcock-parallel' into parallel
kwalcock Nov 6, 2020
0a1d842
Fix merge
kwalcock Nov 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Changes
+ **8.3.0** - Added support for proper multi-threaded execution of CluProcessor by cloning parameters in each thread.
+ **8.2.3** - Do not allow loops in enhanced semantic roles.
+ **8.2.2** - Bug fix: we now guarantee that the SRL graph has the same number of nodes as the words in the sentence.
+ **8.2.2** - Bug fixes in the generation of enhanced semantic roles.
+ **8.2.1** - Improvements in the generation of enhanced semantic roles.
+ **8.2.1** - Bug fix in LexiconNER: we were ignoring case information before.
+ **8.2.1** - Improvements to the generation of enhanced semantic roles.
+ **8.2.0** - Added simple lexicon-based sentiment analyzer, using Bing Liu's lexicon.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ object ProcessorShell extends App {
case ":clu" =>
reader.setPrompt("(clu)>>> ")
println("Preparing CluProcessor...\n")
Utils.initializeDyNet()
Utils.initializeDyNet(train = false)
proc = clu
proc.annotate("initialize me!")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ import org.clulab.dynet.Utils
import org.clulab.processors.Document
import org.clulab.processors.Processor
import org.clulab.processors.clu.CluProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.processors.fastnlp.FastNLPProcessorWithSemanticRoles
import org.clulab.serialization.DocumentSerializer

import scala.collection.parallel.ForkJoinTaskSupport
Expand Down Expand Up @@ -96,7 +98,7 @@ object ParallelProcessorExample {
forkJoinPoolConstructor.newInstance(threads.asInstanceOf[Integer])

// For the record, this is the standard version
//new ForkJoinPool(threads)
// new ForkJoinPool(threads)
}

val forkJoinPool = newForkJoinPool(threads)
Expand All @@ -113,22 +115,33 @@ object ParallelProcessorExample {
val outputDir = args(1)
val extension = args(2)
val threads = args(3).toInt
val parallel = true

val files = findFiles(inputDir, extension)
val sortedFiles = files.sortBy(-_.length)
// Parallelizing by file results in a quick crash.
val parFiles = parallelize(files, threads)
val parFiles = parallelize(sortedFiles, threads)

Utils.initializeDyNet()
val startupTimer = new Timer("This is how long it takes to start up")
startupTimer.start()

Utils.initializeDyNet(train = !parallel)

val processor: Processor = new CluProcessor()
// val processor: Processor = new FastNLPProcessor()
// val processor: Processor = new FastNLPProcessorWithSemanticRoles()

val documentSerializer = new DocumentSerializer

val untimed = processor.annotate("I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.")
startupTimer.stop()
println(startupTimer.toString)


val timer = new Timer(s"$threads threads processing ${parFiles.size} files")
timer.start()

parFiles.foreach { file =>
(if (parallel) parFiles else files).foreach { file =>
println(s"Processing ${file.getName}...")

val text = {
Expand All @@ -140,7 +153,15 @@ object ParallelProcessorExample {
}

val outputFile = new File(outputDir + "/" + file.getName)
val document = processor.annotate(text)
val document = try {
val document = processor.annotate(text)
document
}
catch {
case throwable: Throwable =>
println(s"Threw exception for ${file.getName}")
throw throwable
}
val printedDocument = {
val stringWriter = new StringWriter
val printWriter = new PrintWriter(stringWriter)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class FastNLPProcessorWithSemanticRoles(tokenizerPostProcessor:Option[TokenizerS

/** Used for SRL */
lazy val cluProcessor = {
Utils.initializeDyNet()
Utils.initializeDyNet(train = false)
new CluProcessor()
}

Expand Down
2 changes: 1 addition & 1 deletion main/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ libraryDependencies ++= {
// for machine learning
"de.bwaldvogel" % "liblinear" % "2.30",
"tw.edu.ntu.csie" % "libsvm" % "3.23",
"org.clulab" %% "fatdynet" % "0.2.5", // "0-cuda.2.6-SNAPSHOT", // "0.2.5"
"org.clulab" %% "fatdynet" % "0.2.6", // "0-cuda.2.6-SNAPSHOT", // "0.2.5"

// NLP tools used by CluProcessor
"org.antlr" % "antlr4-runtime" % "4.6", // for tokenization
Expand Down
2 changes: 1 addition & 1 deletion main/src/main/scala/org/clulab/dynet/CnnExample.scala
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import edu.cmu.dynet.{Dim, Expression, ParameterCollection, UnsignedVector}
// https://github.com/neubig/nn4nlp-code/blob/970d91a51664b3d91a9822b61cd76abea20218cb/05-cnn/cnn-class.py#L45
//
object CnnExample extends App {
Utils.initializeDyNet()
Utils.initializeDyNet(train = true)
val pc = new ParameterCollection()

val embSize = 3
Expand Down
2 changes: 1 addition & 1 deletion main/src/main/scala/org/clulab/dynet/CoNLLSRLToMetal.scala
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ object CoNLLSRLToMetal {

def main(args: Array[String]): Unit = {
assert(args.length == 2)
Utils.initializeDyNet()
Utils.initializeDyNet(train = false)

val file = new File(args(0))
val reader = new CoNLLSRLToMetal
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ class ConstEmbeddingsGlove(matrixResourceName: String, isResource:Boolean = true

override def dim: Int = lookupParameters.dim().get(0).toInt

/** Read-only access to the embedding for this word */
def get(word:String): Expression = {
if(w2i.contains(word)) {
Expression.constLookup(lookupParameters, w2i(word))
Expand Down Expand Up @@ -70,7 +71,7 @@ object ConstEmbeddingsGlove {

def apply(matrixResourceName: String, isResource: Boolean): ConstEmbeddingsGlove = {

DyNetSync.synchronized {
this.synchronized {
// these objects are read-only and they use a lot of RAM, so let's reuse them if they exist
if(SINGLETON.isEmpty) {
SINGLETON = Some(new ConstEmbeddingsGlove(matrixResourceName, isResource))
Expand Down
6 changes: 0 additions & 6 deletions main/src/main/scala/org/clulab/dynet/DyNetSync.scala

This file was deleted.

55 changes: 31 additions & 24 deletions main/src/main/scala/org/clulab/dynet/EmbeddingLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,33 +17,40 @@ import scala.util.Random
* This layer takes a sequence of words and produces a sequence of Expression that stores the words' full embeddings
* @author Mihai
*/
class EmbeddingLayer (val parameters:ParameterCollection,
val w2i:Map[String, Int], // word to index
val w2f:Counter[String], // word to frequency
val c2i:Map[Char, Int], // character to index
val tag2i:Option[Map[String, Int]], // POS tag to index
val ne2i:Option[Map[String, Int]], // NE tag to index
val learnedWordEmbeddingSize: Int, // size of the learned word embedding
val charEmbeddingSize: Int, // size of the character embedding
val charRnnStateSize: Int, // size of each one of the char-level RNNs
val posTagEmbeddingSize: Int, // size of the POS tag embedding
val neTagEmbeddingSize: Int, // size of the NE tag embedding
val distanceEmbeddingSize: Int,
val distanceWindowSize: Int, // window considered for distance values (relative to predicate)
val positionEmbeddingSize: Int,
val useIsPredicate: Boolean, // if true, add a Boolean bit to indicate if current word is the predicate
val wordLookupParameters:LookupParameter,
val charLookupParameters:LookupParameter,
val charFwRnnBuilder:RnnBuilder, // RNNs for the character representation
val charBwRnnBuilder:RnnBuilder,
val posTagLookupParameters:Option[LookupParameter],
val neTagLookupParameters:Option[LookupParameter],
val distanceLookupParameters:Option[LookupParameter],
val positionLookupParameters:Option[LookupParameter],
val dropoutProb: Float) extends InitialLayer {
case class EmbeddingLayer (parameters:ParameterCollection,
w2i:Map[String, Int], // word to index
w2f:Counter[String], // word to frequency
c2i:Map[Char, Int], // character to index
tag2i:Option[Map[String, Int]], // POS tag to index
ne2i:Option[Map[String, Int]], // NE tag to index
learnedWordEmbeddingSize: Int, // size of the learned word embedding
charEmbeddingSize: Int, // size of the character embedding
charRnnStateSize: Int, // size of each one of the char-level RNNs
posTagEmbeddingSize: Int, // size of the POS tag embedding
neTagEmbeddingSize: Int, // size of the NE tag embedding
distanceEmbeddingSize: Int,
distanceWindowSize: Int, // window considered for distance values (relative to predicate)
positionEmbeddingSize: Int,
useIsPredicate: Boolean, // if true, add a Boolean bit to indicate if current word is the predicate
wordLookupParameters:LookupParameter,
charLookupParameters:LookupParameter,
charFwRnnBuilder:RnnBuilder, // RNNs for the character representation
charBwRnnBuilder:RnnBuilder,
posTagLookupParameters:Option[LookupParameter],
neTagLookupParameters:Option[LookupParameter],
distanceLookupParameters:Option[LookupParameter],
positionLookupParameters:Option[LookupParameter],
dropoutProb: Float) extends InitialLayer {

lazy val constEmbedder: ConstEmbeddings = ConstEmbeddingsGlove()

override def clone(): EmbeddingLayer = {
copy(
charFwRnnBuilder = cloneBuilder(charFwRnnBuilder),
charBwRnnBuilder = cloneBuilder(charBwRnnBuilder),
)
}

override def forward(sentence: AnnotatedSentence,
doDropout: Boolean): ExpressionVector = {
setCharRnnDropout(doDropout)
Expand Down
4 changes: 3 additions & 1 deletion main/src/main/scala/org/clulab/dynet/FinalLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package org.clulab.dynet

import edu.cmu.dynet.{Expression, ExpressionVector}

trait FinalLayer extends Saveable {
trait FinalLayer extends Saveable with Cloneable {
def forward(inputExpressions: ExpressionVector,
headPositionsOpt: Option[IndexedSeq[Int]],
doDropout: Boolean): ExpressionVector
Expand All @@ -15,4 +15,6 @@ trait FinalLayer extends Saveable {
def inference(emissionScores: Array[Array[Float]]): IndexedSeq[String]

def inferenceWithScores(emissionScores: Array[Array[Float]]): IndexedSeq[IndexedSeq[(String, Float)]]

override def clone(): FinalLayer = ???
}
26 changes: 15 additions & 11 deletions main/src/main/scala/org/clulab/dynet/ForwardLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,20 @@ import org.clulab.dynet.Utils.{ByLineIntBuilder, fromIndexToString, mkTransition
import org.clulab.struct.Counter
import org.clulab.utils.Configured

abstract class ForwardLayer (val parameters:ParameterCollection,
val inputSize: Int,
val isDual: Boolean,
val t2i: Map[String, Int],
val i2t: Array[String],
val H: Parameter,
val rootParam: Parameter,
val nonlinearity: Int,
val dropoutProb: Float)
extends FinalLayer {
abstract class ForwardLayer extends FinalLayer {

//
// all these accessor methods will be redefined as vals in the children classes
//
def parameters:ParameterCollection
def inputSize: Int
def isDual: Boolean
def t2i: Map[String, Int]
def i2t: Array[String]
def H: Parameter
def rootParam: Parameter
def nonlinearity: Int
def dropoutProb: Float

def forward(inputExpressions: ExpressionVector,
headPositionsOpt: Option[IndexedSeq[Int]],
Expand Down Expand Up @@ -91,7 +95,7 @@ abstract class ForwardLayer (val parameters:ParameterCollection,
}

object ForwardLayer {
val logger: Logger = LoggerFactory.getLogger(classOf[ViterbiForwardLayer])
val logger: Logger = LoggerFactory.getLogger(classOf[ForwardLayer])

val DEFAULT_DROPOUT_PROB: Float = Utils.DEFAULT_DROPOUT_PROBABILITY

Expand Down
24 changes: 13 additions & 11 deletions main/src/main/scala/org/clulab/dynet/GreedyForwardLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,23 @@ import java.io.PrintWriter

import edu.cmu.dynet.{Dim, Expression, ExpressionVector, Parameter, ParameterCollection}
import org.clulab.dynet.ForwardLayer.TYPE_GREEDY
import org.clulab.dynet.Utils.{ByLineFloatBuilder, ByLineIntBuilder, ByLineStringMapBuilder, fromIndexToString, save}
import org.clulab.dynet.Utils.{ByLineFloatBuilder, ByLineIntBuilder, ByLineStringMapBuilder, cloneBuilder, fromIndexToString, save}
import ForwardLayer._

import scala.collection.mutable.ArrayBuffer

class GreedyForwardLayer (parameters:ParameterCollection,
inputSize: Int,
isDual: Boolean,
t2i: Map[String, Int],
i2t: Array[String],
H: Parameter,
rootParam: Parameter,
nonlinearity: Int,
dropoutProb: Float)
extends ForwardLayer(parameters, inputSize, isDual, t2i, i2t, H, rootParam, nonlinearity, dropoutProb) {
case class GreedyForwardLayer (override val parameters:ParameterCollection,
override val inputSize: Int,
override val isDual: Boolean,
override val t2i: Map[String, Int],
override val i2t: Array[String],
override val H: Parameter,
override val rootParam: Parameter,
override val nonlinearity: Int,
override val dropoutProb: Float)
extends ForwardLayer {

override def clone(): GreedyForwardLayer = this

override def loss(finalStates: ExpressionVector, goldLabelStrings: IndexedSeq[String]): Expression = {
val goldLabels = Utils.toIds(goldLabelStrings, t2i)
Expand Down
4 changes: 3 additions & 1 deletion main/src/main/scala/org/clulab/dynet/InitialLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ import edu.cmu.dynet.ExpressionVector
/**
* First layer that occurs in a sequence modeling architecture: goes from words to Expressions
*/
trait InitialLayer extends Saveable {
trait InitialLayer extends Saveable with Cloneable {
def forward(sentence: AnnotatedSentence,
doDropout: Boolean): ExpressionVector

def outDim:Int // output dimension

override def clone(): InitialLayer = ???
}
4 changes: 3 additions & 1 deletion main/src/main/scala/org/clulab/dynet/IntermediateLayer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@ import edu.cmu.dynet.ExpressionVector
/**
* Intermediate layer in a sequence modeling architecture: goes from ExpressionVector to ExpressionVector
*/
trait IntermediateLayer extends Saveable {
trait IntermediateLayer extends Saveable with Cloneable {
def forward(inputExpressions: ExpressionVector,
doDropout: Boolean): ExpressionVector

def inDim: Int
def outDim: Int

override def clone(): IntermediateLayer = ???
}
Loading