Using AWS comprehend as source of models #233
-
First of all, Thank you so much for open-sourcing this library. I am planning to leverage this library and tried to invoke it from AWS lambda. As called out in the README, when using Stanford NLP models the memory size goes north of 400MB crossing the permitted limits from AWS Lambda end. I tried to use AWS comprehend as suggested. Below is how my
I still get an RTE that model files are missing. Below is the stacktrace. Could you please help resolve this issue.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
@AdityaReddyY In retrospect, I don't this library should support AWS comprehend since it will add additional cost. We can use the CoreNLP library if we remove all of the bloat. For example, I removed unused dependencies and used the shade plugin to remove the unused packages like this: <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<maven.compiler.release>11</maven.compiler.release>
</properties>
<dependencies>
<dependency>
<groupId>io.whelk.flesch.kincaid</groupId>
<artifactId>whelk-flesch-kincaid</artifactId>
<version>0.1.6</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.3.2</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>4.3.2</version>
<classifier>models</classifier>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.11.4</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.example.demo.Example</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>edu/stanford/nlp/models/coref/*</exclude>
<exclude>edu/stanford/nlp/models/coref/fastneural/*</exclude>
<exclude>edu/stanford/nlp/models/coref/neural/*</exclude>
<exclude>edu/stanford/nlp/models/coref/statistical/*</exclude>
<exclude>edu/stanford/nlp/models/ner/*</exclude>
<exclude>edu/stanford/nlp/models/dcoref/*</exclude>
<exclude>edu/stanford/nlp/models/gender/*</exclude>
<exclude>edu/stanford/nlp/models/kbp/english/*</exclude>
<exclude>edu/stanford/nlp/models/kbp/english/gazetteers/*</exclude>
<exclude>edu/stanford/nlp/models/kbp/english/semgrex/*</exclude>
<exclude>edu/stanford/nlp/models/kbp/english/tokensregex/*</exclude>
<exclude>edu/stanford/nlp/models/lexparser/*</exclude>
<exclude>edu/stanford/nlp/models/naturalli/*</exclude>
<exclude>edu/stanford/nlp/models/naturalli/affinities/*</exclude>
<exclude>edu/stanford/nlp/models/parser/nndep/*</exclude>
<exclude>edu/stanford/nlp/models/quoteattribution/*</exclude>
<exclude>edu/stanford/nlp/models/sentiment/*</exclude>
<exclude>edu/stanford/nlp/models/supervised_relation_extractor/*</exclude>
<exclude>edu/stanford/nlp/models/sutime/*</exclude>
<exclude>edu/stanford/nlp/models/truecase/*</exclude>
<exclude>edu/stanford/nlp/models/ud/*</exclude>
<exclude>edu/stanford/nlp/models/upos/*</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>
</plugins>
</build>
</project> This resulted in a jar ~16mb in size, well within the 250 upper limit of AWS lambdas. Hope this helps! |
Beta Was this translation helpful? Give feedback.
@AdityaReddyY In retrospect, I don't this library should support AWS comprehend since it will add additional cost. We can use the CoreNLP library if we remove all of the bloat. For example, I removed unused dependencies and used the shade plugin to remove the unused packages like this: