|  | 
| 184 | 184 |       </example> | 
| 185 | 185 |     </example> | 
| 186 | 186 | 
 | 
|  | 187 | +    <member name="WordEmbeddings"> | 
|  | 188 | +      <summary> | 
|  | 189 | +        Word Embeddings transform is a text featurizer which converts vectors of text tokens into sentence vectors using a pre-trained model. | 
|  | 190 | +      </summary> | 
|  | 191 | +      <remarks> | 
|  | 192 | +        WordEmbeddings wrap different embedding models, such as GloVe. Users can specify which embedding to use.  | 
|  | 193 | +        The available options are various versions of <a href="https://nlp.stanford.edu/projects/glove/">GloVe Models</a>, <a href="https://en.wikipedia.org/wiki/FastText">FastText</a>, and <a href="http://anthology.aclweb.org/P/P14/P14-1146.pdf">Sswe</a>. | 
|  | 194 | +        <para> | 
|  | 195 | +          Note: As WordEmbedding requires a column with text vector, e.g. %3C'This', 'is', 'good'%3E, users need to create an input column by: | 
|  | 196 | +          <list type="bullet"> | 
|  | 197 | +          <item><description>concatenating columns with TX type,</description></item> | 
|  | 198 | +            <item> | 
|  | 199 | +              <description>or using the output_tokens=True for NGramFeaturizer() to convert a column with sentences like "This is good" into %3C'This', 'is', 'good' %3E.  | 
|  | 200 | +              The column for the output token column is renamed with a prefix of '_TranformedText'.</description> | 
|  | 201 | +            </item> | 
|  | 202 | +        </list> | 
|  | 203 | +          In the following example, after the NGramFeaturizer, features named ngram.__ are generated. A new column named ngram_TransformedText is | 
|  | 204 | +          also created with the text vector, similar as running .split(' '). However, due to the variable length of this column it cannot be properly | 
|  | 205 | +          converted to pandas dataframe, thus any pipelines/transforms output this text vector column will throw errors. However, we use  | 
|  | 206 | +          ngram_TransformedText as the input to WordEmbedding, the ngram_TransformedText column will be overwritten by the output from  | 
|  | 207 | +          WordEmbedding. The output from WordEmbedding is named ngram_TransformedText.__ | 
|  | 208 | +        </para> | 
|  | 209 | +      </remarks> | 
|  | 210 | +    </member> | 
|  | 211 | +    <example name="WordEmbeddings"> | 
|  | 212 | +      <example> | 
|  | 213 | +        <code language="csharp"> | 
|  | 214 | +          pipeline.Add(new WordEmbeddings(("InTextCol" , "OutTextCol"))); | 
|  | 215 | +        </code> | 
|  | 216 | +      </example> | 
|  | 217 | +    </example> | 
|  | 218 | + | 
| 187 | 219 |   </members> | 
| 188 | 220 | </doc> | 
0 commit comments