Add TinyMS text module

Signed-off-by: leonwanghui <wanghui71leon@gmail.com>
tinyms-ai · Apr 1, 2021 · 6166d12 · 6166d12
1 parent 16ca574
commit 6166d12
Showing 7 changed files with 48 additions and 3 deletions.
diff --git a/docs/en/source/index.rst b/docs/en/source/index.rst
@@ -46,6 +46,7 @@ designed to providing quick-start guidelines for machine learning beginners.
    tinyms/tinyms.context
    tinyms/tinyms.data
    tinyms/tinyms.vision
+   tinyms/tinyms.text
    tinyms/tinyms.primitives
    tinyms/tinyms.layers
    tinyms/tinyms.model

diff --git a/docs/en/source/quickstart/install.md b/docs/en/source/quickstart/install.md
@@ -6,7 +6,7 @@
 
 For users who own a clean environment, it is recommended to use [pypi](https://pypi.org/) to install TinyMS given that the following requirements are meet. For those who don't, [Anaconda](https://www.anaconda.com/products/individual#Downloads) is a good choice for setting up the python environment.
 
-Prerequisites  
+Prerequisites
 
 - OS: `Ubuntu 18.04` or `Windows 10`
 - Python: `3.7.5`
@@ -49,7 +49,7 @@ docker run -it --net=host tinyms/tinyms:0.1.0-jupyter
 
 Open a browser on the local machine, type in
 
-```URL
+```
 <Your_external_IP_address>:8888
 ```
 

diff --git a/docs/en/source/tinyms/tinyms.text.rst b/docs/en/source/tinyms/tinyms.text.rst
@@ -0,0 +1,5 @@
+tinyms.text
+===========
+
+.. automodule:: tinyms.text
+   :members:
diff --git a/docs/zh_CN/source/index.rst b/docs/zh_CN/source/index.rst
@@ -43,6 +43,7 @@ Welcome to TinyMS's documentation!
    tinyms/tinyms.context
    tinyms/tinyms.data
    tinyms/tinyms.vision
+   tinyms/tinyms.text
    tinyms/tinyms.primitives
    tinyms/tinyms.layers
    tinyms/tinyms.model

diff --git a/docs/zh_CN/source/quickstart/install.md b/docs/zh_CN/source/quickstart/install.md
@@ -50,7 +50,7 @@ docker run -it --net=host tinyms/tinyms:0.1.0-jupyter
 
 在本地打开浏览器，输入
 
-```URL
+```
 <公网IP地址>:8888
 ```
 

diff --git a/docs/zh_CN/source/tinyms/tinyms.text.rst b/docs/zh_CN/source/tinyms/tinyms.text.rst
@@ -0,0 +1,5 @@
+tinyms.text
+===========
+
+.. automodule:: tinyms.text
+   :members:
diff --git a/tinyms/text/__init__.py b/tinyms/text/__init__.py
@@ -0,0 +1,33 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+This module is to support text processing for NLP tasks. It is a high performance
+NLP text processing module which is developed with ICU4C and cppjieba.
+"""
+from mindspore.dataset.text.transforms import Lookup, JiebaTokenizer, UnicodeCharTokenizer, Ngram, \
+    WordpieceTokenizer, TruncateSequencePair, \
+    ToNumber, SlidingWindow, SentencePieceTokenizer, PythonTokenizer
+from mindspore.dataset.text.utils import to_str, to_bytes, Vocab, SentencePieceVocab, SentencePieceModel, \
+    SPieceTokenizerOutType, SPieceTokenizerLoadType
+
+text_transform = ["Lookup", "JiebaTokenizer", "UnicodeCharTokenizer", "Ngram",
+                  "WordpieceTokenizer", "TruncateSequencePair",
+                  "ToNumber", "SlidingWindow", "SentencePieceTokenizer", "PythonTokenizer"]
+text_utils = ["to_str", "to_bytes", "Vocab", "SentencePieceVocab", "SentencePieceModel",
+              "SPieceTokenizerOutType", "SPieceTokenizerLoadType"]
+
+__all__ = []
+__all__.extend(text_transform)
+__all__.extend(text_utils)
-Original file line number
+Diff line change
@@ @@ -50,7 +50,7 @@ docker run -it --net=host tinyms/tinyms:0.1.0-jupyter @@
     在本地打开浏览器，输入
-    ```URL
+    ```
     <公网IP地址>:8888
     ```