|
| 1 | +Using the API |
| 2 | +============= |
| 3 | + |
| 4 | +The `Google Natural Language`_ API can be used to reveal the |
| 5 | +structure and meaning of text via powerful machine |
| 6 | +learning models. You can use it to extract information about |
| 7 | +people, places, events and much more, mentioned in text documents, |
| 8 | +news articles or blog posts. You can use it to understand |
| 9 | +sentiment about your product on social media or parse intent from |
| 10 | +customer conversations happening in a call center or a messaging |
| 11 | +app. You can analyze text uploaded in your request or integrate |
| 12 | +with your document storage on Google Cloud Storage. |
| 13 | + |
| 14 | +.. warning:: |
| 15 | + |
| 16 | + This is a Beta release of Google Cloud Natural Language API. This |
| 17 | + API is not intended for real-time usage in critical applications. |
| 18 | + |
| 19 | +.. _Google Natural Language: https://cloud.google.com/natural-language/docs/getting-started |
| 20 | + |
| 21 | +Client |
| 22 | +------ |
| 23 | + |
| 24 | +:class:`~gcloud.language.client.Client` objects provide a |
| 25 | +means to configure your application. Each instance holds |
| 26 | +both a ``project`` and an authenticated connection to the |
| 27 | +Natural Language service. |
| 28 | + |
| 29 | +For an overview of authentication in ``gcloud-python``, see |
| 30 | +:doc:`gcloud-auth`. |
| 31 | + |
| 32 | +Assuming your environment is set up as described in that document, |
| 33 | +create an instance of :class:`~gcloud.language.client.Client`. |
| 34 | + |
| 35 | + .. code-block:: python |
| 36 | +
|
| 37 | + >>> from gcloud import language |
| 38 | + >>> client = language.Client() |
| 39 | +
|
| 40 | +By default the ``language`` is ``'en'`` and the ``encoding`` is |
| 41 | +UTF-8. To over-ride these values: |
| 42 | + |
| 43 | + .. code-block:: python |
| 44 | +
|
| 45 | + >>> client = language.Client(language='es', |
| 46 | + ... encoding=encoding=language.Encoding.UTF16) |
| 47 | +
|
| 48 | +The encoding can be one of |
| 49 | +:attr:`Encoding.UTF8 <gcloud.language.document.Encoding.UTF8>`, |
| 50 | +:attr:`Encoding.UTF16 <gcloud.language.document.Encoding.UTF16>`, or |
| 51 | +:attr:`Encoding.UTF32 <gcloud.language.document.Encoding.UTF32>`. |
| 52 | + |
| 53 | +Methods |
| 54 | +------- |
| 55 | + |
| 56 | +The Google Natural Language API has three supported methods |
| 57 | + |
| 58 | +- `analyzeEntities`_ |
| 59 | +- `analyzeSentiment`_ |
| 60 | +- `annotateText`_ |
| 61 | + |
| 62 | +and each method uses a `Document`_ for representing text. To |
| 63 | +create a :class:`~gcloud.language.document.Document`, |
| 64 | + |
| 65 | + .. code-block:: python |
| 66 | +
|
| 67 | + >>> text_content = ( |
| 68 | + ... 'Google, headquartered in Mountain View, unveiled the ' |
| 69 | + ... 'new Android phone at the Consumer Electronic Show. ' |
| 70 | + ... 'Sundar Pichai said in his keynote that users love ' |
| 71 | + ... 'their new Android phones.') |
| 72 | + >>> document = client.document_from_text(text_content) |
| 73 | +
|
| 74 | +By using :meth:`~gcloud.language.client.Client.document_from_text`, |
| 75 | +the document's type is plain text: |
| 76 | + |
| 77 | + .. code-block:: python |
| 78 | +
|
| 79 | + >>> document.doc_type == language.Document.PLAIN_TEXT |
| 80 | + True |
| 81 | +
|
| 82 | +In addition, the document's language defaults to the language on |
| 83 | +the client |
| 84 | + |
| 85 | + .. code-block:: python |
| 86 | +
|
| 87 | + >>> document.language |
| 88 | + 'en' |
| 89 | + >>> document.language == client.language |
| 90 | + True |
| 91 | +
|
| 92 | +In addition, the |
| 93 | +:meth:`~gcloud.language.client.Client.document_from_html`, |
| 94 | +factory can be used to created an HTML document. In this |
| 95 | +method and the from text method, the language can be |
| 96 | +over-ridden: |
| 97 | + |
| 98 | + .. code-block:: python |
| 99 | +
|
| 100 | + >>> html_content = """\ |
| 101 | + ... <html> |
| 102 | + ... <head> |
| 103 | + ... <title>El Tiempo de las Historias</time> |
| 104 | + ... </head> |
| 105 | + ... <body> |
| 106 | + ... <p>La vaca saltó sobre la luna.</p> |
| 107 | + ... </body> |
| 108 | + ... </html> |
| 109 | + ... """ |
| 110 | + >>> document = client.document_from_html(html_content, |
| 111 | + ... language='es') |
| 112 | +
|
| 113 | +The ``language`` argument can be either ISO-639-1 or BCP-47 language |
| 114 | +codes; at the time, only English, Spanish, and Japanese `are supported`_. |
| 115 | +However, the ``analyzeSentiment`` method `only supports`_ English text. |
| 116 | + |
| 117 | +.. _are supported: https://cloud.google.com/natural-language/docs/ |
| 118 | +.. _only supports: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment#body.request_body.FIELDS.document |
| 119 | + |
| 120 | +The document type (``doc_type``) value can be one of |
| 121 | +:attr:`Document.PLAIN_TEXT <gcloud.language.document.Document.PLAIN_TEXT>` or |
| 122 | +:attr:`Document.HTML <gcloud.language.document.Document.HTML>`. |
| 123 | + |
| 124 | +In addition to supplying the text / HTML content, a document can refer |
| 125 | +to content stored in `Google Cloud Storage`_. We can use the |
| 126 | +:meth:`~gcloud.language.client.Client.document_from_blob` method: |
| 127 | + |
| 128 | + .. code-block:: python |
| 129 | +
|
| 130 | + >>> document = client.document_from_blob(bucket='my-text-bucket', |
| 131 | + ... blob='sentiment-me.txt') |
| 132 | + >>> document.gcs_url |
| 133 | + 'gs://my-text-bucket/sentiment-me.txt' |
| 134 | + >>> document.doc_type == language.Document.PLAIN_TEXT |
| 135 | + True |
| 136 | +
|
| 137 | +and the :meth:`~gcloud.language.client.Client.document_from_uri` |
| 138 | +method. In either case, the document type can be specified with |
| 139 | +the ``doc_type`` argument: |
| 140 | + |
| 141 | + .. code-block:: python |
| 142 | +
|
| 143 | + >>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt' |
| 144 | + >>> document = client.document_from_uri( |
| 145 | + ... gcs_url, doc_type=language.Document.HTML) |
| 146 | + >>> document.gcs_url == gcs_url |
| 147 | + True |
| 148 | + >>> document.doc_type == language.Document.HTML |
| 149 | + True |
| 150 | +
|
| 151 | +.. _analyzeEntities: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeEntities |
| 152 | +.. _analyzeSentiment: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment |
| 153 | +.. _annotateText: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText |
| 154 | +.. _Document: https://cloud.google.com/natural-language/reference/rest/v1beta1/Document |
| 155 | +.. _Google Cloud Storage: https://cloud.google.com/storage/ |
| 156 | + |
| 157 | +Analyze Entities |
| 158 | +---------------- |
| 159 | + |
| 160 | +The :meth:`~gcloud.language.document.Document.analyze_entities` method |
| 161 | +finds named entities (i.e. proper names) in the text and returns them |
| 162 | +as a :class:`list` of :class:`~gcloud.language.entity.Entity` objects. |
| 163 | +Each entity has a corresponding type, salience (prominence), associated |
| 164 | +metadata and other properties. |
| 165 | + |
| 166 | + .. code-block:: python |
| 167 | +
|
| 168 | + >>> text_content = ("Michelangelo Caravaggio, Italian painter, is " |
| 169 | + ... "known for 'The Calling of Saint Matthew'.") |
| 170 | + >>> document = client.document(text_content) |
| 171 | + >>> entities = document.analyze_entities() |
| 172 | + >>> for entity in entities: |
| 173 | + ... print('=' * 20) |
| 174 | + ... print(' name: %s' % (entity.name,)) |
| 175 | + ... print(' type: %s' % (entity.entity_type,)) |
| 176 | + ... print('metadata: %s' % (entity.metadata,)) |
| 177 | + ... print('salience: %s' % (entity.salience,)) |
| 178 | + ==================== |
| 179 | + name: Michelangelo Caravaggio |
| 180 | + type: PERSON |
| 181 | + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Caravaggio'} |
| 182 | + salience: 0.75942981 |
| 183 | + ==================== |
| 184 | + name: Italian |
| 185 | + type: LOCATION |
| 186 | + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Italy'} |
| 187 | + salience: 0.20193423 |
| 188 | + ==================== |
| 189 | + name: The Calling of Saint Matthew |
| 190 | + type: WORK_OF_ART |
| 191 | + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/index.html?curid=2838808'} |
| 192 | + salience: 0.03863598 |
| 193 | +
|
| 194 | +Analyze Sentiment |
| 195 | +----------------- |
| 196 | + |
| 197 | +The :meth:`~gcloud.language.document.Document.analyze_sentiment` method |
| 198 | +analyzes the sentiment of the provided text and returns a |
| 199 | +:class:`~gcloud.language.sentiment.Sentiment`. Currently, this method |
| 200 | +only supports English text. |
| 201 | + |
| 202 | + .. code-block:: python |
| 203 | +
|
| 204 | + >>> text_content = "Jogging isn't very fun." |
| 205 | + >>> document = client.document(text_content) |
| 206 | + >>> sentiment = document.analyze_sentiment() |
| 207 | + >>> print(sentiment.polarity) |
| 208 | + -1 |
| 209 | + >>> print(sentiment.magnitude) |
| 210 | + 0.8 |
| 211 | +
|
| 212 | +Annotate Text |
| 213 | +------------- |
| 214 | + |
| 215 | +The :meth:`~gcloud.language.document.Document.annotate_text` method |
| 216 | +analyzes a document and is intended for users who are familiar with |
| 217 | +machine learning and need in-depth text features to build upon. |
| 218 | + |
| 219 | +The method returns a named tuple with four entries: |
| 220 | + |
| 221 | +* ``sentences``: A :class:`list` of sentences in the text |
| 222 | +* ``tokens``: A :class:`list` of :class:`~gcloud.language.token.Token` |
| 223 | + object (e.g. words, punctuation) |
| 224 | +* ``sentiment``: The :class:`~gcloud.language.sentiment.Sentiment` of |
| 225 | + the text (as returned by |
| 226 | + :meth:`~gcloud.language.document.Document.analyze_sentiment`) |
| 227 | +* ``entities``: :class:`list` of :class:`~gcloud.language.entity.Entity` |
| 228 | + objects extracted from the text (as returned by |
| 229 | + :meth:`~gcloud.language.document.Document.analyze_entities`) |
| 230 | + |
| 231 | +By default :meth:`~gcloud.language.document.Document.annotate_text` has |
| 232 | +three arguments ``include_syntax``, ``include_entities`` and |
| 233 | +``include_sentiment`` which are all :data:`True`. However, each of these |
| 234 | +`Features`_ can be selectively turned off by setting the corresponding |
| 235 | +arguments to :data:`False`. |
| 236 | + |
| 237 | +When ``include_syntax=False``, ``sentences`` and ``tokens`` in the |
| 238 | +response is :data:`None`. When ``include_sentiment``, ``sentiment`` in |
| 239 | +the response is :data:`None`. When ``include_entities``, ``entities`` in |
| 240 | +the response is :data:`None`. |
| 241 | + |
| 242 | + .. code-block:: python |
| 243 | +
|
| 244 | + >>> text_content = 'The cow jumped over the Moon.' |
| 245 | + >>> document = client.document(text_content) |
| 246 | + >>> annotations = document.annotate_text() |
| 247 | + >>> # Sentences present if include_syntax=True |
| 248 | + >>> print(annotations.sentences) |
| 249 | + ['The cow jumped over the Moon.'] |
| 250 | + >>> # Tokens present if include_syntax=True |
| 251 | + >>> for token in annotations.tokens: |
| 252 | + ... msg = '%11s: %s' % (token.part_of_speech, token.text_content) |
| 253 | + ... print(msg) |
| 254 | + DETERMINER: The |
| 255 | + NOUN: cow |
| 256 | + VERB: jumped |
| 257 | + ADPOSITION: over |
| 258 | + DETERMINER: the |
| 259 | + NOUN: Moon |
| 260 | + PUNCTUATION: . |
| 261 | + >>> # Sentiment present if include_sentiment=True |
| 262 | + >>> print(annotations.sentiment.polarity) |
| 263 | + 1 |
| 264 | + >>> print(annotations.sentiment.magnitude) |
| 265 | + 0.1 |
| 266 | + >>> # Entities present if include_entities=True |
| 267 | + >>> for entity in annotations.entities: |
| 268 | + ... print('=' * 20) |
| 269 | + ... print(' name: %s' % (entity.name,)) |
| 270 | + ... print(' type: %s' % (entity.entity_type,)) |
| 271 | + ... print('metadata: %s' % (entity.metadata,)) |
| 272 | + ... print('salience: %s' % (entity.salience,)) |
| 273 | + ==================== |
| 274 | + name: Moon |
| 275 | + type: LOCATION |
| 276 | + metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Natural_satellite'} |
| 277 | + salience: 0.11793101 |
| 278 | +
|
| 279 | +.. _Features: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText#Features |
0 commit comments