Skip to content

Commit 9a6bf7f

Browse files
authored
Merge pull request #2062 from dhermes/language-usage-doc
Adding usage doc for Natural Language API.
2 parents 0bf3b68 + d36ad4d commit 9a6bf7f

File tree

2 files changed

+286
-0
lines changed

2 files changed

+286
-0
lines changed

docs/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,13 @@
148148

149149
vision-usage
150150

151+
.. toctree::
152+
:maxdepth: 0
153+
:hidden:
154+
:caption: Natural Language
155+
156+
language-usage
157+
151158
.. toctree::
152159
:maxdepth: 0
153160
:hidden:

docs/language-usage.rst

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
Using the API
2+
=============
3+
4+
The `Google Natural Language`_ API can be used to reveal the
5+
structure and meaning of text via powerful machine
6+
learning models. You can use it to extract information about
7+
people, places, events and much more, mentioned in text documents,
8+
news articles or blog posts. You can use it to understand
9+
sentiment about your product on social media or parse intent from
10+
customer conversations happening in a call center or a messaging
11+
app. You can analyze text uploaded in your request or integrate
12+
with your document storage on Google Cloud Storage.
13+
14+
.. warning::
15+
16+
This is a Beta release of Google Cloud Natural Language API. This
17+
API is not intended for real-time usage in critical applications.
18+
19+
.. _Google Natural Language: https://cloud.google.com/natural-language/docs/getting-started
20+
21+
Client
22+
------
23+
24+
:class:`~gcloud.language.client.Client` objects provide a
25+
means to configure your application. Each instance holds
26+
both a ``project`` and an authenticated connection to the
27+
Natural Language service.
28+
29+
For an overview of authentication in ``gcloud-python``, see
30+
:doc:`gcloud-auth`.
31+
32+
Assuming your environment is set up as described in that document,
33+
create an instance of :class:`~gcloud.language.client.Client`.
34+
35+
.. code-block:: python
36+
37+
>>> from gcloud import language
38+
>>> client = language.Client()
39+
40+
By default the ``language`` is ``'en'`` and the ``encoding`` is
41+
UTF-8. To over-ride these values:
42+
43+
.. code-block:: python
44+
45+
>>> client = language.Client(language='es',
46+
... encoding=encoding=language.Encoding.UTF16)
47+
48+
The encoding can be one of
49+
:attr:`Encoding.UTF8 <gcloud.language.document.Encoding.UTF8>`,
50+
:attr:`Encoding.UTF16 <gcloud.language.document.Encoding.UTF16>`, or
51+
:attr:`Encoding.UTF32 <gcloud.language.document.Encoding.UTF32>`.
52+
53+
Methods
54+
-------
55+
56+
The Google Natural Language API has three supported methods
57+
58+
- `analyzeEntities`_
59+
- `analyzeSentiment`_
60+
- `annotateText`_
61+
62+
and each method uses a `Document`_ for representing text. To
63+
create a :class:`~gcloud.language.document.Document`,
64+
65+
.. code-block:: python
66+
67+
>>> text_content = (
68+
... 'Google, headquartered in Mountain View, unveiled the '
69+
... 'new Android phone at the Consumer Electronic Show. '
70+
... 'Sundar Pichai said in his keynote that users love '
71+
... 'their new Android phones.')
72+
>>> document = client.document_from_text(text_content)
73+
74+
By using :meth:`~gcloud.language.client.Client.document_from_text`,
75+
the document's type is plain text:
76+
77+
.. code-block:: python
78+
79+
>>> document.doc_type == language.Document.PLAIN_TEXT
80+
True
81+
82+
In addition, the document's language defaults to the language on
83+
the client
84+
85+
.. code-block:: python
86+
87+
>>> document.language
88+
'en'
89+
>>> document.language == client.language
90+
True
91+
92+
In addition, the
93+
:meth:`~gcloud.language.client.Client.document_from_html`,
94+
factory can be used to created an HTML document. In this
95+
method and the from text method, the language can be
96+
over-ridden:
97+
98+
.. code-block:: python
99+
100+
>>> html_content = """\
101+
... <html>
102+
... <head>
103+
... <title>El Tiempo de las Historias</time>
104+
... </head>
105+
... <body>
106+
... <p>La vaca salt&oacute; sobre la luna.</p>
107+
... </body>
108+
... </html>
109+
... """
110+
>>> document = client.document_from_html(html_content,
111+
... language='es')
112+
113+
The ``language`` argument can be either ISO-639-1 or BCP-47 language
114+
codes; at the time, only English, Spanish, and Japanese `are supported`_.
115+
However, the ``analyzeSentiment`` method `only supports`_ English text.
116+
117+
.. _are supported: https://cloud.google.com/natural-language/docs/
118+
.. _only supports: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment#body.request_body.FIELDS.document
119+
120+
The document type (``doc_type``) value can be one of
121+
:attr:`Document.PLAIN_TEXT <gcloud.language.document.Document.PLAIN_TEXT>` or
122+
:attr:`Document.HTML <gcloud.language.document.Document.HTML>`.
123+
124+
In addition to supplying the text / HTML content, a document can refer
125+
to content stored in `Google Cloud Storage`_. We can use the
126+
:meth:`~gcloud.language.client.Client.document_from_blob` method:
127+
128+
.. code-block:: python
129+
130+
>>> document = client.document_from_blob(bucket='my-text-bucket',
131+
... blob='sentiment-me.txt')
132+
>>> document.gcs_url
133+
'gs://my-text-bucket/sentiment-me.txt'
134+
>>> document.doc_type == language.Document.PLAIN_TEXT
135+
True
136+
137+
and the :meth:`~gcloud.language.client.Client.document_from_uri`
138+
method. In either case, the document type can be specified with
139+
the ``doc_type`` argument:
140+
141+
.. code-block:: python
142+
143+
>>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt'
144+
>>> document = client.document_from_uri(
145+
... gcs_url, doc_type=language.Document.HTML)
146+
>>> document.gcs_url == gcs_url
147+
True
148+
>>> document.doc_type == language.Document.HTML
149+
True
150+
151+
.. _analyzeEntities: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeEntities
152+
.. _analyzeSentiment: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment
153+
.. _annotateText: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText
154+
.. _Document: https://cloud.google.com/natural-language/reference/rest/v1beta1/Document
155+
.. _Google Cloud Storage: https://cloud.google.com/storage/
156+
157+
Analyze Entities
158+
----------------
159+
160+
The :meth:`~gcloud.language.document.Document.analyze_entities` method
161+
finds named entities (i.e. proper names) in the text and returns them
162+
as a :class:`list` of :class:`~gcloud.language.entity.Entity` objects.
163+
Each entity has a corresponding type, salience (prominence), associated
164+
metadata and other properties.
165+
166+
.. code-block:: python
167+
168+
>>> text_content = ("Michelangelo Caravaggio, Italian painter, is "
169+
... "known for 'The Calling of Saint Matthew'.")
170+
>>> document = client.document(text_content)
171+
>>> entities = document.analyze_entities()
172+
>>> for entity in entities:
173+
... print('=' * 20)
174+
... print(' name: %s' % (entity.name,))
175+
... print(' type: %s' % (entity.entity_type,))
176+
... print('metadata: %s' % (entity.metadata,))
177+
... print('salience: %s' % (entity.salience,))
178+
====================
179+
name: Michelangelo Caravaggio
180+
type: PERSON
181+
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Caravaggio'}
182+
salience: 0.75942981
183+
====================
184+
name: Italian
185+
type: LOCATION
186+
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Italy'}
187+
salience: 0.20193423
188+
====================
189+
name: The Calling of Saint Matthew
190+
type: WORK_OF_ART
191+
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/index.html?curid=2838808'}
192+
salience: 0.03863598
193+
194+
Analyze Sentiment
195+
-----------------
196+
197+
The :meth:`~gcloud.language.document.Document.analyze_sentiment` method
198+
analyzes the sentiment of the provided text and returns a
199+
:class:`~gcloud.language.sentiment.Sentiment`. Currently, this method
200+
only supports English text.
201+
202+
.. code-block:: python
203+
204+
>>> text_content = "Jogging isn't very fun."
205+
>>> document = client.document(text_content)
206+
>>> sentiment = document.analyze_sentiment()
207+
>>> print(sentiment.polarity)
208+
-1
209+
>>> print(sentiment.magnitude)
210+
0.8
211+
212+
Annotate Text
213+
-------------
214+
215+
The :meth:`~gcloud.language.document.Document.annotate_text` method
216+
analyzes a document and is intended for users who are familiar with
217+
machine learning and need in-depth text features to build upon.
218+
219+
The method returns a named tuple with four entries:
220+
221+
* ``sentences``: A :class:`list` of sentences in the text
222+
* ``tokens``: A :class:`list` of :class:`~gcloud.language.token.Token`
223+
object (e.g. words, punctuation)
224+
* ``sentiment``: The :class:`~gcloud.language.sentiment.Sentiment` of
225+
the text (as returned by
226+
:meth:`~gcloud.language.document.Document.analyze_sentiment`)
227+
* ``entities``: :class:`list` of :class:`~gcloud.language.entity.Entity`
228+
objects extracted from the text (as returned by
229+
:meth:`~gcloud.language.document.Document.analyze_entities`)
230+
231+
By default :meth:`~gcloud.language.document.Document.annotate_text` has
232+
three arguments ``include_syntax``, ``include_entities`` and
233+
``include_sentiment`` which are all :data:`True`. However, each of these
234+
`Features`_ can be selectively turned off by setting the corresponding
235+
arguments to :data:`False`.
236+
237+
When ``include_syntax=False``, ``sentences`` and ``tokens`` in the
238+
response is :data:`None`. When ``include_sentiment``, ``sentiment`` in
239+
the response is :data:`None`. When ``include_entities``, ``entities`` in
240+
the response is :data:`None`.
241+
242+
.. code-block:: python
243+
244+
>>> text_content = 'The cow jumped over the Moon.'
245+
>>> document = client.document(text_content)
246+
>>> annotations = document.annotate_text()
247+
>>> # Sentences present if include_syntax=True
248+
>>> print(annotations.sentences)
249+
['The cow jumped over the Moon.']
250+
>>> # Tokens present if include_syntax=True
251+
>>> for token in annotations.tokens:
252+
... msg = '%11s: %s' % (token.part_of_speech, token.text_content)
253+
... print(msg)
254+
DETERMINER: The
255+
NOUN: cow
256+
VERB: jumped
257+
ADPOSITION: over
258+
DETERMINER: the
259+
NOUN: Moon
260+
PUNCTUATION: .
261+
>>> # Sentiment present if include_sentiment=True
262+
>>> print(annotations.sentiment.polarity)
263+
1
264+
>>> print(annotations.sentiment.magnitude)
265+
0.1
266+
>>> # Entities present if include_entities=True
267+
>>> for entity in annotations.entities:
268+
... print('=' * 20)
269+
... print(' name: %s' % (entity.name,))
270+
... print(' type: %s' % (entity.entity_type,))
271+
... print('metadata: %s' % (entity.metadata,))
272+
... print('salience: %s' % (entity.salience,))
273+
====================
274+
name: Moon
275+
type: LOCATION
276+
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Natural_satellite'}
277+
salience: 0.11793101
278+
279+
.. _Features: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText#Features

0 commit comments

Comments
 (0)