Speech: long_running_recognize on serverless #6162

jdupl123 · 2018-10-04T06:07:03Z

I asked this on stackoverflow https://stackoverflow.com/questions/52417481/speech-long-running-recognize-on-serverless but recieved no response.

I would like to use long_running_recognize with on serverless. I have a constraint that I can only run for 5 minutes at a time. In order to gather a transcription which takes longer than 5 minutes, I would like to resume the operation.result call in a new process.

I could implement my own client but this is not an ideal solution as it requires me calling

gcloud auth application-default print-access-token

to get an access token. It also make it harder to keep up with changes in the api.

Any advice/thoughts would be apreciated.

Cheers,
Jaco.

The text was updated successfully, but these errors were encountered:

tseaver · 2018-10-04T07:10:22Z

@jdupl123 You could dump the state of the google.longrunning.Operation out from the OperationFuture returned from SpeechClient.long_running_recognize, e.g.:

future = speech_client.long_running_recognize(...)
opration = future.operation()
# save the message in your backing store of choice

Then you could reconstitute it later:

from google.api_core.operation import from_gapic
from google.cloud.speech_v1.proto import cloud_speech_pb2

# load operation from backing store
future = from_gapic(
    operation,
    speech_client.transport._operations_client,
    cloud_speech_pb2.LongRunningRecognizeResponse,
    metadata_type=cloud_speech_pb2.LongRunningRecognizeMetadata,
)

jdupl123 · 2018-10-04T07:34:44Z

thanks @tseaver for the tip.

since future.operation is of type google.longrunning.operations_pb2.Operation I can't just JSON dump it, to save it in Redis. would you recommend pickle.dumps() or is there some better way to convert the operation to text and back again. I always have some security concerns around using pickle.

jdupl123 · 2018-10-04T07:40:45Z

btw I tried saving the name:"1234324" printed out by future.operation in a named tuple but then the done attribute was missing. This felt like quite a hacky solution so I didn't pursue it further.

tseaver · 2018-10-05T14:08:17Z

@jdupl123 Convert a protobuf messages to JSON:

from google.protobuf import json_format

# Convert operation protobuf message to JSON-formatted string
operation_json = json_format.MessageToJson(operation)

Converting it back:

from google.longrunning import operation_pb2

# Convert JSON-formatted string to proto message
new_operation = json_format.Parse(operation_js, operation_pb2.Operation())

jdupl123 · 2018-10-05T22:43:29Z

Many thanks @tseaver much appreciated.

jdupl123 · 2018-10-09T12:14:06Z

I've tried the above but I run into the following issue.

code as follows

from google.api_core.operation import from_gapic
from google.cloud.speech_v1.proto import cloud_speech_pb2
from google.protobuf import json_format
from google.longrunning import operations_pb2


def serialize_operation(future):
    operation = future.operation

    # Convert operation protobuf message to JSON-formatted string
    operation_json = json_format.MessageToJson(operation)
    return operation_json


def restore_operation(operation_json, speech_client):
    # Convert JSON-formatted string to proto message
    operation = json_format.Parse(operation_json, operations_pb2.Operation())

    # load operation from backing store
    future = from_gapic(
        operation,
        speech_client.transport._operations_client,
        cloud_speech_pb2.LongRunningRecognizeResponse,
        metadata_type=cloud_speech_pb2.LongRunningRecognizeMetadata
    )

    return future

    def gather_transcripts(self):
        client = self.client
        operationJson = self.persist.get_json(f'{self.jobUUID}_operation')
        operation = restore_operation(operationJson, client)

        try:
            response = operation.result(timeout=100)
            done = response.done()
        except TimeoutError as e:
            print(e)
            done = False

Traceback (most recent call last):
  File "googletest.py", line 16, in <module>
    res = gt.gather_transcripts()
  File "/Users/jdp/Projects/transcribers/google.py", line 73, in gather_transcripts
    response = operation.result(timeout=100)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 115, in result
    self._blocking_poll(timeout=timeout)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 94, in _blocking_poll
    retry_(self._done_or_raise)()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
    on_error=on_error,
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
    return target()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 73, in _done_or_raise
    if not self.done():
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 140, in done
    self._refresh_and_update()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 132, in _refresh_and_update
    self._set_result_from_operation()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 112, in _set_result_from_operation
    self._result_type, self._operation.response)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/protobuf_helpers.py", line 57, in from_any_pb
    any_pb.__class__.__name__, pb_type.__name__))
TypeError: Could not convert Any to LongRunningRecognizeResponse

any ideas of tips for how to debug this further? @tseaver

jdupl123 · 2018-10-09T23:29:57Z

I solved this issue. the issue was mixing speech_v1 with speech_v1p1beta1

jdupl123 · 2018-11-28T03:53:40Z

@tseaver this method works for me most of the time but occasionally I get

Traceback (most recent call last):
  File "handlers/managed_transcribe.py", line 82, in handler
  File "handlers/managed_transcribe.py", line 221, in __call__
  File "handlers/managed_transcribe.py", line 158, in run_transcriber
  File "transcribers/google.py", line 84, in gather_transcripts
  File "/var/task/google/api_core/future/polling.py", line 120, in result
              raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Server unavailable, please try again later.

this happens when there is a noise of silence at the end of the file to be transcribed.

Any ideas what may be causing this?

tseaver · 2018-12-07T19:31:30Z

@jdupl123 In #6305, I added support for retrying transient errors to Future.result. That fix is available in released versions since google-api_core 1.5.2.

tseaver added type: question Request for information or clarification. Not an issue. api: speech Issues related to the Speech-to-Text API. labels Oct 4, 2018

jdupl123 closed this as completed Oct 5, 2018

jdupl123 reopened this Oct 9, 2018

jdupl123 closed this as completed Oct 9, 2018

jdupl123 reopened this Nov 28, 2018

tseaver closed this as completed Dec 7, 2018

JustinBeckwith assigned jdupl123 Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech: long_running_recognize on serverless #6162

Speech: long_running_recognize on serverless #6162

jdupl123 commented Oct 4, 2018

tseaver commented Oct 4, 2018

jdupl123 commented Oct 4, 2018

jdupl123 commented Oct 4, 2018

tseaver commented Oct 5, 2018

jdupl123 commented Oct 5, 2018

jdupl123 commented Oct 9, 2018 •

edited

Loading

jdupl123 commented Oct 9, 2018

jdupl123 commented Nov 28, 2018

tseaver commented Dec 7, 2018

Speech: long_running_recognize on serverless #6162

Speech: long_running_recognize on serverless #6162

Comments

jdupl123 commented Oct 4, 2018

tseaver commented Oct 4, 2018

jdupl123 commented Oct 4, 2018

jdupl123 commented Oct 4, 2018

tseaver commented Oct 5, 2018

jdupl123 commented Oct 5, 2018

jdupl123 commented Oct 9, 2018 • edited Loading

jdupl123 commented Oct 9, 2018

jdupl123 commented Nov 28, 2018

tseaver commented Dec 7, 2018

jdupl123 commented Oct 9, 2018 •

edited

Loading