Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech: long_running_recognize on serverless #6162

Closed
jdupl123 opened this issue Oct 4, 2018 · 9 comments
Closed

Speech: long_running_recognize on serverless #6162

jdupl123 opened this issue Oct 4, 2018 · 9 comments
Assignees
Labels
api: speech Issues related to the Speech-to-Text API. type: question Request for information or clarification. Not an issue.

Comments

@jdupl123
Copy link

jdupl123 commented Oct 4, 2018

I asked this on stackoverflow https://stackoverflow.com/questions/52417481/speech-long-running-recognize-on-serverless but recieved no response.

I would like to use long_running_recognize with on serverless. I have a constraint that I can only run for 5 minutes at a time. In order to gather a transcription which takes longer than 5 minutes, I would like to resume the operation.result call in a new process.

I could implement my own client but this is not an ideal solution as it requires me calling

gcloud auth application-default print-access-token

to get an access token. It also make it harder to keep up with changes in the api.

Any advice/thoughts would be apreciated.

Cheers,
Jaco.

@tseaver tseaver added type: question Request for information or clarification. Not an issue. api: speech Issues related to the Speech-to-Text API. labels Oct 4, 2018
@tseaver
Copy link
Contributor

tseaver commented Oct 4, 2018

@jdupl123 You could dump the state of the google.longrunning.Operation out from the OperationFuture returned from SpeechClient.long_running_recognize, e.g.:

future = speech_client.long_running_recognize(...)
opration = future.operation()
# save the message in your backing store of choice

Then you could reconstitute it later:

from google.api_core.operation import from_gapic
from google.cloud.speech_v1.proto import cloud_speech_pb2

# load operation from backing store
future = from_gapic(
    operation,
    speech_client.transport._operations_client,
    cloud_speech_pb2.LongRunningRecognizeResponse,
    metadata_type=cloud_speech_pb2.LongRunningRecognizeMetadata,
)

@jdupl123
Copy link
Author

jdupl123 commented Oct 4, 2018

thanks @tseaver for the tip.

since future.operation is of type google.longrunning.operations_pb2.Operation I can't just JSON dump it, to save it in Redis. would you recommend pickle.dumps() or is there some better way to convert the operation to text and back again. I always have some security concerns around using pickle.

@jdupl123
Copy link
Author

jdupl123 commented Oct 4, 2018

btw I tried saving the name:"1234324" printed out by future.operation in a named tuple but then the done attribute was missing. This felt like quite a hacky solution so I didn't pursue it further.

@tseaver
Copy link
Contributor

tseaver commented Oct 5, 2018

@jdupl123 Convert a protobuf messages to JSON:

from google.protobuf import json_format

# Convert operation protobuf message to JSON-formatted string
operation_json = json_format.MessageToJson(operation)

Converting it back:

from google.longrunning import operation_pb2

# Convert JSON-formatted string to proto message
new_operation = json_format.Parse(operation_js, operation_pb2.Operation())

@jdupl123
Copy link
Author

jdupl123 commented Oct 5, 2018

Many thanks @tseaver much appreciated.

@jdupl123 jdupl123 closed this as completed Oct 5, 2018
@jdupl123 jdupl123 reopened this Oct 9, 2018
@jdupl123
Copy link
Author

jdupl123 commented Oct 9, 2018

I've tried the above but I run into the following issue.

code as follows

from google.api_core.operation import from_gapic
from google.cloud.speech_v1.proto import cloud_speech_pb2
from google.protobuf import json_format
from google.longrunning import operations_pb2


def serialize_operation(future):
    operation = future.operation

    # Convert operation protobuf message to JSON-formatted string
    operation_json = json_format.MessageToJson(operation)
    return operation_json


def restore_operation(operation_json, speech_client):
    # Convert JSON-formatted string to proto message
    operation = json_format.Parse(operation_json, operations_pb2.Operation())

    # load operation from backing store
    future = from_gapic(
        operation,
        speech_client.transport._operations_client,
        cloud_speech_pb2.LongRunningRecognizeResponse,
        metadata_type=cloud_speech_pb2.LongRunningRecognizeMetadata
    )

    return future
    def gather_transcripts(self):
        client = self.client
        operationJson = self.persist.get_json(f'{self.jobUUID}_operation')
        operation = restore_operation(operationJson, client)

        try:
            response = operation.result(timeout=100)
            done = response.done()
        except TimeoutError as e:
            print(e)
            done = False
Traceback (most recent call last):
  File "googletest.py", line 16, in <module>
    res = gt.gather_transcripts()
  File "/Users/jdp/Projects/transcribers/google.py", line 73, in gather_transcripts
    response = operation.result(timeout=100)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 115, in result
    self._blocking_poll(timeout=timeout)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 94, in _blocking_poll
    retry_(self._done_or_raise)()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
    on_error=on_error,
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
    return target()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/future/polling.py", line 73, in _done_or_raise
    if not self.done():
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 140, in done
    self._refresh_and_update()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 132, in _refresh_and_update
    self._set_result_from_operation()
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/operation.py", line 112, in _set_result_from_operation
    self._result_type, self._operation.response)
  File "/Users/jdp/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/google/api_core/protobuf_helpers.py", line 57, in from_any_pb
    any_pb.__class__.__name__, pb_type.__name__))
TypeError: Could not convert Any to LongRunningRecognizeResponse

any ideas of tips for how to debug this further? @tseaver

@jdupl123
Copy link
Author

jdupl123 commented Oct 9, 2018

I solved this issue. the issue was mixing speech_v1 with speech_v1p1beta1

@jdupl123 jdupl123 closed this as completed Oct 9, 2018
@jdupl123 jdupl123 reopened this Nov 28, 2018
@jdupl123
Copy link
Author

@tseaver this method works for me most of the time but occasionally I get

Traceback (most recent call last):
  File "handlers/managed_transcribe.py", line 82, in handler
  File "handlers/managed_transcribe.py", line 221, in __call__
  File "handlers/managed_transcribe.py", line 158, in run_transcriber
  File "transcribers/google.py", line 84, in gather_transcripts
  File "/var/task/google/api_core/future/polling.py", line 120, in result
              raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Server unavailable, please try again later.

this happens when there is a noise of silence at the end of the file to be transcribed.

Any ideas what may be causing this?

@tseaver
Copy link
Contributor

tseaver commented Dec 7, 2018

@jdupl123 In #6305, I added support for retrying transient errors to Future.result. That fix is available in released versions since google-api_core 1.5.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: speech Issues related to the Speech-to-Text API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants