Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing TF and ProdClient in the same file #5

Open
FrancescoSaverioZuppichini opened this issue Apr 20, 2018 · 1 comment
Open

Importing TF and ProdClient in the same file #5

FrancescoSaverioZuppichini opened this issue Apr 20, 2018 · 1 comment

Comments

@FrancescoSaverioZuppichini

Hello,

I need to import some function from keras in order to preprocess the input to send to the tf-serving server. I am doing sentiment analysis in tweets. My client is pretty basic

# client.py
from predict_client.prod_client import ProdClient
from Config import Config
from data import get_tokenizer, make_inputs

client = ProdClient(Config.HOST, Config.MODEL_NAME, Config.MODEL_VERSION)


def predict(text):

    tokenizer = get_tokenizer()
    data = make_inputs(text, tokenizer)
    print(data)
    req_data = [{'in_tensor_name': 'inputs', 'in_tensor_dtype': 'DT_FLOAT', 'data': data}]

    prediction = client.predict(req_data, request_timeout=10)

    return prediction


text = 'Trump is bad'
print(predict(text))

I import from data.py two utils functions that use keras

#data.py
from tensorflow.python.keras.preprocessing.text import Tokenizer
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
import numpy as np

def get_tokenizer():
    # TODO better approach is to subclass the Tokenizer
    conn = redis.Redis(Config.REDIS_HOST, port=Config.REDIS_PORT)

    tokenizer = Tokenizer(num_words=Config.N_WORDS + 1, oov_token="<OOV>")

    tokenizer.word_index = conn.hgetall('word_index')

    return tokenizer

def make_inputs(tokenizer, data):
    sequences = tokenizer.texts_to_sequences(data)
    sequences_pad = pad_sequences(sequences, maxlen=Config.MAX_LEN)
    inputs = np.array(sequences_pad).reshape([-1, Config.MAX_LEN])
    return inputs

I get this error:

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "tensorflow/core/framework/tensor_shape.proto":
  tensorflow.TensorShapeProto.dim: "tensorflow.TensorShapeProto.dim" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto.unknown_rank: "tensorflow.TensorShapeProto.unknown_rank" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto.Dim.size: "tensorflow.TensorShapeProto.Dim.size" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto.Dim.name: "tensorflow.TensorShapeProto.Dim.name" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto.Dim: "tensorflow.TensorShapeProto.Dim" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto: "tensorflow.TensorShapeProto" is already defined in file "tensor_shape.proto".
  tensorflow.TensorShapeProto.dim: "tensorflow.TensorShapeProto.Dim" seems to be defined in "tensor_shape.proto", which is not imported by "tensorflow/core/framework/tensor_shape.proto".  To use it here, please add the necessary import.

I am preatty sure this is because I import a file that imports keras from tf. Any idea to fix it?

Thank you for your time,

Cheers,

Francesco Saverio Zuppichini

@stianlp
Copy link
Contributor

stianlp commented May 18, 2018

Hi, and sorry for the super late reply!

TLDR; see the bottom of this comment.

At first I thought this had to do with naming (which is sort of does) and that renaming packages would work in the client, but then again cause problems when sending requests to the server because client and server must use same names on the messages to work in gRPC.

I thought I could "fool" Keras to believe that we are using the same .proto files that is located in tensorflow/core/framework/ and make it work. But it looks like packages and naming isn't the only issue, looks like it has to do with loading files multiple times (exactly what the error message says).

So, I found this:
protocolbuffers/protobuf#3002 (comment)

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'

It fixes the issue for me, but I am not why and if it's a good long-term solution.

EDIT: I've only tried to import these two lines, and it worked sending a request.

from tensorflow.python.keras.preprocessing.text import Tokenizer
from tensorflow.python.keras.preprocessing.sequence import pad_sequences

But I haven't tried to actually call the tokenizer functions and stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants