Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train model with tfgo? #55

Open
ZonghaoWang opened this issue Jul 14, 2021 · 8 comments
Open

how to train model with tfgo? #55

ZonghaoWang opened this issue Jul 14, 2021 · 8 comments

Comments

@ZonghaoWang
Copy link

No description provided.

@galeone
Copy link
Owner

galeone commented Jul 14, 2021

If you're interested in using a model for inference, it's better to train the model in Pyton, export it as a SavedModel and then use it from tfgo.

If instead you're interested in training a model with tfgo (but currently we don't support model saving, hence you have to keep it in memory after training and you can't save it on disk), you must export the training graph + model in Python as a SavedModel and use it from tfgo. I wrote an article that explains this process: https://pgaleone.eu/tensorflow/go/2020/11/27/deploy-train-tesorflow-models-in-go-human-activity-recognition/

@CodingBeard
Copy link

@galeone

I read through your guide and figured out how to save a model after training in golang.

Note that I am using python TF 2.4.1 and github.com/tensorflow/tensorflow v2.0.3+incompatible. It may work on other TF versions and also with tfgo.

When I saved a model tensorflow python it had an additional two StatefulPartitionCall outputs and a saver_filename input. These are not visible using saved_model_cli but you can see them by looping through .Operations() on a loaded model in golang.

The first of the two extra outputs, when given a string value E.G. tf.NewTensor("save_dir/variables/variables") for saver_filename will save the variables (I.E. variables.index, variables.data-00000-of-00001) in save_dir/variables.

You can just then copy the original saved_model.pb into save_dir, and when you load from that dir it will have the golang trained weights.

Thanks for your article, it pointed me in the right direction.

@galeone
Copy link
Owner

galeone commented Nov 17, 2021

@CodingBeard thanks for reading the article and for the feedback!

I'm not aware of that method of getting the operation name, but your experience can help other readers. Would you be so kind to open a merge request to the blog repo (https://github.com/galeone/galeone.github.io/blob/master/_posts/2020-11-27-deploy-train-tesorflow-models-in-go-human-activity-recognition.md) adding some lines on how to find the node names if you can't find them using saved_model_cli?

It will be a great addition!

@CodingBeard
Copy link

I'm not using tfgo in my project, but with an older version of tensorflow's golang package it looks like this:

model, e := tf.LoadSavedModel("model_dir", []string{"serve"}, nil)
if e != nil {
    panic(e.Error())
}

for _, operation := range model.Graph.Operations() {
    fmt.Println(operation.Name())
}

It seems the save node is always the first hidden StatefulPartitionCall after the ones visible in saved_model_cli. So for example in your article where the learn signature is StatefulPartitionCall and the predict signature is StatefulPartitionCall_1 then the one to call for saving the variables will be StatefulPartitionCall_2

The full example using python TF 2.4.1 and golang tensorflow/tensorflow/go v2.0.3 is as follows:

Python:

class GolangModel(tf.Module):
    def __init__(self):
        super().__init__()

        bool_input = k.layers.Input(
            shape=(3,),
            name='bool_input',
            dtype='float32',
            batch_size=10
        )

        output = k.layers.Dense(
            1
        )(bool_input)

        self.model = Model(bool_input, output)
        self._global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
        self._optimizer = k.optimizers.Adam()
        self._loss = k.losses.binary_crossentropy

    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=(None, 3), dtype=tf.float32),
            tf.TensorSpec(shape=(None, 1), dtype=tf.float32),
        ]
    )
    def learn(self, data, labels):
        self._global_step.assign_add(1)
        with tf.GradientTape() as tape:
            loss = self._loss(labels, self.model(data))

        gradient = tape.gradient(loss, self.model.trainable_variables)
        self._optimizer.apply_gradients(zip(gradient, self.model.trainable_variables))
        return {"loss": loss}

    @tf.function(input_signature=[tf.TensorSpec(shape=(None, 3), dtype=tf.float32)])
    def predict(self, data):
        prediction = self.model(data)
        return {"prediction": prediction}

gm = GolangModel()

gm.learn(
    tf.zeros([10, 3], dtype=tf.float32),
    tf.zeros([10, 1], dtype=tf.float32),
)
gm.predict(tf.zeros((10, 3), dtype=tf.float32))

tf.saved_model.save(
    gm,
    "/data/models/gm",
    signatures={
        "learn": gm.learn,
        "predict": gm.predict,
    },
)

golang:

        gm, e := tf.LoadSavedModel("/data/models/gm", []string{"serve"}, nil)
	if e != nil {
		errorHandler.Error(e)
		return nil
	}

	boolInput, e := tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})

	result, e := gm.Session.Run(
		map[tf.Output]*tf.Tensor{
			gm.Graph.Operation("predict_data").Output(0): boolInput,
		},
		[]tf.Output{
			gm.Graph.Operation("StatefulPartitionedCall_1").Output(0),
		},
		nil,
	)
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	floatResults, ok := result[0].Value().([][]float32)
	if !ok {
		fmt.Println("No float results")
		return nil
	}

	fmt.Println(floatResults)

	trainData, e := tf.NewTensor([][]float32{
		{0.5, 0.5, 0.5},
		{0.5, 0.5, 0.5},
		{0.5, 0.5, 0.5},
		{0.5, 0.5, 0.5},
		{0.5, 0.5, 0.5},
		{0, 0, 0},
		{0, 0, 0},
		{0, 0, 0},
		{0, 0, 0},
		{0, 0, 0},
	})
	trainLabels, e := tf.NewTensor([][]float32{
		{1},
		{1},
		{1},
		{1},
		{1},
		{0},
		{0},
		{0},
		{0},
		{0},
	})

	for i := 0; i < 1000; i++ {
		_, e := gm.Session.Run(
			map[tf.Output]*tf.Tensor{
				gm.Graph.Operation("learn_data").Output(0):   trainData,
				gm.Graph.Operation("learn_labels").Output(0): trainLabels,
			},
			[]tf.Output{
				gm.Graph.Operation("StatefulPartitionedCall").Output(0),
			},
			nil,
		)
		if e != nil {
			errorHandler.Error(e)
			return e
		}
	}

	boolTest, e := tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})

	test, e := gm.Session.Run(
		map[tf.Output]*tf.Tensor{
			gm.Graph.Operation("predict_data").Output(0): boolTest,
		},
		[]tf.Output{
			gm.Graph.Operation("StatefulPartitionedCall_1").Output(0),
		},
		nil,
	)
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	testResults, ok := test[0].Value().([][]float32)
	if !ok {
		fmt.Println("No post training float results")
		return nil
	}

	fmt.Println(testResults)

	os.RemoveAll("gm-trained")
	os.MkdirAll("gm-trained/variables", os.ModePerm)
	savedModel, e := ioutil.ReadFile("/data/models/gm/saved_model.pb")
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	e = ioutil.WriteFile("gm-trained/saved_model.pb", savedModel, os.ModePerm)
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	filenameInput, e := tf.NewTensor("gm-trained/variables/variables")
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	_, e = gm.Session.Run(
		map[tf.Output]*tf.Tensor{
			gm.Graph.Operation("saver_filename").Output(0): filenameInput,
		},
		[]tf.Output{
			gm.Graph.Operation("StatefulPartitionedCall_2").Output(0),
		},
		nil,
	)
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	gmTrained, e := tf.LoadSavedModel("gm-trained", []string{"serve"}, nil)
	if e != nil {
		errorHandler.Error(e)
		return nil
	}

	boolTest, e = tf.NewTensor([][]float32{{0.5, 0.5, 0.5}, {0, 0, 0}})

	test, e = gmTrained.Session.Run(
		map[tf.Output]*tf.Tensor{
			gmTrained.Graph.Operation("predict_data").Output(0): boolTest,
		},
		[]tf.Output{
			gmTrained.Graph.Operation("StatefulPartitionedCall_1").Output(0),
		},
		nil,
	)
	if e != nil {
		errorHandler.Error(e)
		return e
	}

	testResults, ok = test[0].Value().([][]float32)
	if !ok {
		fmt.Println("No post training float results")
		return nil
	}

	fmt.Println(testResults)

Hopefully that makes sense and you can use it for your article, and maybe even add a method to save the model in this repo.

@galeone
Copy link
Owner

galeone commented Nov 17, 2021

woah, thanks for sharing! I guess I'm going to extract something from your code for tfgo 😄

@CodingBeard
Copy link

Hey, I've continued digging and managed to get the human readable graph def of that saved model. Under the hood the save method seems to be using op.SaveV2 to save the variables and you can see the function definition of StatefulPartitionedCall_2 by searching the graph def for: __inference__traced_save_719

Human readable graph: https://gist.github.com/CodingBeard/769a42d06a9b9d518e69f6c1ae41e45b

I got the graph in that format by making use of tensorflow/c/c_api_experimental.h:TF_GraphDebugString by adding the following function (I'm aware of the lack of memory management) to github.com/galone/tensorflow/go/graph.go

func (g *Graph) GetDebugString() string {
	tmp := C.ulong(1)
	graphDebugChar := C.TF_GraphDebugString(g.c, (*C.ulong)(&tmp))
	goString := C.GoString(graphDebugChar)

	return goString
}

You can figure out what's going on under the hood of a StatefulPartitionedCall (which is a just the graph of a tf.function) using that debug string method.

@MIchaelFU0403
Copy link

How can i save a trained model using golang? With the help of tf-go and Mr.CodingBeard 's code, i can train simple machine leanring model in an online scenario and test them by evaluating related performance metric. Could you please give me some insights to save trained model in save_model or checkpoint format.

@MIchaelFU0403
Copy link

i tried to use privided "saver_filename" and "StatefulPartitionedCall_2" but did not work in other self-build models that are similar to Mr.CodingBeard 's gm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants