Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 4175] [pulsar-function-go] Add Go Function heartbeat (and gRPC service) for production usage #6031

Merged
merged 13 commits into from
Feb 4, 2020

Conversation

devinbost
Copy link
Contributor

@devinbost devinbost commented Jan 11, 2020

Partial fix required for #4175.
Addresses issue documented here: grpc/grpc-go#3310
Part of ongoing work to add the gRPC service for heartbeat functionality and related support for running Go functions.

Motivation

Progress was blocked by the issue documented above.

Modifications

  1. Updated ./generate.sh script in pulsar-function-go module to use the grpc plugin.
  2. Rebuilt Go gRPC files using the grpc plugin. This updated the proto version and was mostly an additive change that included adding the new registration methods required for creating the Go gRPC service.

Verifying this change

  • Unable to test changes because Go tests were already failing. Currently investigating.
  • More testing will be done in subsequent commits because the Go gRPC functionality is not yet complete. (Should be low impact.)

@devinbost
Copy link
Contributor Author

Let's not merge this until I can get the Go tests to pass. I just wanted to get this upstream to share the fix with anyone else who might be working on this feature.

@devinbost
Copy link
Contributor Author

FYI, I ran the Go tests from master (without my changes), and they failed with the same results as the tests with the changes from this PR.

@sijie sijie requested a review from wolfstudy January 11, 2020 03:04
@wolfstudy
Copy link
Member

FYI, I ran the Go tests from master (without my changes), and they failed with the same results as the tests with the changes from this PR.

Thanks @devinbost Can you provide the error information? Or how can I reproduce this problem

@devinbost
Copy link
Contributor Author

devinbost commented Jan 11, 2020

@wolfstudy Thanks for taking a look.
If you clone the latest from my go-features branch on my devinbost/pulsar fork (to /Users/[yourUserName]/github.com/apache/pulsar/), then:
cd /Users/[yourUserName]/github.com/apache/pulsar/pulsar-function-go/pf
go test

You should get this result:

OCPC-LM31977:pf dbost$ go test

`# github.com/apache/pulsar/pulsar-function-go/pf [github.com/apache/pulsar/pulsar-function-go/pf.test]
./function.go:167:44: undefined: port
FAIL github.com/apache/pulsar/pulsar-function-go/pf [build failed]
OCPC-LM31977:pf dbost$ go test
--- FAIL: TestContext (0.00s)
context_test.go:38:
Error Trace: context_test.go:38
Error: Not equal:
expected: []string{"topic-1", "topic-2"}
actual : []string(nil)

                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -1,5 +1,2 @@
                            -([]string) (len=2) {
                            - (string) (len=7) "topic-1",
                            - (string) (len=7) "topic-2"
                            -}
                            +([]string) <nil>
                             
            Test:           TestContext
context_test.go:42: 
            Error Trace:    context_test.go:42
            Error:          Not equal: 
                            expected: "topic-3"
                            actual  : "persistent://public/default/topic-02"
                            
                            Diff:
                            --- Expected
                            +++ Actual
                            @@ -1 +1 @@
                            -topic-3
                            +persistent://public/default/topic-02
            Test:           TestContext

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[function is nil]

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[function kind struct is not func]

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[functions may not take more than two arguments, but function takes 3]

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[function takes two arguments, but the first is not Context. got string]

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[function may not return more than two values]

2020/01/11 11:16:50.741 log.go:76: [error] process function error:[function returns two values, but the second does not implement error]

2020/01/11 11:16:50.742 log.go:76: [error] process function error:[function returns a single value, but it does not implement error]

2020/01/11 11:16:50.744 log.go:76: [error] process function error:[bad stuff]

2020/01/11 11:16:50.744 log.go:76: [error] process function error:[bad stuff]

2020/01/11 11:16:50.744 log.go:76: [error] process function error:[bad stuff]

2020/01/11 11:16:50.744 log.go:76: [error] process function error:[bad stuff]

FAIL
exit status 1
FAIL github.com/apache/pulsar/pulsar-function-go/pf 0.123s

@sijie
Copy link
Member

sijie commented Jan 13, 2020

ping @wolfstudy ?

@devinbost
Copy link
Contributor Author

devinbost commented Jan 13, 2020

I ran each of the Go tests individually. Here are the results:

util_test.go passed.
instanceConf_test.go passed.
In function_test.go:
    TestInvalidFunctions passed.
    TestInvokes passed.
In context_test.go:
    TestContext failed. 

So, most of them passed. It’s only context_test.go that failed.

Regarding the TestContext failure, here's what I got from my breakpoint:

Screen Shot 2020-01-13 at 8 39 50 AM

Note that the inputTopics value at the bottom is nil.

@devinbost
Copy link
Contributor Author

It turns out that the issue in the context_test.go test was in FunctionContext.GetInputTopics().
The instance method gets its valued populated by goInstance.setupConsumer(), which must not have been getting run in the test. I changed the getter on the functionContext to get the inputTopic values directly from functionContext.instanceConf.funcDetails.GetSource().InputSpecs.
We may want to remove dependencies on the functionContext.inputTopics field (and remove the field altogether) to prevent confusion later on.
Also, the conf.yaml values must have been changed, so I updated the test to match the values that were being provided.
It's passing locally for me.

I also added a new test file instanceControlServicer_test.go that tests that a gRPC server can be created and successfully connected to.

We probably still need to test the entire thing end-to-end to ensure that concurrency issues weren't introduced anywhere and that I'm starting the servicer in the correct location.

@wolfstudy
Copy link
Member

It turns out that the issue in the context_test.go test was in FunctionContext.GetInputTopics().
The instance method gets its valued populated by goInstance.setupConsumer(), which must not have been getting run in the test. I changed the getter on the functionContext to get the inputTopic values directly from functionContext.instanceConf.funcDetails.GetSource().InputSpecs.
We may want to remove dependencies on the functionContext.inputTopics field (and remove the field altogether) to prevent confusion later on.
Also, the conf.yaml values must have been changed, so I updated the test to match the values that were being provided.
It's passing locally for me.

I also added a new test file instanceControlServicer_test.go that tests that a gRPC server can be created and successfully connected to.

We probably still need to test the entire thing end-to-end to ensure that concurrency issues weren't introduced anywhere and that I'm starting the servicer in the correct location.

Thanks @devinbost agree with you, maybe we can remove inputTopics filed of FunctionContext . Can you help to fix the context_test.go file?

@devinbost
Copy link
Contributor Author

Thanks @devinbost agree with you, maybe we can remove inputTopics filed of FunctionContext . Can you help to fix the context_test.go file?

Done.

@devinbost devinbost changed the title [Issue 4175] [pulsar-function-go] (ongoing) Fixed gRPC file to contain service registration method [Issue 4175] [pulsar-function-go] (ongoing) Make Go-functions production ready Jan 14, 2020
@devinbost devinbost changed the title [Issue 4175] [pulsar-function-go] (ongoing) Make Go-functions production ready [Issue 4175] [pulsar-function-go] (ongoing) Add Go Function heartbeat and statistics for production usage Jan 14, 2020
@devinbost
Copy link
Contributor Author

Instrumenting Prometheus might be more work than anticipated.

I ran into a blocking issue due to missing features in the Go client of Prometheus and got some pushback from one of their maintainers when I asked about getting those features.
I'm discussing the issue with them, but we will likely need to work around the issue since they don't seem very eager to add the desired functionality.
For reference, see the discussion here: https://groups.google.com/d/msgid/prometheus-users/3480a316-748c-4351-8a15-1ac88971fd24%40googlegroups.com?utm_medium=email&utm_source=footer

@devinbost
Copy link
Contributor Author

So, we probably need to have the Prometheus wiring in a different PR due to its complexity and the larger changes that will be required.

@devinbost devinbost changed the title [Issue 4175] [pulsar-function-go] (ongoing) Add Go Function heartbeat and statistics for production usage [Issue 4175] [pulsar-function-go] (ongoing) Add Go Function heartbeat (and gRPC service) for production usage Jan 14, 2020
@devinbost devinbost changed the title [Issue 4175] [pulsar-function-go] (ongoing) Add Go Function heartbeat (and gRPC service) for production usage [Issue 4175] [pulsar-function-go] Add Go Function heartbeat (and gRPC service) for production usage Jan 15, 2020
Copy link
Member

@wolfstudy wolfstudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM, just little comments, please fix them.

pulsar-function-go/pb/doc.go Show resolved Hide resolved
pulsar-function-go/pf/context.go Outdated Show resolved Hide resolved
pulsar-function-go/pf/context.go Outdated Show resolved Hide resolved
"math"
"time"

"github.com/apache/pulsar/pulsar-client-go/pulsar"
log "github.com/apache/pulsar/pulsar-function-go/logutil"
"github.com/apache/pulsar/pulsar-function-go/pb"
pb "github.com/apache/pulsar/pulsar-function-go/pb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pb "github.com/apache/pulsar/pulsar-function-go/pb"
"github.com/apache/pulsar/pulsar-function-go/pb"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I make this change, I can't get it to build. I get unresolved variables, even if I remove pb from the references in the code.

pulsar-function-go/pf/instance.go Outdated Show resolved Hide resolved
@@ -24,7 +24,7 @@ import (
"time"

"github.com/apache/pulsar/pulsar-function-go/conf"
"github.com/apache/pulsar/pulsar-function-go/pb"
pb "github.com/apache/pulsar/pulsar-function-go/pb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pb "github.com/apache/pulsar/pulsar-function-go/pb"
"github.com/apache/pulsar/pulsar-function-go/pb"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, when I make this change, I can't get it to build.

"net"
"google.golang.org/grpc"
log "github.com/apache/pulsar/pulsar-function-go/logutil"
pb "github.com/apache/pulsar/pulsar-function-go/pb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pb "github.com/apache/pulsar/pulsar-function-go/pb"
"github.com/apache/pulsar/pulsar-function-go/pb"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, when I make this change, I can't get it to build.

pulsar-function-go/pf/instance_test.go Outdated Show resolved Hide resolved
)


func testProcessSpawnerHealthCheckTimer(tkr *time.Ticker, lastHealthCheckTs int64, expectedHealthCheckInterval int32, counter *int ){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testProcessSpawnerHealthCheckTimer(tkr *time.Ticker, lastHealthCheckTs int64, expectedHealthCheckInterval int32, counter *int ){
func testProcessSpawnerHealthCheckTimer(tkr *time.Ticker, lastHealthCheckTs int64, expectedHealthCheckInterval int32, counter int ){

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks the test because the counter variable doesn't propagate as needed for the assertion.

}
}

func testStartScheduler(counter *int){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testStartScheduler(counter *int){
func testStartScheduler(counter int){

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this change breaks the test because the counter variable doesn't propagate as needed for the assertion.

@wolfstudy
Copy link
Member

@devinbost I'm adding Action CI for Go Functions, make ensure that test cases can pass and code style is good

@devinbost
Copy link
Contributor Author

I made the changes that I could. Regarding everything that wasn't modified, please see my replies to the comments.

Copy link
Member

@wolfstudy wolfstudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@wolfstudy
Copy link
Member

retest this please

@sijie
Copy link
Member

sijie commented Jan 17, 2020

@devinbost can you rebase this pull request?

…omission of methods for gRPC server registration in generated gRPC files for Go. (apache#4175)

Generated updated gRPC files that contain service registration methods for creating gRPC service in Go. Also, upgraded proto version to 3. (apache#4175)

Fixed build errors by prefixing pulsar-function-go/pb with pb alias. (apache#4175).

Added instanceControlServicer.go as the servicer responsible for serving the gRPC service for the Go Function instances (apache#4175). Rough draft right now.

Added changes to show intent behind passing port value to Start in function.go. Also, added some code to support healthcheck and added methods to support instanceConrolServicer. Just needed to commit changes to allow reproducible test errors. (apache#4175).

Updated function.go Start method to make it more clear where we need to provide a port value (apache#4175).

Added port and expectedHealthCheckInterval to use of function context. Updated all references. (apache#4175)

Added Apache license to gRPC-generated files in attempt to get license check test to pass (apache#4175).

Created instanceControlServicer_test.go to test gRPC server and validate that HealthCheck method returns true as expected (apache#4175).

Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location (apache#4175).

Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location. (Should have been part of previous commit.) Also, added expectedHealthCheckInterval to conf.yaml for testing. (apache#4175).

Fixed license formatting by running mvn license:format (apache#4175).

Added logic and tests to allow healthCheck to kill instances that aren't receiving their regular health checks. Still needs an end-to-end test involving FunctionManager to check for possible issues that could kill instances incorrectly (apache#4175).

Removed inputTopics field from FunctionContext (apache#4175).

Adding the progress I've made so far on migrating the Prometheus code to Go... currently blocked due to missing methods from the Go client. Waiting for information from the Prometheus maintainers to find a workaround. (apache#4175).

Fixed license check. (apache#4175)

Reverting the last two commits since they should go into a separate PR. (apache#4174).

Re-added test file that was accidentially deleted (apache#4175).

Added a few comments to make review easier (apache#4175).

Made minor (non-functional) changes as per PR review (apache#4175).

Fixed print statements (apache#4175).

Re-added comment after getting maven license formatting correct (apache#4175).
@devinbost
Copy link
Contributor Author

@sijie It's rebased and ready to go.

@wolfstudy
Copy link
Member

run java8 tests
run integration tests
run cpp tests

@wolfstudy
Copy link
Member

run integration tests
run java8 tests

@wolfstudy
Copy link
Member

run java8 tests

@wolfstudy
Copy link
Member

run integration tests

@wolfstudy
Copy link
Member

run cpp tests

@sijie
Copy link
Member

sijie commented Jan 22, 2020

run java8 tests

Devin Bost added 4 commits January 24, 2020 22:25
@devinbost
Copy link
Contributor Author

devinbost commented Jan 27, 2020

@sijie Do we need all of the Github Action tests to pass before this can be merged?
All of the tests pass on my local machine.
@wolfstudy and I have re-triggered the tests many times, and they're still not all passing due to the intermittent issues.

@devinbost
Copy link
Contributor Author

devinbost commented Jan 29, 2020

@merlimat @jerrypeng How do we get this PR merged? We've tried re-running the tests at least 12 times, and the intermittent test failures continue to prevent the tests from all passing.
(If you look back through the history of the test runs, every test has passed at least once.)

@sijie
Copy link
Member

sijie commented Jan 31, 2020

@devinbost the actions are marked with required. we can't merge it right now. working on how to get around the github action issues.

@devinbost
Copy link
Contributor Author

@sijie Thanks for the help. I really appreciate it.

@tuteng tuteng merged commit e1c8d14 into apache:master Feb 4, 2020
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
… service) for production usage (apache#6031)

Partial fix required for apache#4175. 
Addresses issue documented here: grpc/grpc-go#3310
Part of ongoing work to add the gRPC service for heartbeat functionality and related support for running Go functions. 

### Motivation
Progress was blocked by the issue documented above. 

### Modifications

1. Updated ./generate.sh script in pulsar-function-go module to use the grpc plugin. 
2. Rebuilt Go gRPC files using the grpc plugin. This updated the proto version and was mostly an additive change that included adding the new registration methods required for creating the Go gRPC service. 

### Verifying this change

- Unable to test changes because Go tests were already failing. Currently investigating. 
- More testing will be done in subsequent commits because the Go gRPC functionality is not yet complete. (Should be low impact.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants